LLMs for Everyone: Running the LLaMA-13B model and LangChain in Google Colab

Experimenting with Large Language Models at no cost (Part 2)

In the primary a part of the story, we used a free Google Colab instance to run a Mistral-7B model and extract information using the FAISS (Facebook AI Similarity Search) database. On this part, we are going to go further, and I’ll show easy methods to run a LLaMA 2 13B model; we may also test some extra LangChain functionality like making chat-based applications and using agents. In the identical way, as in the primary part, all used components are based on open-source projects and can work completely at no cost.

Let’s get into it!

LLaMA.cpp

A LLaMA.CPP is a really interesting open-source project, originally designed to run an LLaMA model on Macbooks, but its functionality grew far beyond that. First, it’s written in plain C/C++ without external dependencies and might run on any hardware (CUDA, OpenCL, and Apple silicon are supported; it might probably even work on a Raspberry Pi). Second, LLaMA.CPP might be connected with LangChain, which allows us to check a whole lot of its functionality at no cost without having an OpenAI key. Last but not least, because LLaMA.CPP works in every single place, it’s an excellent candidate to run in a free Google Colab instance. As a reminder, Google provides free access to Python notebooks with 12 GB of RAM and 16 GB of VRAM, which might be opened using the Colab Research page. The code is opened in the online browser and runs within the cloud, so everybody can access it, even from a minimalistic budget PC.

Before using LLaMA, let’s install the library. The installation itself is straightforward; we only must enable LLAMA_CUBLAS before using pip:

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip3 install llama-cpp-python
!pip3 install huggingface-hub
!pip3 install sentence-transformers langchain langchain-experimental
!huggingface-cli download TheBloke/Llama-2-7b-Chat-GGUF llama-2-7b-chat.Q4_K_M.gguf --local-dir /content --local-dir-use-symlinks False

For the primary test, I will probably be using a 7B model. Here, I also installed a huggingface-hub library, which allows us to routinely download a “Llama-2–7b-Chat” model within the GGUF format needed for LLaMA.CPP. I also installed a LangChain…

LLMs for Everyone: Running the LLaMA-13B model and LangChain in Google Colab

Experimenting with Large Language Models at no cost (Part 2)

LLaMA.cpp

What are your thoughts on this topic?
Let us know in the comments below.

1 COMMENT

Share this article

Recent posts

Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale

Context Engineering as Your Competitive Edge

Constructing Telco Reasoning Models for Autonomous Networks with NVIDIA NeMo

5 Latest Digital Twin Products Developers Can Use to Construct 6G Networks

Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel

LLMs for Everyone: Running the LLaMA-13B model and LangChain in Google Colab

Experimenting with Large Language Models at no cost (Part 2)

LLaMA.cpp

What are your thoughts on this topic? Let us know in the comments below.

1 COMMENT

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.