In the primary a part of the story, we used a free Google Colab instance to run a Mistral-7B model and extract information using the FAISS (Facebook AI Similarity Search) database. On this part, we are going to go further, and I’ll show easy methods to run a LLaMA 2 13B model; we may also test some extra LangChain functionality like making chat-based applications and using agents. In the identical way, as in the primary part, all used components are based on open-source projects and can work completely at no cost.
Let’s get into it!
LLaMA.cpp
A LLaMA.CPP is a really interesting open-source project, originally designed to run an LLaMA model on Macbooks, but its functionality grew far beyond that. First, it’s written in plain C/C++ without external dependencies and might run on any hardware (CUDA, OpenCL, and Apple silicon are supported; it might probably even work on a Raspberry Pi). Second, LLaMA.CPP might be connected with LangChain, which allows us to check a whole lot of its functionality at no cost without having an OpenAI key. Last but not least, because LLaMA.CPP works in every single place, it’s an excellent candidate to run in a free Google Colab instance. As a reminder, Google provides free access to Python notebooks with 12 GB of RAM and 16 GB of VRAM, which might be opened using the Colab Research page. The code is opened in the online browser and runs within the cloud, so everybody can access it, even from a minimalistic budget PC.
Before using LLaMA, let’s install the library. The installation itself is straightforward; we only must enable LLAMA_CUBLAS
before using pip:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip3 install llama-cpp-python
!pip3 install huggingface-hub
!pip3 install sentence-transformers langchain langchain-experimental
!huggingface-cli download TheBloke/Llama-2-7b-Chat-GGUF llama-2-7b-chat.Q4_K_M.gguf --local-dir /content --local-dir-use-symlinks False
For the primary test, I will probably be using a 7B model. Here, I also installed a huggingface-hub
library, which allows us to routinely download a “Llama-2–7b-Chat” model within the GGUF format needed for LLaMA.CPP. I also installed a LangChain…