The way to Run Your Own LLaMA Download LLaMA Weights Arrange Conda and create an environment for LLaMA Create env and install dependencies Create a swapfile Run the models Add custom prompts Level Up Coding

LLaMA model weights can be found over the web on various web sites. This shouldn’t be legal but I’m sharing only a “The way to — tutorial”

All work shown here is provided by LLaMAnnon

magnet:xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce
Get the .torrent file here.

Please download and seed all of the model weights in the event you can. If you should run a single model, don’t forget to download the tokenizer.model file too.

The official method really useful by meta is using Conda so –

Arrange Conda

Open a terminal and run: wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Run chmod +x Miniconda3-latest-Linux-x86_64.sh
Run ./Miniconda3-latest-Linux-x86_64.sh
Go along with the default options. When it shows you the license, hit q to proceed the installation.
Refresh your shell by logging out and logging in back again.

Create an env: conda create -n llama
Activate the env: conda activate llama
Install the dependencies:
NVIDIA:
conda install torchvision torchaudio pytorch-cuda=11.7 git -c pytorch -c nvidia
AMD:
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.2
Clone the INT8 repo by the user tloen: git clone https://github.com/tloen/llama-int8 && cd llama-int8
Install the necessities: pip install -r requirements.txt pip install -e .

Loading the weights for 13B and better models need a substantial amount of DRAM. IIRC it takes about 50GB for 13B, and over 100GB for 30B. You’ll need a swap file to handle excess memory usage. This is barely used for the loading process; the inference is unaffected (so long as you meet the VRAM requirements).

Create a swapfile: sudo dd if=/dev/zero of=/swapfile bs=4M count=13000 status=progress This can create about ~50GB swapfile. Edit the count to your preference. 13000 means 4MBx13000.
Mark it as swap: sudo mkswap /swapfile
Activate it: sudo swapon /swapfile

If you should delete it, simply run sudo swapoff /swapfile after which rm /swapfile.

I’ll assume your LLaMA models are in ~/Downloads/LLaMA.

Open a terminal in your llama-int8 folder (the one you cloned).
Run: python example.py --ckpt_dir ~/Downloads/LLaMA/7B --tokenizer_path ~/Downloads/LLaMA/tokenizer.model --max_batch_size=1
You’re done. Wait for the model to complete loading and it’ll generate a prompt.

By default, the llama-int8 repo has a brief prompt baked into example.py.

Open the example.py file within the llama-int8 directory.
Navigate to line 136. It starts with triple quotations, """.
Replace the present prompt with whatever you may have in mind.

Good luck!! The word on the road is that the 7b model is pretty dumb and that is the one version fitting on an enthusiast GPU (16-24gb; 8gb is a NO-GO). There are some tricks to suit a 13B model to suit (using 8bit memory shenanigans but I even have not done that and I’m undecided the way it affects the model itself).

The way to Run Your Own LLaMA Download LLaMA Weights Arrange Conda and create an environment for LLaMA Create env and install dependencies Create a swapfile Run the models Add custom prompts Level Up Coding

What are your thoughts on this topic?
Let us know in the comments below.

2 COMMENTS

Share this article

Recent posts

AI in Finance and Its Impact on Worker Retention

AI’s Growing Power Needs: Tech Industry’s Move Towards Nuclear Power

“Human Intelligence Created”… Human Intelligence Challenge Spreads Against ‘Made by AI’

What We Still Don’t Understand About Machine Learning

OpenAI Unveils SearchGPT: A Recent AI-Powered Search Engine

The way to Run Your Own LLaMA Download LLaMA Weights Arrange Conda and create an environment for LLaMA Create env and install dependencies Create a swapfile Run the models Add custom prompts Level Up Coding

What are your thoughts on this topic? Let us know in the comments below.

2 COMMENTS

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.