Inside LLaMA: Meta AI Recent Large Language Model that Outperforms GPT-3 Across Many Tasks Architecture LLaMA in Motion The First LLaMA Implementation

An open-source implementation of LLaMA is already available.

I recently began an AI-focused educational newsletter, that already has over 150,000 subscribers. TheSequence is a no-BS (meaning no hype, no news etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to maintain you up so far with machine learning projects, research papers and ideas. Please give it a try by subscribing below:

Large Language Models (LLMs) have recently taken the world by storm with their remarkable ability to perform recent tasks from textual instructions or a number of examples. This ability, often called few-shot learning, was first observed when models were scaled as much as a sufficient size. Because of this, researchers have focused on scaling these models even further. The overall assumption is that more parameters will lead to raised performance. Nevertheless, recent research has shown that, for a given compute budget, the perfect performance shouldn’t be achieved by the biggest models. As a substitute, smaller models trained on more data outperform their larger counterparts. In that context, Meta AI recently published a paper detailing LLaMA, a 65B LLM that’s in a position to outperform GPT-3 across many tasks despite being significantly smaller.

The core principle behind LLaMA is to realize the perfect possible performance at various inference budgets by training on more tokens than what is usually used. LLaMA ranges from 7B to 65B parameters and has competitive performance in comparison with the perfect existing LLMs. As an example, LLaMA-13B outperforms GPT-3 on most benchmarks despite being 10× smaller. This model is more likely to democratize the access and study of LLMs since it will possibly be run on a single GPU. At the upper end of the dimensions, the 65B-parameter model can also be competitive with the perfect large language models corresponding to Chinchilla or PaLM-540B.

What sets LLaMA aside from other models is that it only uses publicly available data, making it compatible with open sourcing. Most existing models depend on data that’s either not publicly available or undocumented. Although there are some exceptions, corresponding to OPT, GPT-NeoX, BLOOM, and GLM, none of them are competitive with PaLM-62B or Chinchilla.

LLaMA’s mode is predicated on a typical transformer architecture, incorporating various improvements from recent research, corresponding to pre-normalization, SwiGLU activation function, and rotary embeddings. To reinforce training stability, LLaMA normalizes the input of every transformer sub-layer using the RMSNorm normalizing function as a substitute of normalizing the output as in the unique architecture. Moreover, LLaMA replaces the ReLU non-linearity with the SwiGLU activation function to enhance performance, using a dimension of 234d in addition to absolutely the positional embeddings and adding rotary positional embeddings at each layer of the network to scale back computational overhead.

To reinforce training efficiency, Meta AI used an efficient implementation of the causal multi-head attention operator, reducing memory usage and computation along with checkpointing to save lots of expensive activations through the backward pass and manually implement the backward function for the transformer layers to scale back the variety of activations that must be recomputed. Finally, To scale back memory usage further, LLaMA relies on model and sequence parallelism and overlap computation of activations and communication between GPUs over the network. The result was visible through the training of the 65B-parameter model. LLaMA processed roughly 380 tokens/sec/GPU on 2048 A100 GPUs with 80GB of RAM, taking around 21 days to coach over our dataset containing 1.4T tokens.

LLaMA was evaluated on 20 benchmarks, including zero-shot and few-shot tasks, and compared it with other foundation models, corresponding to GPT-3, Gopher, Chinchilla, and PaLM, together with OPT models, GPT-J, and GPTNeo. Results showed that LLaMA was in a position to outperform GPT-3 despite being 10 times smaller in size.

A number of the results of LLaMA are incredibly sophisticated and factually accurate, showing strong signs of reasoning.

LLaMA hasn’t been open-sourced yet. Nevertheless, not wasting any time, AI startup Nebuly released ChatLLaMA, an open-source implementation of LLaMA based on RLHF. ChatLLaMA enables the implementation of ChatGPT-style service using pre-trained LLaMA models. In comparison with the unique ChatGPT, ChatLLaMA offers faster and cheaper training processes and single-GPU inference, because of the smaller size of LLaMA architectures. Plus, the library includes built-in support for DeepSpeed ZERO, allowing you to hurry up the fine-tuning process. ChatLLaMA also supports all LLaMA model architectures (7B, 13B, 33B, 65B), supplying you with the flexibleness to fine-tune the model based in your preferences for training time and inference performance.

The code for using ChatLLaMA is super easy, as illustrated below:

from chatllama.rlhf.trainer import RLTrainer
from chatllama.rlhf.config import Configpath = "path_to_config_file.yaml"
config = Config(path=path)
trainer = RLTrainer(config.trainer)
trainer.distillate()
trainer.train()
trainer.training_stats.plot()

LLaMA is definitely a really interesting development within the LLM space. Meta AI has enabled early access to the model. Hopefully, a generally available release can be available soon.

Inside LLaMA: Meta AI Recent Large Language Model that Outperforms GPT-3 Across Many Tasks Architecture LLaMA in Motion The First LLaMA Implementation

An open-source implementation of LLaMA is already available.

What are your thoughts on this topic?
Let us know in the comments below.

3 COMMENTS

Share this article

Recent posts

AI’s Growing Power Needs: Tech Industry’s Move Towards Nuclear Power

“Human Intelligence Created”… Human Intelligence Challenge Spreads Against ‘Made by AI’

What We Still Don’t Understand About Machine Learning

OpenAI Unveils SearchGPT: A Recent AI-Powered Search Engine

Public Release: Kling AI Video Generator

Inside LLaMA: Meta AI Recent Large Language Model that Outperforms GPT-3 Across Many Tasks Architecture LLaMA in Motion The First LLaMA Implementation

An open-source implementation of LLaMA is already available.

What are your thoughts on this topic? Let us know in the comments below.

3 COMMENTS

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.