Home Artificial Intelligence LLAMA: Open and efficient foundation language. | by Roopal jain | May, 2023

LLAMA: Open and efficient foundation language. | by Roopal jain | May, 2023

0
LLAMA: Open and efficient foundation language. | by Roopal jain | May, 2023

LLaMA, a set of foundation language models starting from 7B to 65B parameters.

LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA65B is competitive with the perfect models, Chinchilla-70B and PaLM-540B.

There training data comprises large text data that’s collected openly mentioned in below table.

They tokenize the info with the bite pair encoding algorithm. Overall their entire data comprises 1.4 T tokens after tokenization.

  • To enhance the training stability , They normalise the input sublayer than output layer through the use of normalizing function.
  • They modified the function with the .
  • They replaced with the .
  • was used with following hyper parameter

They used cosine learning rate schedule, weight decay of 0.1 and gradient clipping of 1.0.

This process involves deduplicating the info, removing non-English pages, filtering low-quality content, removing html tags and discarding pages not classified as references.

Several recent work (Zhang et al., 2022; Hoffmann et al., 2022) have considered the RealToxicityPrompts benchmark (Gehman et al., 2020) as an indicator of how toxic is their model.

RealToxicityPrompts consists of about 100k prompts that the model must complete; then a toxicity rating is routinely evaluated by making a request to PerspectiveAPI 3 .

For every of the 100k prompts, They measure their toxicity rating. The rating per prompt ranges from 0 (non-toxic) to 1 (toxic).

They observed toxicity increases by the dimensions of the model. This might be explained by the proven fact that the larger model, Gopher, has worse performance than Chinchilla, suggesting that the relation between toxicity and model size may only apply inside a model family. So it is tough to check model toxicity in contrast to look at.

Most notably, LLaMA-13B outperforms GPT-3 while being greater than 10× smaller, and LLaMA-65B is competitive with Chinchilla-70B and PaLM-540B on most of benchmark. LLAMA(65B)(rating — 68.9%) outperform most of LLA model of moderate size but remains to be removed from cutting-edge GPT code-davinci-002 on MMLU (rating — 77.4%).

Finetuning these models on instructions result in promising results. Through the use of following guide one can finetune LLaMa(LLaMA (huggingface.co)).

LEAVE A REPLY

Please enter your comment!
Please enter your name here