Home Artificial Intelligence LLaMA: LLMs for Everyone!

LLaMA: LLMs for Everyone!

4
LLaMA: LLMs for Everyone!

High-performing language models which might be open-source…

(Photo by Raspopova Marina on Unsplash)

For years, the deep learning community has embraced openness and transparency, resulting in massive open-source projects like HuggingFace. Lots of essentially the most profound ideas in deep learning (e.g., transformers [2], self-supervised learning, etc.) are openly available online, either via public code repositories or Arxiv. Although open-source has been the norm for quite a while, the recognition (and business applicability) of huge language models (LLMs) has recently challenged this tendency.

Lots of essentially the most powerful LLMs available today can only be accessed via APIs (e.g., from OpenAI or Anthropic), making the source code and model parameters inaccessible to researchers and developers. While it’s not my goal to spark a moral discussion of current trends within the LLM landscape, this information is relevant to the subject of this post: openly-available LLMs. Interestingly, not all powerful language foundation models are hidden behind a paywall. Some models, akin to LLaMA, are each openly available and incredibly high-performing, thus maintaining a way of openness within the deep learning research community.

LLaMA just isn’t a single model, but slightly a collection of LLMs with sizes starting from 7 billion to 65 billion parameters. Taking inspiration from Chinchilla [3], these LLMs are a bit smaller than their counterparts but are pre-trained extensively (i.e., smaller models, more tokens) and developed with the goal of providing a various group of models with different tradeoffs between performance and inference efficiency. LLaMA models perform surprisingly well; e.g., the 13 billion parameter model is roughly comparable to GPT-3 [4], while the 65 billion parameter model often surpasses the performance of PaLM [5].

“GPT-4 has learned from a wide range of licensed, created, and publicly available data sources, which can include publicly available personal information.” — from [6]

Beyond the impressive performance, LLaMA uses only publicly available data for pre-training. Taking a step (back) towards open-source inside the LLM landscape, LLaMA models could be reproduced completely from online resources. Recent models akin to GPT-4 are known to have been trained with a mix of public and…

4 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here