LLMs contain a LOT of parameters. But what’s a parameter?

When a model is trained, each word in its vocabulary is assigned a numerical value that captures the meaning of that word in relation to all the opposite words, based on how the word appears in countless examples across the model’s training data.

Each word gets replaced by a form of code?

Yeah. But there’s a bit more to it. The numerical value—the embedding—that represents each word is in actual fact a of numbers, with each number within the list representing a distinct facet of meaning that the model has extracted from its training data. The length of this list of numbers is one other thing that LLM designers can specify before an LLM is trained. A typical size is 4,096.

Every word inside an LLM is represented by a listing of 4,096 numbers?

Yup, that’s an embedding. And every of those numbers is tweaked during training. An LLM with embeddings which can be 4,096 numbers long is claimed to have 4,096 dimensions.

Why 4,096?

It would appear to be a wierd number. But LLMs (like anything that runs on a pc chip) work best with powers of two—2, 4, 8, 16, 32, 64, and so forth. LLM engineers have found that 4,096 is an influence of two that hits a sweet spot between capability and efficiency. Models with fewer dimensions are less capable; models with more dimensions are too expensive or slow to coach and run.

Using more numbers allows the LLM to capture very fine-grained details about how a word is utilized in many alternative contexts, what subtle connotations it might need, the way it pertains to other words, and so forth.

Back in February, OpenAI released GPT-4.5, the firm’s largest LLM yet (some estimates have put its parameter count at greater than 10 trillion). Nick Ryder, a research scientist at OpenAI who worked on the model, told me on the time that larger models can work with extra information, like emotional cues, reminiscent of when a speaker’s words signal hostility: “All of those subtle patterns that come through a human conversation—those are the bits that these larger and bigger models will pick up on.”

The upshot is that each one the words inside an LLM get encoded right into a high-dimensional space. Picture 1000’s of words floating within the air around you. Words which can be closer together have similar meanings. For instance, “table” and “chair” shall be closer to every aside from they’re to “astronaut,” which is near “moon” and “Musk.” Way off in the gap you’ll be able to see “prestidigitation.” It’s a bit like that, but as an alternative of being related to one another across three dimensions, the words inside an LLM are related across 4,096 dimensions.

Yikes.

It’s dizzying stuff. In effect, an LLM compresses the whole web right into a single monumental mathematical structure that encodes an unfathomable amount of interconnected information. It’s each why LLMs can do astonishing things and why they’re unimaginable to completely understand.

LLMs contain a LOT of parameters. But what’s a parameter?

Each word gets replaced by a form of code?

Every word inside an LLM is represented by a listing of 4,096 numbers?

Why 4,096?

Yikes.

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

3 Questions: Constructing predictive models to characterize tumor progression

Introducing Storage Buckets on the Hugging Face Hub

Constructing a Like-for-Like solution for Stores in Power BI

How NVIDIA Builds Open Data for AI

How Joseph Paradiso’s sensing innovations bridge the humanities, medicine, and ecology

LLMs contain a LOT of parameters. But what’s a parameter?

Each word gets replaced by a form of code?

Every word inside an LLM is represented by a listing of 4,096 numbers?

Why 4,096?

Yikes.

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.