DeepSeek can have found a brand new technique to improve AI’s ability to recollect

Currently, most large language models break text down into hundreds of tiny units called tokens. This turns the text into representations that models can understand. Nonetheless, these tokens quickly grow to be expensive to store and compute with as conversations with end users grow longer. When a user chats with an AI for lengthy periods, this challenge could cause the AI to forget things it’s been told and get information muddled, an issue some call “context rot.”

The brand new methods developed by DeepSeek (and published in its latest paper) could help to beat this issue. As a substitute of storing words as tokens, its system packs written information into image form, almost as if it’s taking an image of pages from a book. This enables the model to retain nearly the identical information while using far fewer tokens, the researchers found.

Essentially, the OCR model is a test bed for these latest methods that let more information to be packed into AI models more efficiently.

Besides using visual tokens as an alternative of just text tokens, the model is built on a style of tiered compression that just isn’t unlike how human memories fade: Older or less critical content is stored in a rather more blurry form to be able to save space. Despite that, the paper’s authors argue, this compressed content can still remain accessible within the background while maintaining a high level of system efficiency.

Text tokens have long been the default constructing block in AI systems. Using visual tokens as an alternative is unconventional, and in consequence, DeepSeek’s model is quickly capturing researchers’ attention. Andrej Karpathy, the previous Tesla AI chief and a founding member of OpenAI, praised the paper on X, saying that images may ultimately be higher than text as inputs for LLMs. Text tokens could be “wasteful and just terrible on the input,” he wrote.

Manling Li, an assistant professor of computer science at Northwestern University, says the paper offers a brand new framework for addressing the prevailing challenges in AI memory. “While the thought of using image-based tokens for context storage isn’t entirely latest, that is the primary study I’ve seen that takes it this far and shows it would actually work,” Li says.

DeepSeek can have found a brand new technique to improve AI’s ability to recollect

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Graph RAG vs SQL RAG

The Pearson Correlation Coefficient, Explained Simply

The AI Hype Index: Data centers’ neighbors are pivoting to power blackouts

Let Hypothesis Break Your Python Code Before Your Users Do

Meta bought 1 GW of solar this week

DeepSeek can have found a brand new technique to improve AI’s ability to recollect

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.