How To Scale Transformers’ Memory as much as 262K Tokens With a Minor Change? What’s the difficulty? What’s the answer? What’s the result? The KNN lookup in a single scheme: Experiment and Results – BARD AI

Home Artificial Intelligence How To Scale Transformers’ Memory as much as 262K Tokens With a Minor Change? What’s the difficulty? What’s the answer? What’s the result? The KNN lookup in a single scheme: Experiment and Results

Artificial Intelligence

How To Scale Transformers’ Memory as much as 262K Tokens With a Minor Change? What’s the difficulty? What’s the answer? What’s the result? The KNN lookup in a single scheme: Experiment and Results

admin

-

March 15, 2023

How To Scale Transformers’ Memory as much as 262K Tokens With a Minor Change?
What’s the difficulty?
What’s the answer?
What’s the result?
The KNN lookup in a single scheme:
Experiment and Results

Extending Transformers by memorizing as much as 262K tokens

Figure 1. Extending Transformers with access to (key, value) pairs of previously seen subsequences. source

Figure 1. Extending Transformers with access to (key, value) pairs of previously seen subsequences. source

Well, it seems the technical subject is alleged; Now let’s dive into experiments they did.

It’s common for all papers to say what they did outperformed predecessors or has some advantages over its counterparts. Here is identical, but I’m not gonna re-write them here. I suffice to supply daring results.

Effect of External Memory

Average token-level perplexities of every model when trained for 500K steps source

As you possibly can see adding external memory to each Vanilla Transformer (popular one) and Transformer-XL, the complexity is considerably improved. For instance; for dataset PG19, by adding a memory size of 8192, the complexity for vanilaTran sees an improvement from 13.71 to 12.39, and the identical for Transformer-XL.

Increasing the dimensions of the memory increases the advantage of the memory

One of the best complexity of all models and datasets is for those with a memory size of 65K.

Is that this approach Scalable (from an architectural perspective)??

It seems yes, they did this by increasing the sizes of 1 and eight billion parameters.

Adding a memory of 8K tokens improves perplexity across different model sizes. source

The end in their words: “”

Funtuning on a bigger memory

It showed unstable training.

Finetuning for 20K steps to utilize a bigger memory on the arXiv dataset. source

Finetuning a non-memory model to make use of memory

Q) can we use a pre-trained Transformer and the finetune it to make use of external memory?

A) , in fact.

Finetuning a 1B vanilla Transformer model to make use of external memory of size 65K. source

The model to make use of . (here it only takes which is barely of the pre-trained one)

which tokens show a profit from memory?

Difference in loss for every token in randomly chosen paper, using the identical model once with a memory size of 8K and once with 32K. Higher numbers mean the longer memory helped compared to the shorter memory. This paper is 22K tokens long. source

We are able to understand that its profit is sparse.

The foremost point of this approach is that you simply need the smallest amount of changes in code to adapt this external memory. It is a big point for everybody who’s using transformers of their work.
The official code is publicly available in and .
The foremost paper.

5 COMMENTS

NAKIRI October 12, 2023 At 8:27 pm

NAKIRI

Reply
focus music October 12, 2023 At 9:02 pm

focus music

Reply
free binance account January 20, 2024 At 5:58 am

Your article helped me a lot, is there any more related content? Thanks! https://www.binance.com/en/join?ref=GJY4VW8W

Reply
Μπνου εγγραφ Binance February 27, 2024 At 7:31 am

Thank you for your sharing. I am worried that I lack creative ideas. It is your article that makes me full of hope. Thank you. But, I have a question, can you help me? https://accounts.binance.com/el/register-person?ref=T7KCZASX

Reply
Registrēties March 19, 2024 At 10:43 am

Your point of view caught my eye and was very interesting. Thanks. I have a question for you.

Reply

Leave a Reply to focus music Cancel reply