A Gentle Introduction to GPT Models Generative pre-trained language models 4 versions and lots of more subversions ChatGPT How you can use GPT models? Limitations of GPT models Conclusion

Artificial Intelligence

A Gentle Introduction to GPT Models Generative pre-trained language models 4 versions and lots of more subversions ChatGPT How you can use GPT models? Limitations of GPT models Conclusion

admin

April 17, 2023

A Gentle Introduction to GPT Models
Generative pre-trained language models
4 versions and lots of more subversions
ChatGPT
How you can use GPT models?
Limitations of GPT models
Conclusion

Welcome to the brand new world of token generators

With the recent releases of ChatGPT and GPT-4, GPT models have drawn plenty of interest from the scientific community. These recent versions of OpenAI’s GPT models are so powerful and versatile that it could take plenty of time before we will exploit their full potential.

Despite the fact that they’re impressive, what you could not know is that the principal ideas and algorithms behind GPT models are removed from recent.

Whether you might be a seasoned data scientist or simply someone inquisitive about GPT, knowing how GPT models evolved is especially insightful on the impact of information and what to anticipate for the approaching years.

In this text, I explain how GPT models became what they’re today. I’ll mainly concentrate on how OpenAI scaled GPT models through the years. I’ll also give some pointers if you wish to start using GPT models.

GPT models are language models.

Language models have existed for greater than 50 years.

The primary generation of language models was “n-gram based”. They modeled the probability of a word given some previous words.

As an example, if you might have the sentence:

The cat sleeps within the kitchen.

With n=3, you may get from a 3-gram language model the probability of getting “in” following “cat sleeps”.

n-gram models remained useful in lots of natural language and speech processing tasks until the start of the 2010s.

They suffer several limitations. The computational complexity dramatically increases with a better n. So these models were often limited to n=5 or lower.

Then, due to neural networks and using more powerful machines, this principal limitation was alleviated and it became possible to compute the probability for for much longer n-grams, as an illustration for n=20 or higher.

Generating text with these models was also possible but their outputs were of a so poor quality that they were rarely used for this purpose.

Then, in 2018, OpenAI proposed the primary GPT model.

GPT stands for “generative pre-trained”. “Pre-trained” implies that the model was simply trained on a considerable amount of text to model probabilities without some other purpose than language modeling. GPT models can then be fine-tuned, i.e., further trained, to perform more specific tasks.

As an example, you should use a small dataset of reports summaries to acquire a GPT model excellent at news summarization. Or fine-tune it on French-English translations to acquire a machine translation system able to translating from French to English.

Note: The term “pre-training” suggests that the models should not fully trained and that one other step is required. With recent models, the necessity for fine-tuning tends to vanish. The pre-trained models are actually directly utilized in applications.

GPT models are actually excellent in just about all natural language processing tasks. I particularly studied their ability to do machine translation, as you may read in the next article:

The size of the training, and the Transformer neural network architecture that they exploit, are the principal explanation why they will generate fluent text.

Since 2018 and the primary GPT, several versions and subversions of GPT followed.

GPT and GPT-2

GPT-2 got here out only just a few months after the primary GPT was announced. Note: The term “GPT” was never mentioned within the scientific paper describing the primary GPT. Arguably, lets say that “GPT-1” never existed. To the perfect of my knowledge, it was also never released.

What’s the difference between GPT and GPT-2?

The size. GPT-2 is far larger than GPT.

GPT was trained on the BookCorpus which accommodates 7,000 books. The model has 120 million parameters.

What’s a parameter?

A parameter is a variable learned in the course of the model training. Typically, a model with more parameters is greater and higher.

120 million was an enormous number in 2018.

With GPT-2, OpenAI proposed a fair greater model containing 1.5 billion parameters.

It was trained on an undisclosed corpus called WebText. This corpus is 10 times larger than BookCorpus (in line with the paper describing GPT-2).

OpenAI step by step released 4 versions of GPT-2:

small: 124 million parameters
medium: 355 million parameters
large: 774 million parameters
xl: 1.5 billion parameters

They’re all publicly available and might be utilized in industrial products.

While GPT-2-XL excels at generating fluent text within the wild, i.e., with none particular instructions or fine-tuning, it stays far less powerful than more moderen GPT models for specific tasks.

The discharge of GPT-2-XL was the last open release of a GPT model by OpenAI. GPT-3 and GPT-4 can only be used through OpenAI’s API.

GPT-3

GPT-3 was announced in 2020. With its 175 billion parameters, it was a fair greater jump from GPT-2 than GPT-2 from the primary GPT.

This can also be from GPT-3 that OpenAI stopped to reveal precise training details about GPT models.

Today, there are 7 GPT-3 models available through OpenAI’s API but we only know little about them.

With GPT-3, OpenAI demonstrated that GPT models might be extremely good for specific language generation tasks if the users provide just a few examples of the duty they need the model to attain.

GPT-3.5

With the GPT-3 models running within the API and attracting increasingly users, OpenAI could collect a really large dataset of user inputs.

They exploited these inputs to further improve their models.

They used a method called reinforcement learning from human feedback (RLHF). I won’t explain the main points here but you could find them in a blog post published by OpenAI.

In a nutshell, due to RLHF, GPT-3.5 is a lot better at following user instructions than GPT-3. OpenAI denotes this class of GPT models as “instructGPT”.

With GPT-3.5, you may “prompt” the model to perform a selected task without the necessity to present it examples of the duty. You only have to put in writing the “right” prompt to get the perfect result. That is where “prompt engineering” becomes necessary and why expert prompt engineers are receiving incredible job offers.

GPT-3.5 is the present model used to power ChatGPT.

GPT-4

GPT-4 has been released in March 2023.

We all know almost nothing about its training.

The principal difference with GPT-3/GPT-3.5 is that GPT-4 is bimodal: It may take as input images and text.

It may generate text but won’t directly generate images. Note: GPT-4 can generate the code that may generate a picture, or retrieve one from the Web.

On the time of writing these lines, GPT-4 continues to be in a “limited beta”.

ChatGPT is only a user interface with chat functionalities. If you write something with ChatGPT, it’s a GPT-3.5 model that generates the reply.

A particularity of ChatGPT is that it’s not only taking as input the present query of the user as an out-of-the-box GPT model would do. To properly work as a chat engine, ChatGPT must keep track of the conversation: What has been said, what’s the user goal, etc.

OpenAI didn’t disclose the way it does that. On condition that GPT models can only accept a prompt of a limited length (I’ll explain this later), ChatGPT can’t just concatenate all of the dialogue turns together to place them in the identical prompt. This type of prompt may very well be way too large to be handled by GPT-3.5.

You possibly can easily get GPT-2 models online and use them in your computer. If you wish to run large language models in your machine, you could be excited by reading my tutorial:

For GPT-3 and GPT-3.5, we’ve no other alternative than to make use of OpenAI’s API. You’ll first must create an OpenAI account on their website.

Once you might have an account, you may start fidgeting with the models contained in the “playground” which is a sandbox that OpenAI proposes to experiment with the models. You possibly can access it only when you find yourself logged in.

If you wish to directly use the models in your application, OpenAI and the open-source community offer libraries in lots of languages, equivalent to Python, Node.js, and PHP, to call the models using OpenAI API.

You possibly can create and get your OpenAI API key in your OpenAI account. Note: Keep this key secret. Anyone who has it will probably devour your OpenAI credits.

Each model has different settings that you would be able to adjust. Bear in mind that GPT models are non-deterministic. For those who prompt a model twice with the identical prompt there’s a high likelihood that you’re going to have two close but different answers.

Note: If you wish to reduce the variations between answers given the identical prompt, you may set to 0 the “temperature” parameter of the model. As a side effect, it’s going to also significantly decrease the range of the answers, in other words, the generated text could also be more redundant.

You may even should care in regards to the “maximum content length”. That is the length of your prompt along with the length of the reply generated by GPT. As an example, GPT-3.5-turbo has a “maximum content length” of 4,096 tokens.

A token is the minimal unit of text utilized by the GPT models to generate text. Yes, GPT models should not exactly word generators but moderately token generators. A token is usually a character, a chunk of word, a word, or perhaps a sequence of words for some languages.

OpenAI gives an example within the API documentation.

"ChatGPT is great!" is encoded into six tokens: ["Chat", "G", "PT", " is", " great", "!"].

As a rule of thumb, count that 750 English words yield 1,000 tokens.

In my view, managing the “maximum content length” is essentially the most tedious a part of working with the OpenAI API. First, there is no such thing as a straightforward solution to know the way many tokens your prompt accommodates. Then, you may’t know prematurely what number of tokens can be in the reply of the model.

You’ve gotten to guess. And you may only guess right if you might have some experience with the models. I like to recommend experimenting rather a lot with them to raised gauge how long might be the answers given your prompts.

In case your prompt is simply too long, the reply can be cut-off.

I won’t give more details in regards to the API here as it will probably change into quite technical.

GPT models are only token generators trained on the Web. They’re biased by the content they were trained on and thus can’t be considered fully secure.

Since GPT-3.5, OpenAI has trained its model to avoid answering harmful content. To attain this, they used machine learning techniques and consequently this “self-moderation” of the model can’t be 100% trusted.

This self-moderation may go for a given prompt, but may then completely fail after just changing one word on this prompt.

I also recommend reading the Terms of Use of OpenAI products. On this document, the restrictions of GPT models appear more clearly for my part.

For those who plan to construct your application with the API, it is best to particularly concentrate so far:

You could be a minimum of 13 years old to make use of the Services. For those who are under 18 you have to have your parent or legal guardian’s permission to make use of the Services. For those who use the Services on behalf of one other person or entity, you have to have the authority to just accept the Terms on their behalf. You could provide accurate and complete information to register for an account. Chances are you’ll not make your access credentials or account available to others outside your organization, and you might be accountable for all activities that occur using your credentials.

Italy temporarily banned ChatGPT because it could generate inappropriate answers for people under 18, amongst other reasons.

For those who are a developer constructing an application on top of OpenAI API, you have to check the age of your users.

OpenAI also published a listing of usage policies mentioning all of the prohibited usage of the models.

GPT models are quite simple models and their architecture didn’t evolve much since 2018. But whenever you train an easy model at a big scale on the best data and with the best hyperparameters, you may get extremely powerful AI models equivalent to GPT-3 and GPT-4.

They’re so powerful that we’ve not nearly explored all their potential.

While recent GPT models should not open-source, they continue to be easy to make use of with OpenAI’s API. You may also play with them through ChatGPT.