Linear Regression to GPT in Seven Steps Step 1: Prediction by Linear Regression Step 2: Prediction by Neural Networks Step 3: Prediction on a word Step 4: Prediction for Naive Translation Step 5: Prediction with Context Step 6: Prediction of Next Word Step 7: Prediction for Generative AI Summary

-

There are many writings about Generative AI. There are essays dedicated to its applications, ethical and moral issues, and its risk to human society. If you ought to understand the technology itself, there’s a spread of obtainable material from the unique research papers to introductory articles and videos. Depending in your current level and interest, you could find the proper resources for study.

This text is written for a selected class of readers. These readers have studied machine learning, though not as a significant subject. They’re aware that prediction and classification are the 2 predominant use cases of ML that cover most of its applications. They’ve also studied common machine learning algorithms for prediction and classification akin to linear regression, logistic regression, support vector machines, decision trees and a little bit of neural networks. They may need coded just a few small projects in python using libraries akin to scikit-learn and even used some pre-trained TensorFlow models like ResNet. I believe a variety of students and professionals will give you the chance to relate to this description.

For these readers it’s natural to wonder: is generative AI a recent sort of ML use case? It actually seems different from each prediction and classification. There’s enough jargon going around to discourage venturing into understanding generative AI. Terms akin to transformers, multi-head attention, large language models, foundational models, sequence to sequence, and prompt engineering can easily persuade you that it is a very different world than the comfy prediction-classification one we used to know.

The message of this text is that generative AI is only a special case of prediction. If you happen to fit the outline of ML enthusiasts I gave earlier, then you definately can understand the essential working of generative AI in seven easy steps. I start with linear regression (LinReg), the ML technique that everybody knows. In this text I even have treated a specific branch of generative AI called Large Language Models (LLM), largely since the wildly popular ChatGPT belongs to this branch.

Image by Rajashree Rajadhyax

LinReg identifies the very best line that represents the given data points. Once this line is found, it’s used to predict the output for a recent input.

Image by writer

We are able to write the LinReg model as a mathematical function. Written in a straightforward to grasp way, it looks like:

recent output = Line Function (recent input)

We may draw a schematic for it:

That is prediction at probably the most basic level. A LinReg model ‘learns’ the very best line and uses it for prediction.

You should utilize LinReg only should you know that the info will fit a line. This is often easy to do for single-input-single-output problems. We are able to simply draw a plot and inspect it visually. But in most real life problems, there are multiple inputs. We cannot visualize such a plot.

As well as, real world data doesn’t at all times follow a linear path. Persistently, the very best fitting shape is non-linear. See the below plot:

Image by writer

Learning such a function in multi-dimensional data just isn’t possible by easy methods like LinReg. That is where neural networks (NN) are available. NNs don’t require us to come to a decision which function they need to learn. They find it themselves after which go on to learn the function, nevertheless complex it might be. Once a NN learns the complex, multi-input function, they use this function for prediction.

We are able to again write an analogous equation, but with a change. Our inputs at the moment are many, so now we have to represent them by a vector. In truth, the output may also be many and we’ll use a vector for them too.

output vector = NN Function (input vector)

We’ll draw the schematic of this recent, more powerful prediction:

Image by writer

Now consider that now we have an issue by which the input to the NN is a word from some language. Neural networks can only accept numbers and vectors. To suit this, words are converted into vectors. You possibly can imagine them to be the residents of a many dimensional space, where related words are close to one another. For instance, the vector for ‘Java’ will likely be near other vectors for programming techniques; but it’ll even be near the vectors for places within the Far-east, akin to Sumatra.

A (very imaginary) word embedding, image by writer

Such a set of vectors corresponding to words in a language known as an ‘Embedding’. There are various methods to create the embeddings; Word2Vec and GloVe being two popular examples. Typical sizes of such embeddings are 256, 512 or 1024.

Once now we have vectors for words, we will use the NN for prediction on them. But what can we achieve by prediction on words? We are able to do a variety of things. We are able to translate a word to a different language, get a synonym for the word or find its past tense. The equation and schematic for this prediction will look very much like Step 3.

output word embedding = NN Function (input word embedding)
Image by writer

In a translation problem, the input is a sentence in a single language and the output is a sentence in one other language. How can we implement translation using what we already learn about prediction on a word? Here we take a naive approach to translation. We convert each word of the input sentence to its equivalent in one other language. After all, real translation is not going to work like this; but for this step, pretend as if it’ll.

This time we’ll draw the schematic first:

Image by writer

The equation for the primary word will likely be:

NN Translation Function (Embeddings for word. no 1 in input sentence) 
= Embedding for word no.1 in output sentence

We are able to similarly write the equations for the opposite words.

The neural network used here has learned the interpretation function by taking a look at many examples of word pairs, one from each language. We’re using one such NN for every word.

We thus have a translation system using prediction. I even have already admitted that it is a naive approach to translation. What are the additions that may make it work in the true world? We’ll see that in the following two steps.

The primary problem with the naive approach is that the interpretation of 1 word depends upon other words within the sentences. For instance, consider the next English to Hindi translation:

Input (English): ‘Ashok sent a letter to Sagar’

Output (Hindi): ’Ashok ne Sagar ko khat bheja’.

The word ‘sent’ is translated as ‘bheja’ within the output. Nevertheless, if the input sentence is:

Input (English): ‘Ashok sent sweets to Sagar’

Then the identical word is translated as ‘bheji’.

Output (Hindi): ’Ashok ne Sagar ko mithai bheji’.

Thus it’s mandatory so as to add the context from other other words within the sentence while predicting the output. We’ll draw the schematic just for one word:

Image by writer

There are various methods to generate the context. Essentially the most powerful and state-of-the-art known as ‘attention’. The neural networks that use attention for context generation are called ‘transformers’. Bert and GPT are examples of transformers.

We now have a form of prediction that uses context. We are able to write the equation as:

NN Translation Function (Embeddings for word. no 1 in input sentence 
+ context from other words in input sentence)
= Embedding for word no.1 in output sentence

We’ll now handle the second problem within the naive approach to translation. Translation just isn’t a one-to-one mapping of words. See the instance from the previous step:

Input (English): ‘Ashok sent a letter to Sagar’

Output (Hindi): ’Ashok ne Sagar ko khat bheja’.

You’ll notice that the order of the words is different, and there is no such thing as a equivalent of the word ‘a’ in input, or the word ‘ne’ within the output. Our one-NN-per-word approach is not going to work on this case. In truth it’ll not work usually.

Fortunately there’s a greater method available. After giving the input sentence to an NN, we ask it to predict only one word, the word that will likely be the primary word of the output sentence. We are able to represent this as:

Image by writer

In our letter sending example, we will write this as:

NN Translation Function (Embeddings for 'Ashok sent a letter to Sagar' 
+ context from input sentence)
= Embedding for 'Ashok'

To get the second word in output, we alter our input to:

Input = Embeddings for input sentence + Embedding for first word in output

Now we have to also include this recent input within the context:

Context = Context from input sentence + context from first word in output

The NN will then predict the following (second) word within the sentence:

This could be written as:

NN Translation Function 
(Embeddings for 'Ashok send a letter to Sagar + Embedding for 'Ashok',
Context for input sentence + context for 'Ashok')
= Embedding for 'ne'

We are able to proceed this process till the NN predicts the embedding for ‘.’, in other words, it signals that the output has ended.

Now we have thus reduced the interpretation problem to the ‘predict the following word’ problem. In the following step we’ll see how this approach to translation leads us to a more generic Generative AI capability.

The ‘prediction of next word’ method just isn’t limited to translation. The NN could be trained to predict the following word in such a way that the output is the reply to a matter, or an essay on the subject you specify. Imagine that the input to such Generative NN is the sentence:

‘Write an article on the effect of world warming on the Himalayan glaciers’.

The input to the generative model known as a ‘prompt’. The GNN predicts the primary word of the article, after which goes on predicting further words till it generates an entire nice essay. That is what the Large Language Models akin to ChatGPT do. As you may imagine, the internals of such models are way more complex than what I even have described here. But they contain the identical essential components that now we have seen: embeddings, attention and a next-word-prediction NN.

Image by writer

Other than translation and content generation, LLMs may answer questions, plan travel and do many other wonderful things. The essential method utilized in all these tasks stays prediction of the following word.

We began from the essential prediction technique using LinReg. We made the prediction problem more complex by adding vectors after which word embeddings. We learned find out how to apply prediction to language by solving the naive translation problem. While enhancing the naive method to handle real translation we got oriented with the essential elements of LLMs: context and next word prediction. We realized that Large Language Models are all about prediction on text sequences. We got conversant in essential terms akin to prompt, embeddings, attention and transformers.

The sequences don’t should be text really. We are able to have any sequence akin to images or sounds. The message of this text, which thus covers the entire gamut of generative AI, is:

Generative AI is prediction on sequences using neural networks.

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

3 COMMENTS

0 0 votes
Article Rating
guest
3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

3
0
Would love your thoughts, please comment.x
()
x