Generative AI: A Beginner’s Guide Intro What’s Generative AI? Text Images — Video — Audio Why now? What are the issues in NLP space Limitation of RNN and LSTM models Transformation Brought by Transformers Application of Transformers in Multiple Modalities A Basic Introduction to Transformers 1. Tokenisation: (Input and Output) 2. Attention 3. Encoder Decoder Transformers Models in the present day How do I exploit these models? Evaluation of LLM How are these models trained? Constructing Applications Basics of Prompt Engineering Plugins and Augmentation Nice Tuning References

Artificial Intelligence

Generative AI: A Beginner’s Guide Intro What’s Generative AI? Text Images — Video — Audio Why now? What are the issues in NLP space Limitation of RNN and LSTM models Transformation Brought by Transformers Application of Transformers in Multiple Modalities A Basic Introduction to Transformers 1. Tokenisation: (Input and Output) 2. Attention 3. Encoder Decoder Transformers Models in the present day How do I exploit these models? Evaluation of LLM How are these models trained? Constructing Applications Basics of Prompt Engineering Plugins and Augmentation Nice Tuning References

admin

June 16, 2023

Generative AI: A Beginner’s Guide
Intro
What’s Generative AI?
Text
Images — Video — Audio
Why now?
What are the issues in NLP space
Limitation of RNN and LSTM models
Transformation Brought by Transformers
Application of Transformers in Multiple Modalities
A Basic Introduction to Transformers
1. Tokenisation: (Input and Output)
2. Attention
3. Encoder Decoder
Transformers Models in the present day
How do I exploit these models?
Evaluation of LLM
How are these models trained?
Constructing Applications
Basics of Prompt Engineering
Plugins and Augmentation
Nice Tuning
References

In December 2022, ChatGPT was introduced. The introduction of this AI chatbot marked a turning point within the history of technology. Its rapid growth surpassed that of every other platform in history and it sparked a revolution in the sector of generative AI applications. This recent wave has , from healthcare to finance to entertainment. Because of this, generative AI technologies have many potential uses, and their impact on society continues to be being explored.

ChatGPT reached 100 million monthly lively users in 2 months

The sphere of generative AI has been rapidly evolving lately, with . These advancements have led to recent possibilities and applications for generative AI, akin to within the fields of natural language processing, computer vision, and music generation. Moreover, the increasing availability of knowledge and computing power has allowed for more complex and complicated models to be developed, resulting in even greater potential for generative AI in the long run. As this field continues to grow and develop, it’s going to be exciting to see what recent breakthroughs will emerge and the way they’ll shape our world.

This surge of interest in generative AI has led to the emergence of many startups which provide quite a lot of services that use this technology.

So Analytical AI, also often called traditional AI, refers back to the use of machines to analyse existing data and discover patterns or make predictions for various applications akin to fraud detection or content recommendations. It focuses on analysing and processing available information. However, generative AI is a field that involves machines generating recent data, akin to images, text, or music, based on learned patterns and models. Let’s take some examples.

The flexibility of language models to

These models not only have language generation capabilities but additionally have language understanding capabilities. Language understanding is a strong tool that will be used to enhance the capabilities of software systems in some ways. A few of an important advantages of language understanding include improved summarisation, neural search, and text categorisation.

Along with these advantages, language understanding may also be used to enhance the user experience of software systems in lots of other ways. For instance, language understanding will be used to supply natural language interfaces, which permit users to interact with software systems using natural language. This could make software systems more accessible and easier to make use of.

AI image generation is one other exciting area within the generative AI space. In that domain, models like DALL-E, MidJourney, and Stable Diffusion have taken social media by storm.

Examples https://www.midjourney.com/top/

Generative AI has been within the works for a while now but over the past few years, the entire generative AI ecosystem has undergone significant development. Nevertheless, to completely understand the present state of affairs and appreciate the complete potential of generative AI, it’s important to delve into the advancements made in the sector of natural language processing. The appearance of has played a vital role on this regard. Through using transformers, AI can now process and generate language, images, and videos, and work in multiple modalities combined

Evolution of Natural Language Processing Domain

To effectively solve problems within the NLP space, a machine learning practitioner encounters several challenges:

: Human language is nuanced, ambiguous, and context-dependent. Subsequently, it poses a major challenge for machine learning models to grasp and generate coherent and meaningful text.
: In lots of instances, the meaning of a sentence or a phrase is heavily depending on the context established much earlier within the text. Traditional NLP models struggle with maintaining and understanding these long-term dependencies.
: Large-scale text processing requires significant computational resources, making it difficult to scale traditional NLP systems for larger tasks.
: The models often struggle to generalize their understanding of language across different tasks, genres, and languages.

For a very long time, we’ve tried to resolve it with Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) models, which were once the cornerstone of NLP tasks, but they carry certain limitations:

: RNNs and LSTMs process data sequentially, which is computationally expensive, especially for long sequences. This makes them ill-suited for processing large texts or handling real-time applications.
: Although LSTMs mitigate the vanishing gradient problem to some extent, they don’t completely overcome it. This issue hampers the model’s ability to learn long-term dependencies.
: As a result of their inherent sequential nature, these models can’t be easily parallelised, limiting their training efficiency on modern hardware.

Transformers have revolutionized the NLP space by overcoming the constraints of RNN and LSTM based models:

: Transformers introduced the concept of “attention,” which allows the model to weigh the importance of various parts of the input when generating the output. This mechanism effectively solves the long-term dependency issue.
: Unlike RNNs and LSTMs, transformers process all the info points within the input sequence concurrently, allowing for efficient parallelization and speeding up training times.
: Transformers can handle larger sequences of knowledge more effectively than their predecessors, making them more scalable for large-scale NLP tasks.
: With these features, transformers have shown superior performance on quite a lot of NLP tasks, akin to translation, summarization, and sentiment evaluation.

The unique features of transformers make them apt for applications beyond text and across different modalities, akin to images, audio, and video:

: Transformers can process images by treating them as a sequence of pixels or patches. This has led to impressive leads to tasks like image classification and generation.
: Within the domain of audio, transformers have been used for speech recognition, music generation, and even audio synthesis.
: For videos, which will be viewed as sequences of images, transformers are capable of handle temporal dependencies between frames, enabling tasks like video classification and generation.
: Transformers can process and relate information across different modalities, resulting in breakthroughs in areas like automatic captioning and image-text co-generation.

In conclusion, the arrival of transformers has been instrumental in pushing the boundaries of what is feasible with generative AI. By enabling advanced capabilities in natural language processing and lengthening those to other modalities, transformers have truly transformed the landscape of AI research and applications.

Transformers are a kind of model architecture introduced within the paper “Attention is All You Need” by Vaswani et al., from Google Brain, in 2017. They’re particularly successful in quite a lot of tasks and have been the idea for plenty of high-profile models.

The architecture of transformer models

To know the architecture let’s simplify it and break it down into components

There are 3 principal components we are able to dive a bit deeper about:

ML models don’t understand words but they understand numbers.

That is the means of splitting text into individual words or subwords, that are called tokens (numbers). This is commonly step one in NLP pipelines

The sentence is broken into words and every word is assigned a set number.

The transformer model relies on a mechanism called to weigh the importance of various words or elements in an input sequence when generating an output. This implies they’re able to modeling complex patterns and dependencies in data, including long-range dependencies.

So each input word is represented as a token. The eye mechanism then calculates a weight for every token, based on its relevance to the present token. For instance:

Within the sentence the word “it” refers back to the animal, and we are able to see within the figure below that the model has learned that the very best attention ought to be on the word “animal”

The model then based on the eye on the text given predicts the subsequent word. It generates words one after the other as we see in ChatGPT UI

When the word “My” is given as input and the model outputs “Name” in step one
Within the second step, the model predicts the word “is” based on the context of the input word (“My”) and the words already generated, which on this case is “name”

The encoder processes the input text, converting it right into a meaningful representation, while the decoder generates an output sequence based on that representation, facilitating tasks like machine translation, text summarization, and query answering

With an unlimited array of models available out there, it becomes difficult to find out which of them to choose for our use case.

The principal specifications of recent LLMs (large language models) that we are able to look into to know the variability of models are:

The utmost variety of tokens that will be considered when predicting the subsequent token. Commonly available context lengths are 2K and 4K. Largest thus far are ~65K and 100K
The variety of unique tokens that the model can understand
The variety of learnable weights within the model. This will be within the billions and even trillions of parameters. Note: Parameters count isn’t an indicator of performance
The variety of tokens that the model was trained on. This will be within the tons of of billions and even trillions of tokens.

These specifications have been increasing rapidly lately, as LLMs have develop into more powerful and capable

The latest offerings come as an entire package that handles the tokenisation and generation aspect. It takes input as text and outputs generated text

Example of transformers library in Hugging Face

from transformers import pipelinegenerator = pipeline("text-generation")
generator("On this course, we are going to teach you easy methods to")

Output: 
[{'generated_text':
'In this course, we will teach you how to understand and use '
'data flow and data interchange when handling user data. We '
'will be working with one or more of the most commonly used '
'data flows — data flows of various types, as seen by the HTTP'
}]
---

The above example uses gpt2 as its model, but we are able to easily swap it and use every other model available on Hugging Face Model Hub

The Hugging Face NLP Course covers the principal concepts, working with models and datasets, and tackling NLP tasks using Transformers for speech processing and computer vision. The course goals to organize learners to use 🤗 Transformers to numerous machine-learning problems.

As a better alternative corporations like OpenAI, Google, Anthropic, Cohere, and lots of others have APIs available for these models that will be integrated into AI workflows without the necessity of LLM Ops

LMsys has a Chatbot Arena, which is a benchmarking platform for big language models (LLMs) that features anonymous, randomised battles in a crowdsourced manner.
The HF Open LLM Leaderboard and C-Eval Benchmark goals to trace, rank and evaluate LLMs and chatbots as they’re released by mechanically running multiple benchmark tests

A ChatGPT like assistant gets trained in multiple steps:

The model learns the statistical relationships between words and phrases on this step. Many of the training work happens on this step
Involves training the model on a big corpus of publicly available text from the web
Requires a considerable amount of GPU computing to coach such models (100–1000+ GPUs)
Uses unsupervised learning, meaning the model learns to predict the subsequent word in a sentence, thus understanding the structure of the language
Ends in a “base model” that has a general understanding of language but no specific expertise

This will be used to enhance the performance of the pre-trained model on a particular task
Requires low-volume and high-quality data
With the supervised fine-tuning approach ability to have interaction in a dialogue or chat will be introduced to a base model

The reward comes from human evaluators, who’re supplied with multiple responses from a single prompt and must rating each of them based on the relative quality of the response
The model is trained to predict the reward together with the generated text
Reward modeling will be used to enhance the performance of the models on quite a lot of tasks, akin to generating creative text formats, translating languages, and writing different sorts of creative content

It’s used together with reward modeling to boost the model’s ability to generate text with higher rewards consistently

For training these models an enormous amount of language data is used. The dataset comprises of knowledge from multiple sources and is named an information mixture. Example of Data Mixture

Multiple offerings of models can be found as API and are able to use for the bottom layer

For the center layer, the next techniques will be used to adapt the bottom models for any custom use cases:

: Guide the model to the specified outcomes
: Connect models to make use of tools like calculator, wolfram, custom APIs
: Augment the input context with proprietary data
Constructing a custom model for specific use cases

In essence, consider prompting as writing code (pseudo code) in English. Use instructions, and conditions and specify the specified outputs

To perform tasks that the models weren’t explicitly trained on like summarisation, we are able to provide a prompt that describes the specified output.

For instance if “In summary,” doesn’t result in a superb generation, we should want to try “To summarise in plain language,“ or “The principal point to take from this text is that”.

To enhance the accuracy of generation, it’s essential to explain the specified output with examples of the way it should look. It will help the model understand what’s expected.

Augmentation involves loading context or information into the working memory of LLMs. The necessity for augmentation arises because LLM models have a training cutoff date, with OpenAI models having a cutoff of September 2021. To access any content that’s newer than that, models can use an online browser plugin to achieve that knowledge for the conversation. The identical is true for proprietary content which the models haven’t been trained on, but we are able to use augmentation methods to achieve access to it.

In the case of enhancing the capabilities of a language model, there are several techniques that will be employed to enhance its context. By expanding the context, we are able to potentially improve the model’s understanding and generate more accurate and relevant responses. Listed here are three common approaches for augmenting the context:

Chains: Augment with more LLM calls
Tools: Augment with an outdoor source
Retrieval: Augment with an even bigger corpus

Chains are a strategy to augment the context of a language model by chaining together multiple calls to the model. This will be done by utilizing the output of 1 call to the model because the input to the subsequent call. For instance, you would use the output of a call to the model to generate a listing of possible answers to an issue, after which use the output of one other call to the model to pick the most effective answer from the list. To perform deliberate decision-making by considering multiple different reasoning paths, one can use Self-consistency CoT and the Tree of Thought method as shown within the figure.

One other strategy to expand the context is by leveraging external tools or resources. These tools can provide supplementary information to the model, allowing it to attract from a wider range of data. For example, the model can access APIs, or serps to retrieve real-time information or gather specific data relevant to the conversation. By incorporating these external sources, the model can offer more accurate and up-to-date responses that transcend its pre-trained knowledge.

Retrieval involves finding relevant data from a big dataset. In language models, retrieval finds similar text using vector databases. These databases store vectors and use techniques like indexing, similarity measures, and approximate seek for efficient retrieval. For instance, to search out text related to “artificial intelligence,” a vector database would index the dataset, calculate similarity distances, and return similar vectors. Retrieval with vector databases improves search and retrieval speed and accuracy for various data types like text and pictures.

Different techniques for augmenting the context of a language model have their very own pros and cons. Chains are efficient but difficult to regulate, tools are powerful but difficult to integrate, and retrieval is effective but expensive. The selection of technique depends upon the precise application: chains for speed, tools for accuracy, and retrieval for knowledge-intensive tasks. Augmented language models, overall, are a worthwhile tool to boost task performance. By employing the suitable techniques, language models can develop into more accurate, informative, and efficient.

Finetuning large language models (LLMs) will be used to adapt it for any custom use case. It’s becoming increasingly accessible because of plenty of recent advances, including:

, uses a smaller variety of parameters to realize similar performance to traditional finetuning. This could make finetuning cheaper and efficient.
, which uses lower-precision numbers to represent the model’s weights, which might further reduce the price of inference.
, akin to LLaMA, which will be used as a place to begin for finetuning, reducing the quantity of knowledge and expertise required.

This includes:

, which is obligatory for training the model.
, akin to a GPU, to coach the model.
, which will be complex and time-consuming.

Finally, Generative AI is a rapidly growing field with the potential to revolutionize many points of our lives. By learning from data, generative AI models can create recent content, akin to images, text, and music. Nevertheless, generative AI continues to be in its early stages of development, and there are plenty of limitations that have to be addressed. One limitation is that generative AI models will be biased, reflecting the biases that exist in the info they’re trained on. One other limitation is that generative AI models will be computationally expensive to coach, which limits their accessibility to smaller organizations and individuals.

Despite these limitations, generative AI is a promising technology with the potential to make a major impact on our world. Because the technology continues to develop, we are able to expect to see much more creative and modern applications of generative AI

https://www.sequoiacap.com/article/generative-ai-a-creative-new-world/

http://jalammar.github.io/generative-ai-and-ai-product-moats/

http://jalammar.github.io/how-gpt3-works-visualizations-animations/

http://jalammar.github.io/illustrated-transformer/

https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/

https://docs.cohere.com/docs/prompt-engineering

https://lifearchitect.ai/gpt-4/

https://youtu.be/bZQun8Y4L2A