Ten Years of AI in Review

-

From image classification to chatbot therapy

Yearly timeline, from 2012 to 2023, highlighting the most significant advances in AI.
Image by the Creator.

The last decade has been an exhilarating and eventful ride for the sector of artificial intelligence (AI). Modest explorations of the potential of deep learning was an explosive proliferation of a field that now includes the whole lot from recommender systems in e-commerce to object detection for autonomous vehicles and generative models that may create the whole lot from realistic images to coherent text.

In this text, we’ll take a walk down memory lane and revisit a number of the key breakthroughs that got us to where we’re today. Whether you might be a seasoned AI practitioner or just all in favour of the newest developments in the sector, this text will offer you a comprehensive overview of the remarkable progress that led AI to develop into a household name.

2013: AlexNet and Variational Autoencoders

The yr 2013 is widely thought to be the “coming-of-age” of deep learning, initiated by major advances in computer vision. In response to a recent interview of Geoffrey Hinton, by 2013 “just about all the pc vision research had switched to neural nets”. This boom was primarily fueled by a fairly surprising breakthrough in image recognition one yr earlier.

In September 2012, AlexNet, a deep convolutional neural network (CNN), pulled off a record-breaking performance within the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), demonstrating the potential of deep learning for image recognition tasks. It achieved a top-5 error of 15.3%, which was 10.9% lower than that of its nearest competitor.

Bar chart showing the top-5 errors of various teams that participated in the 2012 ImageNet challenge.
Image by the Creator.

The technical improvements behind this success were instrumental for the longer term trajectory of AI and dramatically modified the best way deep learning was perceived.

First, the authors applied a deep CNN consisting of 5 convolutional layers and three fully-connected linear layers — an architectural design dismissed by many as impractical on the time. Furthermore, as a result of the massive variety of parameters produced by the network’s depth, training was done in parallel on two graphics processing units (GPUs), demonstrating the flexibility to significantly speed up training on large datasets. Training time was further reduced by swapping traditional activation functions, corresponding to sigmoid and tanh, for the more efficient rectified linear unit (ReLU).

Image showing the activation functions sigmoid, tanh, and ReLU.
Image by the Creator.

These advances that collectively led to the success of AlexNet marked a turning point within the history of AI and sparked a surge of interest in deep learning amongst each academics and the tech community. Consequently, 2013 is taken into account by many because the inflection point after which deep learning truly began to take off.

Also happening in 2013, albeit a bit of drowned out by the noise of AlexNet, was the event of variational autoencoders, or VAEs — generative models that may learn to represent and generate data corresponding to images and sounds. They work by learning a compressed representation of the input data in a lower-dimensional space, generally known as latent space. This enables them to generate recent data by sampling from this learned latent space. VAEs, in a while, turned out to open up recent avenues for generative modeling and data generation, with applications in fields like art, design, and gaming.

2014: Generative Adversarial Networks

The next yr, in June 2014, the sector of deep learning witnessed one other serious advance with the introduction of generative adversarial networks, or GANs, by Ian Goodfellow and colleagues.

GANs are a style of neural network able to generating recent data samples which can be just like a training set. Essentially, two networks are trained concurrently: (1) a generator network generates fake, or synthetic, samples, and (2) a discriminator network evaluates their authenticity. This training is performed in a game-like setup, with the generator attempting to create samples that idiot the discriminator, and the discriminator attempting to accurately call out the fake samples.

At the moment, GANs represented a strong and novel tool for data generation, getting used not just for generating images and videos, but in addition music and art. Additionally they contributed to the advance of unsupervised learning, a site largely thought to be underdeveloped and difficult, by demonstrating the chance to generate high-quality data samples without counting on explicit labels.

2015: ResNets and NLP Breakthroughs

In 2015, the sector of AI made considerable advances in each computer vision and natural language processing, or NLP.

Kaiming He and colleagues published a paper titled “Deep Residual Learning for Image Recognition”, wherein they introduced the concept of residual neural networks, or ResNets — architectures that allow information to flow more easily through the network by adding shortcuts. Unlike in an everyday neural network, where each layer takes the output of the previous layer as input, in a ResNet, additional residual connections are added that skip a number of layers and directly connect with deeper layers within the network.

Consequently, ResNets were capable of solve the issue of vanishing gradients, which enabled the training of much deeper neural networks beyond what was considered possible on the time. This, in turn, led to significant improvements in image classification and object recognition tasks.

At around the identical time, researchers made considerable progress with the event of recurrent neural networks (RNNs) and long short-term memory (LSTM) models. Despite having been around for the reason that Nineties, these models only began to generate some buzz around 2015, mainly as a result of aspects corresponding to (1) the provision of larger and more diverse datasets for training, (2) improvements in computational power and hardware, which enabled the training of deeper and more complex models, and (3) modifications made along the best way, corresponding to more sophisticated gating mechanisms.

Consequently, these architectures made it possible for language models to raised understand the context and meaning of text, resulting in vast improvements in tasks corresponding to language translation, text generation, and sentiment evaluation. The success of RNNs and LSTMs around that point paved the best way for the event of enormous language models (LLMs) we see today.

2016: AlphaGo

After Garry Kasparov’s defeat by IBM’s Deep Blue in 1997, one other human vs. machine battle sent shockwaves through the gaming world in 2016: Google’s AlphaGo defeated the world champion of Go, Lee Sedol.

An image showing the board and stones of the game Go.
Photo by Elena Popova on Unsplash.

Sedol’s defeat marked one other major milestone within the trajectory of AI advancement: it demonstrated that machines could outsmart even probably the most expert human players in a game that was once considered too complex for computers to handle. Using a mix of deep reinforcement learning and Monte Carlo tree search, AlphaGo analyzes thousands and thousands of positions from previous games and evaluates the perfect possible moves — a method that far surpasses human decision making on this context.

2017: Transformer Architecture and Language Models

Arguably, 2017 was probably the most pivotal yr that laid the inspiration for the breakthroughs in generative AI that we’re witnessing today.

In December 2017, Vaswani and colleagues released the foundational paper “Attention is all you would like”, which introduced the transformer architecture that leverages the concept of self-attention to process sequential input data. This allowed for more efficient processing of long-range dependencies, which had previously been a challenge for traditional RNN architectures.

An image showing two transformer figures.
Photo by Jeffery Ho on Unsplash.

Transformers are comprised of two essential components: encoders and decoders. The encoder is liable for encoding the input data, which, for instance, could be a sequence of words. It then takes the input sequence and applies multiple layers of self-attention and feed-forward neural nets to capture the relationships and features inside the sentence and learn meaningful representations.

Essentially, self-attention allows the model to know relationships between different words in a sentence. Unlike traditional models, which might process words in a hard and fast order, transformers actually examine all of the words without delay. They assign something called attention scores to every word based on its relevance to other words within the sentence.

The decoder, then again, takes the encoded representation from the encoder and produces an output sequence. In tasks corresponding to machine translation or text generation, the decoder generates the translated sequence based on the input received from the encoder. Just like the encoder, the decoder also consists of multiple layers of self-attention and feed-forward neural nets. Nonetheless, it includes a further attention mechanism that permits it to deal with the encoder’s output. This then allows the decoder to take into consideration relevant information from the input sequence while generating the output.

The transformer architecture has since develop into a key component in the event of LLMs and has led to significant improvements across the domain of NLP, corresponding to machine translation, language modeling, and query answering.

2018: GPT-1, BERT and Graph Neural Networks

Just a few months after Vaswani et al. published their foundational paper, the enerative retrained ransformer, or GPT-1, was introduced by OpenAI in June 2018, which utilized the transformer architecture to effectively capture long-range dependencies in text. GPT-1 was certainly one of the primary models to exhibit the effectiveness of unsupervised pre-training followed by fine-tuning on specific NLP tasks.

Also making the most of the still quite novel transformer architecture was Google, who, in late 2018, released and open-sourced their very own pre-training method called idirectional ncoder epresentations from ransformers, or BERT. Unlike previous models that process text in a unidirectional manner (including GPT-1), BERT considers the context of every word in each directions concurrently. For example this, the authors provide a really intuitive example:

… within the sentence “I accessed the checking account”, a unidirectional contextual model would represent “bank” based on “I accessed the” but not “account”. Nonetheless, BERT represents “bank” using each its previous and next context — “I accessed the … account” — ranging from the very bottom of a deep neural network, making it deeply bidirectional.

The concept of bidirectionality was so powerful that it led BERT to outperform state-of-the-art NLP systems on quite a lot of benchmark tasks.

Other than GPT-1 and BERT, graph neural networks, or GNNs, also made some noise that yr. They belong to a category of neural networks which can be specifically designed to work with graph data. GNNs utilize a message passing algorithm to propagate information across the nodes and edges of a graph. This allows the network to learn the structure and relationships of the information in a way more intuitive way.

This work allowed for the extraction of much deeper insights from data and, consequently, broadened the range of problems that deep learning might be applied to. With GNNs, major advances were made possible in areas like social network evaluation, suggestion systems, and drug discovery.

2019: GPT-2 and Improved Generative Models

The yr 2019 marked several notable advancements in generative models, particularly the introduction of GPT-2. This model really left its peers within the dust by achieving state-of-the-art performance in lots of NLP tasks and, as well as, was capable to generate highly realistic text, which, in hindsight, gave us a teaser of what was about to are available in this arena.

Other improvements on this domain included DeepMind’s BigGAN, which generated high-quality images that were almost indistinguishable from real images, and NVIDIA’s StyleGAN, which allowed for higher control over the looks of those generated images.

Collectively, these advancements in what’s now generally known as generative AI pushed the boundaries of this domain even further, and…

2020: GPT-3 and Self-Supervised Learning

… not soon thereafter, one other model was born, which has develop into a household name even outside of the tech community: GPT-3. This model represented a serious step forward in the dimensions and capabilities of LLMs. To place things into context, GPT-1 sported a measly 117 million parameters. That number went as much as 1.5 billion for GPT-2, and 175 billion for GPT-3.

This vast amount of parameter space enables GPT-3 to generate remarkably coherent text across a big selection of prompts and tasks. It also demonstrated impressive performance in quite a lot of NLP tasks, corresponding to text completion, query answering, and even creative writing.

Furthermore, GPT-3 highlighted again the potential of using self-supervised learning, which allows models to be trained on large amounts of unlabeled data. This has the advantage that these models can acquire a broad understanding of language without the necessity for extensive task-specific training, which makes it way more economical.

Yann LeCun tweets about an NYT article on self-supervised learning.

2021: AlphaFold 2, DALL·E and GitHub Copilot

From protein folding to image generation and automatic coding assistance, the yr of 2021 was an eventful one due to the releases of AlphaFold 2, DALL·E, and GitHub Copilot.

AlphaFold 2 was hailed as a long-awaited solution to the decades-old protein folding problem. DeepMind’s researchers prolonged the transformer architecture to create evoformer blocks — architectures that leverage evolutionary strategies for model optimization — to construct a model able to predicting a protein’s 3D structure based on its 1D amino acid sequence. This breakthrough has enormous potential to revolutionize areas like drug discovery, bioengineering, in addition to our understanding of biological systems.

OpenAI also made it within the news again this yr with their release of DALL·E. Essentially, this model combines the concepts of GPT-style language models and image generation to enable the creation of high-quality images from textual descriptions.

For example how powerful this model is, consider the image below, which was generated with the prompt “Oil painting of a futuristic world with flying cars”.

An AI-produced image showing a city with flying cars above it.
Image produced by DALL·E.

Lastly, GitHub released what would later develop into every developers best friend: Copilot. This was achieved in collaboration with OpenAI, which provided the underlying language model, Codex, that was trained on a big corpus of publicly available code and, in turn, learned to know and generate code in various programming languages. Developers can use Copilot by simply providing a code comment stating the issue they try to unravel, and the model would then suggest code to implement the answer. Other features include the flexibility to explain input code in natural language and translate code between programming languages.

2022: ChatGPT and Stable Diffusion

The rapid development of AI over the past decade has culminated in a groundbreaking advancement: OpenAI’s ChatGPT, a chatbot that was released into the wild in November 2022. The tool represents a cutting-edge achievement in NLP, able to generating coherent and contextually relevant responses to a big selection of queries and prompts. Moreover, it could actually engage in conversations, provide explanations, offer creative suggestions, assist with problem-solving, write and explain code, and even simulate different personalities or writing styles.

A screenshot from ChatGPT showing its ability to understand and explain Python code.
Image by the Creator.

The easy and intuitive interface through which one can interact with the bot also stimulated a pointy rise in usability. Previously, it was mostly the tech community that may mess around with the newest AI-based inventions. Nonetheless, lately, AI tools have penetrated almost every skilled domain, from software engineers to writers, musicians, and advertisers. Many firms are also using the model to automate services corresponding to customer support, language translation, or answering FAQs. Actually, the wave of automation we’re seeing has rekindled some worries and stimulated discussions on which jobs could be vulnerable to being automated.

Although ChatGPT was taking on much of the limelight in 2022, there was also a major advancement made in image generation. Stable diffusion, a latent text-to-image diffusion model able to generating photo-realistic images from text descriptions, was released by Stability AI.

Stable diffusion is an extension of the normal diffusion models, which work by iteratively adding noise to pictures after which reversing the method to recuperate the information. It was designed to hurry up this process by operating indirectly on the input images, but as a substitute on a lower-dimensional representation, or latent space, of them. As well as, the diffusion process is modified by adding the transformer-embedded text prompt from the user to the network, allowing it to guide the image generation process throughout each iteration.

Overall, the discharge of each ChatGPT and Stable Diffusion in 2022 highlighted the potential of multimodal, generative AI and sparked an enormous boost of further development and investment on this area.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x