Ten Years of AI in Review

Artificial Intelligence

Ten Years of AI in Review

admin

May 28, 2023

From image classification to chatbot therapy

Yearly timeline, from 2012 to 2023, highlighting the most significant advances in AI. — Image by the Creator.

The last decade has been an exciting and eventful ride for the sector of artificial intelligence (AI). Modest explorations of the potential of deep learning changed into an explosive proliferation of a field that now includes every thing from recommender systems in e-commerce to object detection for autonomous vehicles and generative models that may create every thing from realistic images to coherent text.

In this text, we’ll take a walk down memory lane and revisit a number of the key breakthroughs that got us to where we’re today. Whether you might be a seasoned AI practitioner or just all for the most recent developments in the sector, this text will offer you a comprehensive overview of the remarkable progress that led AI to turn into a household name.

2013: AlexNet and Variational Autoencoders

The yr 2013 is widely thought to be the “coming-of-age” of deep learning, initiated by major advances in computer vision. Based on a recent interview of Geoffrey Hinton, by 2013 “just about all the pc vision research had switched to neural nets”. This boom was primarily fueled by a slightly surprising breakthrough in image recognition one yr earlier.

In September 2012, AlexNet, a deep convolutional neural network (CNN), pulled off a record-breaking performance within the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), demonstrating the potential of deep learning for image recognition tasks. It achieved a top-5 error of 15.3%, which was 10.9% lower than that of its nearest competitor.

Bar chart showing the top-5 errors of various teams that participated in the 2012 ImageNet challenge. — Image by the Creator.

The technical improvements behind this success were instrumental for the long run trajectory of AI and dramatically modified the best way deep learning was perceived.

First, the authors applied a deep CNN consisting of 5 convolutional layers and three fully-connected linear layers — an architectural design dismissed by many as impractical on the time. Furthermore, resulting from the massive variety of parameters produced by the network’s depth, training was done in parallel on two graphics processing units (GPUs), demonstrating the power to significantly speed up training on large datasets. Training time was further reduced by swapping traditional activation functions, equivalent to sigmoid and tanh, for the more efficient rectified linear unit (ReLU).

Image showing the activation functions sigmoid, tanh, and ReLU. — Image by the Creator.

These advances that collectively led to the success of AlexNet marked a turning point within the history of AI and sparked a surge of interest in deep learning amongst each academics and the tech community. Consequently, 2013 is taken into account by many because the inflection point after which deep learning truly began to take off.

Also happening in 2013, albeit a bit of drowned out by the noise of AlexNet, was the event of variational autoencoders, or VAEs — generative models that may learn to represent and generate data equivalent to images and sounds. They work by learning a compressed representation of the input data in a lower-dimensional space, often called latent space. This enables them to generate latest data by sampling from this learned latent space. VAEs, in a while, turned out to open up latest avenues for generative modeling and data generation, with applications in fields like art, design, and gaming.

2014: Generative Adversarial Networks

The next yr, in June 2014, the sector of deep learning witnessed one other serious advance with the introduction of generative adversarial networks, or GANs, by Ian Goodfellow and colleagues.

GANs are a kind of neural network able to generating latest data samples which are much like a training set. Essentially, two networks are trained concurrently: (1) a generator network generates fake, or synthetic, samples, and (2) a discriminator network evaluates their authenticity. This training is performed in a game-like setup, with the generator attempting to create samples that idiot the discriminator, and the discriminator attempting to appropriately call out the fake samples.

At the moment, GANs represented a strong and novel tool for data generation, getting used not just for generating images and videos, but additionally music and art. Additionally they contributed to the advance of unsupervised learning, a website largely thought to be underdeveloped and difficult, by demonstrating the likelihood to generate high-quality data samples without counting on explicit labels.

2015: ResNets and NLP Breakthroughs

In 2015, the sector of AI made considerable advances in each computer vision and natural language processing, or NLP.

Kaiming He and colleagues published a paper titled “Deep Residual Learning for Image Recognition”, during which they introduced the concept of residual neural networks, or ResNets — architectures that allow information to flow more easily through the network by adding shortcuts. Unlike in a daily neural network, where each layer takes the output of the previous layer as input, in a ResNet, additional residual connections are added that skip a number of layers and directly connect with deeper layers within the network.

Consequently, ResNets were in a position to solve the issue of vanishing gradients, which enabled the training of much deeper neural networks beyond what was regarded as possible on the time. This, in turn, led to significant improvements in image classification and object recognition tasks.

At around the identical time, researchers made considerable progress with the event of recurrent neural networks (RNNs) and long short-term memory (LSTM) models. Despite having been around for the reason that Nineties, these models only began to generate some buzz around 2015, mainly resulting from aspects equivalent to (1) the provision of larger and more diverse datasets for training, (2) improvements in computational power and hardware, which enabled the training of deeper and more complex models, and (3) modifications made along the best way, equivalent to more sophisticated gating mechanisms.

Consequently, these architectures made it possible for language models to higher understand the context and meaning of text, resulting in vast improvements in tasks equivalent to language translation, text generation, and sentiment evaluation. The success of RNNs and LSTMs around that point paved the best way for the event of huge language models (LLMs) we see today.

2016: AlphaGo

After Garry Kasparov’s defeat by IBM’s Deep Blue in 1997, one other human vs. machine battle sent shockwaves through the gaming world in 2016: Google’s AlphaGo defeated the world champion of Go, Lee Sedol.

An image showing the board and stones of the game Go. — Photo by Elena Popova on Unsplash.

Sedol’s defeat marked one other major milestone within the trajectory of AI advancement: it demonstrated that machines could outsmart even probably the most expert human players in a game that was once considered too complex for computers to handle. Using a mix of deep reinforcement learning and Monte Carlo tree search, AlphaGo analyzes thousands and thousands of positions from previous games and evaluates one of the best possible moves — a technique that far surpasses human decision making on this context.

2017: Transformer Architecture and Language Models

Arguably, 2017 was probably the most pivotal yr that laid the muse for the breakthroughs in generative AI that we’re witnessing today.

In December 2017, Vaswani and colleagues released the foundational paper “Attention is all you would like”, which introduced the transformer architecture that leverages the concept of self-attention to process sequential input data. This allowed for more efficient processing of long-range dependencies, which had previously been a challenge for traditional RNN architectures.

An image showing two transformer figures. — Photo by Jeffery Ho on Unsplash.

Transformers are comprised of two essential components: encoders and decoders. The encoder is answerable for encoding the input data, which, for instance, generally is a sequence of words. It then takes the input sequence and applies multiple layers of self-attention and feed-forward neural nets to capture the relationships and features throughout the sentence and learn meaningful representations.

Essentially, self-attention allows the model to know relationships between different words in a sentence. Unlike traditional models, which might process words in a set order, transformers actually examine all of the words without delay. They assign something called attention scores to every word based on its relevance to other words within the sentence.

The decoder, alternatively, takes the encoded representation from the encoder and produces an output sequence. In tasks equivalent to machine translation or text generation, the decoder generates the translated sequence based on the input received from the encoder. Much like the encoder, the decoder also consists of multiple layers of self-attention and feed-forward neural nets. Nevertheless, it includes a further attention mechanism that allows it to concentrate on the encoder’s output. This then allows the decoder to have in mind relevant information from the input sequence while generating the output.

The transformer architecture has since turn into a key component in the event of LLMs and has led to significant improvements across the domain of NLP, equivalent to machine translation, language modeling, and query answering.

2018: GPT-1, BERT and Graph Neural Networks

Just a few months after Vaswani et al. published their foundational paper, the enerative retrained ransformer, or GPT-1, was introduced by OpenAI in June 2018, which utilized the transformer architecture to effectively capture long-range dependencies in text. GPT-1 was certainly one of the primary models to reveal the effectiveness of unsupervised pre-training followed by fine-tuning on specific NLP tasks.

Also profiting from the still quite novel transformer architecture was Google, who, in late 2018, released and open-sourced their very own pre-training method called idirectional ncoder epresentations from ransformers, or BERT. Unlike previous models that process text in a unidirectional manner (including GPT-1), BERT considers the context of every word in each directions concurrently. As an instance this, the authors provide a really intuitive example:

… within the sentence “I accessed the checking account”, a unidirectional contextual model would represent “bank” based on “I accessed the” but not “account”. Nevertheless, BERT represents “bank” using each its previous and next context — “I accessed the … account” — ranging from the very bottom of a deep neural network, making it deeply bidirectional.

The concept of bidirectionality was so powerful that it led BERT to outperform state-of-the-art NLP systems on quite a lot of benchmark tasks.

Other than GPT-1 and BERT, graph neural networks, or GNNs, also made some noise that yr. They belong to a category of neural networks which are specifically designed to work with graph data. GNNs utilize a message passing algorithm to propagate information across the nodes and edges of a graph. This permits the network to learn the structure and relationships of the information in a way more intuitive way.

This work allowed for the extraction of much deeper insights from data and, consequently, broadened the range of problems that deep learning may very well be applied to. With GNNs, major advances were made possible in areas like social network evaluation, advice systems, and drug discovery.

2019: GPT-2 and Improved Generative Models

The yr 2019 marked several notable advancements in generative models, particularly the introduction of GPT-2. This model really left its peers within the dust by achieving state-of-the-art performance in lots of NLP tasks and, as well as, was capable to generate highly realistic text, which, in hindsight, gave us a teaser of what was about to are available in this arena.

Other improvements on this domain included DeepMind’s BigGAN, which generated high-quality images that were almost indistinguishable from real images, and NVIDIA’s StyleGAN, which allowed for higher control over the looks of those generated images.

Collectively, these advancements in what’s now often called generative AI pushed the boundaries of this domain even further, and…

2020: GPT-3 and Self-Supervised Learning

… not soon thereafter, one other model was born, which has turn into a household name even outside of the tech community: GPT-3. This model represented a serious step forward in the dimensions and capabilities of LLMs. To place things into context, GPT-1 sported a measly 117 million parameters. That number went as much as 1.5 billion for GPT-2, and 175 billion for GPT-3.

This vast amount of parameter space enables GPT-3 to generate remarkably coherent text across a big selection of prompts and tasks. It also demonstrated impressive performance in quite a lot of NLP tasks, equivalent to text completion, query answering, and even creative writing.

Furthermore, GPT-3 highlighted again the potential of using self-supervised learning, which allows models to be trained on large amounts of unlabeled data. This has the advantage that these models can acquire a broad understanding of language without the necessity for extensive task-specific training, which makes it much more economical.

Yann LeCun tweets about an NYT article on self-supervised learning.

2021: AlphaFold 2, DALL·E and GitHub Copilot

From protein folding to image generation and automatic coding assistance, the yr of 2021 was an eventful one due to the releases of AlphaFold 2, DALL·E, and GitHub Copilot.

AlphaFold 2 was hailed as a long-awaited solution to the decades-old protein folding problem. DeepMind’s researchers prolonged the transformer architecture to create evoformer blocks — architectures that leverage evolutionary strategies for model optimization — to construct a model able to predicting a protein’s 3D structure based on its 1D amino acid sequence. This breakthrough has enormous potential to revolutionize areas like drug discovery, bioengineering, in addition to our understanding of biological systems.

OpenAI also made it within the news again this yr with their release of DALL·E. Essentially, this model combines the concepts of GPT-style language models and image generation to enable the creation of high-quality images from textual descriptions.

As an instance how powerful this model is, consider the image below, which was generated with the prompt “Oil painting of a futuristic world with flying cars”.

An AI-produced image showing a city with flying cars above it. — Image produced by DALL·E.

Lastly, GitHub released what would later turn into every developers best friend: Copilot. This was achieved in collaboration with OpenAI, which provided the underlying language model, Codex, that was trained on a big corpus of publicly available code and, in turn, learned to know and generate code in various programming languages. Developers can use Copilot by simply providing a code comment stating the issue they are attempting to resolve, and the model would then suggest code to implement the answer. Other features include the power to explain input code in natural language and translate code between programming languages.

2022: ChatGPT and Stable Diffusion

The rapid development of AI over the past decade has culminated in a groundbreaking advancement: OpenAI’s ChatGPT, a chatbot that was released into the wild in November 2022. The tool represents a cutting-edge achievement in NLP, able to generating coherent and contextually relevant responses to a big selection of queries and prompts. Moreover, it could engage in conversations, provide explanations, offer creative suggestions, assist with problem-solving, write and explain code, and even simulate different personalities or writing styles.

A screenshot from ChatGPT showing its ability to understand and explain Python code. — Image by the Creator.

The straightforward and intuitive interface through which one can interact with the bot also stimulated a pointy rise in usability. Previously, it was mostly the tech community that will mess around with the most recent AI-based inventions. Nevertheless, lately, AI tools have penetrated almost every skilled domain, from software engineers to writers, musicians, and advertisers. Many corporations are also using the model to automate services equivalent to customer support, language translation, or answering FAQs. The truth is, the wave of automation we’re seeing has rekindled some worries and stimulated discussions on which jobs could be prone to being automated.

Though ChatGPT was taking on much of the limelight in 2022, there was also a big advancement made in image generation. Stable diffusion, a latent text-to-image diffusion model able to generating photo-realistic images from text descriptions, was released by Stability AI.

Stable diffusion is an extension of the normal diffusion models, which work by iteratively adding noise to pictures after which reversing the method to get well the information. It was designed to hurry up this process by operating circuitously on the input images, but as an alternative on a lower-dimensional representation, or latent space, of them. As well as, the diffusion process is modified by adding the transformer-embedded text prompt from the user to the network, allowing it to guide the image generation process throughout each iteration.

Overall, the discharge of each ChatGPT and Stable Diffusion in 2022 highlighted the potential of multimodal, generative AI and sparked an enormous boost of further development and investment on this area.