AI models have an expiry date — Continual Learning could also be a solution

-

Why, in a world where the one constant is change, we’d like a Continual Learning approach to AI models.

Image by the writer generated in Midjourney

Imagine you have got a small robot that’s designed to walk around your garden and water your plants. Initially, you spend a number of weeks collecting data to coach and test the robot, investing considerable time and resources. The robot learns to navigate the garden efficiently when the bottom is roofed with grass and bare soil.

Nevertheless, because the weeks go by, flowers begin to bloom and the looks of the garden changes significantly. The robot, trained on data from a distinct season, now fails to recognise its surroundings accurately and struggles to finish its tasks. To repair this, you have to add recent examples of the blooming garden to the model.

Your first thought is so as to add recent data examples to the training and retrain the model from scratch. But this is pricey and also you don’t need to do that each time the environment changes. As well as, you have got just realised that you simply would not have all of the historical training data available.

Now you think about just fine-tuning the model with recent samples. But that is dangerous since the model may lose a few of its previously learned capabilities, resulting in catastrophic forgetting (a situation where the model loses previously acquired knowledge and skills when it learns recent information).

..so is there another? Yes, using Continual Learning!

In fact, the robot watering plants in a garden is simply an illustrative example of the issue. Within the later parts of the text you will notice more realistic applications.

Learn adaptively with Continual Learning (CL)

It will not be possible to foresee and prepare for all of the possible scenarios that a model could also be confronted with in the long run. Subsequently, in lots of cases, adaptive training of the model as recent samples arrive might be a great option.

In CL we wish to search out a balance between the stability of a model and its plasticity. Stability is the flexibility of a model to retain previously learned information, and plasticity is its ability to adapt to recent information as recent tasks are introduced.

“(…) within the Continual Learning scenario, a learning model is required to incrementally construct and dynamically update internal representations because the distribution of tasks dynamically changes across its lifetime.” [2]

But the way to control for the soundness and plasticity?

Researchers have identified a lot of ways to construct adaptive models. In [3] the next categories have been established:

  1. Regularisation-based approach
  • On this approach we add a regularisation term that ought to balance the consequences of old and recent tasks on the model structure.
  • For instance, weight regularisation goals to manage the variation of the parameters, by adding a penalty term to the loss function, which penalises the change of the parameter by considering how much it contributed to the previous tasks.

2. Replay-based approach

  • This group of methods focuses on recovering among the historical data in order that the model can still reliably solve previous tasks. One in every of the constraints of this approach is that we’d like access to historical data, which will not be all the time possible.
  • For instance, experience replay, where we preserve and replay a sample of old training data. When training a brand new task, some examples from previous tasks are added to show the model to a mix of old and recent task types, thereby limiting catastrophic forgetting.

3. Optimisation based approach

  • Here we wish to control the optimisation methods to keep up performance for all tasks, while reducing the consequences of catastrophic forgetting.
  • For instance, gradient projection is a technique where gradients computed for brand new tasks are projected in order to not affect previous gradients.

4. Representation-based approach

  • This group of methods focuses on obtaining and using robust feature representations to avoid catastrophic forgetting.
  • For instance, self-supervised learning, where a model can learn a sturdy representation of the info before being trained on specific tasks. The concept is to learn high-quality features that reflect good generalisation across different tasks that a model may encounter in the long run.

5. Architecture-based approach

  • The previous methods assume a single model with a single parameter space, but there are also a lot of techniques in CL that exploit model’s architecture.
  • For instance, parameter allocation, where, during training, each recent task is given a dedicated subspace in a network, which removes the issue of parameter destructive interference. Nevertheless, if the network will not be fixed, its size will grow with the number of latest tasks.

And the way to evaluate the performance of the CL models?

The essential performance of CL models might be measured from a lot of angles [3]:

  • Overall performance evaluation: average performance across all tasks
  • Memory stability evaluation: calculating the difference between maximum performance for a given task before and its current performance after continual training
  • Learning plasticity evaluation: measuring the difference between joint training performance (if trained on all data) and performance when trained using CL

So why don’t all AI researchers switch to Continual Learning instantly?

If you have got access to the historical training data and aren’t nervous in regards to the computational cost, it could seem easier to simply train from scratch.

One in every of the explanations for that is that the interpretability of what happens within the model during continual training remains to be limited. If training from scratch gives the identical or higher results than continual training, then people may prefer the better approach, i.e. retraining from scratch, quite than spending time trying to know the performance problems of CL methods.

As well as, current research tends to deal with the evaluation of models and frameworks, which can not reflect well the actual use cases that the business can have. As mentioned in [6], there are numerous synthetic incremental benchmarks that don’t reflect well real-world situations where there’s a natural evolution of tasks.

Finally, as noted in [4], many papers on the subject of CL deal with storage quite than computational costs, and in point of fact, storing historical data is way more cost effective and energy consuming than retraining the model.

If there have been more deal with the inclusion of computational and environmental costs in model retraining, more people may be taken with improving the present state-of-the-art in CL methods as they might see measurable advantages. For instance, as mentioned in [4], model re-training can exceed 10 000 GPU days of coaching for recent large models.

Why should we work on improving CL models?

Continual learning seeks to deal with one of the crucial difficult bottlenecks of current AI models — the incontrovertible fact that data distribution changes over time. Retraining is pricey and requires large amounts of computation, which will not be a really sustainable approach from each an economic and environmental perspective. Subsequently, in the long run, well-developed CL methods may allow for models which might be more accessible and reusable by a bigger community of individuals.

As found and summarised in [4], there’s an inventory of applications that inherently require or may benefit from the well-developed CL methods:

  1. Model Editing
  • Selective editing of an error-prone a part of a model without damaging other parts of the model. Continual Learning techniques could help to constantly correct model errors at much lower computational cost.

2. Personalisation and specialisation

  • General purpose models sometimes should be adapted to be more personalised for specific users. With Continual Learning, we could update only a small set of parameters without introducing catastrophic forgetting into the model.

3. On-device learning

  • Small devices have limited memory and computational resources, so methods that may efficiently train the model in real time as recent data arrives, without having to start out from scratch, may very well be useful on this area.

4. Faster retraining with warm start

  • Models should be updated when recent samples grow to be available or when the distribution shifts significantly. With Continual Learning, this process might be made more efficient by updating only the parts affected by recent samples, quite than retraining from scratch.

5. Reinforcement learning

  • Reinforcement learning involves agents interacting with an environment that is usually non-stationary. Subsequently, efficient Continual Learning methods and approaches may very well be potentially useful for this use case.

Learn more

As you possibly can see, there remains to be lots of room for improvement in the world of Continual Learning methods. If you happen to have an interest you possibly can start with the materials below:

  • Introduction course: [Continual Learning Course] Lecture #1: Introduction and Motivation from ContinualAI on YouTube https://youtu.be/z9DDg2CJjeE?si=j57_qLNmpRWcmXtP
  • Paper in regards to the motivation for the Continual Learning: Continual Learning: Application and the Road Forward [4]
  • Paper in regards to the state-of-the-art techniques in Continual Learning: Comprehensive Survey of Continual Learning: Theory, Method and Application [3]

If you have got any questions or comments, please be happy to share them within the comments section.

Cheers!

Image by the writer generated in Midjourney

[1] Awasthi, A., & Sarawagi, S. (2019). Continual Learning with Neural Networks: A Review. In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data (pp. 362–365). Association for Computing Machinery.

[2] Continual AI Wiki Introduction to Continual Learning https://wiki.continualai.org/the-continualai-wiki/introduction-to-continual-learning

[3] Wang, L., Zhang, X., Su, H., & Zhu, J. (2024). A Comprehensive Survey of Continual Learning: Theory, Method and Application. IEEE Transactions on Pattern Evaluation and Machine Intelligence, 46(8), 5362–5383.

[4] Eli Verwimp, Rahaf Aljundi, Shai Ben-David, Matthias Bethge, Andrea Cossu, Alexander Gepperth, Tyler L. Hayes, Eyke Hüllermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H. Lampert, Martin Mundt, Razvan Pascanu, Adrian Popescu, Andreas S. Tolias, Joost van de Weijer, Bing Liu, Vincenzo Lomonaco, Tinne Tuytelaars, & Gido M. van de Ven. (2024). Continual Learning: Applications and the Road Forward https://arxiv.org/abs/2311.11908

[5] Awasthi, A., & Sarawagi, S. (2019). Continual Learning with Neural Networks: A Review. In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data (pp. 362–365). Association for Computing Machinery.

[6] Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, & Fartash Faghri. (2024). TiC-CLIP: Continual Training of CLIP Models.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x