Core Models: The Next Frontier of AI

0

Modern artificial intelligence (AI) focuses on learning from data – and the more data, the better it learns.

This is why, so far, AI research and application has largely focused on training larger AI models on more data using highly efficient computational resources. But while significant progress has been made in this area, many application areas – such as healthcare and manufacturing – have limited data, which has limited their applicability in these areas.

Foundation models could be the solution to this. The term “base models” refers to a general purpose behind an AI model. While traditional AI models must be trained on large data sets for each individual use case, base models can be adapted to a wide range of downstream tasks, limiting the amount of work needed to launch an AI business and improve efficiency. (Also read: 7 key challenges to AI adoption – and how to overcome them.)

The basic models are based on standard ideas in transfer learning and recent advances in training deep learning models using self-supervised learning. They have also demonstrated remarkable emerging capabilities and dramatically improved performance across a wide variety of use cases, making them an attractive prospect for the business.

But the potential base models present are even bigger than that: they represent a growing paradigm shift in AI. Until now, AI researchers and developers had to train models from scratch for every use case, requiring them to collect large amounts of task-specific datasets. On the contrary, the basic models offer general purpose models that can be adapted to specific use cases using the data you already have.

In this way, the core models will make it easier for organizations to leverage or heavily integrate AI into their operations. (Also read: Robotic Process Automation: What You Need to Know.)

How do foundation models work?

From a technological point of view, the basic models are deep neural networks trained using self-supervised learning. Although these technologies have been around for many years, what is truly revolutionary is the scale at which they create patterns.

Recent base models contain hundreds of billions to trillions of parameters and are trained on hundreds of gigabytes of data. Existing basic models mainly use advanced transfer learning.

Although transfer learning is not an integral part of core models, it has a few properties that make it an ideal core for core models:

  1. They are easily parallelizable. Transfer learning can be easily parallelizable in the learning and inference phases. This property is particularly vital for natural language processing (NLP), where previous state-of-the-art models – including recurrent neural networks (RNN) and long-term memory (LSTM) – process data sequentially and do not therefore cannot be parallelized.
  2. They have fewer implicit biases. Compared to other contemporary models, such as convolutional neural networks (CNNs) and RNNs, transfer learning has minimal implicit bias. Implicit bias refers to the design choices one makes by considering certain characteristics of the input data, for example, the locality of features in CNNs and the sequential dependencies of features in RNNs. Therefore, due to less implicit biases, transfer learning is a more universal architecture than other models, which makes it more suitable for basic model building. However, this also means that transfer learning requires more training data due to the well-known trade-off between implicit bias and data. (Also read: Why diversity is essential for quality data to train AI.)

Basic models are usually trained using self-supervised learning, which, unlike supervised learning, requires less human intervention. Instead, self-supervised learning allows a model to “teach itself” how to learn using the supervision cues naturally available in the training data.

Here are some examples of these supervisory signals:

  • Hiding words in a sentence and training the model to recover missing words, like BERT does.
  • Predict the next character or word in a sentence, like GPT-3 does.
  • Judge the correspondence between an image and its transformed version, as SimCLR does.
  • Judge the similarity between an image and its explanation as CLIP does.

Self-supervised learning is useful for training baseline models for at least two reasons:

  1. It has better scalability than supervised learning. This is because it is much more convenient to get more unlabeled data than labeled data.
  2. It learns more expressive features. Indeed, it uses a richer data space than supervised data, whose label spaces are notoriously confined.

The combination of a high capacity and computationally efficient model architecture, a highly scalable training objective, and powerful hardware allows us to scale fundamental models to an extraordinary level.

The Rise of Foundation Models

The rise of foundation models can be understood in terms of emergence and homogenization. Emergence refers to the behavior of a system, which is produced indirectly. Homogenization involves the consolidation of methods to build machine learning systems for a wide range of applications.

To better contextualize the place of base models in the broader AI conversation, let’s explore the rise of AI over the past 30 years: (Also read A Brief History of AI.)

1. Machine Learning

Most contemporary AI developments are driven by machine learning (ML), which uses historical data to learn predictive models to make future predictions. The rise of ML within AI began in the 1990s and was a paradigm shift from how AI systems were built before.

ML algorithms can infer how to perform a given operation from the data it is trained on. This was a major step towards homogenization, as a wide range of AI use cases can be achieved using a single generic ML algorithm.

However, an important task of ML is feature engineering, which requires domain experts to turn raw data into higher-level functionality.

2. Deep learning

Neural networks got a new start in the form of deep learning (DL) around 2010.

Unlike vanilla neural networks, DL models are powered by deep neural networks (i.e. neural networks with more layers of computation), efficient computing hardware, and larger datasets. A major advantage of DL is to take raw input (i.e. pixels) and produce a hierarchy of features in the training process. Thus, in DL, traits also emerge from the act of learning.

This evolution has led DL to show extraordinary performance on standard benchmarks. The rise of DL was also a step closer to homogenization, as the same DL algorithm could be used for many AI use cases without domain-specific feature engineering.

DL models, however, require a lot of domain-specific data for training. (Also read: Basic machine learning terms you need to know.)

3. Basic models

The era of base models began in 2018 in the field of natural language processing. Technically, the basic models are reinforced by transfer learning and scaling.

Transfer learning works by taking the knowledge that an AI model needed to acquire to perform the tasks it can already perform and expanding on it to teach the model to perform new tasks – essentially “transferring” the knowledge from the model to new use cases.

In deep learning, a dominant approach to transfer learning is to pre-train a model using self-supervised learning and then adapt it to a specific use case.

While transfer learning makes basic models feasible, scale makes them powerful. The scale depends on three key factors:

  1. Develop a computationally efficient model architecture that takes advantage of hardware parallelism (e.g. transfer learning).
  2. Improve computer hardware with better throughput and memory (e.g. GPUs)
  3. Access larger datasets.

Unlike deep learning, where large amounts of task-specific datasets must be available for the model to learn use-case-specific functionality, base models aim to build functionality “to general purpose” that can be adopted for several use cases.

Thus, foundation patterns present the possibility of an unprecedented level of homogenization. Example: Almost all state-of-the-art NLP models are adopted from one of the few fundamental models (eg BERT, GPT-3, T5, CLIP, DALL-E 2, Codex and OPT).

Conclusion

The base models represent the start of a paradigm shift in how AI systems are built and deployed around the world. They have already established their mark in NLP and are being explored in other areas such as computer vision, speech recognition and reinforcement learning.

However, given their potential, we can expect the core models to transcend the world of research and revolutionize the way AI is adopted in business. Automation of processes within the enterprise will no longer require data science teams to retrain models from scratch for every task they wish to automate; instead, they can train a model on base parameters and refine for each use case. (Also read: 3 amazing examples of artificial intelligence in action.)

Share.

About Author

Comments are closed.