“Foundational Model” refers to a large-scale pre-trained model that serves as a base for various applications.
A Foundational Model is:
Large-Scale: Typically, foundational models are trained on vast amounts of data and have an extensive number of parameters. These models’ sheer size and complexity enable them to capture a broad range of patterns and knowledge.
Pre-Trained: Foundational models are initially trained on general tasks or large datasets to capture a wide spectrum of information. This initial training is often unsupervised or self-supervised, leveraging massive datasets to build a foundational understanding of language, vision, or other domains.
Adaptable: One of the defining characteristics of foundational models is their ability to be fine-tuned or adapted to specific tasks with relatively small amounts of labeled data. By leveraging the knowledge captured during pre-training, they can be quickly adjusted to perform well on specialized tasks.
Multimodal (in some cases): While not always the case, some foundational models are designed to process and generate multiple types of data (e.g., text, images, audio) concurrently. These are often called “multimodal” models.
Pervasively Used: Due to their versatility and high performance, foundational models often become a standard starting point for a wide variety of applications in AI research and industry.
Examples of foundational models in natural language processing include OpenAI’s GPT series (like GPT-3 or GPT-4) and models like BERT from Google. In the vision domain, models like BigGAN and DALL·E are foundational models for generating images.
« Back to Glossary Index