Inference is the process by which a trained machine learning model is used to make predictions on new, unseen data. Once a model has been trained on a dataset during the training phase, it can be used to predict the output for new inputs based on what it has learned.
For instance, consider a Large Language Model (LLM) trained on vast amounts of text data. During the training phase, the LLM learns to understand patterns, structures, and nuances in language from a colossal corpus of diverse textual content. Once the LLM is trained, it can predict or generate coherent and contextually relevant text based on new, unseen prompts or questions. This is the inference phase.
Inference can be performed on a single piece of data (like one sentence) or on a batch of data. The goal of inference is to use the patterns the model learned during training to make accurate predictions or generate relevant outputs on new data. While training an LLM requires substantial computational power and resources, inference is comparatively less resource-intensive, allowing it to be performed on a broader range of devices. The inference phase is critical in the machine learning pipeline, as it’s when the trained model is put to work on the task it was designed for, whether that’s generating text, recognizing images, translating languages, recommending products, or any other task.
« Back to Glossary Index