The phase where a trained model processes new inputs to generate predictions or outputs, as opposed to the training phase where it learns from data.
When the AI actually uses its brain to answer your question โ the thinking part after it's done learning.
The process of an AI actually generating an answer or making a prediction after it's been trained. Training is learning; inference is doing.
The phase where a trained model processes new inputs to generate predictions or outputs, as opposed to the training phase where it learns from data.
Forward-pass computation through a trained model to produce outputs for novel inputs โ characterized by latency, throughput, and cost metrics, optimized via quantization, batching, and speculative decoding.
The computational process of sampling from the learned conditional distribution p(y|x;ฮธ) โ involving KV-cache management, autoregressive token generation, and decoding strategies (greedy, beam search, nucleus sampling) that trade off quality, diversity, and latency.
Want to explore Inference in depth?
Ask SeekBox and get answers from 7 AI engines at once.
Try it in SeekBox โ