Inference

Technical

The phase where a trained model processes new inputs to generate predictions or outputs, as opposed to the training phase where it learns from data.

Explained at 5 levels

👶5 Year Old

When the AI actually uses its brain to answer your question — the thinking part after it's done learning.

📚Middle Schooler

The process of an AI actually generating an answer or making a prediction after it's been trained. Training is learning; inference is doing.

🎓College Student

The phase where a trained model processes new inputs to generate predictions or outputs, as opposed to the training phase where it learns from data.

🧑Adult

Forward-pass computation through a trained model to produce outputs for novel inputs — characterized by latency, throughput, and cost metrics, optimized via quantization, batching, and speculative decoding.

🧠Genius

The computational process of sampling from the learned conditional distribution p(y|x;θ) — involving KV-cache management, autoregressive token generation, and decoding strategies (greedy, beam search, nucleus sampling) that trade off quality, diversity, and latency.

Want to explore Inference in depth?

Ask SeekBox and get answers from 7 AI engines at once.

Try it in SeekBox →