SeekBox

Token

Technical

The basic unit of text that LLMs process. Text is split into tokens (subwords) by a tokenizer, and the model predicts the next token in a sequence.

Explained at 5 levels

๐Ÿ‘ถ5 Year Old

A tiny piece of a word that the AI reads โ€” like breaking "butterfly" into "butter" and "fly".

๐Ÿ“šMiddle Schooler

The small chunks that AI breaks text into before reading it โ€” usually parts of words. A sentence might be 10โ€“20 tokens.

๐ŸŽ“College Student

The basic unit of text that LLMs process. Text is split into tokens (subwords) by a tokenizer, and the model predicts the next token in a sequence.

๐Ÿง‘Adult

A subword unit produced by a tokenizer (e.g., BPE or SentencePiece) that maps text to integer IDs consumed by the model. Context window size, cost, and latency all scale with token count.

๐Ÿง Genius

A discrete symbol from a finite vocabulary constructed via byte-pair encoding or unigram language modeling, serving as the atomic unit of the autoregressive factorization P(xโ‚,...,xโ‚™) = โˆP(xแตข|x<แตข).

Want to explore Token in depth?

Ask SeekBox and get answers from 7 AI engines at once.

Try it in SeekBox โ†’