Attention Mechanism

Architecture

A mechanism in neural networks that computes weighted relevance scores between elements of a sequence, allowing the model to focus on the most pertinent info...

Explained at 5 levels

👶5 Year Old

The way AI decides which words in a sentence are most important — like when you highlight the key words in a book.

📚Middle Schooler

A technique that lets AI focus on the most relevant parts of the input when generating each word, instead of treating everything equally.

🎓College Student

A mechanism in neural networks that computes weighted relevance scores between elements of a sequence, allowing the model to focus on the most pertinent information for each output.

🧑Adult

The core operation in transformers that computes pairwise relevance via scaled dot-product of query, key, and value projections, enabling dynamic context-dependent weighting of input representations.

🧠Genius

Scaled dot-product attention: Attention(Q,K,V) = softmax(QKᵀ/√dₖ)V — extended via multi-head projections, causal masking, and positional encodings, with O(n²) complexity driving research into linear, sparse, and sub-quadratic alternatives.

Want to explore Attention Mechanism in depth?

Ask SeekBox and get answers from 7 AI engines at once.

Try it in SeekBox →