DSA Studio
Search
Home
Sign in
Self-Attention Checkpoint
Test your understanding of the attention mechanism.
1. The attention formula scales by sqrt(d_k) to:
Prevent extreme softmax values
Speed up computation
Reduce memory
Enable parallelism
2. In causal attention, position i can attend to:
Positions 0 to i
Positions i to n
All positions
Only position i
3. Multi-head attention allows the model to:
Attend to different representation subspaces
Process longer sequences
Use less memory
Train faster
4. Self-attention complexity is O(n^?) with respect to sequence length.
5. Q, K, V in attention stand for:
Query, Key, Value
Quality, Kernel, Vector
Quantized, Known, Variable
Queue, Key, Validation
Submit quiz
Auto-advance on pass