Self-Attention Checkpoint

Test your understanding of the attention mechanism.


1. The attention formula scales by sqrt(d_k) to:
2. In causal attention, position i can attend to:
3. Multi-head attention allows the model to:
4. Self-attention complexity is O(n^?) with respect to sequence length.
5. Q, K, V in attention stand for: