DSA Studio
Search
Home
Sign in
Transformer Block Checkpoint
Test your understanding of transformer architecture.
1. A transformer block consists of:
Attention + FFN + LayerNorm + Residuals
Only attention
Only FFN
Convolutions
2. Pre-norm means layer normalization is applied:
Before attention/FFN
After attention/FFN
Only at the end
Not at all
3. The FFN typically expands dimensions by:
4x
2x
8x
1x
4. Residual connections help with training deep networks by enabling:
5. GPT-2 uses which activation in the FFN?
GELU
ReLU
Sigmoid
Tanh
Submit quiz
Auto-advance on pass