Transformer Block Checkpoint

Test your understanding of transformer architecture.


1. A transformer block consists of:
2. Pre-norm means layer normalization is applied:
3. The FFN typically expands dimensions by:
4. Residual connections help with training deep networks by enabling:
5. GPT-2 uses which activation in the FFN?