The Illustrated Transformer

https://jalammar.github.io/illustrated-transformer/

A visual walkthrough by Jay Alammar of the Transformer architecture introduced in “Attention Is All You Need” (Vaswani et al., 2017). Uses step-by-step diagrams to explain self-attention, multi-head attention, positional encoding, and the encoder-decoder structure. Part of Alammar’s “Visualizing machine learning one concept at a time” series, which also covers GPT-2 and BERT. Widely used as a reference for understanding the architecture behind modern Large Language Models.