Attention Is All You Need

Attention Is All You Need (arXiv)

“Attention Is All You Need” is a landmark 2017 paper by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin at Google Brain. It introduced the Transformer architecture, which replaces recurrent processing with self-attention mechanisms, enabling parallel processing of entire sequences. The Transformer is the foundation of all modern large language models including GPT and Claude. The “T” in GPT stands for Transformer.