Large Language Model

Wikipedia - Large language model | IBM - What Are Large Language Models

A Large Language Model (LLM) is a neural network with billions of parameters, trained on vast amounts of unlabeled text using self-supervised learning. LLMs generate, summarize, translate, and reason over natural language. The foundation was the Transformer architecture, introduced by Vaswani et al. in the landmark paper “Attention Is All You Need” (NeurIPS, 2017). OpenAI’s GPT (2018) and Google’s BERT (2018) were the first widely recognized LLMs. Subsequent models like GPT-4, Claude, Gemini, and DeepSeek scaled to hundreds of billions of parameters, enabling capabilities from code generation to complex reasoning. LLMs are the core technology behind Generative Artificial Intelligence.