A Brief History of AI

Studying electrical engineering in the late 1990s was a demanding endeavor. The mathematics alone pushed me to my limits. The ideal of the engineer was clear: calculate precisely how a system behaves before building it. Model it. Predict it. Control it.

Then, in a lab seminar, I watched an experienced practitioner do something that contradicted everything I had learned. He was tasked with tuning a control system. Instead of setting up the differential equations that describe the system’s dynamics, he fed a Dirac impulse into the system and observed what came out.

A Dirac impulse is an idealized, infinitely short signal. Feed it into a system, and the response gives you hints about its internal dynamics, if you have the experience to read them: how fast the system reacts, whether it oscillates, where it stabilizes. It is the engineer’s empirical shortcut. Instead of predicting the system from theory, you probe it and observe.

Stimulate, observe, adjust: the empirical shortcut in control engineering

From the impulse response, the practitioner drew conclusions about the system’s behavior. Based on experience and pattern recognition, he adjusted the controller. No differential equations. Just: stimulate, observe, adjust. Repeat.

In analog engineering, where systems are inherently non-deterministic, this was standard practice. A theoretical model exists, but in practice you proceed empirically first. The theory told you the direction. The practice told you the truth.

I did not know it at the time, but this principle would turn out to be the key to understanding the most important technology of our era.

The Dream (1950s-1960s)

In the summer of 1956, a small group of researchers gathered at Dartmouth College in New Hampshire. Their proposal was ambitious: “We propose that a 2-month, 10-man study of artificial intelligence be carried out.” The term “artificial intelligence” was coined for the occasion.

The idea was straightforward. Human intelligence follows rules. If we can encode those rules, a machine can reason. This approach, later called symbolic AI, produced early successes: programs that could prove mathematical theorems, play checkers, and solve simple logic puzzles.

The optimism was immense. Herbert Simon predicted in 1960 that “machines will be capable, within twenty years, of doing any work a man can do.” The funding flowed. The expectations soared.

The First Winters (1970s-1980s)

Reality caught up. Symbolic AI worked for narrow, well-defined problems but collapsed in the face of real-world complexity. Language understanding, image recognition, common-sense reasoning: none of these yielded to hand-coded rules. Funding dried up. Researchers moved to other fields. This period became known as the First AI Winter.

Then came Expert Systems: software that encoded the decision rules of human specialists. Banks used them for credit scoring. Hospitals used them for diagnosis. Manufacturers used them for quality control. The technology boomed in the 1980s. Billions were invested. Companies like Symbolics and Lisp Machines Inc. went public.

But Expert Systems were brittle. Every new situation required new rules, written by expensive specialists. They could not learn. They could not adapt. When the market realized this, the Second AI Winter set in. By the early 1990s, AI had become a term most serious researchers avoided.

A pattern had emerged that would repeat: promise, overinvestment, disappointment, quiet survival of the core ideas. Each cycle left behind real knowledge, but also real scars.

The Quiet Revolution (1990s-2000s)

While the public lost interest, including me, a quieter shift was underway. Statistical methods began replacing hand-coded rules. Instead of telling the machine what to look for, researchers fed it data and let it find patterns. This was machine learning: algorithms that improve through experience rather than explicit programming.

Neural networks, inspired by the structure of the human brain, existed in theory since the 1940s. But they were small and impractical. In the late 1990s, you could already experiment with neural networks on a PC. I had a software simulator with, if I remember correctly, fewer than 100 parameters, the adjustable numbers that define a neural network’s behavior.

One of my professors at TU Darmstadt, Wolfgang Hilberg, was passionate about this technology. Hilberg was an engineer of the old school, best known as the inventor of the radio-controlled clock. With 45 patents to his name, his roots ran deep in analog electronics.

The late 1990s were the era of the great shift from analog to digital. The promise was seductive: imprecise, fuzzy analog technology replaced by precise, mathematically deterministic digital technology. But Hilberg swam against the current. He pursued analog neural networks and later published on artificial cognition. His books “Sprache und Denken in neuronalen Netzen” (Language and Thought in Neural Networks, 2008) and “Wie denkt das Gehirn?” (How Does the Brain Think?, 2012) explored how neural architectures could model human cognition.

His direction was not wrong, just early. Today, researchers are pursuing programmable photonic hardware that implements neural network functions directly in optical circuits. Precisely the kind of analog AI that Hilberg envisioned decades ago.

For me, and for most others at the time, neural networks were an interesting academic exercise with no real practical use. Impressive as a demonstration, useless for anything serious.

To understand a neural network, think of it as a web of interconnected parameters that learn patterns from data. Each parameter is a small number. During training, data flows through the network, and an algorithm called backpropagation adjusts the parameters based on errors: “the output was wrong, so let me tweak the numbers that produced it.” Repeat this billions of times with billions of data points, and the network starts recognizing patterns that no human programmed into it.

But with fewer than 100 parameters, there was not much to recognize. I dismissed neural networks entirely, like most people did. Until the day I first used ChatGPT.

Deep Learning Breaks Through (2012)

In 2012, a neural network called AlexNet won the ImageNet competition, a benchmark for image recognition, by a dramatic margin. Deep neural networks with many layers and millions of parameters suddenly outperformed all traditional approaches.

The breakthrough had a surprising enabler: GPUs, the graphics processors originally built for video games. Their architecture, designed for massive parallel computation, turned out to be perfect for training neural networks. What had been computationally impossible on CPUs became feasible on commodity gaming hardware.

The quiet revolution became loud. Research funding returned. Companies started building teams. The term “deep learning” entered the mainstream.

The Transformer (2017)

In 2017, a team at Google published a paper with a deceptively simple title: “Attention Is All You Need.” It introduced the Transformer architecture, a new way to process sequences of data. Instead of reading input word by word, as previous architectures did, the Transformer processes entire sequences in parallel, using a mechanism called “attention” to determine which parts of the input matter for each part of the output.

This was the architectural breakthrough that would make everything that followed possible. It is the T in GPT (Generative Pre-trained Transformer). Without it, large language models as we know them would not exist. Over the next five years, OpenAI built GPT-1 (2018), GPT-2 (2019), and GPT-3 (2020), each dramatically larger than the last.

The Explosion (2022+)

ChatGPT launched in November 2022. It was built on GPT-3, a model with 175 billion parameters.

The jump from about 100 parameters in my 1990s simulator to 175 billion is not just quantitative. Somewhere along this scale, emergent capabilities appeared: abilities that the developers themselves did not predict and did not explicitly program. The models could write poetry, debug code, discuss philosophy, draft legal contracts, explain quantum mechanics to a five-year-old. They could simulate conversations, teach, coach, and create.

Why was this suddenly possible? Not because of a single invention, but because four enablers had matured simultaneously (see Developing a Strategy for the GenAI Era for a deeper strategic analysis):

Compute: GPU clusters and cloud infrastructure had evolved from expensive custom hardware to commodity services available on demand.
Data: The internet had produced training corpora of unprecedented scale. Decades of human text, code, and conversation became the raw material.
Algorithms: The Transformer architecture (2017) provided the engine. Techniques like reinforcement learning from human feedback (RLHF) refined it.
Capital: Billions of dollars in investment funded the training runs that turned these ingredients into working models.

Each enabler evolved along the evolution path from Genesis to Product or Commodity. Only when the lower layers mature can the higher layer emerge. This is why Hilberg’s vision in the 1990s was right in direction but wrong in timing: the enablers were not ready.

At the same time, Diffusion Models (Midjourney, Stable Diffusion, DALL-E) did for images what Large Language Models (LLMs) did for text: generate new content from learned patterns. An LLM is a neural network trained on vast amounts of text that can generate, translate, summarize, reason, and write code. A Diffusion Model generates images by gradually refining random noise into coherent visual output.

Both work by predicting the most likely next element, whether a word or a pixel. They do not look up answers. They guess, informed by patterns learned from billions of examples.

Reasoning and Agents (2024+)

The next step arrived faster than most expected. Models gained the ability to reason: to think step by step before answering, breaking complex problems into sub-problems, evaluating intermediate results, and revising their approach. Chain-of-thought prompting revealed that models could solve problems they previously failed at, simply by being asked to show their work.

Then came agents: AI systems that act autonomously. An agent does not wait for a prompt. It receives a goal, breaks it into tasks, uses tools (search engines, code interpreters, APIs, file systems), executes actions, evaluates results, and adjusts its plan. It works like a capable colleague who takes an assignment and comes back with a result. Not just an answer to a question, but completed work.

AI Agents in Practice Beyond Coding describes how this works in daily practice. The chapter you are reading right now was written with the help of an AI agent system that manages research, cross-references, image generation, and editorial review across dozens of specialized skills. My role was to envision, structure, direct, check, edit, and correct. The agent predicted, and I decided whether the prediction was useful. Sometimes it went in a direction I did not want. Sometimes it was simply wrong. That is the nature of working with non-deterministic systems, which is exactly where the next section picks up.

The Return of the Non-Deterministic

Now the arc comes full circle.

The opening of this chapter described an engineer who worked empirically with an analog control system. He did not compute the differential equations. He probed the system, observed the response, and adjusted. This was standard practice in analog engineering, because analog systems are inherently non-deterministic. Temperature fluctuations, manufacturing tolerances, material variations: the same circuit never behaves exactly the same way twice.

The great promise of digitalization was to leave this behind. Digital technology is precise. Deterministic. Repeatable. The same input always produces the same output. Strictly speaking, this was never entirely true, but the belief in it was strong, and for most practical purposes it held.

Now we work with a technology that is fundamentally non-deterministic again. Ask an LLM the same question twice, and you get two different answers. Not because it is broken, but because that is how it works. A Large Language Model is, as Tudor Girba and Simon Wardley describe in Rewilding Software Engineering, “a coherence engine, not a truth engine. It does not give you what is, but what is likely and sounds coherent.”

This is not a flaw. It is the nature of the technology. And it is strikingly familiar.

We humans never think a thought exactly the same way twice. We never make the same decision again in exactly the same way, because we have gained new experience, or our environment has shifted, even minimally. Just as the environment of a control system changes: temperature, component aging, external disturbances. The analog engineers understood this. They worked empirically because they had to.

The AI era demands the same.

Why Empiricism Matters Now

This has consequences for how we think about knowledge itself.

The empirical loop: observation, induction, theory, deduction, and back to observation

Deduction works top-down: from a general theory to a specific prediction. If the premises are correct, the conclusion is guaranteed. This is the promise of deterministic systems. You model the system, compute the answer, and trust the result.

Induction works bottom-up: from specific observations to general patterns. The conclusions are probable, not certain. This is how we learn from experience. The more observations, the stronger the pattern, but never a guarantee.

Empiricism combines both in a continuous loop. Observe the system. Induce a pattern from what you see. Form a theory. Deduce a prediction. Test it against reality. Observe again. This is the scientific method. It is also the AAA-Loop at the heart of AME3, the framework at the center of this book: Anticipate, Advance, Assess.

AI makes induction central again. You cannot deduce the correct output of an LLM from theory alone. You must observe what it produces, recognize patterns in its behavior, form hypotheses about when it works well and when it does not, test those hypotheses, and refine your approach. The practitioners who worked empirically with analog systems had the right method all along. What changed is the scale and the speed. What remains is the principle.

The professor who was passionate about neural networks in the 1990s saw the direction. The timing was wrong. The enablers were not ready. But the insight was right: intelligence, whether human or artificial, does not follow the deterministic model that digital technology promised. It follows the empirical model that engineers have used for centuries.

The next chapter, AI and the Principles of Evolution, explores why this pattern is not unique to AI. Complexity always grows. Tools always get used to build more, not less. And the social structures through which humans organize to handle this complexity, teams, arenas, enterprises, persist through every technological revolution. Including this one.

The game is empirical. It always was.