Agile Development Practices in AI-Enhanced Teams

In the early 2000s at Web.de, Andy had one of the least glamorous jobs in software development: integration duty. His task was to take a version snapshot of everything the developers had produced, as often as possible, and turn the pile of contributions into one working system. By hand. The plan was to do it test-driven, until part of the development crew started excluding failing tests after a rebuild, and his integration runs turned into archaeology.

More than twenty years later, Andy does not write integration scripts at all. He describes what the pipeline should do, and an AI coding agent generates the GitHub Actions. The practice he once executed by hand has become something he governs.

That journey raises a question that splits the software community: are agile engineering practices still relevant in the age of AI?

The Uncomfortable Question

The provocation is real. Marc Bless, a longtime colleague, argues that we should stop worrying about whether teams do Scrum, because soon there will be no teams at all. AI agents will take over the work completely. A panel at the Regional Scrum Gathering Tokyo 2026 framed the same moment more carefully: an era has begun in which small Teams can achieve enormous results. And in the same discussion, the sobering counterpoint: AI has increased individual productivity, but productivity at the organizational level has not risen.

These two observations frame the tension. Position A: traditional engineering practices remain valid, and AI makes them more important, not obsolete. Position B: practices like AI pairing and dynamic teaming dissolve the classical boundaries, and what we call a team today will not exist tomorrow.

When Andy put the two positions to the participants of a recent webinar, developers and team leads, every one of them picked Position A. All of them worked with AI daily. That is not a coincidence. The closer you work with AI agents, the more you appreciate the practices that keep their output trustworthy.

Seven Practices and the Gap

In the late 1990s, Kent Beck described the practices of eXtreme Programming: the way a team creates software together. The count varies between twelve and fourteen depending on the edition, but a stable core of seven carries the engineering load. Read the list with one question in mind: what happens to each practice when an agent does the typing?

Pair Programming: two people working on one problem in different modes
Test-Driven Development: write a failing test, make it pass, then refactor
Continuous Integration: integrate early and often, never wait until the end
Refactoring: improve the design continuously, not in heroic cleanup projects
Simple Design: make the system no more complicated than necessary
Collective Code Ownership: the code belongs to everyone, not to its author
Coding Standards: shared conventions so the team can work as one

Andy asked the webinar participants two questions about this list. How many of these practices do you know? And how many does your team actually live? The difference between the two answers is your engineering gap. Most teams know five or more. Few live more than two or three well.

That gap was tolerable in a world where humans typed every line. It becomes expensive in a world where agents generate code faster than humans can read it. The practices endure because their purpose was never the typing. Their purpose is managing complexity and making mistakes visible early, and that need only grows.

Practices Evolve, They Do Not Expire

A practice is the combination of a technology and a way of working with it. Both sides evolve. Each of the seven practices has traveled its own path along the evolution axis, from novel idea to commodity.

Engineering practices evolve with their tooling: the discipline stays, the execution moves toward AI

Continuous Integration shows the full arc, and Andy’s integration duty at Web.de was its manual starting point. The first CI servers automated the checkout, the build, and the tests. Today the pipeline definition itself is generated. Three generations of tooling, and the purpose never changed: integrate early, fail visibly, fix immediately.

Test-Driven Development followed the same path. The red-green-refactor cycle that I learned as a programmer (described in Empirical Control) can now be executed by an agent: the human defines the failing test, the agent implements until green, then both refactor. The danger is the default behavior of Large Language Models: build something that looks right and see whether it works. An LLM wants to complete the pattern. Telling it to build a web app produces a plausible web app. Telling it to work test-first against a business logic forces a different level of rigor. The discipline is the lever, not the tool.

The same arc runs through the other practices. Refactoring still starts with the first rule: secure the spot with a test before you touch it. Drafting that securing test for existing behavior is exactly the kind of uncritical work an agent can take over, and reviewing it is a good moment to check whether you actually understood the existing behavior. Simple Design gets a new kind of reviewer: an agent instructed as a system architect, asked where the critical assumptions and weak points sit.

The practices also depend on each other, with AI just as without. Continuous Integration without Test-Driven Development integrates untested code faster. Pair Programming without Coding Standards drowns in debates about formatting. Simple Design keeps the system understandable enough that Collective Code Ownership can mean something. Pulling one practice forward drags the others with it.

Three Stories from One Journey

The integration story is only the first station of Andy’s coding journey. He tells it in three. Past, present, and the near future.

Team of One (2019-2023). When DasScrumTeam’s developers left for more exciting work, our custom training administration system stayed behind. Andy adopted it. He worked solo through a Swift backend, a JavaScript admin frontend, PHP, Moodle, and AWS scripts. The first AI code snippets arrived through IDE assistants. He knew the engineering practices from two decades of training others. He did not live them systematically in his own code. It went fast, and it mostly went well. Until something broke, and he could not distinguish a bug from a false assumption, because the safety net of tests was missing. “It works” was the only acceptance criterion. In solo mode, engineering practices are invisible, until something breaks.

Qualify (2023-now). For a certification association we co-founded, Andy built a middleware connecting an online test platform with an Open Badge certification service. This time with Claude Code as a pair partner: a second presence that can read and write. He started fast, generated a lot of PHP, integrated quickly. Then he noticed he had lost his way. The agent had, in an earlier session where Andy had asked it to “build the whole requirements catalog from A to K”, hallucinated requirements that contradicted the actual business logic. The system itself even asked how to handle the contradiction, and he had clicked “yes” without reading. He stopped, threw the code away, and kept only the insights about the integration. The rebuild used an API-first TypeScript backend, acceptance tests that he wrote himself while the agent implemented, and Continuous Integration as a heartbeat: green means the next commit may go out. The thrown-away prototype was not waste. It was one full pass through an AAA-Loop: anticipate a solution, advance fast, assess honestly, and start the next loop with the insights instead of the code.

Agentic Team (now). Today, the same system is maintained by what Andy calls an agent cluster: a design review agent that asks “is this more complicated than necessary?”, a UX agent that checks affordances and error messages, and a security lens that looks for secrets, missing validation, and permission gaps. He added a GDPR check, because programmers prefer not to think about it. A reflection skill runs a Plan-Do-Study-Act cycle (Andy deliberately renames the PDCA check phase to study): after each work cycle, the agent reviews which assumptions held, how many cycles a task consumed, and which process fixes follow. Not which bug fixes. Which prompt, instruction, or skill adjustments will reduce errors in the next cycle. The developer becomes the retrospective facilitator of their own agents. A persistent context layer gives all agents a shared language and caches recurring knowledge, so the expensive model calls happen only for genuinely new work. Architecture decisions are documented and stay binding across sessions. This is a productive system. Trainers certify real people with it. The same architectural pattern, skills, memory files, and review loops, runs the publishing system behind this book and is described in AI Agents in Practice Beyond Coding.

The three stories map cleanly onto the four eras of team evolution. The journey from solo coding through human-AI pairing to orchestrating specialist agents is the path from Era II to Era IV, compressed into one person’s experience.

From Coding Standards to Prompting Standards

The real strength appears at team level. An agent cluster that only one developer uses is a productivity gain. An agent cluster the whole Team shares is a practice.

From pair to agent cluster: the reflection loop stays, the cluster runs inside Continuous Integration

This changes what Coding Standards mean. They used to govern variable names, indentation, and function size. Now they grow into prompting standards and documentation standards for AI: which instructions the agents receive, where decisions are recorded, what context every agent must load. Ken Judy, an engineering leader Andy exchanges notes with, sees the failure mode daily: developers race ahead with generated code and end up stuck with systems they no longer understand.

Collective Code Ownership answers this, extended by one sentence that a webinar participant had already written into her team’s standards: generated code was understood by a human, not just accepted. The rule applies to human and agent contributions alike. We do not commission an AI and let it own the code. We own it, which means we can evolve it.

Andy closed the webinar with a commitment exercise, and it works just as well for a reader. Take the seven practices, place your own implementation of each on the evolution axis, and pick the one practice where AI augmentation would create the greatest leverage for your team. Describe the AI variant in two sentences. Define the first step for next week.

After the Recording Stopped

The most candid ten minutes came after the webinar ended. The recording was off, the participants had left, and Andy and I kept talking. Two threads from that conversation belong in this chapter.

Agents are not team members. When a participant praised “pair programming as a trio”, I pushed back. One of the biggest problems in the AI and teams debate is that we use vocabulary made for humans to describe a tool. The agent is not a third person. It is a prompt we are working on together. The confusion is understandable: we build multi-agent systems and give them human perspectives, the editor, the tester, the designer, because the models simulate human behavior convincingly. Andy added the warning label he likes: these systems are statistical parrots, with real limits and no consciousness.

But what the humans in the room actually do is orchestrate, steer, and supervise those simulations. And they do that in pair mode. Test-Driven Development never aimed to make testers obsolete, and agent clusters do not replace them either: simulated tester behavior finds the routine problems so that human testers can hunt the hidden ones. The agents have no emotions. They only simulate them. We have real ones, which is exactly why the human conversation about what our agents can and cannot do, where they fail, and how we adapt our prompts is the new pairing.

This terminology discipline matters because three different conversations hide inside every “AI and teams” debate, and they need different answers. The first: do teams still exist? That question has little to do with AI. Teams persist because humans operate the technology and humans answer for the results. A Team is accountable for the outcome its work produces. The AI is not. The second: what can AI actually do? That answer changes with every model release and is outdated by the time you finish reading it. The third: which practices help a team work with agents? That was the webinar, and it is the most useful conversation of the three, because a team can act on it tomorrow.

Non-determinism challenges Continuous Integration. Hosted Large Language Models are not deterministic. What we generate today, we cannot reproduce exactly tomorrow. That is new territory for software engineers. Our entire profession rests on repeatability: code is unambiguous, a bug can be reproduced, a build can be rerun. Regenerating the same code from the same prompt is practically impossible, and checking in the conversations does not help, because they carry no repetition guarantee either. The consequence is uncomfortable but clear. The generated code base is the only ground truth. Continuous Integration becomes sacred: it must always run, and it is owned by the developers who watch it, not by the agents that feed it. My next test for our own setup is the oldest one in the book: take the system to a fresh machine and build it from scratch.

The same evening, we landed on the multi-team question. Andy and I already struggle to keep our two agent setups in sync. A team of five, where two developers are enthusiastic about AI and three are not, faces the same problem with higher stakes, plus the shadow inventory of tools people use but are not allowed to. And when one team’s heavily customized agents produce code that the next team can neither read nor trust, no interface contract repairs that relationship. None of this is a technical problem. Microservices did not solve the coordination problems of multi-team setups either, even though the interfaces and contracts were technically sound, because the problem was social all along. AI agents will not solve it for the same reason. Multi-team coordination is a social challenge wearing a technical costume.

The Practices Were Never About Typing

So Position A wins, but in a sharpened form. The agile engineering practices survive because they were never about typing code. They are mechanisms for making mistakes visible early, for keeping systems understandable, and for learning faster than the system grows. AI raises the stakes on all three. The generation is cheap now. The understanding is not.

Era IV teams, the AI-Enhanced Teams, will not abandon Pair Programming, Test-Driven Development, or Continuous Integration. They will run them at a different altitude: humans defining intent and acceptance, agents executing cycles, and the empirical loop holding the whole thing honest. The engineering gap between knowing a practice and living it does not close itself. AI just turned it from a quality issue into a survival issue.

References

Fokusthema webinar Agile Entwicklungspraktiken in KI-verstärkten Teams, Andreas Schliep, DasScrumTeam, June 10, 2026
Kent Beck, Extreme Programming Explained (eXtreme Programming)
Verband für adaptive Organisationen, Are teams still relevant?
Regional Scrum Gathering Tokyo 2026, panel discussion on AI and teams