Agentic Engineering Weekly for March 20-27, 2026

Agentic Engineering Weekly for March 20-27, 2026

This was the week practitioners started naming the costs nobody had been tracking. AI CodeCon crystallized an agent orchestration playbook, but the more interesting signal came from the margins: new vocabulary for the things going wrong. The agentic tar pit. Craft alienation. The liability gap. When people coin terms, they're trying to get a handle on problems they can feel but couldn't previously articulate.


My top 3 picks this week


Judgment is measurable: Jeff Gothelf ships the rubric

AI made building cheap. That's no longer the controversial claim it was a year ago. The interesting question is what happens next: if execution cost drops to near-zero, what's left to differentiate? Jeff Gothelf answered with a concrete 4-part rubric for measuring product judgment, the thing everyone agrees matters most but nobody could score. This is the first actionable framework that turns "taste" from a vague hiring criterion into something visible, scorable, and teachable.

Three separate pieces this week converged on the same conclusion from different angles. Kent C. Dodds argued that prompting is now a junior skill and that architecture and product taste is where senior leverage lives. Luca Mezzalira wrote that AI-assisted coding is splitting the profession into three distinct roles with different leverage points. Derek Comartin pointed out that the design decisions, trade-offs, and systems thinking were always the real work: we just couldn't see it clearly when coding was the bottleneck. I personally frame it as a paint-by-numbers metaphor (hi Cynefin!): the creative choices and decisions (composition, palette, subject) remain yours. The filling-in of the colors can be delegated to the slop-machine.

What Gothelf adds is the measurement piece. It's not enough to say judgment matters. If you can't score it, you can't teach it, you can't hire for it, and you can't tell whether your team is getting better at it. The rubric makes the implicit explicit, and that's the kind of tool that changes behavior.

Worth reading:


The agentic tar pit: agents can't tell accidental from essential complexity

Multiple speakers at AI CodeCon independently named the same trap, and it's worth paying attention when uncoordinated practitioners converge on the same diagnosis. The pattern: agents accumulate accidental complexity freely but lack the judgment to recognize when they've hit essential complexity. Brooks' No Silver Bullet applies directly here. The agentic tar pit turns every greenfield project brownfield, not through malice or bugs, but through the steady accumulation of decisions that no one is steering.

The .NET team's 10-month report on GitHub Copilot Coding Agent in dotnet/runtime provides rare empirical grounding. This isn't a startup demo or a weekend project: it's a massive, real-world codebase with data-driven lessons. Meanwhile, the failure modes compound. When you sequence agents, errors propagate: each agent's output becomes the next agent's input, and the compounding makes multi-agent systems fragile at scale.

The practical antidote showed up in the same week's reading. Djordje Babic's piece on using hexagonal architecture to keep agents from eating their own tail is the most actionable response: old patterns solving new problems. If you separate the decision boundary from the execution boundary, you can let agents fill in the deterministic parts without ceding the architectural choices. The tar pit is avoidable, but only if you know it's there.

Worth reading:


Kent Beck's 'Still Burning': a manifesto for geeks who lost their footing

Kent Beck launched a new podcast this week, and the title tells you everything: Still Burning. It's not just about AI technique. It's about the emotional reality of practitioners whose hard-won skills lost leverage overnight. The opening episode is a fireside manifesto for geeks navigating a world that shifted under their feet, and the honesty is striking: "Nobody knows." No framework, no five-step plan. Just an acknowledgment that the ground moved and the only way to play is to, well, start playing and poking around with the "genie".

Beck isn't alone in this lane. Hong Minhee published a piece applying Marx's concept of labour alienation to LLM-assisted coding: when work bypasses the meaningful parts, workers lose connection to intentional creation. The source isn't the tool itself but the structural pressure tying livelihood to output metrics. Mario Zechner's raw pushback against relentless acceleration ("Thoughts on slowing the fuck down") hits the same nerve from a different angle. These aren't Luddite takes. They're practitioners with decades of credibility saying: the acceleration is real, the disorientation is real, and pretending otherwise helps nobody.

What makes this signal rather than noise is the audience. These aren't people who resist change. Beck wrote the Agile Manifesto. Minhee maintains open-source infrastructure. Zechner builds game engines. When the people who've spent their careers embracing change start saying "hold on," it's worth listening to what they're actually saying: not "stop" but "acknowledge the cost."

Worth reading:


From conductor to orchestrator: the agent playbook is crystallizing

AI CodeCon produced something we haven't had before: a concrete engineering playbook for agent orchestration that goes beyond "just prompt better." Addy Osmani's "Code Agent Orchestra" laid out the architecture: subagents for context window management, agent teams with messaging, and the shift from pair programmer to swarm coordinator. Philip Carter's "two computers" model (deterministic and non-deterministic, running in parallel) gives engineers a mental framework for deciding what belongs where. The key insight: 99% of today's knowledge work doesn't have a skill file yet, and context quality, not quantity, is what separates a skilled operator from a vanilla agent.

Anthropic's own engineering team published practical guidance on harness design for long-running agentic development. This is infrastructure-level thinking: how do you keep agents productive over hours or days without losing control? Birgitta Boeckeler from Thoughtworks added the leadership angle: AI augmentation brings new responsibilities for technical leaders, not fewer. And Martin Fowler's team framed the design problem as positioning humans and agents in complementary loops rather than replacement hierarchies.

Tim O'Reilly opened the conference with the dragon metaphor: you can ride it or be eaten by it, but you can't ignore it. What's notable about this week's playbook is that it's moving past the dragon-riding metaphors into actual engineering patterns. "Decompose before you prompt" is becoming a design principle, not a suggestion. The gap between practitioners who internalize this playbook and those who don't will widen fast.

Worth reading:


Red-Green-Refactor your context: TDD discipline applied to AI sessions

TestDouble published the most practically useful framing of the week: applying TDD's red-green-refactor loop to AI coding sessions. The problem they're solving is real and underappreciated. Every AI coding session generates knowledge: what worked, what didn't, what the constraints actually are. That knowledge vanishes when the session ends. The TDD-inspired loop captures what you learned and routes it where your team will find it.

The same week, three separate pieces renegotiated what "good code" and "craft" mean in agentic workflows. TestDouble's own companion piece asks directly: what does "good code" even mean now? If we've optimized code for human readers for decades, and agents are becoming the primary readers and writers, the emphasis shifts toward system-level observability and problem-space implementation. Thoughtworks' "Beyond Vibe Coding" makes the enterprise case: throwing raw prompts at a chat interface doesn't scale, and they propose five building blocks for what comes next.

The connecting thread is that the discipline of TDD and agentic AI reinforce each other when applied thoughtfully. The red-green-refactor loop isn't just a testing pattern: it's a knowledge-capture pattern. Write a failing test (make the requirement explicit), make it pass (let the agent do the filling-in), refactor (capture what you learned into the codebase's structure). The practitioners who were already doing TDD have a structural advantage in the agentic era, because they already have the habit of making implicit knowledge explicit before the session ends.

Worth reading:


Who goes to prison when AI ships the bug?

Mark Seemann posed the question that the "ship it faster" crowd keeps dodging: who is liable for code written by LLMs? It's not a rhetorical exercise. Are we waiting for another challenger disaster as we further normalize deviance? As AI-generated code moves from prototypes to production, the legal and economic infrastructure hasn't caught up. Taleb's ergodicity comes to mind here: there is definitely an upside, but there's also chance for a catastrophic downside in every success story being sold today.

The liability gap sits alongside a broader economic reckoning. Ardalis examined what happens when investor-subsidized pricing gives way to realistic business models, asking where the real costs of agentic development will land. Ed Zitron's detailed analysis of NVIDIA, Anthropic, and OpenAI finances provides the numbers behind the vibes. Karen Hao's investigative journalism, based on interviews with 90 OpenAI employees, documents the gap between AI claims and reality from the inside. Whether or not you agree with the "bubble" framing, the convergence of legal uncertainty and economic reality checks in a single week is worth noting.

The practical takeaway for engineering leaders: if your team is shipping AI-generated code to production, you need a position on accountability before your legal team asks for one. The technology is years ahead of the infrastructure that governs it.

Worth reading:


Quick Hits


Curated from 40 sources across articles, podcasts, and videos. Week of March 20-27, 2026.