Agentic Engineering Weekly for March 28 - April 4, 2026

Agentic Engineering Weekly for March 28 - April 4, 2026

The industry spent this week naming things it couldn't articulate before. They point at structural problems: the debts AI coding creates that aren't in the code, the paradox of agents that solve complexity by generating more of it, and the moment "harness engineering" stopped being a blog post and became a discipline.


My top 3 picks this week


Last video


Three debts, not one: code, cognition, and intent

We've spent years talking about technical debt as the central liability of software systems. AI coding tools are remarkably good at paying it down: refactors that took weeks now take minutes. But something uncomfortable is happening in return. Teams are accumulating two new forms of debt that live outside the codebase entirely, and we're only just finding the words for them.

Cognitive debt is the erosion of shared mental models. Peter Naur argued decades ago that a program is not its source code but the understanding in the heads of its makers. When agents generate large swaths of implementation, that understanding never forms. Story after story follows the same arc: vibe-coding goes well until you hit a wall, and no amount of prompting gets you past it, because nobody on the team has an internal model of how the system actually behaves. The countermeasure isn't "read more code." It's deliberate practices that rebuild understanding: component walkthroughs, generated architecture visualizations, and occasionally the radical move of regenerating a component from scratch specifically to rebuild the team's mental model.

Intent debt is subtler: the absence of captured rationale. Why was this boundary drawn here? What trade-off was made and what was rejected? When decisions get made in a prompt and the agent executes them, the reasoning evaporates. Six months later, nobody can tell whether a design choice was deliberate or an LLM default. The fix is intent-first workflows: ADRs, spec-driven development, BDD. Artifacts that let both future humans and agents make informed changes instead of guessing.

Worth reading:


Stripe ships 1,300 agent PRs per week: dark factories aren't coming, they're here

Dan Shapiro coined the term dark factory for the endgame of agentic engineering: a software production process so automated you can turn the lights off. This week, Stripe showed us what that looks like at scale. Their internal "minions" system ships approximately 1,300 pull requests per week with minimal human coding. That's not a research demo. That's production infrastructure at one of the most demanding engineering organizations on the planet.

What makes the Stripe story worth studying isn't the raw throughput. It's the preconditions. Their years of investment in developer experience, comprehensive documentation, blessed paths for common tasks, and robust CI/CD didn't just make human engineers faster. It made agents dramatically more successful. Clear docs on "how to add a new API field" translate directly into agent instructions. The lesson: organizations that invested in developer experience for humans are now reaping compound returns from agents. Organizations that didn't are stuck teaching agents to navigate the same chaos that frustrated their human developers.

Simon Willison frames this as a practical threshold crossing: late-2025 models made agentic engineering loops reliable enough to iterate without constant babysitting. The bottleneck shifted from writing code to specification, judgment, testing, and security. The control surface for software quality is moving from manual inspection toward simulation, testing, and system-level verification. We're not there yet for most teams. But the trajectory is no longer theoretical.

Worth reading:


Harness engineering graduates from blog post to discipline

Three independent publications landed in the same week, and the convergence is the signal. Birgitta Boeckeler published a mental model for building trust in coding agents through feedforward guides and feedback sensors. LangChain broke down the anatomy of an agent harness into composable components. And Addy Osmani documented orchestration patterns from subagents to full agent teams with quality gates. When Fowler's bliki, an AI-native framework company, and a Google DevRel lead all arrive at the same concept independently, it's time to start paying attention folks.

The practical implications are already sharp. We're finding that LLMs can realistically follow somewhere between 100 and 500 instructions before performance degrades. In a world of AGENTS.md files, custom skills, and layered system prompts, that instruction budget compounds fast. The temptation is to keep adding: more rules, more guardrails, more context. But the winning move is ruthless curation. Build your harness, evaluate it with and without each component to measure actual impact, and expect to deprecate parts of it as models improve and your understanding deepens. Today's harness is tomorrow's legacy code.

The deeper shift is from vibes to measurement. Jessica Wang at the Coding Agents Conference put it bluntly: without real eval datasets, scoring, and experiments, you're just guessing whether your agent setup actually works. "Shipping on vibes" is how AI breaks. The teams that treat their harness as an engineering artifact, with tests, metrics, and iteration cycles, will outperform the ones who treat it as a config file they tweak by feel.

Worth reading:


The agentic tar pit: every greenfield becomes brownfield

Agents are excellent at obliterating Fred Brooks' accidental complexity: the boilerplate, the scaffolding, the ceremony that exists because humans are slow at typing. But they re-introduce new accidental complexity in its place: wrong abstractions, defensive boilerplate, overcomplicated solutions that no human would have chosen. Left untouched, agents choke on their own bloated codebases. Every greenfield project, given enough unsupervised agent iterations, becomes brownfield.

Wes McKinney called this the Mythical Agent-Month at the O'Reilly AI CodeCon, and the framing is sharp. Brooks told us adding people makes late projects later. McKinney extends the argument: adding agents makes messy projects messier, because agents can't distinguish accidental from essential complexity. They treat everything as a problem to solve by generating more code. When code generation is effectively free, the only defense is product taste: the discipline to say no. Every feature agents produce is free to create but not free to maintain. Each one adds surface area for bugs, confusion, and future agentic mistakes.

Karpathy's framing complements this from the operator side. When things go wrong with agents, he argues it's an orchestration failure, a skill issue in the human operator, not a model failure: bad instructions, poor memory setup, weak task decomposition. The bottleneck has shifted from generation to verification. The more agents you run, the more you need to invest in verification infrastructure. Guard rails are not optional, they are the name of the game. But build them with the expectation that they'll be temporary. Model capability improves, harnesses like claude code expand, buliding blocks get commoditized. Protip: check out Wardley mapping if you're a leader struggling with making these kinds of investment choices today.

Worth reading:


Teach agents first, coach humans alongside

Here's a reframe I've been sitting with: if you're a technical coach, your highest-leverage move right now might not be teaching people directly. It might be encoding your expertise as an agent skill first. The skill becomes a reusable asset the next time you coach someone on the same topic. You can guide a learner through the material with the script. The learner can study the concept interactively and autonomously. And your coding agents can leverage it immediately if it benefits the harness. One artifact, four returns.

This doesn't mean coaching becomes a solo act between a person and a chatbot. The human coach still matters enormously, especially for the undercurrent work that no agent can do: understanding why someone resists a new practice, facilitating navigation of the grief of forced change of identity, meeting people where they actually are rather than where you think they should be. Forcing AI adoption is a losing strategy. At best you get compliance, at worst conflict. Neither leads to the genuine curiosity that adoption requires. The coaching fundamentals haven't changed: invite over inflict, lead by example, amplify what's already working.

TechWolf open-sourced their AI-first bootcamp this week, which built AI champions across HR, marketing, product, and finance. It's a useful reference for how to structure hands-on training that crosses disciplinary boundaries. Meanwhile, Kent Beck's Genie Sessions showed what it looks like to encode a specific practice (TCR: Test && Commit || Revert) as an agent skill. The pattern is emerging: coaching expertise becomes agent-consumable, and the coach's role shifts from being the delivery vehicle to curating and verifying the delivery system.

Worth reading:


Quick Hits


Curated from 229 sources across articles, podcasts, and videos. Week of March 28 - April 4, 2026.