💻 The code isn't the bottleneck anymore (and your metrics are lying)

The AI Thinker Podcast

0:00

-6:01

💻 The code isn't the bottleneck anymore (and your metrics are lying)

Your team is adopting AI coding tools, but are you measuring what truly matters? A look at how agentic AI is flipping the script on software development

Adam Faik

Jun 15, 2025

Transcript

You’ve seen the demos. You’ve heard the hype. Your engineers are probably already using AI coding assistants, and the pressure is on to show the ROI. But as you look at the dashboards, a nagging question emerges: are we just getting more code, or are we actually getting better, faster? If you’re nodding along, you’re not alone. The way we measure engineering productivity is rapidly becoming obsolete.

A wave of insights from leaders at the forefront of this shift, including Anthropic’s CPO Mike Krieger, and veteran engineers like Thomas Ptacek and Armin Ronacher, reveals a startling new reality. At Anthropic, an astonishing 90-95% of the code for their Claude Code tool is written by AI itself. Meanwhile, the AI tool company Windsurf reports its agentic systems can write over 94% of a user’s committed code. But as these numbers soar, the real story isn’t just about code volume. It’s about a fundamental transformation in where the true bottlenecks lie. This isn’t just another tool; it’s a paradigm shift that demands a new playbook for product and innovation leaders.

What we’ll cover:

What is “agentic coding” and why is it a game-changer?
Why your “percentage of code written” metric can be deceptive
How should you adapt your team’s workflow?
- 1. Treat your tools and prompts as first-class citizens
- 2. Shift from code review to intent and outcome review
- 3. Redefine “productivity” around new bottlenecks
References

What is “agentic coding” and why is it a game-changer?

First, let’s clear up a key term. This isn’t about your standard autocomplete. Thomas Ptacek, a developer with decades of experience, puts it plainly:

If you’re making requests on a ChatGPT page and then pasting the resulting (broken) code into your editor, you’re not doing what the AI boosters are doing.

Agentic coding is a significant leap forward. Think of an AI agent as an autonomous junior developer. You give it a task, and it gets to work. According to experts like Ptacek and Armin Ronacher, these agents can:

Independently navigate your codebase, reading files to understand context.
Run tools like compilers, tests, and linters to check their own work.
Analyze the output from those tools and iterate on the code, correcting their own mistakes.
Make essentially arbitrary tool calls that you set up, extending their capabilities even further.

This autonomous loop is what separates a simple suggestion tool from a true collaborator. Windsurf’s data highlights the chasm: their non-agentic tools see a “Percentage of Code Written” (PCW) around 41%, while their agentic tools soar past 85%, even hitting 94% internally. This isn’t just a small bump; it’s a different category of impact.

Why your “percentage of code written” metric can be deceptive

Windsurf introduced a metric: Percentage of Code Written (PCW), which measures the proportion of committed code bytes attributable to their AI.

\(\begin{equation} \text{PCW} = 100 \times \frac{W}{W + D} \end{equation}\)

Where:

W represents the number of new, persisted bytes of code attributed to the AI tool.
D represents the number of new, persisted bytes of code attributed to the developer typing manually.

While it’s a more trustworthy metric than easily gamed stats like “acceptance rate,” it comes with a massive caveat that every leader needs to understand.

Windsurf is transparent that PCW is a directional proxy for value, not an absolute one. A 90% PCW does not equal a 90% boost in overall productivity. Why?

AI handles the boilerplate: As Windsurf and Ptacek point out, AI is exceptional at writing the tedious, repetitive, and “easy” code. That last 5-10% written by a human is often the core, complex logic that takes a disproportionate amount of time and critical thinking.
It ignores the rest of the workflow: Coding is just one part of the software development lifecycle. PCW doesn’t capture the time spent on architecture, deep debugging, code reviews, or strategic planning.
The bottleneck just moves: Anthropic’s CPO, Mike Krieger, offers a critical insight from his experience as "patient zero" in this new world. As coding speed skyrockets, the delays don’t disappear, they just pop up somewhere else:

We really rapidly became bottlenecked on other things like our merge queue... I’ve just found all these new bottlenecks in our system. There’s an upstream bottleneck, which is decision making and alignment.

For leaders, this means your focus must shift. Optimizing for raw lines of code is like celebrating how fast you can fill buckets when the pipe they connect to is clogged. Your new job is to find and unblock the real constraints, which are increasingly human and strategic.

How should you adapt your team’s workflow?

So, if raw output is a vanity metric, what should you focus on? The consensus from experts points to a new set of best practices for working with, and leading, teams in the agentic era.

1. Treat your tools and prompts as first-class citizens

The quality of AI output is directly tied to the quality of its environment and instructions.

Build fast, reliable tooling: Armin Ronacher emphasizes that tools for agents must be fast and provide clear, user-friendly error messages. An agent can’t fix what it can’t understand. He even builds custom tooling into a “Makefile” so the agent can easily call a “make dev” command.
Make prompts a team asset: The team behind an open-source Cloudflare library, built almost entirely by Claude, included the prompt in every single git commit. This turns your version history into a record of intent, not just a record of code changes. It becomes a new, powerful form of documentation.
Master the art of the prompt: Developer Philipp Spiess advises breaking down large problems and using sub-agents for specialized tasks. Instead of one giant, one-shot prompt, think in iterations and precise, context-rich instructions.

2. Shift from code review to intent and outcome review

Mike Krieger says that his Claude Code team realized traditional, line-by-line reviews of large, AI-generated pull requests were impractical.

Is the human oversight still there? Absolutely. But it's shifting. The developer is no longer just a writer but a curator, an editor, and a guide.
Focus on the “Why” and the “What”: Does the change accomplish the intended goal? Does it align with the product strategy? Is the user experience right? These are the questions that require human judgment.
Embrace the Human-in-the-Loop: Philipp Spiess found that trying to achieve full autonomy with automated feedback cycles was often less effective than simply having a human review the output and provide direct, contextual feedback for the next iteration.

3. Redefine “productivity” around new bottlenecks

With code generation becoming a commodity, competitive advantage shifts.

From typing to thinking: The crucial skills are now about framing the problem correctly, asking the right questions, and exercising good judgment. As one member of the Cursor team said, “taste in code... is actually gonna become even more important as these models get better.”
Identify your real blockers: Is your merge queue jammed? Is decision-making stalled in meetings? Is your team misaligned on strategy? That’s where you’ll find your new leverage points for improvement.

The era of agentic coding is here, and it’s moving at a breakneck pace. It challenges our long-held assumptions about what it means to build software. As code becomes abundant and cheap, the value shifts from the act of writing it to the wisdom of knowing what to write and why.

This leaves product and innovation leaders with a critical question to ponder: When your AI can write the code, what is the most valuable thing you can ask your humans to do?