Skip to main content
MindStudio
Pricing
Blog About
My Workspace

Claude Code's Creator Says Anthropic Has Zero Manually Written Code — What That Means for Agentic Engineering

Boris Churnney says no one at Anthropic writes code by hand anymore. Here's what that signals about the future of agentic engineering.

MindStudio Team RSS
Claude Code's Creator Says Anthropic Has Zero Manually Written Code — What That Means for Agentic Engineering

Boris Churnney Just Told You Something Important About How Software Gets Built

Boris Churnney, the creator of Claude Code, said something at Anthropic’s Code with Claude event that deserves more attention than it got: “There is literally no manually written code anywhere in the company anymore.” Not “we use AI a lot.” Not “we’re experimenting with agentic workflows.” Literally zero manually written code. At Anthropic. The company building the models.

If you’re an engineer or an AI builder, you should sit with that for a second.

This isn’t a marketing claim. Churnney said it in a panel discussion, almost as a side note, while explaining why the term “vibe coding” was starting to annoy him. His point was that the term undersells what’s actually happening. Vibe coding implies something casual, imprecise, a little sloppy. What Anthropic is actually running is something more systematic: Claudes coordinating with each other over Slack, coding in loops, resolving issues across the codebase, with copious automated testing and verification before anything ships.

That’s not vibing. That’s a different kind of engineering.


Why “Vibe Coding” Is the Wrong Frame

Andrej Karpathy coined the term “vibe coding” and it stuck, partly because it captured something real: the feeling of describing what you want and watching code appear. But Karpathy himself has since suggested a replacement — “agentic engineering” — and Churnney is actively soliciting better alternatives because neither term quite fits what’s happening at the frontier.

Remy doesn't build the plumbing. It inherits it.

Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.

200+
AI MODELS
GPT · Claude · Gemini · Llama
1,000+
INTEGRATIONS
Slack · Stripe · Notion · HubSpot
MANAGED DB
AUTH
PAYMENTS
CRONS

Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.

The distinction matters. Vibe coding implies you’re the creative director and the AI is your intern. You describe, it produces, you accept or reject. The loop is short. The human is in the loop constantly.

What Churnney is describing is different in kind. The agents are running loops autonomously. They’re coordinating with each other. They’re catching their own errors. The human isn’t reviewing every output — the system is. This is closer to managing an engineering org than to pair programming with an AI.

The terminology gap is actually a signal. When practitioners at the frontier can’t find a word for what they’re doing, it usually means the practice has outrun the mental models people are using to describe it.


What the Anthropic Stack Actually Looks Like

To understand why Churnney’s claim is credible, you need to look at what Anthropic shipped at the same event.

The managed agents platform now supports multi-agent orchestration: a lead agent breaks a job into pieces and delegates to specialist sub-agents, each with its own model, prompts, and tools. Sub-agents work in parallel on a shared file system. The lead agent can check in mid-workflow. Everything is auditable in Claude Console — what each sub-agent did, in what order, and the reasoning behind each decision.

This is the infrastructure that makes “no manually written code” plausible. You’re not asking one model to write a whole codebase. You’re running an orchestrated system where different agents handle different parts of the problem, verify each other’s work, and loop until the output meets a standard.

That standard is enforced by a feature called Outcomes. The user writes a rubric for what success looks like. A separate grading agent scores the output against that rubric — independently, without being influenced by the reasoning of the task agent. If the output doesn’t pass, the grading agent kicks it back for another run. Anthropic reported that using Outcomes improved file generation quality by 8.4% for Word documents and 10.1% for PowerPoint slides on their internal benchmarks. That’s the first time Anthropic has applied external grading to non-code knowledge work at scale, and the numbers are meaningful.

For code specifically, the rubric is even more tractable. A PR either works or it doesn’t. Unit tests pass or they fail. The grading loop is tight and well-defined. When Churnney says there’s no manually written code at Anthropic, part of what he means is that the verification layer is automated — the agents aren’t just writing code, they’re running it, testing it, and iterating until it passes.


The Memory Layer That Makes This Durable

One thing that often gets missed in discussions of agentic coding is the memory problem. An agent that can write good code in a single session is useful. An agent that gets better at writing code for your codebase over time is something else.

Hire a contractor. Not another power tool.

Cursor, Bolt, Lovable, v0 are tools. You still run the project.
With Remy, the project runs itself.

Anthropic’s answer to this is a feature called Dreaming: a scheduled background process that reviews past agent sessions and memory stores, extracts patterns, and curates memories so agents improve over time. The framing in Anthropic’s documentation is precise — Dreaming “surfaces patterns that a single agent can’t see on its own, including recurring mistakes, workflows that agents converge on, and preferences shared across a team.” It also restructures memory so it stays high signal as it scales.

The core idea is that agents don’t just complete tasks — they report what they learned while doing them. Those learnings get encoded in orchestration memory and preloaded the next time a sub-agent is called. Sessions compound.

This is architecturally similar to what the open-source Hermes agent has been doing for months — reviewing past conversations, building skills from experience, maintaining cross-session memory. Jeten Gar put it plainly: “The open-source agent ecosystem is leading on primitives… The closed labs have raw model capability. The open source ecosystem has agent primitives.” Anthropic is now shipping production versions of patterns the open-source community validated first. If you’ve been following Claude Code’s three-layer memory architecture, the Dreaming feature is the orchestration-level complement to what’s happening at the session level.


What “No Manually Written Code” Actually Requires

Here’s the non-obvious part. Churnney’s claim isn’t just about model capability. It’s about the entire system around the model.

You can’t run a zero-manual-code engineering org with a single agent in a chat window. You need:

Reliable orchestration. The lead agent needs to decompose tasks correctly and delegate to the right specialist. Multi-agent orchestration with shared file systems and mid-workflow check-ins is what makes this tractable at scale.

External verification. The grading agent that scores output against a rubric is doing the work that a human code reviewer used to do. Without this, you’re trusting the agent to evaluate its own output, which is a known failure mode. The separation between the task agent and the grading agent is load-bearing.

Persistent, improving memory. If agents start from scratch every session, you’re not building an engineering org — you’re hiring contractors who forget everything between projects. Dreaming is what turns a collection of capable agents into something that behaves more like a team.

Automated testing. Churnney specifically mentioned “copious automated testing and verification” as part of Anthropic’s workflow. This is the part that makes the loop close. The grading agent can run tests. Tests can fail. The task agent can fix the failure. No human required.

The interesting thing is that none of these components are new in isolation. Automated testing has existed for decades. External code review is standard practice. Memory systems have been in open-source agent frameworks for a year. What’s new is that all of these pieces are now integrated into a managed platform that a non-expert can configure. If you want to understand the building blocks Anthropic is exposing, the WAT framework — Workflows, Agents, and Tools is a useful mental model for how these components fit together.


The Abstraction Layer Is Moving Up

There’s a longer arc here worth naming.

Every generation of programming has been about raising the abstraction level. Assembly over machine code. C over assembly. High-level languages over C. Managed runtimes over manual memory management. Each step made programmers more productive and made certain classes of errors impossible by construction.

Everyone else built a construction worker.
We built the contractor.

🦺
CODING AGENT
Types the code you tell it to.
One file at a time.
🧠
CONTRACTOR · REMY
Runs the entire build.
UI, API, database, deploy.

Agentic engineering is the next step in that sequence. You’re not writing code — you’re writing specs, rubrics, and orchestration logic. The agents write the code. The grading agents verify it. The memory systems make the whole thing improve over time.

This is why tools like Remy are worth paying attention to: you write a spec — annotated markdown where readable prose carries intent and annotations carry precision — and a full-stack application gets compiled from it. TypeScript backend, SQLite database, auth, deployment, all of it. The spec is the source of truth; the code is derived output. That’s the same abstraction shift Churnney is describing, applied to the full-stack layer.

The implication for builders is that the skill that matters is no longer “can you write correct code” but “can you specify what correct looks like.” Rubrics, test criteria, success definitions — these are the new primitives. The Claude Code skills and plugins system reflects this: you’re writing playbooks that define what good looks like, and the agents execute against them.


What Comes Next

Anthropic’s research head Diane Penn hinted at three things in development: higher judgment and code taste in future models, context windows that “feel infinite,” and better multi-agent coordination. The infinite context framing is interesting — Penn’s precise wording was “context windows that feel infinite,” which suggests this might be sophisticated compaction rather than a fundamental architectural change. But the functional result is the same: agents that can maintain coherent context across arbitrarily long tasks.

Higher code taste is the one I’d watch most closely. Right now, the grading agent in Outcomes is scoring against a rubric you write. That rubric is only as good as your ability to specify what good code looks like. If future models have better intrinsic judgment about code quality, the rubric becomes a floor rather than a ceiling — the model can exceed your spec, not just meet it.

That’s when “no manually written code” stops being a claim about infrastructure and starts being a claim about capability.


The Practical Implication for Engineers Right Now

If you’re building with Claude Code or any agentic coding system, the Churnney claim is a useful calibration point. The question isn’t whether you should use AI for coding — that’s settled. The question is whether your setup has the properties that make zero-manual-code plausible.

Do you have a verification layer that’s independent of the generation layer? Do your agents have memory that persists and improves across sessions? Do you have a rubric that defines what “done” means, not just “generated”?

Most setups don’t have all three. Most setups have a capable model and a human in the loop doing the verification work that the grading agent should be doing. That’s fine — it’s where most teams are. But it’s worth being honest that what Anthropic is running is architecturally different from “I use Claude Code to write functions.”

For teams building multi-agent systems without wanting to wire up all the orchestration infrastructure themselves, platforms like MindStudio handle the coordination layer: 200+ models, 1,000+ integrations, and a visual builder for chaining agents and workflows. The orchestration primitives Anthropic is shipping as managed agents, MindStudio exposes as composable building blocks.

Plans first. Then code.

PROJECTYOUR APP
SCREENS12
DB TABLES6
BUILT BYREMY
1280 px · TYP.
yourapp.msagent.ai
A · UI · FRONT END

Remy writes the spec, manages the build, and ships the app.

The Every/Spiral writing agent is a good example of what this looks like in practice. Spiral uses a multi-agent system with different Anthropic models for cost optimization, and now uses the Outcomes feature with an editorial rubric to enforce writing quality. The rubric is based on editorial standards and writer voice. The grading agent enforces it. Humans aren’t reviewing every draft — the system is. That’s the pattern.

Churnney’s claim isn’t a prediction about where AI is going. It’s a description of where Anthropic already is. The infrastructure to replicate it is now available to anyone building on managed agents. The question is whether you’re willing to invest in the verification and memory layers that make it work — or whether you’re still thinking of agentic coding as a faster way to write code manually.

Those are different systems. One of them scales. The other one doesn’t.

Presented by MindStudio

No spam. Unsubscribe anytime.