Vibe Coding vs Agentic Engineering — Karpathy's Framework for Knowing Which One You're Actually Doing

The Line Karpathy Drew That Most Builders Are Ignoring

You are either raising the floor or raising the ceiling. Karpathy’s distinction between vibe coding and agentic engineering is that clean, and most people building with AI right now have not figured out which one they’re actually doing.

Karpathy coined “vibe coding” — he gets to define it. At Sequoia’s annual AI event, he drew the line explicitly: vibe coding raises the floor for everyone in terms of what they can do with software. Agentic engineering preserves the quality bar of what existed before in professional software. Two different goals. Two different disciplines. Conflating them is how you end up with either a toy that can’t scale or a professional workflow that’s slower than it should be because you’re treating production code like a weekend hack.

The distinction matters right now because December 2025 was, in Karpathy’s words, a “clear point” where something fundamentally changed. He was on a break, had more time, started noticing that with the latest models the chunks just came out fine. He kept asking for more. It kept coming out fine. He can’t remember the last time he corrected the model. That’s not a minor improvement in autocomplete. That’s a different category of tool — and it demands a different mental model for how you use it.

What Actually Separates the Two

Remy doesn't build the plumbing. It inherits it.

Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.

WHAT REMY DOESN'T HAVE TO BUILD

200+

AI MODELS

GPT · Claude · Gemini · Llama

✓

1,000+

INTEGRATIONS

Slack · Stripe · Notion · HubSpot

✓

MANAGED DB

AUTH

PAYMENTS

CRONS

Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.

The surface difference is obvious: vibe coding is casual, agentic engineering is professional. But that framing undersells the real distinction, which is about accountability and intent.

Vibe coding is about access. The point is that anyone — someone who has never written a line of Python, never thought about a call stack — can now build software. They don’t need to understand syntax. They don’t need to know why something works. They describe an outcome and something gets built. That’s genuinely valuable. The floor of what’s possible for a non-programmer has risen dramatically.

But “the floor rises” is the tell. Floor-raising is about inclusion, not excellence. The output of vibe coding is not meant to be deployed to a million users. It’s not meant to pass a security audit. It’s not meant to be maintained by a team of engineers who didn’t write it. Vibe coding produces software the way a weekend cooking class produces food — real, edible, sometimes surprisingly good, but not what you’d serve at a restaurant.

Agentic engineering is about throughput at professional quality. Karpathy’s framing is precise: you are still responsible for your software just as before. You are not allowed to introduce vulnerabilities because you were vibe coding. What changes is speed. Can you go faster? Yes. But how do you coordinate agents — which he describes as “spiky entities, a bit fallible, a little bit stochastic, but extremely powerful” — without sacrificing the quality bar?

That coordination is the skill. It’s not a prompt. It’s a discipline.

Vibe Coding: What It Is and What It Isn’t

Vibe coding is real and it’s useful. Karpathy invented the term; he’s not dismissing it. The ability for a non-programmer to describe what they want and get working software is a genuine expansion of who gets to build things.

The Software 3.0 framing is relevant here. Software 1.0 was explicit rules — you wrote code. Software 2.0 was learned weights — you arranged datasets and trained neural networks. Software 3.0 is prompting: your lever over the interpreter is the context window. Vibe coding is the most accessible expression of Software 3.0. You don’t need to know anything about the underlying system. You just describe the outcome.

The OpenClaw installation is a perfect illustration. You’d expect a bash script — a shell script to run to install the tool. Instead, the installation is a copy-paste of text you give to your agent. No precise specification of individual steps. The agent has its own intelligence, looks at your environment, performs intelligent actions, debugs in the loop. That’s the Software 3.0 paradigm. And for someone who doesn’t know what a bash script is, this is enormously enabling.

But vibe coding has a ceiling, and it’s lower than most people admit. The code it produces is often, in Karpathy’s own words, “not super amazing code necessarily all the time — very bloaty, a lot of copy-paste, awkward abstractions that are brittle.” It works. It’s just gross. For a personal project or a prototype, that’s fine. For production software with real users and real security requirements, it’s a liability.

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

The other thing vibe coding doesn’t give you: understanding. Karpathy cited a tweet he thinks about every other day — “you can outsource your thinking but you can’t outsource your understanding.” If you vibe-coded your way to a working app and something breaks, you have no model of why. You’re not debugging; you’re guessing. That’s a fine tradeoff for a weekend project. It’s a serious problem for anything you’re accountable for.

Agentic Engineering: The Ceiling Gets Higher

Agentic engineering is what happens when a professional software engineer — someone who already understands the quality bar — uses agents to go faster without dropping that bar.

The example Karpathy gives is Peter Steinberger, who runs dozens, sometimes a hundred agents in parallel, automating different parts of his development flow. Not just writing code — the deployment sequence, bug detection, PR management. Steinberger understands what each agent is doing. He can verify the output. He’s not outsourcing his understanding; he’s outsourcing his execution.

That’s the key distinction. Agentic engineering requires you to understand what correct output looks like. You’re not hoping the agent got it right. You’re checking. The verifiability thesis runs through this: LLMs automate what you can verify. An experienced engineer can verify code quality, architecture decisions, security posture. A vibe coder, by definition, often cannot.

The jaggedness of current models makes this even more important. Claude Opus 4.7 can simultaneously refactor a 100,000-line codebase and tell you to walk 50 meters to a car wash rather than drive, because driving a car to a car wash is not a domain with strong RL training signal. The model’s capability profile is spiky. An agentic engineer knows where the spikes are and routes work accordingly. A vibe coder doesn’t know what they don’t know.

This is also why the orchestration layer matters so much. Platforms like MindStudio handle this kind of coordination across agents — 200+ models, 1,000+ integrations, a visual builder for chaining agents and workflows — which lets engineers focus on the quality bar rather than the plumbing. But the engineer still has to set the quality bar. The platform doesn’t do that for you.

The Accountability Gap

Here’s the opinion: most people calling themselves “agentic engineers” right now are actually vibe coding with extra steps.

The tell is whether you’re checking the output or just shipping it. Agentic engineering is not defined by the number of agents you’re running or the complexity of your orchestration. It’s defined by whether you maintain accountability for what gets shipped. If you’re running a hundred agents in parallel but you’re not verifying the output of each one against a professional quality standard, you’re vibe coding at scale. That’s not a compliment.

Karpathy is explicit about this: “You’re still responsible for your software just as before.” The agents don’t absorb your liability. They’re interns — remarkable interns, but interns. You’re still the engineer of record.

This matters especially for anyone building production applications. The WAT framework for structuring agent workflows is one approach to maintaining that accountability — separating workflows, agents, and tools into distinct layers so you can reason about what each component is doing. The point isn’t the framework; the point is having a mental model that lets you verify outputs at each layer.

Remy is new. The platform isn't.

Remy

Product Manager Agent

THE PLATFORM

200+ models 1,000+ integrations Managed DB Auth Payments Deploy

▮

BUILT BY MINDSTUDIO

Shipping agent infrastructure since 2021

Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.

For teams thinking about how production apps get built from AI-generated source, Remy takes a different approach: you write a spec — annotated markdown where prose carries intent and annotations carry precision — and the full-stack app compiles from it. TypeScript backend, SQLite database, auth, deployment. The spec is the source of truth; the code is derived output. That’s not vibe coding. The spec is explicit. It’s a higher-level form of specification, not an abdication of specification.

Which One You’re Actually Doing

The practical test is simple. After the agent produces output, do you understand it well enough to defend it?

If you’re building a personal tool, a prototype, something to validate an idea — vibe coding is the right choice. The floor has risen. Use it. You don’t need to understand every line of generated code to ship something useful for yourself or a small audience. The Qwen 3.6 Plus model’s agentic coding capabilities and others like it have made this genuinely accessible in a way that wasn’t true eighteen months ago.

If you’re building production software — software with real users, real security requirements, real accountability — you need to be doing agentic engineering. That means:

You understand the architecture before you ask the agent to implement it. You verify outputs against a quality standard you can articulate. You’re not just checking “does it run” but “is this the right abstraction.” You maintain the judgment layer even as you delegate the execution layer.

The Claude Code source code leak revealed how much of the agentic coding infrastructure is designed around exactly this kind of human-in-the-loop verification — features built on the assumption that a competent engineer is reviewing what the agent produces, not just accepting it.

The ceiling is genuinely higher now. Karpathy said December 2025 was the inflection point. He’s right. But a higher ceiling only helps you if you’re doing the work to reach it. Vibe coding gets you off the floor. Agentic engineering is what you do when you’re ready to climb.

Use Vibe Coding If, Use Agentic Engineering If

Use vibe coding if:

You’re not a professional developer and you want to build something that works
You’re prototyping an idea and speed matters more than quality
The stakes of failure are low — personal use, internal tools, throwaway scripts
You don’t need to maintain or scale the output
You’re learning what’s possible before committing to a real build

Use agentic engineering if:

You’re a professional developer who is accountable for what ships
You’re building software with real users, real security requirements, or real uptime expectations
You need to maintain the codebase after the initial build
You’re running agents at scale and need to verify output quality across all of them
You want to go faster without introducing the kind of technical debt that vibe coding accumulates

The distinction Karpathy drew is not a judgment about which approach is better in the abstract. It’s a description of two different tools for two different jobs. A hammer is not better than a screwdriver. But if you’re using a hammer on a screw because you don’t know the difference, that’s a problem.

Day one: idea. Day one: app.

DAY

DELIVERED

Not a sprint plan. Not a quarterly OKR. A finished product by end of day.

For teams navigating this in practice, the comparison between Claude Opus 4.6 and GPT-5.4 for agentic workflows is worth reading — not because the model choice is the main variable, but because understanding what each model is actually good at is part of what separates agentic engineering from vibe coding. You have to know your tools.

Karpathy coined the term. He drew the line. The question is which side of it you’re actually on.