Claude Code /ultra review: 5 Things You Need to Know Before Running It ($5–$20 Per Run)
Ultra review spins parallel reviewer agents but costs $5–$20 per run and requires a Claude account, not just an API key. What to know first.
Anthropic Just Shipped /ultra review — 5 Things You Need to Know Before It Costs You $20
Anthropic shipped /ultra review alongside Opus 4.7, and it’s the most consequential addition to Claude Code in months. Here are 5 things buried in the details that matter before you run it.
The short version: /ultra review spins parallel reviewer agents against your codebase, each attacking from a different angle — logic, security, performance, edge cases — and only surfaces bugs that have been independently reproduced and verified. It requires Claude Code v2.1.86 or later, a Claude account login (an API key alone won’t work), and costs $5–$20 per run after your first three free ones on Pro or Max plans.
That last part deserves more attention than it’s getting.
The Mechanics: What /ultra review Actually Does
When you type /review in Claude Code, you get a structured local code review. Fast, cheap, useful. It runs in your session, costs you normal usage tokens, and catches the obvious stuff.
/ultra review is a different operation entirely. It uploads your branch to a cloud sandbox and dispatches a fleet of reviewer agents in parallel. Each agent has a specific mandate — one is looking at logic, one at security, one at performance, one at edge cases. They work simultaneously, not sequentially.
The key design decision: before a bug appears on your list, it has to be independently reproduced and verified by multiple agents. This is the part that makes /ultra review genuinely different from running /review twice. You’re not getting a pile of style nitpicks or speculative warnings. You’re getting confirmed bugs.
The tradeoff is time and money. A run takes 10–20 minutes. It runs in the background, so you can keep working, but you’re paying for that cloud compute. The $5–$20 range depends on codebase size. Large monorepos will hit the high end.
The Account Requirement Nobody Mentions in the Headline
Here’s the detail that will catch people: you cannot use /ultra review with an API key alone.
You need to be signed into Claude Code with an actual Claude account. If you’ve been running Claude Code purely through the API — which is a common setup, especially for teams managing costs through Open Router’s free model tier — you’ll need to change your authentication setup before /ultra review works at all.
This is a meaningful distinction. Anthropic is tying the feature to account-level billing, not API-level billing. The implication is that /ultra review is a subscription feature, not a pure pay-per-token feature. The three free runs on Pro and Max plans reinforce this — it’s positioned as a plan benefit, not just a metered API call.
Check your Claude Code version first: claude --version. You need 2.1.86 minimum. If you’re below that, the command won’t exist.
When to Use /review vs. /ultra review
The practical workflow here is straightforward once you understand the cost structure.
Use /review constantly. It’s fast, it’s cheap, it runs locally, and it catches real issues. Every meaningful commit should go through /review. Think of it as the automated code review you run before asking a senior engineer to look at something.
Reserve /ultra review for the commits where a production bug would cost more than $20 to fix. That’s a short list, but it’s a real one:
- Major refactors touching core business logic
- Anything that touches payments, authentication, or database migrations
- Security-sensitive code paths
- Pre-launch releases where you can’t afford a hotfix cycle
The framing from the source material is exactly right: “the kind of commit where a production bug costs way more than the time and the tokens that ultra review may cost you.” That’s the decision rule. Apply it literally.
If you’re already using the Opus plan mode for planning and Sonnet for execution to manage costs, /ultra review slots in at the end of that workflow — plan with Opus, execute with Sonnet, verify with /ultra review before merge.
The Workflow Stack /ultra review Fits Into
/ultra review doesn’t exist in isolation. It’s the final gate in a workflow that starts much earlier.
The skills that precede it matter. The superpowers skill — 150,000+ GitHub stars — forces Claude to plan before coding, work in an isolated environment, and write tests before writing implementation code. It does two-stage review: once for spec conformance, once for code quality. Running superpowers throughout development means /ultra review is catching the bugs that survived a rigorous process, not cleaning up after a sloppy one.
Plans first. Then code.
Remy writes the spec, manages the build, and ships the app.
The GSD (Get Shit Done) skill handles context engineering — spawning fresh sub-agents per task so your main session doesn’t degrade as the context window fills. Install it from your terminal, then run /gsd-help inside Claude Code to see available commands. GSD’s three phases — plan, execute, verify — map cleanly onto the broader workflow: superpowers handles the plan and execute quality, GSD handles the context hygiene, /ultra review handles the final verification.
The Context Mode skill is worth mentioning here because it affects how much context survives to the review stage. It compresses tool call output aggressively — a 56KB Playwright snapshot becomes 299 bytes, a 46KB access log becomes 155 bytes, and a full session that would have been 315KB becomes 5KB total. That compression matters when you’re trying to maintain coherent context across a long development session before running your final review.
The complete stack: superpowers for disciplined development process → GSD for clean context across a multi-session project → /review for continuous feedback → /ultra review before merge on anything that matters.
The Non-Obvious Cost Calculus
$5–$20 per run sounds expensive until you price the alternative.
A production bug in a payment flow costs engineering time to diagnose, time to fix, time to deploy a hotfix, potential customer support load, and potential revenue impact. Even at the low end of that calculation — say, two hours of senior engineer time — you’re already past $20 in labor cost. For anything touching auth or database migrations, the cost of a missed bug is measured in hours or days, not minutes.
The real question isn’t “is $20 expensive?” It’s “what’s the expected value of catching one confirmed bug before it ships?” For most production codebases, the answer makes /ultra review cheap.
That said, the cost structure does create a discipline incentive. When /review is free and /ultra review costs real money, you naturally reserve the expensive tool for the expensive commits. That’s a good forcing function. It pushes you to use /review more aggressively throughout development so that by the time you’re ready for /ultra review, you’re not paying $20 to catch bugs that a free local review would have caught.
This is also where Claude Code’s token management practices compound with /ultra review economics. Keeping your sessions clean — compacting early, managing context deliberately — means you’re not burning tokens on context rot before you even get to the review stage. The /compact command is the most direct lever here: run it before context rot sets in, not after.
The Skill That Builds the Skills You’re Reviewing
One more piece of the stack worth understanding: the skill creator skill, which is an official Anthropic skill you install with /plugin install skill creator. It drafts, tests, and packages new skills from plain English descriptions. You describe what you want, Claude drafts the skill, iterates on it, and packages it into a reusable .md file.
The relevance to /ultra review is indirect but real. The quality of what you’re reviewing depends on the quality of the skills that built it. If your codebase was assembled with well-structured, tested skills rather than ad-hoc prompting, /ultra review is doing final verification on solid foundations. If it was built with flaky, manually-cobbled skills, /ultra review is doing damage control.
Coding agents automate the 5%. Remy runs the 95%.
The bottleneck was never typing the code. It was knowing what to build.
The skill creator compresses the skill-building learning curve significantly. You don’t need to understand the internal structure of a skill file to produce a reliable one. That matters for teams where not everyone has deep Claude Code expertise — the skill creator democratizes skill quality, which raises the floor on what /ultra review is reviewing.
For teams building production applications from AI-generated code, the question of how that code gets structured is increasingly important. Tools like Remy take a different approach to this problem: you write a spec — annotated markdown — and a complete full-stack application gets compiled from it, TypeScript backend, SQLite database, auth, deployment, all of it. The spec is the source of truth; the generated code is derived output. That’s a different abstraction layer than Claude Code skills, but the underlying question is the same: how do you maintain quality and coherence as AI generates more of your codebase? If you’re evaluating platforms for orchestrating the broader AI workflow around that code — model selection, integrations, agent coordination — MindStudio is worth understanding: it’s an enterprise AI platform with 200+ models, 1,000+ integrations, and a visual builder for orchestrating agents and workflows, which makes it a natural complement to the kind of multi-agent review infrastructure /ultra review represents.
The ClaudeMem Factor for Long Projects
One more skill worth understanding in the context of /ultra review: ClaudeMem, which handles cross-session memory via SQLite and vector search.
ClaudeMem uses a three-layer retrieval system — compact index first, timeline around relevant observations second, full details only when needed for handoff. The repo reports roughly 10x token savings on retrieval compared to dumping all past context at session start. For long projects spanning multiple sessions, this means Claude actually remembers what you were building and why when you come back to it.
Why does this matter for /ultra review? Because the quality of a code review depends partly on understanding intent. If Claude has persistent memory of the architectural decisions, the requirements, the edge cases you already considered — that context informs what the reviewer agents flag as a bug versus an intentional design choice. A reviewer that knows you deliberately chose eventual consistency for a specific reason won’t flag it as a missing transaction. ClaudeMem is what gives that context continuity across sessions.
Install it through the plugin marketplace. The repo has a specific warning: don’t run the npm install command — that installs the SDK library only and the hooks never register. Use the two plugin commands instead.
What to Actually Do Right Now
Check your Claude Code version. If you’re below 2.1.86, update before anything else.
Confirm you’re signed in with a Claude account, not just an API key. If you’re on an API-key-only setup, decide whether you want to switch authentication or keep /ultra review off the table for now.
Run /review on your current working branch. Get a baseline sense of what local review catches. Then, the next time you’re about to merge something that touches a critical path — payments, auth, a schema migration — run /ultra review and compare what it surfaces versus what /review caught.
Other agents start typing. Remy starts asking.
Scoping, trade-offs, edge cases — the real work. Before a line of code.
If you’re on Pro or Max, you have three free runs. Use them on real merges, not toy projects. The point is to calibrate whether the confirmed-bug quality justifies the cost for your specific codebase and workflow.
If you’re building the kind of multi-agent workflow infrastructure where agentic workflow patterns matter — chained skills, scheduled tasks, parallel agents — /ultra review is the quality gate that makes autonomous execution trustworthy. You can’t have agents shipping code autonomously if you don’t have a reliable way to verify what they produced.
The feature is real. The cost is real. The decision rule is simple: if a production bug would cost more than $20, run it.