How to Save Tokens in Claude Code Using the Opus Plan Mode

Q: Can I switch back to full Opus mode mid-session?

Yes. You can use /model claude-opus-4 (or whichever Opus version you want) to switch back to a single-model Opus session at any point. You might do this if you're hitting a particularly complex issue that requires sustained Opus reasoning throughout.

The Hidden Cost of Using Opus for Everything

If you’ve been using Claude Code for serious development work, you’ve probably hit the wall: a session that chews through your token budget before you’ve finished debugging, or a monthly bill that climbs faster than expected because Opus is doing all the heavy lifting — including the routine stuff it doesn’t need to.

Claude Code’s Opus Plan Mode is a practical fix for this. By typing /model opus-plan, you split the cognitive labor: Claude Opus handles the thinking and planning, while Claude Sonnet executes the actual code changes. The result is fewer tokens spent on execution, longer sessions before hitting limits, and noticeably lower costs — without sacrificing the reasoning quality that makes Opus worth using in the first place.

This guide explains exactly how it works, when to use it, and how to get the most out of it.

What Opus Plan Mode Actually Does

When you run Claude Code with the default Opus model, every step of the process — understanding your request, reasoning through a solution, writing code, reviewing diffs, responding to follow-ups — consumes Opus tokens. Opus is exceptional at deep reasoning, but it’s also the most expensive Claude model per token, and a lot of what it does during a session doesn’t require that level of capability.

Opus Plan Mode changes this by introducing a two-model workflow:

Planning phase (Opus): The model thinks through the problem, breaks it into steps, and builds a structured plan. This is where Opus earns its place — complex reasoning, architectural decisions, edge case analysis.
Execution phase (Sonnet): Once the plan exists, Sonnet takes over to implement it. Writing boilerplate, applying straightforward changes, running commands, formatting output — Sonnet handles this fast and cheaply.

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

remy.msagent.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

The command to activate this is:

/model opus-plan

Once active, Claude Code automatically routes planning tasks to Opus and execution tasks to Sonnet within the same session. You don’t have to manually switch models or manage two separate contexts.

Why This Matters: Token Economics in Claude Code

Understanding why this optimization works requires a quick look at how Claude Code consumes tokens and how the two models are priced.

Input vs. Output Tokens

Every message you send, every file Claude reads, and every response it generates costs tokens. In an agentic coding session, Claude Code is reading file contents, tracking context across multiple turns, and generating both reasoning and code. Token consumption adds up fast — especially with large codebases.

The Opus vs. Sonnet Pricing Gap

Claude Opus is significantly more expensive than Claude Sonnet per million tokens. The exact pricing changes, but the ratio is substantial — Opus can cost several times more per token than Sonnet. For a heavy coding session, this difference translates directly into how far your token budget goes.

Session Limits

Claude Code sessions have token limits. When you hit the limit, you either start a new session (losing context) or pay for extended usage. By routing execution tasks to Sonnet, you dramatically reduce per-task token consumption, which means you stay under the limit longer and maintain context across more work.

The math is simple: if 60% of your session tokens are spent on execution tasks that Sonnet can handle just as well, switching those tasks to Sonnet cuts your effective cost by a significant fraction.

How to Enable and Use Opus Plan Mode

Prerequisites

Before you start, make sure you have:

Claude Code installed (npm install -g @anthropic-ai/claude-code or the current install method from Anthropic’s documentation)
An active Anthropic API key with access to both Claude Opus and Claude Sonnet
A Claude Code session open in your terminal

Step 1: Start Your Claude Code Session

Open your project directory in the terminal and start Claude Code as you normally would:

claude

Step 2: Activate Opus Plan Mode

Inside the Claude Code session, type the slash command:

/model opus-plan

You’ll see a confirmation that the model has been switched to the opus-plan configuration. From this point forward, Claude Code will use the two-model routing internally.

Step 3: Describe Your Task as a Planning Problem

To get the most out of Opus Plan Mode, frame your initial request in a way that encourages upfront planning. Instead of jumping straight to implementation, describe the goal and ask Claude to plan it first.

Less effective:

Add authentication to my Express app

More effective:

I need to add JWT authentication to my Express app. 
The app currently has user routes but no auth middleware. 
Plan out what needs to change before we start implementing.

The second prompt signals that you want a plan. Opus will think through the architecture, identify which files need changes, flag potential issues, and outline the steps. Once that plan is complete, Sonnet handles the implementation.

Plans first. Then code.

PROJECTYOUR APP

SCREENS12

DB TABLES6

BUILT BYREMY

1280 px · TYP.

yourapp.msagent.ai

A · UI · FRONT END

Remy writes the spec, manages the build, and ships the app.

Step 4: Review the Plan Before Execution

One of the underappreciated benefits of this workflow is that you get a clear plan to review before any code changes happen. This is a good moment to:

Catch misunderstandings before they compound
Adjust scope (“let’s skip the refresh token logic for now”)
Ask Opus follow-up questions about specific decisions
Confirm the plan matches your actual intentions

Don’t rush past this step. The plan review is where you can prevent Sonnet from writing code based on incorrect assumptions.

Step 5: Let Sonnet Execute

Once you approve the plan, tell Claude to proceed:

That looks right. Go ahead and implement it.

Sonnet takes over from here — applying changes, writing functions, creating files, running any necessary commands. If something unexpected comes up during execution that requires re-planning, Claude Code will route back to Opus automatically.

Step 6: Iterate Efficiently

For follow-up tasks in the same session, keep using the same pattern. Complex decisions go to Opus; routine implementation goes to Sonnet. The session stays open longer, your context is preserved, and your token spend stays lower.

When to Use Opus Plan Mode (and When Not To)

Opus Plan Mode isn’t the right choice for every situation. Here’s a practical breakdown.

Good fits for Opus Plan Mode

Refactoring large codebases — Multiple files, complex dependencies, risk of breaking changes. Opus reasons through the full impact; Sonnet makes the changes.
Building new features from scratch — Architectural decisions benefit from Opus’s reasoning. Implementation is mostly mechanical.
Debugging complex issues — Opus is better at hypothesizing causes and ruling them out systematically. Sonnet writes the fix.
Long sessions with many tasks — Any time you expect to be in a session for an extended period, Opus Plan Mode extends how long you can work before hitting token limits.
Unfamiliar codebases — When Claude needs to read and understand a lot of context before acting, Opus’s planning phase is worth the token cost.

Cases where it adds less value

Quick one-off tasks — If you’re asking Claude to rename a variable or add a comment, the overhead of plan-then-execute isn’t worth it.
Highly iterative, exploratory work — When you’re not sure what you want and expect to change direction frequently, a rigid plan can slow you down.
Strictly Sonnet-level tasks — If the entire task is something Sonnet can handle without deep reasoning, just use Sonnet directly.

Tips to Maximize Token Savings

Beyond just enabling Opus Plan Mode, a few habits will make a meaningful difference in your token usage.

Be specific in your planning prompts

Vague prompts produce verbose plans. The more precisely you describe the problem, the tighter Opus’s plan will be — and the fewer tokens it will use.

Use `/clear` between unrelated tasks

If you’ve finished one feature and are moving to something unrelated, clearing the context with /clear prevents Claude from carrying forward irrelevant history that adds to your input token count.

Not a coding agent. A product manager.

Remy doesn't type the next file. Remy runs the project — manages the agents, coordinates the layers, ships the app.

BY MINDSTUDIO

Instead of asking for one change at a time, describe a set of related changes in one planning prompt. Opus will handle them as a coherent plan, and Sonnet will execute them in sequence. This is more efficient than repeated back-and-forth.

Limit file context when possible

If you’re working on a specific module, tell Claude which files are relevant. Asking it to read the entire codebase for a scoped change wastes tokens on context that won’t be used.

Check token usage mid-session

Claude Code provides visibility into your current session’s token consumption. Periodically check where you are, especially before starting a large task. If you’re close to the limit, consider whether to start a new session rather than hitting the wall mid-task.

Common Mistakes and How to Avoid Them

Skipping the plan review

The biggest mistake is treating the plan as a formality and immediately saying “go ahead.” Plans can have errors. Opus might misunderstand the structure of your codebase, plan changes to the wrong files, or over-engineer a simple problem. Reading the plan takes 30 seconds. Undoing bad implementation changes takes much longer.

Using Opus Plan Mode for trivial tasks

Every planning phase costs Opus tokens, even when the resulting plan is short. If your task doesn’t benefit from planning — like a simple string replacement — you’re spending Opus tokens to produce a one-line plan that Sonnet executes in seconds. Use the right model for the right task.

Not providing enough context upfront

If your initial prompt is missing important constraints or context, Opus will plan based on assumptions. You’ll catch the problem at the review stage, which is fine — but you’ve spent tokens on a plan you’re about to revise. Give Claude the constraints it needs the first time.

Ignoring the model switch confirmation

When you type /model opus-plan, confirm that the mode actually activated before proceeding. If there’s an error or the command isn’t recognized, you’ll be running the session on whatever model was previously selected.

Extending This Pattern Beyond Claude Code

The plan-then-execute model that Opus Plan Mode implements is a broadly useful pattern for AI-assisted work. The core idea — use your most capable model for reasoning and decision-making, use a faster/cheaper model for execution — applies anywhere you’re coordinating AI tasks.

Where MindStudio Fits

If you’re building AI workflows or agents outside of Claude Code, MindStudio lets you implement this same multi-model pattern visually, without code.

MindStudio gives you access to 200+ models — including the full Claude family, GPT-4o, Gemini, and others — in a single workflow builder. You can chain models together: use Claude Opus for a complex reasoning step, then route the output to Sonnet or another model for implementation or generation. The model selection logic is part of the workflow, not hardcoded into application logic.

This is particularly useful when you’re building agents that need to reason carefully about one thing (say, analyzing a support ticket’s intent and complexity) before taking action (routing it, drafting a reply, triggering an integration). You get the quality of Opus where it matters, without paying for it everywhere.

You can try MindStudio free at mindstudio.ai. No API key setup required — the models are available out of the box.

Hire a contractor. Not another power tool.

Cursor, Bolt, Lovable, v0 are tools. You still run the project.
With Remy, the project runs itself.

Frequently Asked Questions

What is Opus Plan Mode in Claude Code?

Opus Plan Mode is a Claude Code configuration activated with the /model opus-plan command. It routes planning and reasoning tasks to Claude Opus and execution tasks to Claude Sonnet within the same session. The goal is to reduce token consumption during the execution phase while preserving Opus’s reasoning quality where it matters most.

How much does Opus Plan Mode actually save?

The savings depend on how token-heavy your execution steps are. In sessions where a significant portion of work is mechanical implementation — writing functions, applying changes across files, running commands — you can reduce costs substantially, since those steps run on Sonnet instead of Opus. Sessions with complex, iterative reasoning throughout will see smaller savings. Most users report that sessions last meaningfully longer before hitting token limits.

Does using Sonnet for execution reduce code quality?

For most execution tasks, no. Sonnet is highly capable at writing clean, correct code when given a clear plan. The quality difference between Sonnet and Opus shows up most in complex reasoning tasks — architectural analysis, nuanced debugging, understanding ambiguous requirements. When those tasks are handled by Opus in the planning phase, Sonnet has what it needs to execute well.

Can I switch back to full Opus mode mid-session?

Yes. You can use /model claude-opus-4 (or whichever Opus version you want) to switch back to a single-model Opus session at any point. You might do this if you’re hitting a particularly complex issue that requires sustained Opus reasoning throughout.

Is Opus Plan Mode the same as using extended thinking?

No, they’re separate features. Extended thinking gives a single model more space to reason before responding. Opus Plan Mode routes tasks between two different models. You can use extended thinking within the Opus planning step for especially complex problems, but they’re independent controls.

Does this work with all versions of Claude Opus and Sonnet?

The specific model versions available in Claude Code depend on your API access and Anthropic’s current offerings. The /model opus-plan command maps to the default Opus and Sonnet versions configured by Claude Code at the time of your session. Check Anthropic’s Claude Code documentation for the current version mappings and any updates to the command syntax.

Key Takeaways

/model opus-plan activates a two-model workflow in Claude Code: Opus for planning, Sonnet for execution.
This reduces token consumption on execution-heavy tasks, which are the majority of steps in most coding sessions.
Sessions last longer before hitting token limits, and overall API costs are lower without sacrificing reasoning quality.
The plan review step — between Opus’s planning output and Sonnet’s execution — is where you catch misunderstandings before they become problems.
The same plan-then-execute pattern applies broadly to AI workflows, including multi-model pipelines built with tools like MindStudio.

Remy doesn't build the plumbing. It inherits it.

Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.

WHAT REMY DOESN'T HAVE TO BUILD

200+

AI MODELS

GPT · Claude · Gemini · Llama

✓

1,000+

INTEGRATIONS

Slack · Stripe · Notion · HubSpot

✓

MANAGED DB

AUTH

PAYMENTS

CRONS

Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.

If you’re spending serious time in Claude Code, Opus Plan Mode is one of the highest-leverage adjustments you can make. It takes ten seconds to enable and has a real impact on how far your session budget goes.