How to Save Tokens in Claude Code Using Opus Plan Mode

Stop Burning Through Your Claude Code Token Budget

If you’ve been using Claude Code for serious development work, you’ve probably hit the wall. The session limit warning appears, your workflow stalls, and you’re left choosing between waiting for your quota to reset or losing momentum on a complex task.

There’s a smarter way to handle this. By using /model opus-plan in Claude Code, you can get Opus-quality reasoning for the parts of your workflow that actually need it — planning and architecture — while switching to Sonnet for the heavier execution work. The result is a meaningfully longer session before you hit the limit, without sacrificing the quality that makes Opus worth using in the first place.

This guide explains exactly how Opus plan mode works, when to use it, and how to build it into your Claude Code workflows.

What “Opus Plan Mode” Actually Means

Claude Code lets you switch models mid-session using the /model command. Opus plan mode takes advantage of this to split your work across two models based on what each does best.

Here’s the core idea:

Claude Opus handles planning, reasoning, and architecture decisions
Claude Sonnet handles execution — writing code, making edits, running commands

Wondering what the Hermes hype is about? Free 60-minute primer

This matters because most of the token consumption in a Claude Code session happens during execution. Generating files, writing functions, making iterative edits — these steps are where your token budget disappears. Sonnet is significantly cheaper per token than Opus, so routing execution through Sonnet while preserving Opus for high-stakes reasoning cuts your effective cost per session substantially.

The quality tradeoff is smaller than you’d expect. Opus is exceptional at reasoning through ambiguous problems, understanding nuanced requirements, and designing architectures. Sonnet is more than capable of implementing a well-specified plan. When you give Sonnet clear instructions shaped by Opus’s thinking, the output quality holds up.

How to Enable Opus Plan Mode in Claude Code

The Basic Command

Inside a Claude Code session, use the slash command to switch models:

/model claude-opus-4-5

Or to switch back to Sonnet for execution:

/model claude-sonnet-4-5

The “opus plan mode” workflow is just a disciplined pattern of using these commands at the right moments. You invoke Opus when you need to think, and Sonnet when you need to build.

Setting Up the Workflow Pattern

Here’s the sequence that works in practice:

Start your session with Opus — Use Opus to understand the problem, define requirements, and generate a detailed plan
Review and confirm the plan — Make sure the output captures everything you need before switching models
Switch to Sonnet for implementation — Use /model claude-sonnet-4-5 before you start the actual build work
Return to Opus for complex decisions — If you hit an unexpected problem or architectural question mid-build, switch back to Opus, resolve it, then return to Sonnet

This isn’t complicated. It’s just being intentional about which model is active at each stage.

What to Ask Opus Before Switching

The quality of your execution depends on the quality of your plan. When you’re in the Opus phase, push for specificity:

Ask Opus to break the task into numbered implementation steps
Request that it identify potential failure points or edge cases upfront
Have it specify file structure, function signatures, and data shapes before any code is written
Ask it to flag any ambiguities that need to be resolved before implementation starts

The more complete this plan, the less you’ll need to jump back to Opus during execution — which means more token savings.

Why This Extends Your Session Limit

Claude Code enforces usage limits per session. When you’re working on a large codebase or a complex feature, it’s easy to consume tokens fast. The cost difference between Opus and Sonnet is significant: Opus tokens cost roughly five times more than Sonnet tokens at the API level. Anthropic’s model pricing page has the current rates if you want to calculate exact savings for your usage patterns.

In practice, if your session would normally hit the limit after generating 10 files with Opus, routing that same generation through Sonnet might let you handle 40–50 files before hitting the same wall. The planning overhead is minimal compared to the implementation volume.

This also makes your sessions more predictable. Instead of burning Opus tokens on routine code generation, you reserve that budget for the moments where Opus genuinely earns it.

When Opus Is Worth the Token Cost

Not everything needs Opus reasoning. Here’s a practical breakdown of when each model makes sense:

Use Opus For:

Initial problem scoping — Translating vague requirements into a concrete technical plan
Architecture decisions — Choosing between approaches with real tradeoffs (database design, API structure, state management patterns)
Debugging complex issues — When the error isn’t obvious and you need genuine reasoning to trace the cause
Code review and critique — Evaluating whether an approach is sound before committing to it
Refactoring strategy — Planning how to restructure existing code without breaking things

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

remy.msagent.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

Use Sonnet For:

Writing boilerplate and scaffolding — Standard files, CRUD operations, test stubs
Implementing a defined plan — Executing steps that are already well-specified
Making targeted edits — Changing specific functions, updating configs, fixing defined bugs
Generating repetitive patterns — Similar components, consistent formatting, documentation strings
Running and interpreting terminal commands — File operations, dependency installs, test runs

The heuristic: if the task requires judgment and creativity, use Opus. If it requires accuracy and execution, Sonnet is fine.

Common Mistakes When Using Opus Plan Mode

Switching to Sonnet Before the Plan Is Complete

The most common error is switching models too early — before Opus has actually resolved all the ambiguities. If Sonnet encounters a decision point that wasn’t covered in the plan, it’ll make a reasonable guess. That guess might be fine, or it might send the implementation in the wrong direction. Take the extra time upfront to make the plan exhaustive.

Using Opus for Every Small Decision

Some people switch back to Opus every time they hit a minor question, which defeats the purpose. If Sonnet proposes two equally reasonable approaches to a small formatting question, just pick one. Save the Opus switch for real architectural decisions.

Forgetting to Check Which Model Is Active

Claude Code shows the active model in the interface, but it’s easy to lose track during a long session. Before a major generation task, verify you’re on Sonnet. Before asking for complex analysis or planning, verify you’re on Opus. A quick /model check takes two seconds and saves you from wasting tokens on the wrong model.

Writing Plans That Are Too High-Level

“Build the authentication system” is not a plan Sonnet can execute well. “Create a JWT-based auth middleware that validates tokens on /api/* routes, stores user ID in request context, and returns 401 with a JSON error body on failure” is a plan Sonnet can execute accurately. Opus should be generating the second kind of plan, not the first.

Building Token-Efficient Workflows Beyond Model Switching

Model switching is the highest-leverage move, but there are other habits that reduce unnecessary token consumption in Claude Code sessions.

Be Specific in Your Prompts

Vague prompts produce exploratory responses that consume tokens without moving the work forward. The more precise your instruction, the more directly the model can respond. This applies to both Opus and Sonnet, but it’s especially important when you’re trying to preserve session budget.

Avoid Asking for Explanations You Don’t Need

Claude Code will often explain its reasoning or describe what it’s doing as it works. This is useful for complex tasks, but unnecessary during routine execution. If Sonnet is writing a standard React component, you don’t need a paragraph explaining what a useState hook does. Use prompts like “write the code without explanation” when you’re in pure execution mode.

Use Compact Context Files

When working on large codebases, Claude Code loads context from the files you reference. Keeping your context files focused — rather than including entire large files when only a section is relevant — reduces the tokens consumed per turn. Reference specific functions or sections rather than whole files when possible.

Plan Your Session Before Starting

Before you open Claude Code, spend two minutes writing out what you’re trying to accomplish. This is especially useful for complex multi-file work. When you start the session with a clear brief, Opus’s planning phase goes faster and produces a better plan — which means Sonnet’s execution phase is more efficient.

How MindStudio Fits Into AI-Heavy Development Workflows

Claude Code with Opus plan mode is a sharp tool for developers building features and debugging code. But once you’ve built something with Claude Code, you often need infrastructure around it — ways to trigger workflows, connect external services, handle data routing, or expose your work to other systems.

That’s where MindStudio becomes relevant. MindStudio is a no-code platform for building AI agents and automated workflows, with access to 200+ models — including Claude — and 1,000+ pre-built integrations with tools like Slack, HubSpot, Airtable, and Google Workspace.

For developers specifically, MindStudio’s Agent Skills Plugin is worth looking at. It’s an npm SDK (@mindstudio-ai/agent) that lets Claude Code and other AI agents call MindStudio’s capabilities as simple method calls — things like agent.sendEmail(), agent.searchGoogle(), or agent.runWorkflow(). The plugin handles rate limiting, retries, and auth, so your agent focuses on reasoning instead of infrastructure plumbing.

If you’re building agents with Claude Code that need to interact with external services, this removes a significant amount of boilerplate. You can try MindStudio free at mindstudio.ai.

Frequently Asked Questions

What is Opus plan mode in Claude Code?

Opus plan mode is a workflow pattern — not a single feature — where you use Claude Opus for planning and architectural reasoning, then switch to Claude Sonnet for implementation and code generation. You enable it by using the /model command in Claude Code to switch between model versions at the appropriate stages of your work.

How much does Opus plan mode actually save in tokens?

The savings depend on your workflow, but the underlying math is straightforward: Opus costs roughly five times more per token than Sonnet. If 80% of your session tokens are consumed during execution (which is typical for large builds), routing that execution through Sonnet reduces your effective token spend by approximately 75–80% for the execution phase. Sessions that would previously hit limits after generating a few hundred lines of code can often handle thousands of lines before hitting the same limit.

Does using Sonnet for execution hurt code quality?

For well-specified tasks, the quality difference is minimal. Sonnet is a capable model for code generation, especially when it’s executing a clear plan rather than making open-ended architectural decisions. The places where Opus noticeably outperforms Sonnet are in reasoning through ambiguous problems and making nuanced tradeoffs — not in writing functions that follow a defined specification.

Can I set a default model for Claude Code sessions?

Remy doesn't write the code. It manages the agents who do.

AGENTS ASSIGNED TO THIS BUILD

Remy

Product Manager Agent

Leading

Design

Engineer

Deploy

Remy runs the project. The specialists do the work. You work with the PM, not the implementers.

Yes. You can configure a default model in Claude Code’s settings so new sessions start on your preferred model without requiring a manual switch. This is useful if you want Sonnet as your default and only switch to Opus when needed. Check the Claude Code documentation for the current configuration options, as the specifics may vary by version.

When should I switch back to Opus mid-session?

Switch back to Opus when you hit a problem that requires genuine reasoning rather than execution — unexpected bugs with unclear root causes, architectural decisions that weren’t anticipated in the plan, or situations where the requirements have changed significantly enough to need replanning. Don’t switch back for minor implementation decisions or questions where either answer would be reasonable.

Does this work for non-coding tasks in Claude Code?

Yes, the same logic applies to any complex task in Claude Code, not just writing code. If you’re using Claude Code for research tasks, document generation, or analysis work, the same pattern holds: use Opus to reason through the approach and structure, then use Sonnet to generate the bulk of the output. The model is the same; the task type doesn’t change the underlying economics.

Key Takeaways

Opus plan mode routes planning to Opus and execution to Sonnet, using the /model command to switch mid-session
Most token consumption happens during execution, so using Sonnet for that phase extends your session limit significantly
Plan quality determines execution quality — invest the time in Opus’s planning phase before switching to Sonnet
The quality tradeoff is small when Sonnet is executing a well-specified plan rather than making open-ended decisions
Other habits help too: specific prompts, avoiding unnecessary explanations, and focused context files all reduce token usage
MindStudio’s Agent Skills Plugin can handle the infrastructure layer for Claude Code agents that need to interact with external services

If you’re building automated workflows around the things Claude Code produces, MindStudio is worth exploring — especially if you want to connect your AI work to external tools without writing the plumbing yourself.