How to Use Claude Code Ultra Code Mode for Deep Research and Complex Tasks

What Makes Claude Code’s Multi-Agent Mode Different

If you’ve used Claude Code for everyday tasks — fixing a bug, writing a function, explaining a codebase — you already know it’s fast and capable. But there’s a class of problems where a single agent working sequentially just isn’t enough: large-scale refactors, deep research across dozens of sources, complex architectural analysis, or generating multiple competing implementations and picking the best one.

That’s where Claude Code’s multi-agent approach comes in. Often called “Ultra Code” mode in developer circles, this pattern uses Claude as an orchestrator that spawns parallel sub-agents, applies adversarial checks, and runs tournament-style selection to produce results that a single-pass agent rarely matches. Understanding how it works — and when to use it — is the difference between getting a good answer and getting the right one.

This guide covers the core multi-agent patterns: fan-out, adversarial verification, and tournament selection. It also covers the practical side: token costs, when not to use these patterns, and how to keep things from getting expensive.

Understanding the Core Architecture

Claude Code is Anthropic’s agentic coding assistant that operates directly in your terminal. Beyond its standard conversational mode, it supports multi-agent orchestration — meaning it can act as a top-level orchestrator that delegates subtasks to specialized sub-agents, each with its own context, tools, and instructions.

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

remy.msagent.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

This isn’t a separate product you install. It’s a capability you invoke through how you structure your prompts and workflows, particularly when running Claude Code in non-interactive or “headless” mode via its API or command-line flags.

The Orchestrator-Subagent Model

In a multi-agent setup, one Claude instance acts as the planner and coordinator. It breaks down a complex task into discrete units, spawns sub-agents to handle each unit in parallel, and then synthesizes the results. Sub-agents can have access to different tools — one might read the filesystem, another might run tests, a third might search external documentation.

This model works because it mirrors how software teams actually operate: a lead engineer defines the architecture, specialists handle individual components, and someone reviews the final output before it ships.

Why Parallelism Matters for Complex Tasks

Sequential reasoning has a ceiling. If you ask a single agent to analyze ten files, it processes them one at a time, and earlier context can dilute later analysis as the context window fills up. With parallel sub-agents, each one gets a clean, focused context window for its specific subtask. The orchestrator then combines outputs that were produced with full attention on each individual piece.

For deep research tasks — say, reviewing the security posture of a large codebase — this parallel approach can cut execution time dramatically and produce more thorough results.

The Fan-Out Pattern: Parallel Research and Analysis

Fan-out is the foundational multi-agent pattern. The orchestrator takes a task and “fans out” work across multiple sub-agents simultaneously, then aggregates their results.

How Fan-Out Works in Practice

Here’s a concrete example: You want to understand how a large open-source library handles authentication across different versions.

A single-agent approach processes each version’s source sequentially. A fan-out approach works like this:

The orchestrator identifies the scope — say, five major versions of the library.
It spawns five sub-agents, one per version.
Each sub-agent independently analyzes authentication logic in its assigned version.
The orchestrator receives five structured reports and synthesizes a comparative analysis.

The wall-clock time is roughly that of analyzing one version. The coverage is all five.

When Fan-Out Is Worth It

Fan-out adds real value when:

Tasks are naturally parallel — analyzing multiple files, repositories, or documents that don’t depend on each other.
Breadth matters — you need complete coverage, not a representative sample.
Context windows are a bottleneck — splitting work prevents earlier analysis from crowding out later work.

Fan-out is less valuable when tasks are inherently sequential, where output A must inform input B. In those cases, a linear chain of agents (or a single agent with good prompting) is more appropriate.

Structuring Fan-Out Prompts

To trigger effective fan-out behavior, be explicit in your orchestrator prompt. Describe the task decomposition upfront:

Analyze the following five modules independently, then produce a unified summary:
1. auth/session.py
2. auth/token.py
3. auth/middleware.py
4. auth/validators.py
5. auth/permissions.py

For each module, report: [specific criteria]

This gives Claude clear decomposition boundaries. Without explicit structure, a single agent will often just process things sequentially rather than parallelizing.

Adversarial Verification: Building In a Critic

Fan-out improves breadth and speed. Adversarial verification improves accuracy and catches errors that a single-pass agent misses.

Hermes Crash Course — free 1-hour live workshop

The pattern is simple: after an agent produces output, a second agent is given that output and asked to challenge it, find flaws, or verify claims independently. The first agent’s work becomes the input, not the ground truth.

Why You Need an Adversarial Layer

AI agents, including Claude, can be confidently wrong. When an agent generates an analysis or implementation, it tends to be internally consistent — which means errors that fit the agent’s internal model will pass unnoticed. The same agent reviewing its own work often can’t see its own blind spots.

A separate agent with instructions to be skeptical, to find edge cases, or to independently verify key claims will catch a different category of errors. This is the same reason code review exists: you need someone who didn’t write the code to read it.

Implementing Adversarial Verification

A two-agent adversarial loop looks like this:

Generator agent: Produces the initial output (code, analysis, plan).
Critic agent: Receives the output and instructions to find errors, logical gaps, missing edge cases, or incorrect assumptions.
Orchestrator: Reconciles the critique with the original output, either triggering a revision or producing a synthesized result.

The critic’s prompt matters a lot. A weak critic prompt (“Is this correct?”) produces weak criticism. A strong critic prompt is specific:

You are reviewing the following code implementation. Your job is to find problems.
Look specifically for:
- Off-by-one errors
- Unhandled edge cases (null inputs, empty arrays, concurrent access)
- Security vulnerabilities
- Performance issues at scale
- Places where the implementation diverges from the stated requirements

Do not summarize what the code does. Only report problems.

That specificity produces actionable feedback, not general endorsements.

Adversarial Verification for Research Tasks

This pattern also applies to non-code tasks. If you’re using Claude Code for deep research — summarizing a body of literature, synthesizing findings across documentation — an adversarial agent can fact-check claims, identify where conclusions outrun the evidence, and flag contradictions.

For high-stakes analysis, running at least one adversarial pass before trusting the output is good practice.

Tournament Patterns: Competing for the Best Result

Tournament selection takes a different approach to quality. Instead of generating one answer and then critiquing it, you generate multiple independent answers and select the best one.

The Tournament Model

The pattern works like this:

Spawn N independent generator agents — each with the same task but potentially different prompting or temperature settings.
Each agent produces a complete solution independently, with no visibility into what the others are doing.
A judge agent evaluates all outputs against defined criteria and selects a winner, or synthesizes the best elements from each.

This is particularly effective for creative or optimization-heavy tasks where there are multiple valid approaches and you want to explore the solution space rather than commit to one path.

Use Cases Where Tournaments Shine

Algorithm implementation: Three different agents implement a sorting algorithm. The judge picks the one with the best balance of clarity, performance, and correctness.
API design: Multiple agents propose different interface designs. The judge evaluates usability, consistency, and extensibility.
Refactoring strategies: Give multiple agents a messy codebase and ask each to propose a refactoring approach. Compare the strategies before committing.
Prompt engineering: Generate multiple candidate prompts for a downstream task and test which performs best.

Wondering what the Hermes hype is about? Free 60-minute primer

Defining Good Evaluation Criteria

A tournament is only as good as its judge. The judge agent needs explicit, weighted criteria — not just “which is best?”

Example judge prompt structure:

You are evaluating three implementations of the same function. Score each on:
1. Correctness (40 points): Does it handle all specified test cases?
2. Readability (30 points): Is the code easy to understand and maintain?
3. Performance (20 points): Does it avoid unnecessary computation?
4. Edge case handling (10 points): Does it handle nulls, empty inputs, and boundary conditions?

Score each implementation. Identify the winner. If no single implementation wins on all criteria, propose a synthesized approach.

Numeric scoring forces the judge to make real tradeoffs rather than producing vague comparative statements.

When to Use Multi-Agent Patterns (and When Not To)

These patterns are genuinely useful — and genuinely expensive. Using them for tasks that don’t require them is wasteful. Here’s a practical decision framework.

Use Multi-Agent Patterns When:

The task is large enough that context window limits are a real constraint. If your codebase analysis requires reading more content than fits in one context window, fan-out is necessary, not optional.
Accuracy stakes are high. Adversarial verification is worth the cost when an error has real consequences — shipping broken code, making a wrong architectural decision, publishing incorrect analysis.
The solution space is genuinely open. Tournament patterns help when you don’t know which approach is best and the cost of choosing wrong is significant.
Time-to-completion matters. Parallel sub-agents can dramatically cut wall-clock time for tasks that can be decomposed.

Don’t Use Multi-Agent Patterns When:

The task is simple and self-contained. Writing a utility function or explaining a concept doesn’t need adversarial verification.
Token budget is tight. Every sub-agent consumes tokens. A three-agent tournament with an adversarial pass might use 10x the tokens of a single-agent response.
Tasks are inherently sequential. If step 2 requires the output of step 1, parallel fan-out doesn’t help and adds coordination overhead.
Speed matters more than depth. A quick single-agent answer is often better than a multi-agent process that takes longer.

Controlling Token Costs in Multi-Agent Workflows

Token costs are the main practical constraint on multi-agent usage. Here’s how to keep them manageable.

Use Smaller Models for Sub-Agent Tasks

Not every sub-agent needs to be running the most capable model. An orchestrator making high-level decisions might benefit from Claude Opus or Sonnet. Sub-agents doing mechanical work — reading files, extracting structured data, running specific analysis scripts — can often use smaller, faster, cheaper models.

Match model capability to task complexity. Reserve the heavyweight models for tasks that actually require them.

Limit Context per Sub-Agent

Sub-agents should get exactly the context they need — not the full project context. If a sub-agent’s job is to analyze one file, pass it that file. Don’t pass it the entire codebase “for reference.”

Tight context scoping keeps per-agent token consumption low and keeps each agent focused.

Set Explicit Stopping Conditions

Multi-agent workflows can loop. Adversarial agents can trigger revision after revision if the critic always finds something. Set explicit stopping conditions: maximum revision rounds, minimum quality thresholds, or time limits.

Without stopping conditions, a well-intended adversarial loop can run far longer than it should.

Cache Where You Can

The Anthropic API supports prompt caching, which reduces costs for repeated context. If multiple sub-agents are all starting with the same system prompt or the same large document, caching that shared prefix cuts costs significantly.

Monitor and Profile Before Scaling

Before running a multi-agent workflow at scale, run it on a small subset and profile the token usage. Multiply out to your full dataset. If the cost is acceptable, proceed. If not, optimize the workflow first.

How MindStudio Fits Into Multi-Agent Workflows

If you’re building on top of Claude Code’s multi-agent capabilities and want to productionize them — connecting them to business tools, scheduling runs, or exposing them as services — that’s where MindStudio becomes relevant.

MindStudio is a no-code platform for building and deploying AI agents and workflows. It supports 200+ models out of the box (including the full Claude family) and provides the infrastructure layer that multi-agent systems need: scheduling, retries, error handling, integrations, and monitoring.

The relevant piece for developers working with Claude Code is the Agent Skills Plugin — an npm SDK (@mindstudio-ai/agent) that lets any AI agent, including Claude Code, call MindStudio’s 120+ typed capabilities as simple method calls. Instead of building infrastructure around your multi-agent workflows from scratch, you can call methods like agent.searchGoogle(), agent.sendEmail(), agent.runWorkflow(), or agent.generateImage() directly from your agent code.

This is particularly useful for the kind of deep research workflows described in this article. A Claude Code orchestrator running a fan-out research pattern can use MindStudio’s search capabilities to pull live data, store structured results to Airtable or Notion, and trigger downstream notifications — all without building any of that plumbing yourself.

You can build multi-step AI workflows visually on MindStudio or use the SDK to extend agents you’re already building. It’s free to start at mindstudio.ai.

Practical Tips for Getting Results

A few things that make a real difference when running multi-agent workflows with Claude Code:

Be explicit about the agent’s role. The orchestrator prompt should clearly state that it’s coordinating sub-agents, not doing everything itself. Sub-agent prompts should clearly define scope and deliverable format.

Use structured output formats. Ask sub-agents to return JSON or clearly delimited sections. This makes the orchestrator’s synthesis job dramatically easier and reduces parsing errors.

Log intermediate results. In complex workflows, write sub-agent outputs to disk or a structured store before aggregating. This gives you debuggability and lets you rerun failed steps without rerunning everything.

Test your critic prompts separately. A weak critic doesn’t add value. Before integrating adversarial verification into a workflow, test your critic prompt in isolation on known-flawed outputs to verify it actually catches problems.

Version your prompts. Multi-agent workflows have many moving parts. Keeping prompts versioned and changes tracked makes debugging much easier when output quality shifts.

Frequently Asked Questions

What is Claude Code Ultra mode?

“Ultra mode” refers to using Claude Code in a multi-agent orchestration configuration — specifically patterns where Claude spawns parallel sub-agents, applies adversarial verification, or runs tournament-style solution selection. It’s not a separately named feature but a way of structuring Claude Code workflows to handle tasks that exceed what a single agent pass can reliably produce.

How do I start a multi-agent workflow in Claude Code?

You invoke multi-agent behavior through your prompting strategy and, for automated workflows, through Claude Code’s headless API mode. Describe the decomposition of tasks explicitly in your orchestrator prompt, and use the --output-format json flag when running non-interactively so results are machine-parseable for downstream agents.

How much does multi-agent mode cost compared to single-agent?

It depends heavily on the number of agents and their context sizes. A simple fan-out with three sub-agents will use roughly 3x the tokens of a single-agent response, plus the orchestrator overhead. Adversarial passes add another agent’s worth of tokens. Tournament patterns multiply costs by the number of competitors. For complex workflows, expect 5–20x the token consumption of a single-pass approach. Use smaller models for sub-agents where possible to control costs.

When should I use adversarial verification vs. tournament selection?

Use adversarial verification when you have one candidate answer and want to stress-test it for errors. Use tournament selection when you’re uncertain which approach or solution is best and want to explore multiple options before committing. You can combine them: run a tournament to generate candidates, then apply adversarial verification to the winner.

Can Claude Code multi-agent workflows connect to external tools?

Yes. Sub-agents can use any tools available to Claude Code — file system access, bash commands, web search (via tools), and API calls. For broader integration with business tools like Slack, HubSpot, or Google Workspace, the MindStudio Agent Skills SDK (@mindstudio-ai/agent) provides pre-built, typed methods that any Claude Code agent can call without building custom integration code.

What’s the biggest mistake developers make with multi-agent setups?

Using them when they’re not needed. Multi-agent patterns add real complexity and cost. The most common mistake is reaching for fan-out or tournament selection for tasks that a well-prompted single agent would handle fine. Start with the simplest approach. Add multi-agent patterns when you hit a concrete limit — context window, accuracy, or coverage — not as a default.

Key Takeaways

Fan-out lets Claude Code work in parallel across multiple independent subtasks, improving both speed and coverage for large-scale analysis.
Adversarial verification uses a separate critic agent to challenge outputs before you trust them — catching errors the generator can’t see in its own work.
Tournament patterns generate multiple independent solutions and select the best, useful when the solution space is open and choosing wrong is costly.
Token costs scale with agent count — use smaller models for mechanical sub-agent tasks, tight context scoping, and explicit stopping conditions to keep costs manageable.
Multi-agent patterns aren’t always better — reserve them for tasks where single-agent approaches hit real limits: context windows, accuracy requirements, or breadth of coverage.
For teams that want to productionize these workflows, tools like MindStudio provide the integration layer that turns prototype multi-agent systems into deployable, connected applications.