Claude Code Sub-Agents Explained: Context, Cost, and Parallel Execution
Sub-agents in Claude Code let you delegate tasks to fresh sessions, use cheaper models, and run work in parallel. Here's how to build and use them.
What Sub-Agents Actually Do in Claude Code
Claude Code’s sub-agent system is one of the more practical features in agentic AI development — and one of the least understood. The concept is simple: instead of doing all work in a single, growing context window, you delegate tasks to separate agent sessions. Each one starts fresh, costs less if you pick the right model, and can run at the same time as other sub-agents.
If you’re building multi-agent workflows with Claude, understanding sub-agents — how they handle context, how they affect your bill, and how parallel execution actually works — will save you real money and make your systems noticeably faster.
This article breaks down all three dimensions: context isolation, cost management, and parallel execution. By the end, you’ll know exactly when and how to use sub-agents, and what mistakes to avoid.
The Context Problem That Sub-Agents Solve
Every Claude session runs inside a context window. As you add messages, tool results, and code outputs, that window fills up. When it gets large, two things happen: responses slow down, and costs go up — because you’re sending more tokens with every request.
For simple tasks, this isn’t an issue. For long-running agentic workflows — debugging a large codebase, running a research pipeline, coordinating multi-step builds — the context can balloon fast.
What “Fresh Context” Means in Practice
Everyone else built a construction worker.
We built the contractor.
One file at a time.
UI, API, database, deploy.
When Claude Code spawns a sub-agent, it creates a new session with its own context. That sub-agent starts with only what you explicitly pass to it: a task description, relevant files, or a few sentences of background.
This matters for two reasons:
- Accuracy: Large contexts increase the chance that the model gets confused by competing information. A sub-agent working on one focused task with minimal context tends to perform better on that task.
- Cost: Fewer input tokens per request means lower API costs per sub-agent call.
The tradeoff is that the sub-agent doesn’t know what the parent agent knows unless you tell it. You have to be deliberate about what context you pass down — and what you leave out.
How Context Flows Between Parent and Sub-Agent
Claude Code uses a handoff model. The orchestrating agent (parent) decides:
- What task to delegate
- What context the sub-agent needs
- What format the sub-agent should return its result in
The sub-agent completes its work and returns a result. The parent takes that result and incorporates it into its own context. The sub-agent’s session ends — it doesn’t persist.
This is intentional. Sub-agents are stateless by design. They do a job, return output, and close. This keeps things predictable and prevents context from accumulating in unexpected ways.
How Claude Code Implements Sub-Agents
Claude Code exposes sub-agents through its tool system. When Claude needs to delegate work, it calls a tool — typically Task — with a description of what it wants done. Claude Code handles the mechanics of spinning up a new agent session, running the task, and returning the result.
The Task Tool
The Task tool is the primary mechanism for spawning sub-agents. It takes a prompt and optionally some context, kicks off a new Claude session, and returns the output when that session completes.
Here’s what a basic sub-agent delegation looks like from the orchestrator’s perspective:
Use the Task tool to:
- Analyze the test failures in /src/tests/
- Return a list of failing tests with error messages
- Do not fix anything, just report
The sub-agent receives this as its entire job. It doesn’t know what the parent is building, what the broader goal is, or what happened in previous turns. It just runs the analysis and returns structured output.
Controlling What Gets Passed Down
Good sub-agent design is mostly about information hygiene. You want to pass exactly what’s needed — no more. Common things to include:
- The specific task (clear, unambiguous)
- Relevant file paths or code snippets
- Output format requirements (JSON, plain text, specific fields)
- Constraints (don’t write files, only analyze, etc.)
What you typically leave out: conversation history, prior tool results, context from other sub-agents, and anything the sub-agent doesn’t need to complete its job.
Cost Management: Using Cheaper Models for Sub-Tasks
One of the most underused aspects of Claude Code sub-agents is model selection. Not every task needs the most capable (and expensive) model. Sub-agents give you a natural place to route simpler work to cheaper models.
The Model Routing Principle
Think of your agent system as having tiers:
- Orchestrator: Handles reasoning, planning, and decision-making. This is where you want Claude Opus or Claude Sonnet — the models with strong reasoning.
- Sub-agents: Handle execution, data extraction, formatting, and other mechanical tasks. Claude Haiku or smaller models often work fine here.
Other agents start typing. Remy starts asking.
Scoping, trade-offs, edge cases — the real work. Before a line of code.
If your orchestrator costs $15 per million output tokens and your sub-agents cost $0.25 per million output tokens, the math adds up quickly in your favor.
When to Use a Lighter Model
Lighter models make sense for sub-agents that:
- Extract specific fields from structured data
- Reformat or transform text
- Check for the presence or absence of something
- Run repetitive pattern-matching tasks
- Generate boilerplate code from a template
They’re less suited for tasks requiring nuanced judgment, complex reasoning chains, or handling ambiguous inputs where a wrong interpretation could cascade into larger errors.
Specifying Models in Sub-Agents
In Claude Code, you can configure which model a sub-agent uses. When spawning a sub-agent via the Task tool, you can pass model preferences as part of the configuration. This lets the orchestrator dynamically route tasks to the appropriate model based on complexity.
A practical pattern: have your orchestrator classify the incoming task by complexity, then route to Haiku for simple tasks and Sonnet or Opus for complex ones. This adds a small overhead to the classification step but typically pays for itself on any workflow that runs frequently.
Parallel Execution: Running Sub-Agents at the Same Time
This is where sub-agents really earn their keep for performance-sensitive workflows. Claude Code can spawn multiple sub-agents and run them concurrently rather than sequentially.
Sequential vs. Parallel: The Real Difference
Sequential execution looks like this:
- Sub-agent A runs (30 seconds)
- Sub-agent B runs (30 seconds)
- Sub-agent C runs (30 seconds)
- Total: 90 seconds
Parallel execution looks like this:
- Sub-agents A, B, and C all start simultaneously
- All complete within roughly 30–35 seconds
- Total: ~35 seconds
For workflows with independent tasks — tasks that don’t depend on each other’s output — parallel execution can cut wall-clock time by 60–80%.
What Tasks Can Run in Parallel
The key constraint is dependency. Sub-agents that need each other’s output must run sequentially. Sub-agents that can operate independently are candidates for parallelism.
Good candidates for parallel execution:
- Analyzing different files or modules in a codebase
- Running tests across different test suites
- Fetching and processing data from multiple sources
- Generating multiple independent outputs (e.g., drafts for different sections)
- Checking multiple APIs or endpoints
Must run sequentially:
- Tasks where output from one becomes input to another
- Tasks that write to the same file or resource
- Tasks where the order of operations matters
How Claude Code Manages Parallelism
Claude Code uses an async execution model for sub-agents. When the orchestrator calls multiple Task tools without waiting for each to complete before calling the next, they run in parallel. The orchestrator waits until all spawned tasks are complete, then processes their combined results.
In practice, you design for this explicitly. The orchestrator’s prompt should specify when to fan out (spawn multiple sub-agents) and when to fan in (wait for results and synthesize them). Clear output formats from sub-agents make the synthesis step much easier.
Managing Rate Limits in Parallel Workflows
Running many sub-agents in parallel means many API requests firing simultaneously. This can trigger rate limits, especially on the Anthropic API at lower tier limits.
A few things help:
- Stagger launches slightly: Add small delays between sub-agent spawns to spread request load
- Cap concurrency: Don’t spawn 50 sub-agents at once — set a max concurrency that your API tier supports
- Use exponential backoff: Build retry logic into your orchestrator for rate limit errors
Remy is new. The platform isn't.
Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.
Claude Code handles some of this internally, but for high-volume workflows, you’ll want to think about concurrency limits explicitly.
Building a Sub-Agent Workflow: A Practical Example
Here’s a concrete example: a code review agent that analyzes a pull request.
The Task
Given a PR with 12 changed files, you want to:
- Check each file for security issues
- Identify style violations
- Summarize the overall changes
Sequential Approach (the slow way)
A single agent reads all 12 files sequentially, runs all three checks for each file, then writes the summary. Context grows with each file. By file 10, the agent is carrying a lot of accumulated context that may or may not be relevant.
Total time: High. Context size: Very large. Cost: Elevated due to large context on every call.
Sub-Agent Approach
The orchestrator:
- Splits the 12 files into 3 groups of 4
- Spawns 3 sub-agents in parallel — one per group
- Each sub-agent runs security + style checks on its 4 files with minimal context
- Sub-agents return structured results (JSON with file, issue type, description)
- Orchestrator synthesizes results into a final report
Total time: Roughly 1/3 of sequential. Context per sub-agent: Small and focused. Cost: Lower per sub-agent call, and the lighter analysis tasks can use Haiku.
This pattern — fan out, process in parallel, fan in and synthesize — is the core of effective sub-agent design.
Common Mistakes When Using Sub-Agents
Even well-designed systems run into predictable issues. Here are the ones that come up most often.
Over-Passing Context
Passing too much context to a sub-agent defeats the purpose. If you’re sending the entire conversation history “just in case,” you’re recreating the bloated context problem you were trying to solve.
Be ruthless: what does this sub-agent actually need to complete its specific task? Start there.
Unstructured Output from Sub-Agents
If a sub-agent returns a blob of free text, the orchestrator has to parse it — and that parsing can fail or introduce errors. Ask sub-agents to return structured output: JSON objects, numbered lists with specific fields, or clearly delimited sections.
Define the output format explicitly in the sub-agent’s task prompt.
Ignoring Dependencies
Running tasks in parallel when they actually depend on each other causes hard-to-debug race conditions or incorrect results. Map your dependencies clearly before designing parallelism. Draw it out if needed — a simple dependency graph will tell you exactly what can and can’t run simultaneously.
Not Handling Sub-Agent Failures
Sub-agents can fail. Network errors, rate limits, and hallucinated outputs all happen. Your orchestrator needs explicit logic for what to do when a sub-agent returns an error or unexpected result: retry, skip, or escalate to a human.
Where MindStudio Fits for Multi-Agent Workflows
Building sub-agent systems in Claude Code is powerful, but it requires writing and maintaining orchestration logic yourself. You’re responsible for spawning agents, passing context, handling parallelism, managing retries, and synthesizing results.
MindStudio handles that orchestration layer visually. It’s a no-code platform where you can build multi-step AI workflows — including multi-agent pipelines — without writing the plumbing code.
Plans first. Then code.
Remy writes the spec, manages the build, and ships the app.
The platform gives you access to 200+ AI models including the full Claude family. You can assign different models to different steps of a workflow, which maps directly to the “use cheaper models for simpler sub-tasks” pattern described in this article. An Opus-level model handles reasoning; Haiku handles extraction — and you configure this with dropdowns, not code.
For developers who are already building with Claude Code, MindStudio’s Agent Skills Plugin (@mindstudio-ai/agent) is worth looking at. It lets Claude Code call MindStudio’s 120+ typed capabilities — things like agent.sendEmail(), agent.searchGoogle(), or agent.runWorkflow() — as simple method calls. This lets your Claude Code sub-agents tap into external integrations without you building those connections from scratch.
If you’re building a workflow that involves Claude Code orchestrating multiple sub-agents that also need to hit external tools — Slack, HubSpot, Google Workspace — this combination handles the full stack.
You can try MindStudio free at mindstudio.ai.
Frequently Asked Questions
What is a sub-agent in Claude Code?
A sub-agent in Claude Code is a separate agent session spawned by an orchestrating (parent) agent to handle a specific delegated task. It starts with a fresh context window, runs its task independently, and returns results to the parent. Sub-agents are stateless — they don’t persist between calls.
How does context isolation work with Claude Code sub-agents?
Each sub-agent gets only the context explicitly passed to it by the parent agent. It doesn’t inherit the parent’s conversation history or tool results. This keeps context windows small, reduces costs, and often improves accuracy by eliminating irrelevant information.
Can Claude Code sub-agents run in parallel?
Yes. Claude Code supports concurrent sub-agent execution. When the orchestrator spawns multiple Task calls without waiting for sequential completion, they run simultaneously. Tasks that don’t depend on each other’s output are good candidates for parallel execution, which can significantly reduce total workflow time.
How do sub-agents reduce costs in Claude Code?
Sub-agents reduce costs in two ways. First, each sub-agent has a smaller context window than a single monolithic session handling all tasks — fewer input tokens per request. Second, you can assign lighter, cheaper models (like Claude Haiku) to sub-agents handling simple, mechanical tasks, reserving expensive models for high-reasoning orchestrator tasks.
What’s the difference between an orchestrator and a sub-agent?
The orchestrator is the parent agent responsible for planning, decision-making, and coordinating work. Sub-agents are the workers — they execute specific tasks delegated by the orchestrator and return results. Orchestrators typically use more capable models; sub-agents can often use faster, cheaper ones.
How do I prevent sub-agents from conflicting when running in parallel?
Map task dependencies before designing parallel execution. Sub-agents that write to the same resource, depend on each other’s output, or require a specific order of operations must run sequentially. Only tasks that are genuinely independent should run in parallel. Also, cap your concurrency to avoid hitting API rate limits.
Key Takeaways
- Sub-agents in Claude Code create isolated sessions with fresh context, reducing context bloat and improving accuracy on focused tasks.
- Model routing — using cheaper models for sub-agents handling simple work — can dramatically reduce per-workflow costs without sacrificing quality.
- Parallel execution cuts wall-clock time significantly for workflows with independent tasks, but requires careful dependency mapping.
- The fan-out/fan-in pattern (split tasks, run in parallel, synthesize results) is the most effective structure for most sub-agent workflows.
- Structured output from sub-agents makes orchestration far more reliable — always define the expected format explicitly.
- For teams that want multi-agent orchestration without building the plumbing from scratch, MindStudio provides the infrastructure layer — including model routing, integrations, and workflow management — out of the box.