How to Use Sub-Agents for Codebase Analysis Without Hitting Rate Limits

Why Your Main Agent Keeps Running Into Rate Limits

When you’re using Claude Code or a similar AI coding agent to analyze a real codebase — one with hundreds of files, sprawling dependencies, and layers of abstraction — the main agent runs into a wall fast.

The problem isn’t intelligence. It’s bandwidth.

Every file the main agent reads, every pattern it searches, every import chain it traces consumes tokens in its active context window. For a modestly sized project with 200 files, a thorough codebase analysis can burn through 100,000+ tokens before the agent produces a single useful answer. For larger codebases, you’ll hit context limits and rate limits before you get anything back.

Sub-agents fix this by offloading the grunt work — file reading, pattern matching, module summarization — to cheap, fast worker instances that run bounded tasks and return concise results. The main agent stays focused on reasoning and synthesis, not file I/O. That’s the core of a multi-agent codebase analysis setup.

This guide covers how to build that setup in Claude Code, what patterns work best, and how to avoid the mistakes that waste tokens instead of saving them.

What Sub-Agents Are (and What They’re Not)

A sub-agent isn’t a special tool or a separate product. It’s just another agent instance with a bounded task, a smaller scope, and often a cheaper model.

✗ VIBE-CODED APP

Tangled. Half-built. Brittle.

✓ AN APP, MANAGED BY REMY

UIReact + Tailwind✓

APIValidated routes✓

DBPostgres + auth✓

DEPLOYProduction-ready✓

Architected. End to end.

Built like a system. Not vibe-coded.

Remy manages the project — every layer architected, not stitched together at the last second.

In the context of Claude Code, sub-agents are spawned using the Task tool — a built-in capability that lets the main agent delegate work to a fresh context. That sub-agent gets a specific instruction, does its work, and returns a summary. The main agent never has to load all that raw content into its own context.

Think of it like a research team:

The orchestrator (main agent) decides what questions to ask, coordinates the work, and synthesizes answers.
The workers (sub-agents) go dig through specific files, modules, or directories and report back focused findings.

The orchestrator uses a capable model like Claude Sonnet 4 or Opus 4 for its reasoning. The workers can use Claude Haiku 3.5 or another fast, cheap model — because they’re doing mechanical work, not complex reasoning.

This model split is where the rate limit savings actually come from.

How Sub-Agents Reduce Rate Limit Pressure

Rate limits have two dimensions: tokens per minute and requests per minute. Both get exhausted when a single agent is doing all the work.

Here’s a quick comparison of the two approaches:

Single-agent analysis:

Main agent reads 50 relevant files directly
Each file averages ~2,000 tokens
Total: 100,000 tokens loaded into one context
You’ve burned most of a Sonnet session before answering a single question

Sub-agent analysis:

Main agent dispatches 5 sub-agents, each tasked with reading 10 files
Each sub-agent uses ~20,000 tokens in its own context
Each sub-agent returns a ~1,500-token summary
Main agent receives 7,500 tokens total — structured, synthesized, actionable

The arithmetic matters, but so does the model split. Claude Haiku 3.5 has significantly higher rate limits per tier than Sonnet or Opus. When your sub-agents run on Haiku and your orchestrator runs on Sonnet, you’re drawing from two separate rate limit pools. That’s a real advantage when you’re doing bulk file analysis.

Choosing the Right Model for Each Role

Not every task needs the same model. Here’s a practical breakdown:

Orchestrator model (Sonnet 4 or Opus 4):

Breaking down the analysis problem into sub-tasks
Evaluating sub-agent outputs
Synthesizing findings into coherent answers
Writing code or making recommendations based on results

Worker model (Haiku 3.5 or similar):

Reading and summarizing individual files
Searching for function usages or class definitions
Listing directory contents matching criteria
Extracting imports, exports, or schema definitions
Counting occurrences of patterns

The rule is simple: if the task is primarily reading and pattern-matching, it belongs on a cheaper model. If it requires judgment, planning, or synthesis, it belongs on the orchestrator.

Some teams also use embedding-based retrieval (a vector search step) before any model sees the files at all. This pre-filters the file list to only what’s relevant, which further reduces the token load for even the cheapest sub-agent.

Setting Up Sub-Agents in Claude Code

Claude Code handles sub-agents through its built-in Task tool. Here’s how to structure the setup in practice.

Step 1: Write a CLAUDE.md That Defines the Strategy

The CLAUDE.md file in your project root is where you instruct the main agent on how to behave. For codebase analysis with sub-agents, include something like:

## Analysis Strategy

When asked to analyze or understand a large section of the codebase:

1. Do NOT read all files directly. First map the directory structure.
2. Use the Task tool to delegate file reading to sub-agents.
3. Each sub-agent should read no more than 10–15 files and return a structured summary.
4. Use Claude Haiku for sub-agent tasks where possible.
5. Collect summaries before drawing conclusions.

This instruction shapes how the orchestrator approaches any analysis request. Without it, the main agent defaults to reading everything itself.

Step 2: Define Bounded Sub-Agent Tasks

When the main agent dispatches a Task, it should give the sub-agent a precise, bounded scope. Vague tasks produce bloated responses. Specific tasks produce summaries.

Good sub-agent task:

Read the files in /src/auth/ and return:
- A list of all exported functions with one-line descriptions
- Any external dependencies imported
- Any notable TODOs or error handling gaps
Keep your response under 1,500 tokens.

Bad sub-agent task:

Analyze the authentication module.

The difference is clarity about scope and output format. When you give a sub-agent an explicit token budget, it tends to respect it.

Step 3: Configure Allowed Tools for Sub-Agents

By default, sub-agents in Claude Code inherit the main agent’s tool permissions. For codebase analysis, you usually want sub-agents to have access to:

Read — file reading
Glob — file pattern matching
Grep — text search

You typically do not want sub-agents to have write access or the ability to execute code during a pure analysis workflow. Restricting their tool set keeps them focused and prevents unintended side effects.

You can configure this in .claude/settings.json:

{
  "allowedTools": ["Read", "Glob", "Grep", "Task"],
  "subtaskAllowedTools": ["Read", "Glob", "Grep"]
}

Step 4: Set a Token Budget Per Sub-Agent

This is the step most people skip, and it’s where sub-agent setups fall apart. If you don’t constrain the sub-agent’s output length, it will return everything it finds — and you’ve just moved the context problem from the main agent to the orchestrator’s input buffer.

Include a token limit in every sub-agent instruction. 1,000–2,000 tokens is a reasonable target for most file-reading tasks. If a sub-agent needs to return more, that’s a signal the task scope is too large and should be split further.

Core Patterns for Codebase Analysis

Once you have the basic setup running, these four patterns cover most codebase analysis use cases.

Pattern 1: File Discovery Before Analysis

Before reading a single file, dispatch a sub-agent to map the territory.

Sub-agent task: “List all files in /src that match *.service.ts. Return the file paths and file sizes only — no content.”

The main agent gets a clean file manifest. It can then decide which files are worth analyzing and dispatch targeted sub-agents for each, rather than reading everything speculatively.

This one step alone cuts unnecessary token usage significantly, especially in large monorepos.

Pattern 2: Chunked Module Summarization

Split the codebase into logical modules — authentication, database layer, API routes, utilities — and assign one sub-agent per module.

Each sub-agent reads all files in its module and returns:

What the module does
Key classes and functions exported
External dependencies
Any obvious issues or complexity hotspots

The main agent collects these summaries and can answer questions about the full codebase without ever reading a raw file itself.

This pattern works especially well for onboarding questions: “How does the payment flow work?” or “Where does error handling happen?” The sub-agents do the legwork, and the main agent reasons over clean summaries.

Pattern 3: Targeted Symbol Search

Other agents start typing. Remy starts asking.

YOU SAID "Build me a sales CRM."

REMY ASKS

01 DESIGN Should it feel like Linear, or Salesforce?

02 UX How do reps move deals — drag, or dropdown?

03 ARCH Single team, or multi-org with permissions?

Scoping, trade-offs, edge cases — the real work. Before a line of code.

When you need to find all usages of a function, trace a class hierarchy, or locate where a specific API endpoint is defined, dispatch a search-focused sub-agent.

Sub-agent task: “Use Grep to find all files that import or reference UserAuthService. Return each file path, the line number, and two lines of surrounding context. Do not read full files.”

This is much cheaper than having the main agent grep interactively. The sub-agent does one bounded search, returns a compact result, and terminates.

Pattern 4: Dependency Chain Mapping

Understanding how modules depend on each other is critical for impact analysis (“if I change X, what breaks?”). A sub-agent can trace import chains without the main agent touching every file.

Sub-agent task: “Starting from /src/index.ts, extract all first-level and second-level imports. Return a flat list of file paths in dependency order. Do not read file content beyond import statements.”

This gives the main agent a dependency graph it can reason over, without loading hundreds of files into context.

Avoiding Common Mistakes

Most sub-agent setups fail for one of four reasons.

Sub-Agent Context Explosion

A sub-agent tasked with “summarizing the database layer” reads 40 files and fills its own context window. Now you have a rate-limited sub-agent instead of a rate-limited main agent — same problem, different location.

Fix: Cap the number of files per sub-agent. If a module has 40 files, split it across 4 sub-agents of 10 files each.

Too Much Parallelism at Once

It’s tempting to dispatch 20 sub-agents simultaneously to maximize speed. But if all 20 are calling the API at the same time, you’ll hit concurrent request limits or requests-per-minute caps just as fast as the single-agent approach.

Fix: Use sequential batches with 3–5 concurrent sub-agents. Let each batch complete before dispatching the next.

Lossy Summaries

Sub-agents instructed to be brief sometimes discard exactly the details the main agent needs. A summary that says “handles authentication” doesn’t help if the main agent’s follow-up question is “what session expiry logic is used?”

Fix: Include explicit output schemas in sub-agent prompts. Tell the sub-agent what categories to always include, even if the content is brief.

Redundant Work Across Sub-Agents

Two sub-agents assigned overlapping modules will read the same files. This wastes tokens and can produce conflicting summaries.

Fix: Map your file assignments before dispatching. Give each sub-agent an explicit, non-overlapping file list rather than a fuzzy module description.

Applying This in OpenAI Codex CLI

The same principles apply if you’re working in OpenAI’s Codex CLI rather than Claude Code. Codex supports multi-agent patterns through its agent orchestration layer.

The key difference is that Codex’s sub-agent spawning is more explicit — you define the agent graph manually rather than relying on the Task tool. But the logic is identical:

Orchestrator maps the codebase structure
Worker agents handle file reading and pattern matching
Workers return structured summaries
Orchestrator synthesizes

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

If you’re building a custom orchestration layer around either Claude or Codex, libraries like LangGraph give you fine-grained control over how agents pass context to each other, which parallel execution tracks run simultaneously, and how results get merged. This is worth exploring if you need production-grade multi-agent workflows beyond what the default CLI tools support.

How MindStudio Fits Into This Workflow

The sub-agent patterns above are powerful, but they require you to manually wire up orchestration logic — writing CLAUDE.md instructions, managing file lists, handling batching. That works well for developers who are comfortable in the CLI. But it’s brittle when the workflow needs to run reliably at scale or get shared across a team.

This is where MindStudio’s Agent Skills Plugin becomes useful. The plugin (@mindstudio-ai/agent) is an npm SDK that lets any AI agent — including Claude Code agents — call MindStudio’s typed capabilities as simple method calls. Rather than reinventing the infrastructure for routing sub-agent tasks, scheduling batches, or handling retries, you call a method and MindStudio handles the plumbing.

For codebase analysis workflows specifically, this means you can build an orchestrated multi-agent pipeline in MindStudio’s visual builder — define your orchestrator logic, assign sub-agent roles, set token budgets, configure model selection per role — and then expose that pipeline as an endpoint your Claude Code agent calls via agent.runWorkflow().

You get the sub-agent architecture described in this article without managing it manually in every project.

You can build and test this kind of workflow for free at mindstudio.ai. The average build takes under an hour, and you don’t need to write any infrastructure code to get the agent graph running.

For teams that want to go deeper on building multi-agent workflows in MindStudio, the platform supports connecting 200+ AI models, so you can mix Claude Haiku for worker agents and Sonnet for orchestration inside the same workflow — without managing separate API keys for each.

Frequently Asked Questions

What is a sub-agent in Claude Code?

A sub-agent in Claude Code is a separate agent instance spawned by the main agent using the Task tool. The sub-agent gets its own context window, a specific instruction, and a set of allowed tools. It completes its task and returns a result to the main agent. Sub-agents don’t share context with the main agent — each one starts fresh.

How do sub-agents help avoid rate limits?

Rate limits apply per model and per API tier. When sub-agents use a cheaper model (like Haiku) while the main agent uses Sonnet, they draw from different rate limit pools. Additionally, sub-agents return compressed summaries rather than raw file content, which drastically reduces the token load on the main agent’s context. Less token consumption in the orchestrator means fewer rate limit events overall.

Can sub-agents use different models than the main agent?

Yes. In Claude Code, you can specify the model for sub-agent tasks. This is one of the primary advantages of the sub-agent architecture — assigning a cheaper, faster model to mechanical tasks (file reading, search) while keeping the more capable model for reasoning and synthesis.

How many sub-agents can run in parallel?

This depends on your API tier’s concurrent request limit. As a practical rule, 3–5 concurrent sub-agents is a safe starting point for most teams. Running more than that simultaneously risks hitting concurrent request caps, which defeats the purpose. Batching sub-agents into groups of 3–5 and running batches sequentially tends to be more reliable.

What should a sub-agent return?

Sub-agents should return structured, concise summaries — not raw file content. A good sub-agent response includes the key findings (exports, dependencies, patterns found), any notable issues, and an explicit note if something was out of scope. Token-constrained output (1,000–2,000 tokens) forces sub-agents to prioritize and filter, which is what you want.

Does this work with tools other than Claude Code?

Yes. The sub-agent pattern is model-agnostic. OpenAI’s Codex CLI, custom LangGraph agents, CrewAI setups, and any orchestrator-worker architecture can apply the same principles: cheap models for file I/O, capable models for reasoning, structured summaries flowing upward. The specifics of how you spawn sub-agents vary by framework, but the logic is the same.

Key Takeaways

The main agent shouldn’t read raw files directly in large codebase analysis. Offloading file I/O to sub-agents keeps its context clean and its rate limit usage low.
Model selection matters as much as architecture. Using Haiku for worker agents and Sonnet for the orchestrator splits rate limit consumption across two pools.
Bounded tasks produce better results. Sub-agents with explicit file lists, output schemas, and token budgets outperform sub-agents with vague instructions.
Parallelism needs a ceiling. Batching 3–5 concurrent sub-agents is more reliable than flooding the API with 20 simultaneous requests.
The four core patterns — file discovery, chunked summarization, targeted symbol search, and dependency mapping — cover the majority of real codebase analysis use cases.

If you want to run these multi-agent patterns reliably in production without managing the orchestration layer yourself, MindStudio is worth a look. You can wire up a full orchestrator-worker workflow visually, assign different models to different roles, and expose it as an endpoint your agents can call — without writing infrastructure code from scratch.