How to Use Sub-Agents for Codebase Analysis Without Hitting Context Limits
Delegate codebase research to sub-agents running cheaper models, keep your main agent focused, and get clean summaries back without polluting your context.
Why Single-Agent Approaches Break on Large Codebases
Anyone who’s tried building multi-agent workflows for codebase analysis runs into the same problem: you feed source files into a Claude conversation, ask for an architectural overview, and either hit the context limit immediately or get vague answers because the model is buried in irrelevant code.
Even Claude’s 200K token context window fills faster than expected once you start loading source files, test files, configuration, and documentation together. A modest production Rails app can run 80,000 lines of code. A Node.js microservice architecture across multiple services can exceed 300,000 lines. And those numbers climb fast once you count dependencies and tests.
The cost problem compounds the size problem. Sending thousands of lines of irrelevant code to answer a question about one function is expensive. Every token counts, and most of those tokens don’t contribute to the answer.
The third issue is the one most people underestimate: context noise. When you fill a context window with everything, the model has to reason through irrelevant code to find what matters. Answer quality drops. Responses get imprecise. The reasoning loses coherence.
Sub-agents solve all three problems. Instead of one agent reading everything, a network of focused sub-agents reads specific slices and returns structured summaries. The orchestrator—the main reasoning agent—never sees raw source code. It sees clean summaries, decides what to investigate next, and accumulates only what’s relevant to the goal.
This guide covers how to design that system from the architecture down to the prompt details.
How the Sub-Agent Architecture Works
The structure has two layers with clearly separated responsibilities.
The orchestrator is the reasoning agent. It holds the analysis goal, decides what to investigate, dispatches tasks, and synthesizes what it learns into a final output. It runs on a capable model because its decisions determine the quality of everything else.
Sub-agents are focused workers. Each one gets a specific, bounded task—read this file, find these patterns, map these imports—and returns a compact structured summary. They don’t accumulate state. They don’t reason about the broader system. They run, return a result, and exit.
The critical mechanism is information compression. Sub-agents take raw source code and return a dense, structured representation of what matters. The orchestrator accumulates summaries, not code. Its context stays manageable throughout the analysis because it only grows by the size of summaries, not source files.
This maps to how an experienced engineer actually works with an unfamiliar codebase. They don’t read every file. They look at the directory structure, pick likely entry points, read targeted sections, and build an understanding incrementally. Sub-agents externalize and automate that process.
The Orchestrator’s Responsibilities
The orchestrator has three jobs:
- Maintain goal focus. It knows the specific question being answered—“where is user input validated?” or “how does the payment flow work?”—and uses that to filter what’s worth investigating.
- Dispatch tasks. Based on what it knows and what’s still unclear, it decides which sub-agent task to run next and on which file or directory.
- Synthesize results. It reads incoming summaries, updates its working model of the codebase, and determines when it has enough to answer the goal.
Use Claude 3.5 Sonnet or Claude 3.7 Sonnet for orchestrator work. These models handle multi-step planning reliably and maintain coherent reasoning across many iterations. The orchestrator is not the place to cut costs—weak reasoning here degrades the entire analysis.
What Sub-Agents Handle
Sub-agents handle reading and extraction. Common task types:
- File summarization — Describe what a module does, what it exports, what it depends on
- Pattern search — Find all functions matching a description (“functions that write to the database”)
- Dependency mapping — Trace what a file imports and where those imports come from
- Schema extraction — Parse database model files and describe data structures and relationships
- Endpoint cataloging — List API routes with method, path, and a one-sentence description
- Security pattern detection — Flag input validation, authentication checks, or suspicious patterns
Each task is narrow enough that the sub-agent doesn’t need to understand the larger system. It just needs to read accurately and return useful output.
Choosing Models: Spend Smart, Not Uniformly
This is where the economics of multi-agent workflows become concrete. Not every step needs the same model, and matching model to task makes a significant cost difference.
Sub-agents are doing extraction and summarization—accurate reading, structured output, nothing more. Claude 3 Haiku or Claude 3.5 Haiku handle this well. They’re fast, cheap, and reliable for structured extraction tasks. Using Haiku instead of Sonnet for extraction work typically reduces token costs by 10–20x. If you’re running 20 sub-agent calls to analyze a codebase, the difference is substantial.
The orchestrator requires genuine reasoning: interpreting summaries, connecting architectural patterns, deciding what’s still unknown, and producing coherent output. This is where you use Claude 3.5 Sonnet or Sonnet 3.7. For very complex architectural analysis of large, unfamiliar systems, Claude 3 Opus is worth considering.
The rule: match the model to the cognitive complexity of the task, not the importance of the step. Extraction is not cognitively complex—it just needs to be accurate. Reasoning about what the extracted information means is cognitively complex.
There’s also a latency argument. Haiku responds significantly faster than Sonnet. When sub-agents run in parallel, the total analysis time is bounded by the slowest sub-agent call. Faster models at the worker level compound across batches.
Setting Token Budgets for Sub-Agent Outputs
Even cheap models can return verbose output if you don’t constrain it. A sub-agent returning 2,000 tokens to summarize a 200-line file isn’t saving much context space. The goal is compression: dense, relevant output.
Enforce length limits explicitly in the prompt. “Your response must not exceed 200 tokens. Do not reproduce any source code. Use short phrases rather than full sentences wherever possible.” Structured output formats—JSON with predefined keys—work better than open-ended prose because they constrain verbosity and make orchestrator parsing deterministic.
Designing Sub-Agent Prompts That Return Clean Summaries
The orchestrator’s reasoning quality depends directly on sub-agent summary quality. Most failures trace back to vague prompts, inconsistent output formats, or missing length constraints.
A reliable sub-agent prompt has four components:
A precise task definition. Not “analyze this file” but “identify: (a) the module’s primary responsibility, (b) all exported functions or classes, (c) external dependencies imported, (d) any authentication or input validation logic.”
A fixed output format. JSON is usually best because it’s easy to parse programmatically. Here’s a template that works well in practice:
{
"module_purpose": "one sentence",
"exports": ["function or class names"],
"key_dependencies": ["external libraries or internal modules"],
"security_patterns": ["auth checks, validation, or anomalies—max 3"],
"worth_investigating_further": true
}
An explicit length constraint. Specify a token cap and instruct the model to prefer brevity.
Only the relevant content. Pass the sub-agent the specific file or function it needs to summarize, nothing more. If a file is large, split it and send sections to separate sub-agents.
The worth_investigating_further field is small but useful. It lets the sub-agent signal relevance back to the orchestrator, so the orchestrator doesn’t have to reason carefully about every summary to decide what matters.
Here’s a complete example prompt:
You are a code analysis assistant. Read the following Python file and return ONLY a JSON object with this structure:
{
"module_purpose": "one sentence",
"exports": ["list of function/class names"],
"key_dependencies": ["external libraries or internal modules imported"],
"security_patterns": ["auth checks, validation, anomalies—max 3 items"],
"worth_investigating_further": true or false
}
Do not reproduce any source code. Keep each field brief. Use null or empty list if nothing to report.
Response must not exceed 200 tokens.
FILE:
[file contents here]
This takes a source file, extracts what matters, and returns something the orchestrator can act on without needing to read the source.
Implementing the Orchestrator Loop Step by Step
Here’s how to assemble the system, regardless of the framework or platform.
Step 1: Define a specific analysis goal.
Start with a question, not “analyze everything.” Good goals:
- “Map the authentication flow from login request to session creation.”
- “Identify all external API calls and what services they contact.”
- “Understand how database queries are constructed and whether any are vulnerable to injection.”
A specific goal gives the orchestrator a clear exit condition. Vague goals produce vague analyses.
Step 2: Build a lightweight codebase index.
Before running any agents, generate a file index—a list of paths and rough sizes. You can do this with a simple directory listing tool or script. The orchestrator uses this to plan what to look at without loading any source code upfront.
Don’t feed the entire index to the orchestrator at once if the codebase is large. Feed it the files that match likely-relevant directories, or let it request sections of the index as needed.
Step 3: Create sub-agent task templates.
Build a small library of task types—file summary, pattern search, dependency trace, endpoint catalog—each with a prompt template and a slot for the target content. Consistent templates mean consistent output formats, which makes orchestrator parsing reliable across many iterations.
Step 4: Configure the orchestrator reasoning loop.
The orchestrator prompt should make the loop structure explicit:
Goal: [ANALYSIS GOAL]
What you've learned so far:
[ACCUMULATED SUMMARIES]
Available tasks:
- summarize_file(path)
- search_pattern(directory, description)
- trace_imports(file_path)
What is the next task to run? If you have enough to answer the goal, respond with DONE followed by your analysis.
This format keeps the orchestrator goal-focused and gives it a natural exit condition.
Step 5: Dispatch tasks and collect results.
When the orchestrator returns a task call, run that sub-agent with the specified input. Append the structured result to the “What you’ve learned so far” section and re-run the orchestrator.
For tasks that don’t depend on each other—summarizing five separate files, for instance—run them in parallel and collect all results before the next orchestrator iteration. This is one of the bigger time savings in the pattern.
Step 6: Manage accumulated context actively.
As the analysis runs, the “learned so far” section grows. Prune it periodically: remove summaries from files marked worth_investigating_further: false, and consolidate repeated findings into single entries. Don’t let this section develop its own context problem.
Step 7: Synthesize at completion.
When the orchestrator responds with DONE, it produces the final analysis from accumulated summaries. This is the only step where it writes free-form prose. Because it reasoned from summaries rather than raw code, the output is focused and on-point.
Building This in MindStudio
Implementing the orchestrator/sub-agent pattern in code requires real infrastructure work: routing tasks between agents, managing inter-agent state, handling retries, and keeping model configuration consistent. It’s all doable, but none of it is the interesting part of the problem.
MindStudio’s visual workflow builder handles the infrastructure so you can focus on the logic. You can build the orchestrator loop and sub-agent tasks as connected workflow steps, configure each step to use a different model, and wire structured JSON outputs from sub-agents directly into the orchestrator’s input without writing plumbing code.
The per-step model configuration is especially useful here. You can set your orchestrator step to use Claude 3.5 Sonnet and your file-reading sub-agent steps to use Claude 3 Haiku—MindStudio handles the API calls, rate limiting, and retries for each. With access to 200+ AI models including the full Claude family, you can experiment with different model pairings to find the right cost-quality balance for your specific codebase.
For teams connecting this to real repositories, MindStudio has pre-built integrations with GitHub and other development tools—so sub-agents can pull file contents from a connected repository directly rather than requiring manual file passing. Analysis outputs can go to Notion, Confluence, Slack, or any other connected tool in the same workflow.
Developers who want to call MindStudio-built workflows from external agent environments—including Claude Code, LangChain, or CrewAI pipelines—can use the Agent Skills Plugin, an npm SDK (@mindstudio-ai/agent) that exposes MindStudio workflows as typed method calls. Build and test your sub-agent workflow visually, then invoke it from wherever your analysis pipeline runs.
You can start for free at mindstudio.ai.
Common Mistakes to Avoid
Passing too much content to sub-agents. Sub-agents should receive the minimum content needed for their task. If you’re asking about one function, don’t pass the whole file. Trim inputs to the relevant section.
Not enforcing output format. Without strict format requirements, sub-agents return inconsistent results—sometimes JSON, sometimes prose, sometimes code blocks with no structure. The orchestrator then has to handle each case, introducing fragility. Specify the format and validate output before it reaches the orchestrator.
Too many orchestrator iterations. If the loop runs 30+ cycles on a simple question, the goal is too vague or sub-agents aren’t returning enough signal. Tighten the goal definition or pre-filter the file index before starting.
Skipping parallelism. File summarization tasks are almost always independent. Running them sequentially offers no benefit and wastes time. Use parallel execution for batches of non-dependent calls—it’s one of the most impactful optimizations available.
Using large models for extraction. Claude 3 Opus reading a file to list its imports is expensive and unnecessary. Save capable models for tasks that require deep reasoning.
Letting accumulated summaries grow unbounded. The orchestrator’s context can develop its own size problem if you never prune. Drop irrelevant summaries, consolidate duplicates, keep the “learned so far” section tightly focused on the goal.
Frequently Asked Questions
How many sub-agents can run in parallel?
There’s no architectural limit—it depends on your platform’s concurrency support and the API rate limits of the model. In practice, 5–10 parallel sub-agents is a solid default. For large codebases requiring hundreds of file reads, batch execution in groups and collect results before the next batch to avoid rate limit errors.
Which Claude model is best for sub-agents doing code reading?
Claude 3 Haiku and Claude 3.5 Haiku are the right defaults for extraction and summarization. They handle structured output formats reliably, respond quickly, and cost far less than Sonnet. Reserve Sonnet for sub-agent tasks that require more nuanced judgment—detecting subtle security patterns, inferring intent from ambiguous code, or analyzing complex dependency chains where a cheaper model produces unreliable results. Anthropic’s Claude model documentation has current context window sizes and pricing for each model if you want to optimize further.
What if a single file is too large for one sub-agent?
Chunk it by logical boundary—function, class, or fixed line count—and send each chunk to a separate sub-agent. Add a consolidation step: a roll-up sub-agent receives the per-chunk summaries and produces a single file-level summary. This three-layer pattern (file chunks → chunk summaries → file summary → orchestrator) handles files of any size without hitting context limits at any layer.
How do I prevent sub-agents from reproducing source code in their output?
Make it explicit in the prompt: “Do not reproduce any source code. Describe what the code does, not what it says.” Pair this with a structured output format and a token limit. If a sub-agent does reproduce code, add a post-processing step that validates output length before feeding it back to the orchestrator and truncates or flags responses that exceed the limit.
How is this different from RAG-based code search?
Retrieval-augmented generation retrieves chunks based on semantic similarity to a query—useful for finding relevant snippets, but it can miss structural relationships in code: call chains, module boundaries, import hierarchies. The sub-agent approach is deliberate and navigational: the orchestrator decides what to look at and in what order, guided by goal-directed reasoning rather than similarity matching. The two approaches complement each other. Use RAG to help the orchestrator identify which files are worth reading; use sub-agents to do the careful reads.
Can this pattern handle monorepos with multiple services?
Yes, with an adjustment. For monorepos, build a service-level index first. Let the orchestrator identify which services are relevant to the analysis goal before building per-service file indexes. Sub-agents operate within service boundaries, and the orchestrator synthesizes across them. This keeps the orchestrator’s task space manageable even when the total codebase is enormous.
Key Takeaways
- Context limits, token cost, and context noise all undermine single-agent codebase analysis—sub-agents address all three by keeping raw code out of the orchestrator’s context window.
- The orchestrator reasons and decides; sub-agents read and extract. Use different Claude models for each role: fast and cheap for extraction, capable for reasoning.
- Structured JSON output formats and explicit length limits are essential for keeping sub-agent responses compact and consistently parseable.
- Run independent sub-agent tasks in parallel—it’s the most impactful performance optimization in this pattern.
- Actively prune the orchestrator’s accumulated context as the analysis progresses. Drop irrelevant summaries and consolidate repeated findings.
- For files that exceed sub-agent context limits, chunk by logical boundary and use a consolidation agent to roll up chunk summaries.
If you want to build this kind of multi-agent workflow without assembling the infrastructure from scratch, MindStudio handles the orchestration layer visually and supports per-step model configuration out of the box—so you can focus on the analysis logic rather than the plumbing.