Claude Code MCP Servers and Token Overhead: What You Need to Know
Each connected MCP server loads tool definitions into every message, costing up to 18,000 tokens per turn. Here's how to audit and reduce that overhead.
The Hidden Cost Lurking in Your Claude Code Setup
If you’ve been using Claude Code with multiple MCP servers connected, you may have noticed your context window filling up faster than expected, responses getting slower, or API costs climbing without an obvious explanation. The culprit is often sitting right in your configuration file, invisible until you start looking at raw token counts.
Every MCP server you connect to Claude Code doesn’t just sit idle. It injects its full tool schema — every tool name, description, and parameter definition — into the context of every single message you send. That overhead stacks up fast, and it applies whether you use those tools or not during a given conversation.
Understanding how Claude Code MCP token overhead works, how to measure it, and how to reduce it is one of the highest-leverage optimizations available to anyone building or working with AI agents today.
How MCP Servers Actually Load Into Claude Code
MCP, short for Model Context Protocol, is an open standard developed by Anthropic that lets AI models connect to external tools and data sources through a standardized interface. It’s designed to make Claude aware of what it can do — search the web, read files, query a database, call an API.
But there’s a mechanical reality to how that awareness works.
When Claude Code initializes a session and you have MCP servers configured, it fetches the tool list from each connected server. That list gets serialized into a structured schema — essentially a JSON blob describing every available tool, its purpose, and the shape of its inputs and outputs. That schema is then prepended to your context.
And it happens on every turn, not just once at the start.
Why Every Message Carries the Full Overhead
You might expect Claude to load tool definitions once and cache them. In practice, the way the API works, tool definitions are sent with each API call. The model doesn’t carry state between turns in the way a human conversation does — each message is a fresh API request that includes the full conversation history plus the full tool schema.
This means if your connected MCP servers have 50 tools between them, each with a moderately verbose description and parameter schema, you’re paying that token cost every single time you send a message.
What a Tool Definition Actually Costs
A single tool definition in an MCP schema typically runs between 100 and 500 tokens, depending on how verbose the descriptions are. A server with 10 well-documented tools could cost 1,500–3,000 tokens per turn. A server with 30 tools, rich descriptions, and complex parameter schemas? That can easily hit 5,000–8,000 tokens on its own.
Connect three or four such servers, and you’re looking at 12,000–20,000 tokens of tool overhead before you’ve typed a single word of your actual query.
Where the 18,000-Token Figure Comes From
The 18,000-token figure you’ll see cited for MCP overhead isn’t a worst-case scenario — it’s a realistic mid-range estimate for a developer who has assembled a practical working setup.
Consider a common Claude Code configuration:
- Filesystem MCP server — tools for reading, writing, listing, and searching files. 8–12 tools, ~1,500 tokens.
- Git MCP server — tools for diff, log, commit, branch operations. 15–20 tools, ~3,000 tokens.
- Database MCP server — tools for querying, schema inspection, and writing. 10–15 tools, ~2,500 tokens.
- Web search MCP server — tools for search, fetch, and content extraction. 5–8 tools, ~1,200 tokens.
- Slack or communication MCP server — tools for reading channels, posting, listing members. 10–15 tools, ~2,000 tokens.
- Custom internal tooling — varies widely, but often adds another 5,000–8,000 tokens.
Add those together and you’re comfortably in the 15,000–20,000 token range. Every message you send in that session carries that payload.
The Compounding Effect on Longer Conversations
Early in a conversation, 18,000 tokens of overhead is painful but manageable — Claude’s context window is large enough to absorb it. The real problem emerges as the conversation grows.
Claude Code’s context window (using Claude Sonnet or Opus models) is 200,000 tokens. But when 18,000 of those are consumed by tool definitions, your effective working space for conversation history, code, file contents, and reasoning is already reduced by nearly 10%. As the session continues and you add more file reads, code outputs, and back-and-forth exchanges, you approach limits sooner than expected.
This leads to truncated conversations, unexpected context loss, and — if you’re on a usage-based billing plan — meaningfully higher costs.
Auditing Your Current MCP Token Overhead
Before you can reduce overhead, you need to know exactly how much you’re carrying. Here’s a practical audit process.
Step 1: List Every Connected Server
Open your Claude Code configuration file. On most systems, this lives at ~/.claude/settings.json or within your project-level .claude/settings.json. Look for the mcpServers key.
Write down every server listed. For each one, note:
- The server name
- Whether you actually use it regularly
- What category of tools it provides
Step 2: Count the Tools Per Server
For each server, start it locally and call its tool list endpoint directly. Most MCP servers expose a tools/list endpoint that returns their full schema. If you’re using the Claude Desktop app alongside Claude Code, you can also see connected servers in the settings panel.
Count the number of tools. Then read through the descriptions — verbose, multi-sentence tool descriptions cost significantly more than concise one-liners.
Step 3: Estimate Token Cost
A rough formula:
tokens ≈ (number_of_tools × 200) + (total_character_count_of_all_descriptions ÷ 4)
This is an approximation, but it gets you close enough to prioritize. If you want an exact count, use a tokenizer library like Anthropic’s tiktoken equivalent and serialize the tool schema to count precisely.
Step 4: Identify Your Biggest Offenders
Sort your servers by estimated token cost. You’ll almost always find that two or three servers account for the majority of overhead. Those are your targets.
Strategies to Reduce MCP Token Overhead
Once you know what’s costing you, you have several options. The right approach depends on your workflow.
Remove Servers You Don’t Use Regularly
This sounds obvious, but many developers accumulate MCP servers over time and never prune them. If you connected a GitHub server six months ago to try it out and haven’t used it since, it’s still loading into every session.
Do a ruthless audit: for any server you haven’t actually invoked a tool from in the past two weeks, remove it from your default configuration. You can always re-add it for specific projects.
Use Project-Level Configurations Instead of Global Ones
Claude Code supports both global and project-level MCP configurations. Rather than loading every server you’ve ever configured for every project, maintain a lean global configuration and extend it per project.
For a frontend project, you probably need filesystem and possibly a package registry tool. You almost certainly don’t need a database introspection server. Keeping that separation dramatically reduces overhead for the majority of your work.
Trim Tool Descriptions at the Source
If you control or can fork an MCP server, look at how verbose its tool descriptions are. Many open-source MCP servers have descriptions written for maximum human readability — long, thorough, full of examples. Those descriptions were written to be helpful, but they cost tokens.
A description like:
“Reads the complete contents of a file at the specified path. Supports relative and absolute paths. Returns the file content as a string. Useful when you need to inspect, analyze, or modify the contents of a file in your project.”
…can often be shortened to:
“Read file contents at path. Returns content as string.”
That change alone on a single tool might save 60–80 tokens. Across 50 tools, that’s 3,000–4,000 tokens per turn.
Implement Tool Filtering or Lazy Loading
Some more advanced MCP server implementations support filtering — only exposing a subset of tools based on configuration flags or environment variables. If you have a server with 30 tools but only regularly use 8 of them, see if you can configure it to expose only those 8.
This requires looking at the specific server’s documentation or source code. Not all servers support it, but it’s worth checking for your heaviest-overhead sources.
Separate Agents for Separate Tool Sets
One of the more effective architectural approaches is to stop trying to give a single Claude Code session access to every possible tool. Instead, break your workflow into focused agents, each with a minimal, relevant tool set.
An agent handling code review doesn’t need web search tools. An agent handling customer data doesn’t need filesystem tools. Matching tool sets to tasks is both a performance optimization and a security improvement.
How MindStudio Handles Tool Overhead Differently
If you’re building AI agents that need to call many different capabilities — web search, email, image generation, data lookups, workflow triggers — the MCP token overhead problem gets worse as you scale. More capabilities mean more tool definitions, more overhead, more cost.
MindStudio takes a different approach with its Agent Skills Plugin. Instead of loading all tool definitions into the context, the SDK exposes capabilities as typed method calls in your agent code. When your agent needs to agent.searchGoogle() or agent.sendEmail(), it calls the method directly rather than advertising the entire capability set to the language model through tool schemas.
This keeps your context window clean. The model focuses on reasoning and decision-making. The infrastructure layer — rate limiting, retries, auth, and execution — is handled outside the prompt entirely.
For teams working with Claude Code and finding that MCP overhead is eating into usable context, using MindStudio’s agent architecture for capability-heavy tasks can be a meaningful optimization. You keep Claude Code for the reasoning and coding tasks it’s best at, while offloading execution to an agent infrastructure that doesn’t inflate your token counts.
You can try MindStudio free at mindstudio.ai — no API keys required, and the Agent Skills Plugin is available on npm as @mindstudio-ai/agent.
Practical Configuration: A Leaner MCP Setup
Here’s what a more intentional Claude Code MCP configuration looks like in practice.
Global Configuration (Minimal)
Your global ~/.claude/settings.json should contain only what you genuinely use in almost every project:
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]
}
}
}
One server. Just filesystem access. That’s your baseline.
Project-Level Extensions
For specific projects, add a .claude/settings.json at the project root that extends the global config:
{
"mcpServers": {
"postgres": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres"],
"env": {
"POSTGRES_CONNECTION_STRING": "postgresql://localhost/mydb"
}
}
}
}
Now that database server only loads when you’re working in that specific project directory. Every other session stays lean.
Monitoring Token Usage Over Time
Build a habit of periodically checking your token consumption. If you’re on the Claude API directly, log the usage field from API responses and track how much goes to input tokens over time. A sudden jump in input tokens without a corresponding increase in conversation length is a signal that your tool overhead has grown.
Frequently Asked Questions
How many tokens do MCP tool definitions actually use per server?
It varies significantly based on the server, but a typical MCP server with 10–15 tools averages 1,500–4,000 tokens per turn. Servers with extensive documentation, complex parameter schemas, or many tools can exceed 5,000–8,000 tokens. To get an exact count for your specific setup, serialize your tool schema and run it through a tokenizer.
Does Claude Code cache MCP tool definitions between messages?
No. Each API call to Claude includes the full tool schema alongside the message and conversation history. There is no cross-turn caching of tool definitions at the API level. Every message you send pays the full tool overhead cost.
Can I use MCP servers without loading all their tools?
Some MCP servers support filtering or partial tool exposure through configuration. Check the documentation for any server you’re using. If the server is open source, you can often fork it and remove tools you don’t need, or add a filtering layer based on environment variables.
Why does having more MCP servers slow down responses?
It’s not just the token cost — it’s also the initialization overhead. When Claude Code starts a session, it connects to and fetches the tool list from every configured server. If any server has slow startup time or network latency, your session initialization slows down. More servers means more initialization round trips.
What’s the difference between global and project-level MCP configuration in Claude Code?
Global configuration (in ~/.claude/settings.json) applies to all Claude Code sessions. Project-level configuration (in .claude/settings.json at the project root) applies only when you’re working within that directory. Using project-level configs lets you keep your global setup lean while adding specialized servers only where needed.
Does reducing MCP token overhead actually save money?
Yes, directly. On usage-based billing for Claude’s API, input tokens cost money. If you’re sending 18,000 tokens of tool overhead on every message in a long session, and you have a team of developers doing this all day, the cost accumulates. Trimming that to 3,000–5,000 tokens through better configuration is a real reduction in API spend.
Key Takeaways
- MCP tool definitions load into the context of every message, not just once per session.
- A typical multi-server setup can add 15,000–20,000 tokens of overhead per turn.
- The overhead reduces your effective working context and increases API costs proportionally.
- Audit your connected servers — most developers have MCP servers loaded that they rarely use.
- Use project-level configurations to only load servers relevant to the current task.
- Trim verbose tool descriptions in servers you control to reduce per-tool token cost.
- Consider agent architectures that handle capability execution outside the prompt context for heavy-use scenarios.
MCP is genuinely useful — connecting Claude Code to external tools and data sources is one of the things that makes it powerful for real development workflows. The goal isn’t to avoid MCP, it’s to use it intentionally. An audited, right-sized configuration gives you the connectivity you need without burning context on tools you’re not using.
If you’re building agents that need broad capabilities at scale, exploring how MindStudio’s infrastructure handles that execution layer separately from the reasoning layer is worth your time. MindStudio’s platform is free to start and designed to work alongside tools like Claude Code rather than replace them.