Claude Code MCP Servers and Token Overhead: What You Need to Know

The Hidden Cost of Connecting Tools to Claude Code

If you’ve been using Claude Code with multiple MCP servers and wondering why your API costs are climbing or why responses feel slower, tool definitions are likely the culprit. Each connected MCP server injects its tool schemas into every single message you send — and that overhead adds up fast.

Claude Code MCP server token overhead is one of the most underappreciated cost drivers in AI development workflows today. A single MCP server with a modest set of tools can consume 2,000–5,000 tokens per turn. Connect three or four servers, and you’re looking at 10,000–18,000 tokens of overhead on every message before Claude processes a single word of your actual prompt.

This post explains how that overhead works, how to measure it, and what you can do to cut it down without sacrificing the integrations you actually need.

How MCP Servers Load Tool Definitions

The Model Context Protocol (MCP) is Anthropic’s open standard for connecting AI models to external tools and data sources. When Claude Code starts a session with MCP servers configured, it queries each server for its list of available tools and loads those definitions into the system prompt or context window.

Each tool definition includes:

A name (usually something like read_file, search_web, query_database)
A description (sometimes a full paragraph explaining what the tool does and when to use it)
An input schema (a JSON Schema object describing each parameter, its type, whether it’s required, and what it does)

The input schema is where things get expensive. A well-documented tool might have five or six parameters, each with its own description string. Multiply that across a dozen tools on one server, and you can easily hit 3,000–4,000 tokens for that server alone.

Why This Happens on Every Turn

This isn’t a one-time cost. Claude doesn’t cache tool definitions between messages in the same way a traditional application might cache configuration. Each API call to Claude includes the full tool list because the model needs to “see” those definitions to reason about whether and how to call them.

In a multi-turn conversation, you pay this overhead on every single turn — not just the first one. A 20-turn debugging session with 15,000 tokens of MCP overhead per turn costs you 300,000 tokens just in tool definitions, regardless of what you’re actually asking Claude to do.

What 18,000 Tokens Actually Means

For context: 18,000 tokens is roughly 13,500 words — about the length of a long research paper. That’s what some teams are loading into every Claude Code message just to have their tools available, even when Claude never ends up using most of them.

At Claude 3.5 Sonnet pricing (around $3 per million input tokens as of mid-2025), 18,000 tokens of overhead per message costs about $0.054 per turn. At 100 turns a day, that’s $5.40/day or ~$162/month in pure overhead — before accounting for your actual prompts and responses.

Auditing Your Current MCP Token Overhead

Before you can reduce overhead, you need to know how much you’re actually paying. Here’s how to measure it.

Step 1: List Your Connected MCP Servers

In Claude Code, your MCP servers are configured in your project settings or .claude/settings.json file. Start by listing every server you have connected:

cat ~/.claude/settings.json | jq '.mcpServers'

or check your project-level config:

cat .claude/settings.json

Write down every server — even ones you rarely use. Forgotten servers still load their tool definitions.

Step 2: Measure Each Server’s Tool Definition Size

For each MCP server, you can inspect the tools it exposes by running the server in isolation and capturing its tools/list response. Most MCP servers support this via their CLI or a simple initialization call.

A quick way to estimate: count the number of tools a server exposes and multiply by an average of 300–800 tokens per tool. Servers with rich documentation or complex schemas will be at the higher end.

Tools that commonly bloat token counts:

Database query tools — parameter descriptions for SQL dialects, table names, filter options
File system tools — path handling, encoding options, permission flags
API integration tools — endpoint-specific parameters for REST APIs with many options
Search tools — query syntax documentation embedded in the schema description

Step 3: Calculate Total Per-Turn Overhead

Add up the estimated token cost for every server’s tool definitions. That’s your baseline overhead per turn. If you’re already using Claude Code’s --verbose flag or logging API calls, you can also check the usage.input_tokens in API responses and compare a session with MCP servers against one without.

Why Token Overhead Compounds in Agentic Workflows

In simple back-and-forth conversations, high token overhead is annoying but manageable. In agentic workflows — where Claude Code runs autonomously through many steps — it becomes a serious cost multiplier.

Multi-Step Tasks Pay the Tax Repeatedly

If you’re using Claude Code to complete a complex task like refactoring a codebase, writing and running tests, or orchestrating a deployment, that might involve 50–100 agentic turns. At 15,000 tokens of MCP overhead per turn, you’re looking at 750,000–1,500,000 tokens of pure overhead for a single task.

That’s not just expensive — it can push long tasks into the territory where you hit context limits, especially if you’re also accumulating conversation history.

Tool Noise Degrades Reasoning Quality

There’s another cost beyond dollars: having too many tools visible can actually hurt Claude’s reasoning. When Claude sees 50+ tools from multiple servers, it has to reason through a larger solution space on every step. This can lead to suboptimal tool choices, increased latency, and occasionally incorrect tool selection.

Research on large language models and tool use consistently shows that models perform better with focused, relevant tool sets rather than large, general-purpose ones. Fewer options means faster, more accurate decisions.

Strategies to Reduce MCP Token Overhead

Once you’ve audited your overhead, there are several concrete ways to reduce it.

1. Disconnect Servers You Don’t Need for the Current Task

The most effective optimization is also the simplest: don’t connect servers you aren’t using for a specific task. Claude Code supports project-level MCP configuration, so you can create different config files for different workflows.

For example:

A settings.json for code review tasks that only connects your GitHub and code analysis servers
A separate config for data work that connects your database and data visualization servers
A minimal config for writing tasks that strips out everything except a search tool

Switching between configs takes seconds and can reduce your per-turn overhead by 60–80%.

2. Use Lightweight Server Implementations

Not all MCP servers are created equal in terms of token efficiency. Some servers are built with extremely verbose tool descriptions — partly for human readability, partly to help models understand edge cases.

If you control the MCP server code, audit your tool descriptions for verbosity. Ask yourself: does Claude actually need this level of detail to use this tool correctly? In many cases, you can trim parameter descriptions significantly without affecting functionality.

Watch out for:

Long prose descriptions when a single sentence would do
Duplicated information in the tool description and parameter descriptions
Examples embedded in descriptions (useful for humans, expensive for every API call)
Deprecated parameters still listed in the schema

3. Split Large Servers Into Focused Ones

If you have a single MCP server that exposes 30 tools across many categories, consider splitting it into smaller, purpose-specific servers. Then you only connect the relevant one for each task type.

This is especially worth doing for “utility belt” MCP servers that bundle many unrelated capabilities together. Splitting a 30-tool general server into three 10-tool focused servers doesn’t reduce the total available functionality — it just means you’re only paying for the tools you need at any given time.

4. Implement Tool Filtering on the Server Side

Some MCP server implementations support dynamic tool filtering — serving different subsets of tools based on context or configuration flags. If your server supports this, configure it to expose only the tools relevant to your current project.

This approach works well when you have a core set of 5–10 tools you use constantly alongside a larger set of specialized tools you use occasionally. Configure the server to expose the core set by default and require explicit opt-in for the specialized ones.

5. Write Tighter Tool Schemas

If you maintain your own MCP servers, invest time in compressing your JSON schemas without losing semantic clarity. Some specific techniques:

Use enum types with short option strings instead of long descriptions of each option
Replace verbose description strings with concise one-liners
Mark parameters as required accurately — don’t include optional parameters that Claude rarely needs
Remove default values from descriptions when they’re obvious from the type

A well-optimized tool schema can be 40–60% smaller than a first-pass version with no loss in usability.

6. Consider Server-Side Caching With Prompt Caching

Anthropic supports prompt caching for Claude models, which can dramatically reduce the effective cost of repeated tool definitions. If you’re making many API calls with the same MCP tool schemas, caching the system prompt (which includes tool definitions) means you only pay full price for the first call — subsequent calls reuse the cached version at reduced cost.

Prompt caching is particularly effective for agentic workflows where the tool definitions are stable across many turns. The cache breakpoints need to be set correctly in your API calls for this to work, so check Anthropic’s documentation on how to structure cached system prompts.

Practical Configurations for Common Claude Code Workflows

Here are some concrete starting points for common use cases.

Minimal Coding Workflow

For straightforward code editing and review:

File system MCP (read/write operations)
Git MCP (diffs, commits, history)
Optional: language-specific linter/formatter

Expected overhead: 3,000–6,000 tokens per turn.

Data and Analytics Workflow

For working with databases and data analysis:

Database MCP (query execution)
File system MCP (CSV/file I/O)
Optional: charting or visualization server

Expected overhead: 4,000–8,000 tokens per turn.

Full-Stack Development Workflow

For building and deploying applications:

File system MCP
Git MCP
Shell execution MCP
Package manager MCP
Optional: cloud provider MCP

Expected overhead: 8,000–14,000 tokens per turn. For this workload, prompt caching becomes especially valuable.

Research and Documentation Workflow

For research-heavy tasks:

Web search MCP
Browser/fetch MCP
File system MCP (notes and output)

Expected overhead: 4,000–7,000 tokens per turn.

Where MindStudio Fits in MCP-Heavy Workflows

One pattern that reduces MCP overhead significantly is consolidating multiple specialized integrations into a single, purpose-built agent — and then exposing that agent as a single MCP tool rather than connecting many individual servers.

This is exactly what MindStudio’s agentic MCP servers enable. Instead of connecting a Slack MCP server, a HubSpot MCP server, a Google Workspace MCP server, and a database MCP server separately — each adding thousands of tokens to every turn — you can build a MindStudio agent that orchestrates all of those integrations internally, then expose it as a single MCP endpoint with one clean tool definition.

From Claude Code’s perspective, this looks like one tool: something like run_business_workflow with a natural language input parameter. That single tool definition might cost 200–400 tokens rather than the 8,000–12,000 tokens you’d pay for four separate MCP servers.

MindStudio connects to 1,000+ business tools natively — HubSpot, Salesforce, Slack, Notion, Airtable, Google Workspace, and more — without requiring API keys or separate accounts. You build the agent visually, define what it does, and then expose it as an MCP endpoint that Claude Code (or any other MCP-compatible AI system) can call.

The practical benefit: Claude Code doesn’t need to know the mechanics of how your CRM integration works or how to format a Slack API call. It just calls your MindStudio agent with a high-level instruction, and the agent handles the implementation details internally. You get cleaner tool interfaces, lower token overhead, and a more focused reasoning surface for Claude.

You can try MindStudio free at mindstudio.ai.

Monitoring Token Usage Over Time

Reducing overhead is a one-time effort. Keeping it low requires some ongoing attention.

Set Up Usage Tracking

Claude’s API returns token counts in every response. If you’re building tooling on top of Claude Code, log usage.input_tokens and usage.output_tokens for every turn. Watching for sudden increases in input token counts is usually the first signal that a new MCP server has been added or an existing one has grown its tool set.

Review MCP Server Updates Before Applying

When MCP servers publish updates, the tool definitions sometimes change — new parameters get added, descriptions get expanded, new tools get included. Before updating a server, check the changelog for any mentions of schema changes. A seemingly minor update can add thousands of tokens to your per-turn overhead.

Audit Quarterly

Workflows change. Tools that were essential six months ago might be unused now. Do a quick audit every quarter: which MCP tools has Claude actually called in recent sessions? Tools that haven’t been called in weeks are candidates for removal from your configuration.

Frequently Asked Questions

How many tokens does a typical MCP server add to each Claude message?

It varies significantly based on how many tools the server exposes and how verbose the tool descriptions are. A minimal server with 5–8 simple tools might add 1,500–3,000 tokens. A large server with 20–30 complex tools can add 8,000–12,000 tokens. Most commonly used MCP servers fall in the 2,000–5,000 token range.

Does Claude Code cache MCP tool definitions between turns?

No — by default, Claude Code includes the full tool definitions in every API call. However, Anthropic’s prompt caching feature can be used to cache system prompt content (including tool definitions) at a reduced cost for repeated calls with the same configuration. This requires explicit implementation at the API level.

Can you have too many MCP servers connected to Claude Code?

Yes, both in terms of cost and quality. Beyond the token overhead, having many servers means Claude is reasoning over a large tool set on every turn. This increases latency and can lead to suboptimal tool selection. Most experienced Claude Code users keep their active server list to 3–5 servers for any given workflow.

Does disconnecting an MCP server affect Claude’s capabilities?

Only for that session’s context. Claude won’t be able to call tools from a disconnected server, but your underlying server and its tools remain intact. You can reconnect at any time. Managing this through project-specific config files makes it easy to switch contexts without permanently removing anything.

What is the maximum number of tools Claude can handle?

Anthropic doesn’t publish a hard limit on the number of tools, but practical performance degrades as tool counts increase. Claude’s reasoning about which tool to use becomes less reliable with very large tool sets. Most practitioners report best results with 10–30 total tools visible at once. Beyond 50 tools, you’ll likely see both quality and speed degradation.

Are some types of MCP tools more expensive than others?

Yes. Tools with complex nested JSON schemas — like those for querying structured databases, managing cloud resources, or interacting with APIs with many optional parameters — tend to be significantly more expensive in terms of token overhead than simple tools like file read/write. If you’re optimizing aggressively, start with your most schema-heavy tools.

Key Takeaways

Every connected MCP server loads its full tool definitions into every Claude Code message — this isn’t a one-time cost.
Overhead of 10,000–18,000 tokens per turn is realistic with multiple servers; this translates to real money at scale.
Audit your current overhead by listing connected servers and estimating token cost per server.
The single most effective fix is disconnecting servers you don’t need for a specific task — use project-level configs to manage this.
For servers you control, compressing tool schemas (shorter descriptions, fewer parameters) can cut overhead 40–60%.
Anthropic’s prompt caching can reduce the effective cost of stable tool definitions in agentic workflows.
Consolidating multiple integrations into a single MindStudio agent and exposing it as one MCP endpoint is a powerful way to reduce overhead while preserving functionality.

Token overhead is a fixable problem. A few hours of configuration work can cut your Claude Code API costs substantially — and often improve response quality at the same time.