The Subtraction Principle for AI Agents: Why Fewer Tools Means Better Performance

When Vercel Cut 80% of Its Agent’s Tools, Performance Got Better

Most teams building AI agents assume more capability equals more performance. If the agent can search the web, query the CRM, send emails, look up pricing, check inventory, pull competitor data, and generate reports — surely it’ll be smarter and more useful, right?

Vercel’s engineering team tested exactly that assumption. And the result surprised them: their AI sales agent performed significantly better after they deleted the vast majority of its tools. This pattern — call it the subtraction principle — keeps showing up across AI agent deployments, and understanding why it works can save you weeks of debugging and iteration.

This article explains the mechanism behind tool overload in AI workflows, walks through how Vercel applied subtraction to a real production agent, and gives you a practical framework for auditing your own agent’s toolset.

The Tool Overload Problem in AI Agents

When you give an AI agent a set of tools, you’re not just expanding what it can do. You’re also expanding the decision space it has to navigate on every single step of every single task.

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

An agent equipped with 20 tools has to reason about which tool is most appropriate at each point in the workflow. That reasoning consumes context window, introduces ambiguity, and creates more surface area for errors. The model might pick the wrong tool, pick the right tool but in the wrong order, or waste tokens evaluating options that were never relevant to the task.

How Tool Selection Actually Works Inside an Agent

Most AI agents work by presenting the language model with a list of available tools — usually as a block of JSON or natural language descriptions — and asking it to decide which tool to call next. The model reads through all the options, reasons about the current state of the task, and selects one.

This process works well when the tool list is short and the tools are clearly differentiated. It breaks down as the list grows. The model starts confusing similar-sounding tools, hedging between two options that overlap in purpose, or missing the most relevant tool because it’s buried among less relevant ones.

Research on LLM tool use and function calling has consistently shown that agents make more selection errors as the number of available tools increases. This isn’t a model limitation that will simply disappear with the next generation — it’s a structural challenge tied to how language models process lists of options under uncertainty.

The Context Window Cost

Every tool you add to an agent has a cost measured in tokens. Tool descriptions, parameters, usage examples — all of that lives in the context window. In a long-running agent workflow, that overhead compounds. You’re spending token budget on tools that may never get called in a given session, leaving less room for the actual reasoning, memory, and output the agent needs to do good work.

For agents handling complex, multi-step tasks, this is a real bottleneck. A leaner toolset means more headroom for the things that actually matter.

The Vercel Case Study: Fewer Tools, Better Results

Vercel’s internal AI agent was built to help sales teams handle inbound inquiries, qualify leads, and surface relevant product information. In its early versions, the agent was equipped with a broad set of tools — covering everything from CRM lookups to documentation search to competitor comparisons to billing data retrieval.

The logic was reasonable: sales conversations are unpredictable, and a well-equipped agent should be able to handle any direction a conversation might take.

But in practice, the agent struggled. It would retrieve information the user hadn’t asked for. It would call tools in suboptimal sequences. It sometimes appeared confused about which data source to consult when similar tools overlapped in scope. Performance metrics — accuracy, task completion rate, conversation quality scores — were inconsistent.

What They Cut

When the team audited the agent’s actual usage patterns, they found that the overwhelming majority of successful, high-quality interactions relied on a small core set of tools. The rest were rarely used, occasionally misused, and consistently added noise.

They stripped the agent down to its most-used, most-differentiated tools — removing roughly 80% of what had been available. Tools that overlapped in purpose were consolidated. Tools that were rarely invoked were removed entirely and handled through other means (static lookup tables, preprocessing, or human handoff).

What Changed

The results were clear and fast to appear. The agent’s tool selection accuracy improved significantly. Conversations became more coherent because the agent wasn’t hedging between overlapping options. Response quality went up, hallucination rates went down, and the overall system became easier to evaluate and debug.

Perhaps counterintuitively, the narrower agent also felt more capable to users — because it was consistently good at the things it did, rather than inconsistently attempting everything.

Why This Pattern Keeps Appearing

Vercel isn’t an outlier. The same finding shows up repeatedly across teams building production AI agents.

The Paradox of Choice in Agent Design

Barry Schwartz’s research on the paradox of choice argues that more options don’t make human decision-makers better off — they make decisions harder and outcomes worse. Language models face an analogous problem. When given a longer list of plausible options, they perform worse on selection tasks, even if the right answer is present.

This isn’t unique to tool selection. It shows up in classification tasks, routing decisions, and any context where a model must pick from a menu of options. The signal degrades as the list grows.

Specificity Beats Versatility

A general-purpose tool that “retrieves relevant customer information” is harder for an agent to use correctly than a specific tool that “looks up the customer’s current subscription plan.” The more precisely scoped the tool, the more accurately the agent can reason about when and how to use it.

But if you have 25 precisely scoped tools, you’ve created a different problem: too many precise options that might overlap or conflict. The sweet spot is a small set of well-defined, non-overlapping tools that each handle a clearly distinct type of action.

Debugging Becomes Tractable

One underappreciated benefit of tool subtraction is what it does for maintainability. An agent with five tools is dramatically easier to evaluate, debug, and improve than one with 25. When something goes wrong, you have fewer places to look. When you want to improve performance, you can focus on making the remaining tools work better rather than asking whether adding another tool might help.

How to Audit Your Agent’s Toolset

If you’re running an AI agent in production — or building one — here’s a practical approach to evaluating whether tool subtraction would help.

Step 1: Instrument Your Agent’s Tool Calls

Before cutting anything, get data. Log every tool call your agent makes over a meaningful sample of interactions — at minimum a few hundred, ideally a few thousand. Record:

Which tool was called
Whether the call succeeded
Whether the tool’s output was actually used in the agent’s response
User satisfaction or task completion signals downstream

This gives you a usage distribution. In most agents, you’ll find something close to a power law: a small number of tools account for the large majority of successful interactions.

Step 2: Identify the Dead Weight

Look for tools that appear rarely in successful interactions but more often in failed ones. These are likely tools the agent reaches for when it’s confused — they’re symptoms of problems, not solutions.

Catch up on Hermes — free 60-minute live workshop

Also look for tools that overlap significantly in description or output type. When two tools could plausibly serve the same purpose, the agent will sometimes pick the wrong one, or call both when it should call neither.

Step 3: Ask Whether Each Tool Is Truly Necessary

For each tool in your agent, ask three questions:

Is this tool called frequently in successful interactions? If not, is there a good reason it’s kept?
Could this task be handled at a different layer? Some things that seem like agent tools are better handled as preprocessing steps, static lookups, or hardcoded logic.
Does removing this tool require anything else to change? If yes, what’s the cost of that change versus the cost of keeping the tool?

Step 4: Cut Aggressively, Then Test

Remove the tools that fail the above questions. Run evaluation sets against the trimmed agent. In most cases, performance on the tasks the agent was already handling well will stay the same or improve. The question is whether removing a tool breaks anything that users actually needed — and the data you collected in Step 1 will tell you how likely that is.

What to Do With Removed Capabilities

Removing tools from an agent doesn’t mean those capabilities disappear. It means they get handled differently.

Route Edge Cases to Humans

Some rare but important tasks — like handling unusual billing disputes or escalating to a technical expert — don’t belong in the agent’s toolset at all. Build a clean handoff path instead. The agent acknowledges when a task is outside its scope and routes it appropriately.

Handle Predictable Tasks Upstream

If the agent was frequently calling a tool to fetch information that could be retrieved before the conversation starts (like a customer’s account tier or recent activity), move that lookup upstream. Pre-populate the agent’s context with the information it reliably needs, rather than requiring it to fetch the information dynamically.

Consolidate Overlapping Tools

If two tools did similar things, replace them with one well-designed tool that handles both cases clearly. This reduces option space while preserving capability.

Common Mistakes When Trimming Agent Tools

Cutting Based on Frequency Alone

A tool that’s rarely called might still be critical. If it handles a rare-but-important edge case (like a legal compliance check or a security verification), removing it creates a real gap even if the usage data looks sparse. Always pair frequency analysis with impact analysis.

Trimming Without Re-Evaluating Tool Descriptions

Once you’ve cut tools, the descriptions of remaining tools may need to be updated. Tool descriptions often reference other tools, assume context that no longer exists, or were written to differentiate one tool from another that’s now gone. Clean up the descriptions alongside the removals.

Assuming the Problem Is Always Too Many Tools

Sometimes agents fail because their tools are poorly designed, not because there are too many of them. A tool with a vague description, inconsistent parameter handling, or unreliable outputs will hurt performance even if it’s the only tool available. Audit tool quality, not just tool quantity.

How MindStudio Supports Lean Agent Design

Remy doesn't build the plumbing. It inherits it.

Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.

WHAT REMY DOESN'T HAVE TO BUILD

200+

AI MODELS

GPT · Claude · Gemini · Llama

✓

1,000+

INTEGRATIONS

Slack · Stripe · Notion · HubSpot

✓

MANAGED DB

AUTH

PAYMENTS

CRONS

Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.

One reason teams end up with bloated agent toolsets is that the infrastructure makes it easy to add capabilities and hard to see the cost of adding them. Every new integration feels free at the time.

MindStudio’s visual workflow builder is structured in a way that makes tool choices explicit and visible. When you’re building an agent in MindStudio, you’re assembling a workflow where each step — each tool call, each decision branch, each output — is represented as a node you can see and reason about. That visual representation makes bloat obvious in a way that a long JSON config file does not.

MindStudio connects to over 1,000 integrations, which means the available tool surface is enormous. But the builder encourages deliberate construction: you add what you need for the task you’re designing, not everything that might conceivably be useful someday. The result is agents that tend to start leaner and stay leaner.

If you’re rebuilding an existing agent that’s grown too complex, MindStudio’s workflow view also makes it straightforward to see which steps are doing real work and which ones are vestigial. You can cut nodes, test the trimmed workflow, and compare results — the kind of iterative subtraction that Vercel applied manually, but with a visual interface that makes the process faster.

You can start building for free at mindstudio.ai — no API keys required, and the average agent build takes less than an hour.

For teams using external agent frameworks, MindStudio’s Agent Skills Plugin lets you expose a curated set of typed capabilities to agents built in LangChain, CrewAI, or Claude Code — so you can deliberately scope what your agent has access to, rather than giving it the kitchen sink.

Frequently Asked Questions

Does the subtraction principle apply to all AI agents, or just sales agents?

The principle applies broadly, though the degree varies by agent type. Agents with highly variable, open-ended tasks (like general-purpose assistants) may need more tools than agents designed for narrow, well-defined workflows. But even general-purpose agents benefit from careful curation. The key insight is that tool selection is a reasoning task, and every additional option makes that task harder. This applies regardless of the domain.

How do I know when an agent has too many tools?

Common symptoms include: inconsistent task completion, frequent tool selection errors in logs, high variance in output quality across similar inputs, and difficulty diagnosing failures. If your agent works well sometimes and poorly other times in ways that seem random, tool overload is often a contributing factor. The audit process described above — logging tool calls and looking at usage distributions — is the most reliable diagnostic.

What’s the right number of tools for an AI agent?

There’s no universal answer, but most practitioners working on production agents find that 5–10 well-scoped tools outperform 20–30 broadly scoped ones for task-specific agents. If your agent needs to handle genuinely diverse domains, a better architecture is often a router agent that delegates to specialized sub-agents, each with their own small toolset, rather than a single agent with a large combined toolset. You can learn more about multi-agent architectures and how they handle task complexity.

Can I recover capabilities after cutting tools?

✗ VIBE-CODED APP

Tangled. Half-built. Brittle.

✓ AN APP, MANAGED BY REMY

UIReact + Tailwind✓

APIValidated routes✓

DBPostgres + auth✓

DEPLOYProduction-ready✓

Architected. End to end.

Built like a system. Not vibe-coded.

Remy manages the project — every layer architected, not stitched together at the last second.

Yes. Removing a tool from an agent doesn’t permanently eliminate that capability from your system. You can route those tasks to human agents, handle them in preprocessing, or build a separate specialized agent that handles only those cases. The goal is to get the right capability to the right place in your system — not to eliminate functionality altogether.

How does tool subtraction interact with context window size?

As language models get larger context windows, the penalty for tool bloat decreases somewhat — there’s more room for tool descriptions without crowding out reasoning space. But selection confusion doesn’t improve with context size. Even a model with a million-token context window will make worse tool choices when given 50 options versus 5. Context window size and tool selection quality are related but distinct constraints.

Should I apply the subtraction principle during initial agent design, or only after observing problems?

Both. During initial design, the subtraction principle argues for starting minimal — build for the core use case with the smallest plausible toolset, then add capabilities only when specific gaps are identified through real usage. If you’re iterating on an existing agent, the audit process described above gives you the data to make evidence-based cuts rather than guesses.

Key Takeaways

More tools don’t make AI agents more capable — they make tool selection harder, which degrades overall performance.
Vercel’s experience (removing ~80% of tools from a sales agent) reflects a pattern seen broadly across production agent deployments.
Tool overload wastes context window space, introduces selection errors, and makes agents harder to debug.
The right approach: instrument your agent’s tool calls, identify dead weight and overlapping tools, cut aggressively, and test.
Removed capabilities don’t disappear — they move to preprocessing, human handoff, or specialized sub-agents.
A visual workflow builder like MindStudio makes lean agent design easier by keeping every tool choice explicit and visible.

If you’re building or refining an AI agent, the most productive hour you can spend might be deciding what to remove, not what to add. Try MindStudio free and see how a visual approach to agent design helps you keep complexity in check from the start.