← Back to blog

The hidden cost of 50 MCP tools

Your agent has 40 MCP tools wired in. It works. But every call feels a little slower than it should, the model reaches for the wrong tool more often than you'd like, and the bill keeps creeping up. You're paying a tax — and it never shows up as a line item.

The good news: it's not mysterious. It's arithmetic. Once you can see the math, you can decide whether you want to keep paying it.

The tax you can see

When you register an MCP tool, its definition goes into the model's context: the name, the description, and the full JSON schema for its arguments. The model can't call a tool it can't see, so all of it has to be there.

A single tool definition runs roughly 50–150 tokens. That's a range, not a constant — a tool with two string arguments costs less than one with a nested object schema and a paragraph of description. But the range is good enough to reason with.

Ten tools is about 1,000 tokens. Fine. Fifty tools is 5,000 or more — and that's the part worth sitting with. Those tokens are in the model's context on every call. Every turn of the conversation. Every retry after a failure. They are not a one-time setup cost. They are rent.

Put real numbers on it

Take a realistic mid-size agent — not a stress test, just a useful one. It connects to six MCP servers: a source-control server, a cloud console, a database server, a chat integration, a docs server, and your team's internal-tools server. Each one contributes eight to ten tools. Call it 50 tools, at an average of 100 tokens each — 5,000 tokens of definitions.

Now run one real task to completion. A genuine piece of work — investigate something, make a change, verify it — is rarely one turn. Say it takes 20. That 5,000-token catalogue rides along on all 20:

  • 5,000 tokens of tool definitions
  • × 20 turns
  • = 100,000 input tokens spent describing tools

Here's the uncomfortable part. Across that whole task, the model probably called six or seven of those 50 tools. The other forty-odd were described, in full, twenty times over — and never used. The overwhelming majority of that 100,000 tokens bought nothing. It's not a benchmark; it's a shape, and it's the shape of almost every multi-integration agent running today.

The tax you can't see — and it's the bigger one

The definition tokens are the obvious cost. They're not the expensive one.

Before the model can call a tool, it has to choose one. That means weighing the whole list and reasoning about which option fits — and reasoning is output tokens, which cost roughly 3–5× what input tokens cost. The more tools in the list, the more of that weighing happens, on every decision.

Then there's the failure mode nobody budgets for. Show a model 50 tools, several with overlapping descriptions — three different "create issue" variants across three trackers — and it will sometimes pick the wrong one. A wrong pick isn't a small mistake. It's a full wasted round trip: the bad call, the error coming back, the model reading the error, the recovery, the retry. Every one of those steps re-sends the 5,000-token catalogue and burns more reasoning.

You don't need a study to believe this — you can watch it in your own traces. More choices, more wrong choices. And the cost of a mistake is never just the mistake; it's everything you spend cleaning up after it.

Why the two taxes multiply

Here's where it compounds. A bigger tool list makes every call more expensive and makes wrong picks more likely. Wrong picks cause extra calls. Each extra call re-pays the definition tax in full. The two costs don't add — they feed each other.

So the question isn't "can I afford 50 tool definitions in my context?" It's "can I afford 50 tool definitions, times every call, times the extra calls that 50 tools cause?"

The fix: make the list stop growing

mcp-flowgate exposes exactly seven tools to the model — three for discovery, four for action — no matter how many capabilities you wire in behind it. Five capabilities or five hundred, the model's tool list is seven entries.

It works because the model stops scanning and starts searching. When it needs to do something, it doesn't read 50 definitions looking for a match. It calls gateway.search:

wire trace — turn 1
 gateway.search { "query": "publish content" }

 { "items": [
      { "id":    "workflow:content_publish",
        "title": "Governed content publishing workflow",
        "tags":  ["content", "governed", "publishing"] } ] }

One query. One result with a title and tags — not 50 tool definitions. The model didn't carry the catalogue; it asked a question and got an answer back.

Then it starts the work, and the response hands it the next move:

wire trace — turn 2
 workflow.start {
    "definitionId": "content_publish",
    "input": { "topic": "Q2 launch", "audience": "enterprise" } }

 { "workflow": { "id": "wf_8f3a", "state": "idea" },
    "links": [
      { "rel":    "create_outline",
        "method": "workflow.submit",
        "args":   { "transition": "create_outline", ... } } ] }

The response carries a links array — the legal next moves, already shaped. The model doesn't reason about which of 50 tools comes next. It follows the one link the server handed back. (That pattern has a name, and it gets its own post — HATEOAS for AI.)

Schemas on demand, not up front

There's a subtler win hiding in that flow, and it's about the most expensive part of a tool definition: the JSON schema. Descriptions are short. A schema for a tool with a nested argument object — enums, required fields, descriptions per property — is often the bulk of those 100 tokens.

In a flat tool list, you pay every schema, for every tool, on every call. With the seven-tool surface, the model pulls a schema with gateway.describe only when it's about to use that capability — one schema, for one tool, at the moment it's needed. The 49 schemas it isn't using right now cost nothing. You've moved the schema from a standing cost in the model's context to an on-demand lookup.

Before and after

Before: 50 tool definitions — call it 5,000+ tokens — sitting in context on every call, plus the reasoning to choose among 50 options, plus the wrong picks that 50 lookalike options produce.

After: seven tool definitions, a fixed and small cost. One search result when the model needs one. One schema fetched on demand. One prefilled link to follow. Wire in your 200th integration and it costs the first 199 calls nothing.

The real shift is that your capability count stops being a context-budget decision. You add the tool because the work needs it — not after weighing it against your token bill.

Where this honestly doesn't pay off

A search call plus a start call is two round trips before the real action. If your agent has five tools and always knows exactly which one it wants, that indirection is overhead, not savings. The seven-tool surface earns its keep when you have many capabilities, or when you want governance around them — not when you have three tools and no rules to enforce. The docs say it plainly: if one MCP server with no governance needs is all you have, point your host straight at it.

One more honest caveat: gateway.search is lexical, not semantic. It scores against titles, descriptions, and tags. Vague metadata produces vague search results. The seven-tool surface moves the cost of discoverability from the model's context to your config — which is a much better place for it, but it isn't free.

Look at your own number

You don't have to take any of this on faith. Open a recent agent trace. Count the tool definitions in the system prompt. Multiply by 100 tokens. Multiply by the number of turns. Then count how many of those tools the model actually called. The gap between those two numbers is what you're spending to describe tools that never got used.

The tax is real whether or not you've measured it. The only question is whether you've looked.

The quick start wires in one tool in about 30 seconds. For what happens to a flat tool list as agents scale past 200 capabilities, read Flat tool lists don't scale.