← Back to blog

The bet behind mcp-flowgate

You wired five MCP tools into your agent and it worked beautifully. So you added more. Twenty. Forty. Somewhere in there it got slower, it started reaching for the wrong tool, and the bill crept up — and you couldn't point at the line that caused it.

That's not a prompt you can fix. It's a structural cost, and it's the reason this project exists.

This is the first post on the mcp-flowgate blog. Before the deep-dives start, it's worth saying plainly what this thing is, what bet it makes, and — just as honestly — what it doesn't do yet.

Two problems hiding in plain sight

Every MCP tool you register does two things you never asked for.

First, its definition — the name, the description, the JSON schema — gets loaded into the model's context on every single call. Five tools is nothing. Fifty is thousands of tokens spent before the model has a single thought. You pay that on every turn, every retry, for the whole conversation.

Second, none of it comes with governance. MCP hands the model a flat list of tools and trusts it to behave. No audit trail. No approval gate. No retry policy. No record of what ran, in what order, and why. A flat list and a prayer.

Here's the part that stings: both problems get worse the more useful your agent becomes. The teams hitting them hardest are the ones doing the most ambitious work — the ones wiring in the issue tracker and the cloud console and the data warehouse and the deploy system, because that's where the real value is.

The bet

mcp-flowgate makes one bet: the model should never see your tool list.

No matter how many capabilities you wire in — five, fifty, five hundred — the model sees exactly seven tools. Three for discovery, four for action:

  • Discovery: gateway.home, gateway.search, gateway.describe
  • Action: workflow.start, workflow.get, workflow.submit, workflow.explain

Your capabilities still exist. They just don't sit in the model's context taking up room. The model calls gateway.search to find what it needs, and every response hands back the legal next moves as links. It follows links instead of scanning a menu.

The second half of the bet: governance should be something you declare, not something you code.

Here's a tool with no governance — one line of real config:

gateway.yaml
proxy:
  expose:
    - name: hello.echo
      executor: { kind: noop }

And here's that same shape turned into an approval gate:

gateway.yaml
proxy:
  expose:
    - name: deploy.prod
      executor: { kind: human, queue: prod-deployments }

That's the whole change. With kind: human, deploy.prod no longer deploys when the model asks for it. It records a human.approval.requested event, returns a "pending" status, and stops. A person resolves the queue. The model has no path around the gate — not because you wrote a defensive check inside the tool, but because you declared a rule the runtime enforces.

How the model finds anything

If the model only ever sees seven tools, a fair question is: how does it actually use the fifty capabilities behind them?

It searches. When the model needs to do something, it calls gateway.search with a plain-language query and gets back the matching capabilities — each as a hit with a title and a link to start it. If it needs the exact argument schema for one of them, gateway.describe returns that, for that one capability, on demand. The model never holds the whole catalogue. It asks a question and gets a small, specific answer.

From there it's links all the way down. The model calls workflow.start, and the response carries the legal next moves. It submits one, and the next response carries the moves legal from the new state. The model holds one current response and follows what it offers. That interaction model is identical whether there are five capabilities behind the gateway or five hundred — which is the whole point.

Start flat, add governance later

You don't have to design a state machine on day one. A flat list of tools — "proxy mode" — is just the trivial case: one state, every tool looping back to it. Internally it even compiles to a workflow, one called proxy_default. A flat tool and a fifty-state governed workflow run on the same engine.

So you start ungoverned. When one tool — the deploy, the refund, the user-delete — needs rules, you wrap that one in a workflow with real states and guards. Nothing else changes. The other tools keep working untouched, and the model's surface stays at seven tools. Governance is something you add where you need it, not a framework you commit to up front.

What's actually built today

This is a pre-1.0 project, so let me be precise about what exists rather than what's on a roadmap.

What works now: the seven-tool surface; capabilities wired to MCP servers, CLI commands, and REST APIs; guards that gate on permission, role, a small expression language, or recorded evidence; JSON Schema validation that runs before any executor; retries with backoff, fallback executors, and idempotency keys; deterministic chaining for steps that don't need a decision; a structured audit event for every meaningful step; and persistent stores — SQLite or Postgres — for state that survives a restart.

What I won't claim: there are no published case studies yet. Throughput under real production load hasn't been measured — only microbenchmarks of the gateway's own overhead, which sits comfortably under a millisecond per call on the reference machine. permission and role guards assume a multi-tenant deployment with real identity wiring; the bundled binary treats every caller as anonymous, which is the right default for one developer on a laptop but is not a security boundary. The docs say all of this plainly. So will this blog.

Who this is for

Honesty cuts both ways, so here's when not to reach for this. If you have a single MCP server and no governance needs — no audit, no approvals, no retries, no multi-step workflows — point your host straight at it. A gateway in the middle would be machinery you don't need.

mcp-flowgate earns its place when you have several capabilities and you care about at least one of: fewer tokens in the model's context, an audit trail, retries and fallbacks, approval gates, schema validation, or ordered multi-step workflows. The more of those you need, the more the gateway is doing for you.

What this blog is for

Nine deep-dives are coming, each about one specific piece of how this works — and why it works that way:

  • What a deploy pipeline and an expense report have in common — why this isn't a coding tool.
  • The hidden cost of 50 MCP tools — the token math, measured.
  • Why your AI agent needs a state machine — the boring pattern that makes agents governable.
  • HATEOAS for AI — why REST's oldest idea is the right pattern for agents.
  • Stop writing approval gates in code — why governance belongs in config, not application logic.
  • Flat tool lists don't scale — what quietly breaks when an agent has 200 tools.
  • Where MCP security actually lives — and, honestly, where it doesn't yet.
  • Your LLM is a generator, not a calculator — using the right tool for each kind of work.
  • Deterministic chaining — letting the runtime handle the steps that aren't decisions.

Every post is grounded in the actual tool — real config, real output on the wire, real numbers. If a claim isn't traceable to something you can run, it doesn't go in.

Start here

The quick start takes about 30 seconds. Write a one-tool config like the hello.echo example above, then build and run the gateway:

terminal
# build it
cargo build --release

# run the gateway over stdio
./target/release/mcp-flowgate serve --config hello.yaml

Point your MCP host at it — Claude Desktop, an IDE, an agent runner — and seven tools appear in the model's list. Then add a second capability to the config, and watch the count stay at seven.

That's the bet. The rest of this blog is the evidence — start with the token math, or jump straight to the quick start.