What do a deploy pipeline and an expense report have in common?
If you've read the other posts on this blog, you've watched the same example go past again and again — a deploy pipeline. Run the tests, build the artifact, ship it. It's a fine example. It's also quietly training you to misunderstand what this tool is.
So let's be direct. mcp-flowgate is not an agentic coding tool. Coding is one thing you can do with it. It is nowhere near the most important.
Why everyone reads it as a coding tool
The misread is understandable. MCP itself grew up inside coding assistants — the first place anyone wired a model to real tools was the IDE. The loudest agent demos are coding agents. And yes, two of this project's own examples — a TDD enforcer, a deploy pipeline — are code-shaped, because engineers built it and engineers reach for engineering examples.
So the pattern-match arrives on its own: a model, plus tools, plus governance, must be "a coding-agent thing." Natural. And wrong.
The actual primitive
Strip it back to what the tool really does. mcp-flowgate governs a process in which an LLM takes consequential actions. The process has steps. Some steps are legal only at certain points. Some need a human. Every step should leave a record.
Read that sentence again and notice what isn't in it: code. Nothing about it is specific to software. It describes a content review. An expense approval. A customer refund. A patient intake. An engineering change. A procurement sign-off. Coding is one instance of the primitive — a loud one, not a special one.
Look at the examples — most of them aren't code
The repository ships a set of examples, and they give the game away. Four of them have nothing to do with software.
Content publishing. The
content-publish example's README opens, in its
first line, "A non-coding example." An LLM helps draft a piece,
runs it through brand review, routes it to a human for
approval, and publishes it — and cannot skip a step. It's
marketing operations. The README even nominates it as the
example to show non-engineers. The mistakes it catches — skip
the review, approve your own piece, publish after a rejection —
are not coding problems. They're every-review-process problems.
Expense reimbursement. The
expense-approval example is finance. An employee
submits an expense; an LLM assistant classifies it against
policy; a manager approves; finance signs off above a
threshold; very-high-value claims need two finance
approvers; payroll issues the payment idempotently, so a
retried call never double-pays. Manager and finance are
different people with different authority — real role-based
access control, in a finance process, with not one line of
application code in the workflow.
Multi-tenant data. The multi-tenant
example is a SaaS data-governance shape: separate tenants,
separate databases, reads and writes gated by role.
Engineering change. The
governed-change example is hardware-quality
territory. An engineering change runs through a risk review —
modeled on FMECA, the failure-mode analysis discipline used in
automotive, aerospace, and manufacturing — loops on remediation
until the residual risk score is acceptable, and only then goes
to a human for sign-off. That is about as far from "agentic
coding" as a workflow gets.
And then, yes — the TDD enforcer and the deploy pipeline. Coding. Two examples out of the set, built from the identical primitives as the other four. Coding didn't get a special engine. It got the same one everything else got.
Why this suddenly matters — a large part of the answer
There's a pattern in that list worth stopping on. Content, finance, role-gated data, engineering change — those aren't just "more use cases." They're the regulated ones.
And regulators have, over the last stretch, converged on a single demand: if an AI system takes a consequential action, you must be able to show what it did, why, and who was watching — and you must keep that record.
The EU AI Act is the sharpest version. For high-risk AI systems it requires the automatic recording of events — logs — across the system's lifetime; that's Article 12, "Record-keeping." It requires deployers to retain those logs for at least six months. And it requires that high-risk systems be built for, and operated with, effective human oversight by people with the competence and authority to exercise it. Those high-risk obligations become enforceable on 2 August 2026.
The United States has no single federal AI law, but a 2026 wave of state statutes moved in the same direction — Colorado's AI Act, and transparency and AI-governance laws in California and Texas — while rules in cities and states such as New York layer recordkeeping, notice, and human-review duties specifically onto automated decision-making tools.
Two honest caveats, because this is a section about the law. First: nothing here is legal advice, and no tool "makes you compliant" — compliance is an organizational and legal question far larger than any gateway. Second: the EU AI Act's logging duty is specifically for high-risk systems, not every chatbot. But the direction of travel is not ambiguous. Across exactly the domains in that example list, "what did the AI do, and why?" is turning from a question you would like to answer into one you are required to.
That is a large part of why the audit trail stopped being a nice-to-have. Not the whole reason you'd put a gateway in front of an LLM — the rest of this blog is the rest of the reasons — but a large part of it.
Now you can actually answer the question
Here's what changed. Until recently, answering "what did the model do, and why" was genuinely hard. An LLM's actions were scattered — some in chat logs, some in tool-call traces, some merely implied by a prompt — and none of it was a coherent record of decisions. After an incident you could reconstruct a story, with effort. You could not hand someone the record, because there wasn't one.
mcp-flowgate produces that record as a byproduct of how it runs. It isn't a compliance feature bolted on the side. Every meaningful step emits a structured audit event — the workflow starting, each transition, each guard evaluation and its result, each executor attempt, each approval request, each rejection. A correlation ID threads every event from a single decision together. The recorded state path is, literally, the history of what the AI did and in what order.
The "why" is in there too, because the decision rules are
declared, not improvised — a guard that blocked a transition
logged the guard that blocked it. And "who was watching" isn't
a promise: an actor: human step is one the model
physically cannot take, a person resolves it, and both the
request and the resolution land in the trail.
states:
awaiting_approval:
transitions:
approve:
target: approved
actor: human # the LLM cannot take this step
approved:
transitions:
publish:
target: published
guards:
- { kind: evidence, requires: [human_request] } # only after a logged approval
executor: { kind: rest, connection: cms, idempotencyKey: true } That fragment is from the content-publishing example. There is no code in it. It's a marketing approval gate — and it produces, on its own, the logged, human-overseen record that the EU AI Act describes. Point an auditor, a brand director, or a regulator at the audit log, and the answer to "what happened here, and who signed off" is already written.
What it's for — and what it isn't
So, plainly: mcp-flowgate is a governance-and-audit layer for any process in which an LLM takes consequential, multi-step action and someone needs to know what it did and why. That someone might be a brand manager, a finance lead, an internal auditor — or, increasingly, a regulator.
It is not for an LLM that only answers questions and touches nothing. No actions means no governance needed and no gateway needed — the welcome post said as much. The breadth here is real, but it isn't infinite: it's exactly the set of processes where an AI acts, and the acting has consequences.
Coding is in that set. It is just neither the largest part of it, nor — soon — the part with the most at stake.
Picture the other rooms
You've been reading this blog imagining a coding agent, because that's the example that kept showing up. Picture the other rooms now. The draft moving through brand review. The expense climbing two tiers of approval. The engineering change clearing its risk gate before anyone signs.
Same seven tools. Same state machine. Same audit trail. And in those rooms, when someone asks what the AI did and why — and more of them, every quarter, are now required to ask — there is finally a file that answers.
Start with the examples — the non-coding ones first — or the quick start.
References
- EU Artificial Intelligence Act, Article 12 — Record-keeping. artificialintelligenceact.eu/article/12
- European Commission — Regulatory framework on Artificial Intelligence. digital-strategy.ec.europa.eu
- White & Case — "AI Watch: Global regulatory tracker — United States." whitecase.com