The Hermes.md Bug That Charged Claude Max Users $200 Extra — and What It Reveals About Subscription Pricing
A string in a git commit triggered API-rate billing on a $200/month Claude Max plan. The bug exposed how fragile AI subscription pricing really is.
A $200 Surprise Charge From a String in a Git Commit
On a $200/month Claude Max plan, a user got hit with an extra $200.98 in API fees — not because they were using a different service, not because they’d exceeded any obvious limit, but because the string hermes.md appeared in a recent git commit message.
That’s the Anthropic Hermes.md billing bug, and it’s one of the more revealing incidents in recent AI pricing history. The user hadn’t been running Hermes, the third-party Claude Code framework. They were just using Claude Code normally. But Claude Code’s system prompt pulls in git status information, and when it found hermes.md in the commit history, it triggered a third-party harness detection routine that kicked the session off the subscription tier and onto API billing — silently, mid-session, with no warning.
The user’s dashboard showed 13% weekly usage and 0% current session usage. 86% of their plan was sitting untouched. The extra charge showed up anyway.
This post is about what actually happened, why it happened, and what it tells you about the structural fragility of flat-fee AI subscriptions in an agentic world.
What the Bug Actually Did
To understand the bug, you need to understand what Claude Code is doing when it starts a session.
Everyone else built a construction worker.
We built the contractor.
One file at a time.
UI, API, database, deploy.
Claude Code isn’t just a chat interface. It’s an agentic coding tool that reads your environment — your file system, your git history, your project structure — and includes that context in the system prompt it sends to the model. This is how it knows what branch you’re on, what files you’ve recently changed, and what your project looks like.
Part of that context-gathering includes detecting whether you’re running inside a third-party harness. Anthropic has been actively trying to route third-party harness usage to the API rather than the subscription tier. The reasoning, stated by Boris Cherny at Anthropic, is that “subscriptions weren’t built for the usage patterns of these third-party tools.” Tools like Hermes and OpenClaw can generate enormous token volumes — far more than the subscription pricing was designed to absorb.
So Claude Code looks for signals that you might be in one of these harnesses. And one of those signals, apparently, was the presence of hermes.md in your git commit history.
The bug: the detection logic was reading git status into the system prompt and pattern-matching on strings, without verifying that the harness was actually active. If you’d ever committed a file called hermes.md, or even written a commit message that mentioned it, the detector fired. You got routed to API billing. The session continued normally from your perspective, while charges accumulated in the background.
Anthropic support initially told the affected user they couldn’t issue compensation for “incorrect billing routing.” That response, shared on Reddit, got 1.4 million views. Theo Brown posted a related finding — that having OpenClaw in a JSON blob in a recent commit would cause Claude Code to either refuse requests or bill extra — which got another million views.
After that visibility, Tariq from Anthropic posted: “Sorry, this was a bug with the third-party harness detection and how we pull git status into the system prompt. We’re reaching out to affected users and giving them a refund plus another month’s worth of credit.”
The refund happened. The bug got fixed. But the incident is worth sitting with, because it’s not really about one bug.
Why This Bug Was Possible in the First Place
The Hermes.md bug didn’t come from nowhere. It came from Anthropic trying to solve a real problem under real pressure.
Anthropic is, by most accounts, straining under the weight of its own success. Ben Thompson at Stratechery noted that Anthropic’s reluctance to make their Mythos model widely available might be less about security concerns and more about compute constraints. Technology writer Tae Kim put it more directly: “Anthropic vastly underestimated compute growth needs, which is expanding much faster than expected.” You can read more about the specifics of that compute crunch in Anthropic’s compute shortage and what it means for Claude limits.
The agentic era changed the math. A year ago, a Claude subscription was priced around the assumption that users would have conversations — back-and-forth exchanges, maybe some document analysis. Claude Code changed that. A single agentic coding session can consume tokens at a rate that would have been unimaginable in the chat era. One power user reported consuming roughly one billion tokens in a single month — equivalent to about 7,500 books worth of words.
SemiAnalysis called Claude Code “the inflection point for AI agents” in a February piece, arguing it was set to drive exceptional revenue growth for Anthropic. That’s true. It’s also true that exceptional revenue growth from a flat-fee subscription, when the underlying cost is token consumption, is a structural problem.
So Anthropic started making moves. They ran what they called a “small test” of removing Claude Code from the Pro subscription. They began routing third-party harness usage to the API. They implemented the harness detection logic that, when it misfired, produced the Hermes.md bug.
The detection logic itself is the tell. The fact that Anthropic was pattern-matching on git commit strings to identify harness usage suggests they were working fast, under pressure, trying to stop the bleeding on subscription economics. A more careful implementation would have verified active harness state rather than inferring it from historical commit data. But careful implementations take time, and Anthropic was dealing with frequent outages caused by excess traffic.
What This Means If You’re Running Claude Code
If you use Claude Code on a Claude Max subscription, here’s what you actually need to know.
Your git history is in your system prompt. Claude Code reads git status and includes it in the context it sends to the model. This is useful — it helps Claude understand your project. But it also means that strings in your commit history can affect how Claude Code routes your session.
Third-party harness detection is active. Anthropic is actively trying to identify whether you’re running inside Hermes, OpenClaw, or other frameworks that aren’t Claude Code or Claude.ai. If you’re building on top of Claude Code using one of these frameworks — or if you’ve ever experimented with them in a repo you’re now using with Claude Code — you could be affected. The comparison of Claude Code frameworks like GStack, Superpowers, and Hermes is worth reading if you’re evaluating which harness to use, because the choice now has billing implications, not just capability ones.
The subscription tier and the API tier have different pricing. When you get routed to the API, you’re paying per token. On Claude Opus 4.7, that’s $5 per million input tokens and $25 per million output tokens. An agentic session that runs for a few hours can consume millions of tokens. The $200.98 charge in the Hermes.md case wasn’t an anomaly — it’s what API pricing looks like for heavy agentic use.
You may not get a warning. The affected user’s dashboard showed normal usage metrics right up until the charge appeared. The routing happened silently. There was no “you’re about to be billed at API rates” notification.
The practical steps, if you’re a Claude Max subscriber using Claude Code for serious work:
-
Check your recent git commit history for any strings that reference third-party harnesses —
hermes,openclaw,hermes.md, or similar. If they’re there, be aware that the detection logic may have seen them. -
Watch your billing dashboard more actively than you might expect to need to. The subscription dashboard showing normal usage doesn’t mean API charges aren’t accumulating separately.
-
If you’re building agentic workflows that chain Claude Code with other tools, understand that Anthropic’s current posture is to route that usage to the API. This isn’t a bug — it’s policy. The bug was the false positive, not the routing itself.
-
If you get an unexpected charge, document everything: your dashboard state, the session details, what you were working on. The Hermes.md user got a refund partly because they did the work of binary-searching their repos to find the trigger and reported it clearly.
Remy doesn't build the plumbing. It inherits it.
Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.
Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.
The Broader Pattern: Subscriptions Weren’t Built for This
The Hermes.md bug is a specific incident, but it’s a symptom of something structural.
GitHub Copilot just announced a shift to consumption-based fees, with CPO Mario Rodriguez explaining that “agentic usage is becoming the default and it brings significantly higher compute and inference demands.” The new multiplier table that came with the announcement is stark: Claude Opus 4.7 went from a 7.5x multiplier to 27x. Gemini 3.1 Pro and GPT-5.3 Codex both went from 1x to 6x. Microsoft had been absorbing roughly a 3.6x subsidy on every Opus token. That’s now over.
Replit made this shift earlier, in the summer and fall of 2025, and took significant criticism for it. Now it looks prescient.
The pattern is consistent: flat-fee subscriptions made sense when AI usage was bounded by human conversation speed. Agentic usage isn’t bounded that way. An agent running autonomously for hours generates token volumes that no flat fee can sustainably absorb. The pricing models are catching up to the usage reality, and the catching-up is messy.
For developers building on top of these platforms, this creates a new kind of risk. You can build a workflow that works perfectly within your subscription, then have it reclassified — correctly or incorrectly — as out-of-scope, and suddenly face API pricing on usage you thought was covered. The Hermes.md bug was an incorrect reclassification. But the correct reclassifications are also happening, and they can be just as surprising if you haven’t been tracking the policy changes.
This is part of why the architecture of your AI workflows matters more than it used to. If you’re building agents that chain multiple models or use multiple tools, platforms like MindStudio offer a different approach: 200+ models, 1,000+ integrations, and a visual builder for composing agents — so the orchestration layer is explicit and the model routing is something you control, rather than something that happens to you based on what strings appear in your git history.
The Troubleshooting Reality
When something like this happens, the support path is frustrating. The initial Anthropic support response to the Hermes.md user was a denial — “we are unable to issue compensation for degraded service or technical errors that result in incorrect billing routing.” That’s a policy response, not a technical one, and it’s the kind of response you get when a support team is working from a script that doesn’t account for the specific failure mode you’ve hit.
The resolution came from public pressure, not from the support process. That’s not a sustainable model for getting billing errors corrected.
If you hit an unexpected charge:
- Don’t accept the first support response as final if you believe it’s a billing error.
- Document the technical details — what you were running, what your dashboard showed, what the charge was.
- The Anthropic Claude Code team (separate from general support) has been more responsive to technical billing issues. Tariq’s response came from that team.
- If you’re part of an organization, note that Anthropic has been known to ban entire organizations without explanation — a 110-person agriculture company recently reported being banned with no stated reason and only a Google Form for appeals. If your organization depends on Claude for daily workflows, that’s a risk to factor into your continuity planning.
Other agents ship a demo. Remy ships an app.
Real backend. Real database. Real auth. Real plumbing. Remy has it all.
Where This Is Heading
The Hermes.md bug will get fixed. The refunds went out. But the underlying tension — between flat-fee subscription economics and agentic token consumption — isn’t going away.
Anthropic is going to keep tightening the boundary between what’s covered by subscriptions and what gets routed to the API. That’s not a criticism; it’s a rational response to a real cost problem. But it means the rules will keep changing, and the changes may not always be communicated clearly before they affect your bill.
If you’re building serious agentic workflows on Claude Code, the Claude Code source code leak post has useful detail on how Claude Code actually works under the hood — which is increasingly relevant when the internals of how it reads your environment can affect your billing tier.
The deeper question the Hermes.md bug raises is about what kind of infrastructure you want to build on. Subscription tiers that can be silently reclassified mid-session, based on pattern-matching against your git history, are a fragile foundation for production workflows. The more agentic your usage, the more that fragility matters.
For teams building spec-driven applications rather than just running agents interactively, tools like Remy take a different approach: you write your application as an annotated markdown spec, and Remy compiles it into a complete TypeScript backend, database, auth, and deployment — the spec is the source of truth, and the generated code is derived output. That’s a different abstraction layer than prompt-and-agent, and it sidesteps some of the billing unpredictability that comes with heavy interactive agentic use.
The Hermes.md bug was a false positive. But it revealed something true: the pricing infrastructure underneath these tools is under real stress, and the stress is going to keep producing surprises until the pricing models actually catch up to how people are using AI.
Understanding token-based pricing — how it works, what drives costs, where the surprises come from — is now a practical skill for anyone building seriously with these tools. The Hermes.md user learned it the hard way. You don’t have to.