Skip to main content
MindStudio
Pricing
Blog About
My Workspace

Anthropic's Harness Detection Bug: 3 Things That Triggered Unexpected Claude Code Charges

A git commit mentioning 'hermes.md' triggered a $200.98 overage on a plan showing 86% unused. Here's exactly what caused it and how Anthropic responded.

MindStudio Team RSS
Anthropic's Harness Detection Bug: 3 Things That Triggered Unexpected Claude Code Charges

A $200.98 Charge on a Plan Showing 86% Unused

A developer on Anthropic’s $200/month Claude Max plan got hit with a $200.98 overage charge last week. His dashboard showed 13% weekly usage, 0% current session, and 86% of his monthly allocation sitting untouched. The trigger, after hours of binary-searching through repos and commits, turned out to be a single string: hermes.md in a recent git commit message.

That’s the story. And it gets worse before it gets better.

Anthropic support acknowledged the bug three times in their correspondence with the user. They called it an “authentication routing issue.” They thanked him for finding it. Then they refused to issue a refund, writing: “we are unable to issue compensation for degraded service or technical errors that result in incorrect billing routing.”

The post landed on Reddit. Then Anu Patel shared it on X, where it collected 1.4 million views. Theo Brown followed with his own thread — 1 million views — that put the sharpest point on it: “if you have a recent commit that mentions OpenClaw in a JSON blob, Claude Code will either refuse your request or bill you extra money.”

Anthropic reversed course after that. Tariq from Anthropic posted publicly that this was “a bug with the third-party harness detection and how we pull git status into the system prompt,” and confirmed that affected users would receive a refund plus one month’s credit. The refund only materialized after 2.4 million combined views forced the issue.


Not a coding agent. A product manager.

Remy doesn't type the next file. Remy runs the project — manages the agents, coordinates the layers, ships the app.

BY MINDSTUDIO

Why This Wasn’t a Random Billing Glitch

Here’s what makes this more than a billing error: Anthropic was actively scanning git commit messages for competitor harness names.

The keywords that triggered the behavior were OpenClaw and Hermes — both third-party agent harnesses that developers use to run Claude outside of Anthropic’s own Claude Code environment. Hermes, specifically, is a Claude Code framework with its own conventions and file structures. If you’ve been comparing frameworks, you might have already read the GStack vs Superpowers vs Hermes comparison — and if you have any of those names in your commit history, you may have been affected without knowing it.

The system was apparently designed to detect when users were running Claude through a third-party harness and either restrict access or route them to a different billing tier. The bug wasn’t that the detection existed — it was that the detection was reading git commit messages and firing on string matches, not actual harness usage.

So a developer who had simply named a file hermes.md — or committed a JSON blob that mentioned OpenClaw in passing — could trigger the same response as someone actually circumventing Anthropic’s billing controls. The detection logic had no way to distinguish between “this person is using a competitor harness” and “this person typed the word Hermes in a commit message six weeks ago.”

That’s a meaningful distinction. And it’s the kind of bug that only surfaces when someone is determined enough to binary-search their entire commit history to find the cause.


How the Bug Actually Worked

Tariq’s public statement is the most technically precise thing Anthropic has said about this. The bug was in “third-party harness detection and how we pull git status into the system prompt.”

That phrase — “pull git status into the system prompt” — is doing a lot of work. Claude Code, when it runs, ingests context about your environment. That includes git status information: recent commits, branch names, file states. This is useful for giving the model situational awareness about your codebase. But it also means that whatever is in your recent git history gets read into the system prompt on every session.

The harness detection logic was apparently scanning that git context for known competitor keywords. When it found them, it triggered a different billing route — one that counted usage differently, or applied overage charges that the user’s subscription plan was supposed to cover.

The user in this case was on Claude Max, the $200/month tier. That plan is designed to give heavy users a predictable monthly cost. But the harness detection was routing his sessions through a different billing path, one that treated his usage as API-style consumption rather than subscription consumption. Hence the $200.98 charge on top of his monthly fee, despite 86% of his plan being unused.

One coffee. One working app.

You bring the idea. Remy manages the project.

WHILE YOU WERE AWAY
Designed the data model
Picked an auth scheme — sessions + RBAC
Wired up Stripe checkout
Deployed to production
Live at yourapp.msagent.ai

If you want to understand how Claude Code manages context and token consumption more broadly, the Claude Code token management guide covers the mechanics in detail — and some of those techniques may help you audit what’s actually being passed into your sessions.


The Investigation That Forced Anthropic’s Hand

The user didn’t just file a support ticket and wait. He ran a binary search across his repos and commits, systematically eliminating variables until he isolated the trigger. That’s the kind of methodical debugging that takes hours and requires you to already suspect that something in your codebase — not your usage patterns — is causing the problem.

Most users wouldn’t get there. If your dashboard shows 86% unused and you get an overage charge, the natural assumption is that the dashboard is wrong, or that some background process ran up usage you didn’t notice. The idea that a string in a commit message is the cause is not where most people start.

He reported it. Anthropic support confirmed the bug three times, named it an “authentication routing issue,” and still declined to refund the charge. That response — acknowledging a technical error on their end while refusing compensation — is what made the Reddit post land so hard.

Theo Brown’s framing was the sharpest: this isn’t just a billing bug, it’s evidence that Anthropic wrote code specifically to detect and penalize users who appear to be using competitor tools. The bug is that the detection was too broad. But the intent behind the detection is its own conversation.


What You Should Check Right Now

If you’re on a Claude Max plan and you’ve worked with any third-party Claude frameworks, there are a few things worth auditing.

First, search your recent git commit history for the strings OpenClaw, Hermes, and any other third-party harness names. The trigger in this case was hermes.md — a filename, not even a direct reference to the harness itself. If you’ve been experimenting with different frameworks, those names could be anywhere: commit messages, JSON config blobs, markdown files, branch names.

Second, check your billing dashboard against your actual usage. The affected user’s dashboard showed 86% unused — which means the dashboard was not reflecting the overage charges in real time. If you’re on a subscription plan and you see an unexpected charge, don’t assume the dashboard is the ground truth.

Third, if you’re using Claude Code through any non-standard path — including routing through Open Router or running it against local models — be aware that Anthropic’s harness detection exists and is apparently still active. The guide to running Claude Code through Open Router covers the technical setup, but the billing implications of third-party routing are now a documented risk.

For developers building agents that orchestrate multiple models across different providers, this kind of vendor-side detection is a real operational concern. Platforms like MindStudio handle this orchestration layer differently — 200+ models, 1,000+ integrations, and a visual builder for chaining agents — so you’re not dependent on a single provider’s billing logic or harness detection to run your workflows.


The Refund That Took 2.4 Million Views

Remy doesn't write the code. It manages the agents who do.

R
Remy
Product Manager Agent
Leading
Design
Engineer
QA
Deploy

Remy runs the project. The specialists do the work. You work with the PM, not the implementers.

Tariq’s public statement confirmed refunds and one month’s credit for affected users. That’s the right outcome. But the sequence matters: support refused compensation, the post went viral, and then Anthropic reversed course.

Theo Brown put it plainly in his reply to the original thread: “There’s a certain class of bugs that suggests the thing you’re trying to do is a bad idea. Worth reflecting on that.” The implication is that the bug reveals intent — you don’t accidentally write detection logic for competitor harness names. Someone made a product decision to scan for those strings and route users differently based on what they found.

That decision, whatever its original rationale, produced a system that charged a user $200.98 for having a filename in their commit history. And the initial support response — acknowledging the error, thanking the user for finding it, and declining to refund — suggests that the policy around this billing behavior was deliberate enough that front-line support felt they were following the rules when they refused.

The reversal is good. The fact that it required 2.4 million views to get there is the part worth sitting with.


What This Means for How You Build

If you’re building on top of Claude Code, a few things have changed in how you should think about your environment.

Your git history is part of your session context. That’s not new — Claude Code has always ingested environmental information — but the harness detection incident makes it concrete. What’s in your commits affects how your sessions are routed and billed. If you’re working across multiple frameworks or comparing tools, that experimentation leaves traces that can have billing consequences.

The Claude Code source code leak post documented how Claude Code constructs its system prompt from environmental signals. The git status pull is part of that construction. Understanding what gets read into context — and what strings might match detection patterns — is now a practical concern, not just an academic one.

For teams managing token costs carefully, the Opus Plan Mode guide covers how to separate planning from execution to reduce consumption. But token management is only useful if your billing is routing correctly in the first place. The hermes.md incident is a reminder that the billing layer has its own logic, and that logic can misfire in ways that aren’t visible on your dashboard.

The deeper issue is one of trust. Anthropic markets Claude Max as a predictable subscription for heavy users. A bug that charges $200.98 on top of a $200/month plan — on a session where 86% of the plan is unused — breaks that predictability in a specific and painful way. The fix is in. But if you’re building production workflows on Claude Code, you now know that git context is scanned, harness detection exists, and the initial support posture when something goes wrong is not automatically “we’ll make this right.”

That’s useful information. Build accordingly.

When the source of truth for your application lives in a spec rather than scattered across API calls and commit messages, some of these environmental side effects become easier to reason about. Tools like Remy take that approach — you write annotated markdown as the spec, and the full-stack app (TypeScript backend, database, auth, deployment) gets compiled from it. The spec is what you own; the generated code is derived output. It’s a different model of where intent lives, and it sidesteps some of the ambient-context problems that bit this user.

The hermes.md incident will probably be a footnote in a few months. But it’s a clean illustration of something worth keeping in mind: the infrastructure you build on has opinions about how you use it, and sometimes those opinions are encoded in billing logic you can’t see until it fires.

Presented by MindStudio

No spam. Unsubscribe anytime.