Claude Mythos Makes Elite Hacking Cheap: The 'Skill Compression' Risk That's Harder to Stop Than One Super-Hacker

The Cost Per Exploit Is “Not Massive” — and That’s the Whole Problem

Claude Mythos can find and exploit vulnerabilities in every major operating system and web browser, even when used by non-experts. That sentence is from the IMF’s own article, “Financial stability risks mount as artificial intelligence fuels cyber attacks.” It’s not a researcher’s speculation or a red-team thought experiment. It’s the International Monetary Fund putting a specific model name in a financial stability warning.

The framing you’ll see most often focuses on what Mythos found: 271 vulnerabilities in Firefox v150 in a single release cycle, compared to the 22 security-sensitive bugs that Anthropic’s Opus 4.6 found in Firefox v148. That’s a striking number. But the more important story isn’t the vulnerability count. It’s what the economics of finding those vulnerabilities now look like.

Skill compression is the term worth holding onto here. Elite vulnerability research used to require six or seven-figure engineers — people with deep knowledge of memory safety, browser internals, parser behavior, and years of adversarial intuition. That expertise was expensive, scarce, and slow. Mythos doesn’t eliminate that expertise. It compresses it into something that costs, per Anthropic’s own reporting, “not massive” per exploit.

That cost structure changes everything about who can attack you.

What Skill Compression Actually Means

The word “compression” is doing real work here. It’s not that Mythos makes existing security researchers 20% faster. It’s that it collapses the skill floor required to do the work at all.

Everyone else built a construction worker.
We built the contractor.

🦺

CODING AGENT

Types the code you tell it to.
One file at a time.

🧠

CONTRACTOR · REMY

Runs the entire build.
UI, API, database, deploy.

Think about what a serious vulnerability research workflow used to require. You needed someone who understood the target codebase at a structural level. You needed someone who could reason adversarially about what the code allows versus what the author intended. You needed someone who could write test cases, reproduce issues in a sandbox, refine findings, and articulate the attack surface clearly enough that a patch could be written. That’s a rare combination of skills. It took years to develop and commanded compensation to match.

Mythos appears to participate in that entire research loop autonomously. It reads code, forms a hypothesis, uses tools, generates test cases, reproduces the issue, refines the finding, and explains the problem. Google’s Project Naptime and Big Sleep have been moving in the same direction. OpenAI’s Codex Security is explicitly built around a similar loop: understand the codebase, build a threat model, validate issues in a sandbox, propose patches for human review. DARPA’s AI Cyber Challenge tested autonomous systems that find and patch vulnerabilities across large codebases. The shape of what these systems are doing is consistent across organizations.

What’s new with Mythos isn’t the concept. It’s that the capability appears to have crossed a threshold where it works reliably enough to surface 271 real vulnerabilities in one of the most security-hardened open-source codebases in the world.

Firefox is not a soft target. It processes untrusted content from the internet constantly. It has fuzzing infrastructure, sandboxing, memory safety work, internal security teams, bug bounty programs, and decades of accumulated paranoia baked into the engineering culture. And yet the gap between what Opus 4.6 found (22 bugs, 14 high severity) and what Mythos found (271 vulnerabilities) in consecutive Firefox versions is not a marginal improvement. It’s a different category of capability.

Why Non-Expert Access Is the Actual Risk

The IMF quote specifically calls out non-experts. That’s the part that deserves more attention than it’s getting.

When security professionals look at a tool like Mythos, many of them think: “Yes, but we could already do this. We don’t because it’s illegal.” That’s true. But it misses the point entirely. The people who could already do this represent a tiny fraction of the global population — people with the specific education, experience, and cognitive profile to do elite vulnerability research. The barrier wasn’t just legal. It was skill.

The Amazon Kindle analogy is useful here. After ChatGPT launched, ebook submissions to Amazon didn’t triple because existing authors wrote more books. They tripled because people who had never written a book before suddenly could. The same dynamic happened with iOS App Store submissions after agentic coding tools matured — flat for three years, then a near-vertical spike.

If you mapped that same chart to cyberattack attempts, that’s the scenario the IMF, the Bank of England, the ECB, the US Treasury, and the Federal Reserve are all separately flagging. Not one super-hacker. Tens of thousands of people with no prior security expertise gaining the ability to run sophisticated vulnerability research campaigns.

The prompt can be in any language. The attack still executes. The skill barrier that used to filter out most of the world’s would-be attackers is eroding.

The Scale Problem Compounds the Skill Problem

Skill compression is one half of the risk. Scale is the other.

Wondering what the Hermes hype is about? Free 60-minute primer

When you’re running Claude Code or Codex on a development task and it’s taking a while, the natural move is to open another instance in a separate tab and put it to work on something else. Boris Chney, the creator of Claude Code, has mentioned running something like five tabs simultaneously with agents and sub-agents working on different projects. That’s just normal agentic workflow now.

Apply that same pattern to vulnerability research. A human security team working on a codebase is constrained by the number of people on the team and the number of hours in a day. They can’t split their attention across a thousand parallel attack vectors simultaneously. An agentic system running Mythos-level capability can. You can spin up a hundred instances, each attacking a different part of the same codebase, or the same attack vector across a hundred different codebases.

At that point, the binding constraint becomes cost per exploit. And according to Anthropic’s own reporting, that cost is not prohibitive. Mythos is expensive to run as a model. But when you’re measuring dollars per discovered vulnerability rather than dollars per hour of compute, the economics look very different — especially if the vulnerability you find can be monetized through ransomware, data theft, or financial system disruption.

This is why the financial sector is treating this as a systemic risk rather than just a cybersecurity problem. A cyberattack on one bank creates a confidence shock. A coordinated attack on payments infrastructure creates a liquidity shock. Attacks on multiple institutions simultaneously can create a market shock. The IMF’s concern isn’t that Mythos will be used once. It’s that the cost structure makes large-scale, parallel attacks economically viable in a way they weren’t before.

Jamie Dimon put it plainly in his shareholder letter: “cyber security remains one of the biggest risks and AI almost surely will make this risk worse.” The CEOs of Goldman Sachs, Bank of America, Citigroup, Morgan Stanley, and Wells Fargo all attended the red alert meetings where these capabilities were demonstrated. These aren’t people who can be easily summoned to the same room. The fact that they all came says something about how seriously the financial sector is taking this.

What’s Buried in the Skill Compression Story

The non-obvious piece here is what skill compression does to the defensive side of the equation.

If Mythos can find 271 vulnerabilities in Firefox in one release cycle, the same capability that makes it dangerous as an offensive tool makes it extraordinarily valuable as a defensive one. Mozilla’s blog post is literally titled “The Zero Days Are Numbered” — the implication being that AI-scale vulnerability discovery, applied defensively, could make zero-day exploits increasingly rare.

NateBJones frames this as a trust model inversion: human-written code is losing its presumption of safety, while AI-reviewed code is gaining it. The reasoning is that human code review has always been limited by human attention and human adversarial imagination. Mythos-level systems can search the consequence space of code exhaustively, at machine scale, in ways humans simply can’t.

This creates an interesting asymmetry. Organizations with access to Mythos (or equivalent capability) can harden their codebases against the same class of attacks that Mythos enables. Organizations without that access are increasingly exposed — not just to Mythos itself, but to the long tail of less-capable attackers who will gain access to similar tools as the capability diffuses.

The cybersecurity capability gap between Claude Mythos and Claude Opus 4.6 is already documented: 83.1% vs 66.6% on cybersecurity benchmarks. That gap matters both offensively and defensively. Understanding what Mythos actually is and what it can do is prerequisite knowledge for any security team trying to reason about their exposure.

For teams building agentic pipelines that incorporate security review steps, platforms like MindStudio offer a way to chain models, tools, and workflows visually — which becomes relevant when you’re trying to integrate automated vulnerability scanning into a build process without writing all the orchestration code yourself.

The Spec Layer Is Where Defense Happens

There’s a deeper point buried in the Mozilla story that’s easy to miss if you focus only on the vulnerability count.

NateBJones makes the argument that comprehensibility is about to become a security property. Code that is structurally legible — narrow modules, explicit boundaries, small interfaces, clear specifications — is code that adversarial AI systems can reason over cleanly. Messy code isn’t just a maintenance problem. It’s structurally resistant to the AI tools that could make it safer.

The practical implication: the way you write code now affects whether Mythos-equivalent systems can defend it later. Technical debt becomes security debt in a much more direct way when the defensive tool is an AI system that needs to reason over your codebase adversarially.

This is also where the abstraction level of software development matters. The trend across the history of programming has been to move human execution upward — from assembly to C to managed runtimes to cloud platforms. Each transition moved a class of error-prone human work into automated systems. Security is pushing that transition again. This is precisely the kind of abstraction shift that tools like Remy are built for: instead of hand-authoring implementation, you write a spec — annotated markdown that carries intent — and the full-stack application is compiled from it. The spec becomes the source of truth; the generated code is derived output. When the meaning layer is explicit and the implementation is derived, adversarial review of the implementation becomes much more tractable.

The Claude Mythos benchmarks at 93.9% on SWE-bench give some sense of how capable the model is at reasoning over code. That same capability that makes it effective at software engineering tasks is what makes it effective at adversarial code interpretation.

What to Watch For

The skill compression risk isn’t a future scenario. It’s a present one with a diffusion timeline.

Mythos is currently being released selectively — to organizations that control some of the most powerful systems on the internet, specifically so those systems can be hardened before broader access creates offensive risk. That’s a reasonable approach, but it’s a temporary window. The capability will diffuse. Similar capability from other labs — OpenAI’s GPT-5.5 security variant is already mentioned in the IMF article alongside Mythos — will become more widely available. Open-source models are likely to reach comparable capability within months.

The organizations that use this window to harden their codebases will be in a materially different position than those that don’t. The ones that treat this as a future problem will find themselves on the wrong side of the asymmetry when the capability becomes widely accessible.

A few specific things worth tracking:

Cost per exploit metrics. Anthropic’s reporting describes the cost as “not massive.” As compute costs continue to fall and model efficiency improves, that cost will decrease further. The economic threshold for large-scale automated attack campaigns will keep dropping.

Agentic attack patterns. The risk isn’t a single Mythos instance running a single attack. It’s parallel instances running simultaneous campaigns across multiple targets. The compute constraints that currently limit Mythos availability are real, but they’re temporary. Planning for the world where those constraints are lifted is the right frame.

Defensive tooling adoption. The same capability that makes Mythos dangerous makes it valuable for defense. Organizations that integrate adversarial AI review into their build pipelines — the way Mozilla did — will find vulnerabilities before attackers do. Those that don’t will find out about their vulnerabilities the other way.

The IMF doesn’t flag things as systemic risks lightly. When the same warning comes independently from the Bank of England, the ECB, the US Treasury, and the Federal Reserve, and when every major Wall Street CEO attends a red alert meeting about it, the signal is worth taking seriously.

The danger isn’t one super-hacker. It never was. It’s the same dynamic that tripled Amazon’s ebook catalog and spiked the iOS App Store — a capability that was previously gated behind years of expertise becoming accessible to anyone with a prompt. Applied to vulnerability research, at machine scale, with a cost per exploit that’s “not massive.”

That’s the thing to build your threat model around.

Claude Mythos Makes Elite Hacking Cheap: The 'Skill Compression' Risk That's Harder to Stop Than One Super-Hacker

The Cost Per Exploit Is “Not Massive” — and That’s the Whole Problem

What Skill Compression Actually Means

Everyone else built a construction worker.
We built the contractor.

Why Non-Expert Access Is the Actual Risk

The Scale Problem Compounds the Skill Problem

What’s Buried in the Skill Compression Story

The Spec Layer Is Where Defense Happens

What to Watch For

Related Articles

Claude Fable 5 Safety Restrictions Explained: What Gets Blocked and Why

What Is Claude Mythos? Anthropic's Next Model Class Above Opus

AI Auditing With vs. Without NLAs: Catching Misaligned Claude Haiku 3.5 in 12–15% of Cases

Anthropic's NLA Research: 5 Times Claude Was Caught Hiding What It Was Really Thinking

The Cost Per Exploit Is “Not Massive” — and That’s the Whole Problem

What Skill Compression Actually Means

Everyone else built a construction worker.We built the contractor.

Why Non-Expert Access Is the Actual Risk

The Scale Problem Compounds the Skill Problem

What’s Buried in the Skill Compression Story

The Spec Layer Is Where Defense Happens

What to Watch For

Related Articles

Claude Fable 5 Safety Restrictions Explained: What Gets Blocked and Why

What Is Claude Mythos? Anthropic's Next Model Class Above Opus

AI Auditing With vs. Without NLAs: Catching Misaligned Claude Haiku 3.5 in 12–15% of Cases

Anthropic's NLA Research: 5 Times Claude Was Caught Hiding What It Was Really Thinking

Everyone else built a construction worker.
We built the contractor.