Anthropic's Economic Index Shows 49% of Jobs Already Have 25%+ of Tasks Done by Claude — Is Yours One of Them?

Anthropic’s Economic Index Found That Half Your Job May Already Be Claude’s Job

The Anthropic Economic Index published a finding that should stop most knowledge workers mid-sentence: approximately 49% of jobs have already had at least 25% of their tasks performed using Claude. Not “could be affected someday.” Already performed. Past tense.

That number comes from Anthropic’s own usage data — real conversations, real tasks, real work that people brought to Claude and got back something useful enough to use. It’s not a projection from a research model. It’s a count of what actually happened.

If you work in a role that involves writing, synthesizing information, drafting recommendations, or coordinating decisions — which describes most knowledge work — you should assume you’re in that 49% until you have specific evidence otherwise.

The Data Behind the Number

The Anthropic Economic Index isn’t the only signal here. OpenAI and University of Pennsylvania researchers estimated that roughly 80% of US workers could have at least 10% of their tasks affected by large language models, and about one in five workers could see 50% or more of their tasks affected. Those are projections. The Anthropic number is observed behavior.

Microsoft’s research team looked at 200,000 Bing Copilot conversations to understand what work people actually bring to AI. The answer wasn’t exotic. The most common tasks were gathering information and writing. The most common outputs from the AI side were writing, teaching, providing information, and advising. This is not niche automation. This is the core of what most office jobs produce on a Tuesday afternoon.

The pattern across all three data sources points the same direction: AI isn’t coming for whole job titles. It’s absorbing specific task types that happen to be distributed across nearly every knowledge work role. The unit of disruption is the task, not the job description.

Why Your Performance Review Won’t Tell You This

Here’s the uncomfortable part. Most performance systems are built to measure visible output — did the document get written, did the update go out, did the meeting happen. They are not built to measure whether the output required you specifically.

This creates a dangerous lag. Your work can look completely fine from the outside. The calendar is full. The manager is happy. The deliverables are shipping. But the question the performance system isn’t asking is: how much of that work would have degraded if you hadn’t been there?

The travel agent analogy is instructive here. Expedia didn’t erase travel agents overnight. Online booking changed the economics of the routine work first, and for a while nothing visibly changed. The profession still existed. People still had jobs. The visible break came later, when economic downturns forced the industry to confront what had already quietly shifted. The agents who survived weren’t the ones who defended routine booking as a professional identity — they moved toward complex itineraries, corporate travel, emergency handling, the work that the booking interface couldn’t do.

Most knowledge work is sitting in that pre-break window right now. The tasks are still being performed. The roles still exist. But the fraction of the work that requires a specific human is shrinking, and the performance review system is measuring the old thing.

What the Index Is Actually Measuring

The 49% figure measures task-level displacement, not job-level replacement. This distinction matters more than it might seem.

Your job is not one thing. It’s 50 or 60 or 300 small things bundled into a title. Some of those things are genuinely hard to replicate — they depend on context you carry, judgment you’ve built from years of being wrong, the ability to read what’s actually happening in a room versus what’s being said. Some of those things are real work that produces real value but doesn’t require you specifically. And some of them, if you’re honest, are theater: work that exists because the organization performs it, not because it produces examined value.

The Anthropic data is capturing the middle category most aggressively. Summarizing, routing, applying known rules to known situations, writing the first draft of a document whose shape is already well-understood — this is the commodity layer of knowledge work, and it’s exactly what Claude is being used for at scale.

The Microsoft Bing Copilot data confirms this. “Gathering information and writing” — that’s commodity work. That’s the layer that’s already being absorbed.

What the index can’t measure is the durable layer: the work where your presence visibly changed the outcome in a way that goes beyond competence, where you changed the question more than you answered it, where the output depended on something you couldn’t have fully specified in advance. That work isn’t showing up in the 49% because it’s not the work people are bringing to Claude. It’s the work that doesn’t fit neatly into a prompt.

The Non-Obvious Problem With Being Good at Commodity Work

Here’s the part that stings. A lot of commodity work was genuinely hard to learn. You spent years getting good at synthesizing messy context into a clean memo. You learned how to write the update that calms people down. You learned how to turn a chaotic meeting into actionable next steps. That was real skill development.

The problem is that a skill can be real and still become less scarce. Markets don’t protect a skill because it once took a decade to build. They protect whatever is scarce now.

If you’ve built your professional identity around being the person who produces excellent commodity outputs — fast, clean, reliable — the Anthropic data suggests that identity is sitting on thin ice. Not because you’re bad at the work. Because the work is becoming less specific to you.

The trap is using AI to do more of that work faster. Claude helps you write the update in five minutes instead of thirty, so you write more updates. You free up two hours from routine coordination, and you fill them with more routine coordination. You become twice as productive at the part of your job whose value is collapsing. It feels like progress because the system still rewards visible throughput. It probably isn’t progress.

For builders thinking about this at the tooling level: the same dynamic applies to how you architect AI workflows. Platforms like MindStudio give you 200+ models, 1,000+ integrations, and a visual builder for chaining agents — which means you can automate commodity workflows quickly. The question worth asking is whether you’re automating the right layer, or just making it easier to produce more of what’s already becoming abundant.

The Legibility Problem That Makes This Hard to See

There’s a structural reason why durable work — the kind that survives this shift — is hard to measure and hard to defend.

Durable work has to be partially legible. You need to be able to show outcomes, point to decisions that changed the trajectory of something, make visible the moments where your judgment mattered. If your best work is completely invisible, you’ll be undervalued even when you’re doing exactly the right things.

But durable work can’t be fully specified. The moment you write down the complete decision tree for how you make a particular kind of judgment call, you’ve turned it into a process. A process can be delegated. A delegated process can be systematized. A systematized process can be automated. You’ve just converted your durable work into commodity work by explaining it too clearly.

This is the legibility paradox: show enough to be valued, but not so much that you’ve written yourself out of the loop.

Hire a contractor. Not another power tool.

Cursor, Bolt, Lovable, v0 are tools. You still run the project.
With Remy, the project runs itself.

The practical implication is that the language you use to describe your contributions matters. “I was concerned we were solving the wrong problem, so I got us to reframe the question — here’s what changed as a result” is a legible outcome claim. It doesn’t expose the mechanism. It doesn’t turn your judgment into a recipe. It just makes the outcome visible enough for the system to credit it.

The distinction between question-answering and question-holding is where this gets concrete. Most organizations reward question-answering: someone asks for a plan, you make the plan. That’s commodifiable — the question is already given, the frame is set, the output can be judged against the prompt. Question-holding is different. It’s the ability to recognize that the question being asked is the wrong question, and to keep the real question open long enough for a better answer to become possible, without losing the room in the process. That skill is much harder to absorb into a prompt.

What the Index Suggests You Should Actually Do

The Anthropic data doesn’t tell you what to do, but it does tell you what to look at. If 49% of jobs have already had 25%+ of tasks performed by Claude, the useful question isn’t “will AI replace me” — it’s “how much of my last two weeks still needed me specifically.”

That’s a calendar question. You can answer it. Open the last ten business days of your calendar, your sent email, your Slack DMs, your docs and tickets. Go line by line and tag each item honestly: Theater (work that exists because the organization performs it), Commodity (real work that doesn’t require you specifically), On-the-Line (pattern recognition where the patterns are structured, work that feels like judgment but might be becoming repeatable), or Durable (work where your presence visibly changed the outcome beyond competence).

The sister posts to this one cover the TCLD audit framework in detail and what to do after you run it — this post is about understanding why the data says you should run it at all.

What the Anthropic index makes hard to ignore is that this isn’t a future risk. The 49% figure is past tense. The tasks are already being performed. The question is whether you know which of your tasks are in that bucket, or whether you’re going to find out when the next reorg forces the question.

One concrete action worth taking regardless of where your audit lands: start keeping a private record of judgment calls. At the end of each week, write one line about a call you made where the outcome depended on something you can’t fully reduce to rules. After a year, you have roughly 50 entries. After three years, you have a portfolio of evidence that your durable work is real — not reconstructed from half-remembered impressions, but documented at the time. When someone asks why the work should come to you instead of a cheaper process, you have an answer.

Remy doesn't write the code. It manages the agents who do.

AGENTS ASSIGNED TO THIS BUILD

Remy

Product Manager Agent

Leading

Design

Engineer

Deploy

Remy runs the project. The specialists do the work. You work with the PM, not the implementers.

If you’re building tools that sit in this space — agents that handle the commodity layer so humans can focus on the durable layer — the spec-driven approach is worth understanding. Remy takes this seriously at the code level: you write your application as annotated markdown, and the full-stack TypeScript app gets compiled from that spec. The spec is the source of truth; the code is derived output. It’s a different relationship between human intent and machine execution than “AI writes code so you don’t have to.”

The Anthropic Economic Index is a snapshot of what’s already happened. The more interesting question is what the same index will show in two years, and whether the tasks that get absorbed next are the ones you’ve been building your career around.

For more on how Claude is being used in practice, the Claude model overview covers the capability landscape. And if you’re watching Anthropic’s infrastructure decisions as a signal for where this is heading, the compute shortage analysis is worth reading — demand is outpacing supply, which tells you something about the rate of adoption. The GPT-5.4 vs Claude Opus 4.6 comparison is also useful context for understanding which model capabilities are converging and which are still differentiated, since that affects which task types get commodified fastest.

The 49% figure is the starting point. The audit is how you find out where you stand inside it.