Durable Work vs. Commodity Work — How to Position Yourself on the Right Side of AI Automation

The Trap Between Invisible and Automatable

You are caught between two bad outcomes. Make your work too visible — document every step, explain every decision, turn your judgment into a process — and you’ve written a spec for your own replacement. Keep it too hidden, and the organization can’t value what it can’t see, and you get cut in the next reorg. This is the legibility paradox, and it’s the central career problem of 2026.

Not “will AI take my job.” That question is too blunt to be useful. The real question is whether the parts of your job that are visible are the parts that are durable — or whether you’ve accidentally made yourself legible in exactly the wrong places.

Most people have. And the data is not comforting.

Anthropic’s Economic Index found that 49% of jobs have already had at least a quarter of their tasks performed using Claude. Microsoft researchers looked at 200,000 Bing Copilot conversations and found the most common work people bring to AI is gathering information and writing. Not exotic edge cases. The rewriting-email thing. The summarizing-the-meeting thing. The work that fills most calendars.

The question worth sitting with: how much of your last two weeks still needed you?

Durable Work vs. Commodity Work — What Actually Separates Them

The distinction sounds simple. Durable work is work where the output depends on something you cannot fully specify in advance. Commodity work is real, valuable work that doesn’t specifically require you.

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

But the line is harder to draw than it sounds, and the legibility paradox is why.

Here’s the problem. When you make durable work legible — when you write down the decision tree, explain the judgment call, document the process — you do two things simultaneously. You make the organization understand your value. And you create a spec that can be delegated, automated, or handed to someone cheaper. The act of articulating the work is also the act of commodifying it.

This is not a theoretical concern. OpenAI and University of Pennsylvania researchers estimate that roughly 80% of US workers could have at least 10% of their tasks affected by language models, and one in five could see half their tasks affected. The tasks most at risk are the ones that can be described. If you can write a prompt for it, someone will.

So the goal is not to make your work fully legible. It’s to make it partially legible — visible enough that the system values it, opaque enough that the system can’t run it without you.

That’s a much harder needle to thread.

What Commodity Work Actually Looks Like

Commodity work is not bad work. It’s often essential work. Summarizing messy context, routing decisions, writing the update that calms people down, turning chaos into a document — these are real skills that took real time to develop.

The problem is that a skill can be real and still become less scarce. Markets don’t protect skills because they were hard to acquire. They protect whatever is scarce now.

The test for commodity work is simple: could you write a spec and have someone else in your organization produce an output that’s roughly as useful? If yes, it’s commodity. And the uncomfortable truth is that if you can write a spec for it, so can an AI.

This is where the travel agent pattern is instructive. Expedia didn’t erase travel agents overnight. The routine booking layer eroded first. The visible break came later, during downturns, when the industry finally had to admit what had already changed. The agents who survived moved toward complex trips, corporate travel, emergencies — work where the stated problem and the real problem were often different things.

Most knowledge work is sitting inside that same lag right now. The work still exists. The calendar is full. The manager is happy. But less and less of it specifically requires you.

The dangerous window is not after the reorg. It’s before it, when the performance system is still rewarding visible throughput while the actual economics of the role are already shifting.

What Durable Work Actually Looks Like

Durable work is not just hard work. Some hard work is commodity work. Some durable work looks almost invisible from the outside.

The clearest marker: you changed the question more than you answered it. You saw that the stated problem was not the real problem. Your presence in the conversation visibly changed the outcome in a way that goes beyond competence or speed.

TIME SPENT BUILDING REAL SOFTWARE

95%

5% Typing the code

95% Knowing what to build · Coordinating agents · Debugging + integrating · Shipping to production

Coding agents automate the 5%. Remy runs the 95%.

The bottleneck was never typing the code. It was knowing what to build.

The underlying skill is what you might call question-holding rather than question-answering. Most organizations reward question-answering. Someone asks for a plan, you make the plan. Someone asks for an analysis, you do the analysis. That’s valuable, but it’s also the surface area AI is best positioned to absorb — because the question is already given, the frame is set, and the output can be judged against the prompt.

Durable work often starts before the question gets asked. It starts when someone asks a question and the right move is to say, “I think we’re asking the wrong question.” That’s uncomfortable. The meeting wants to move forward. The executive wants the recommendation. The team wants next steps. Question-holding is the ability to honor those commitments while keeping the real question open long enough for a better answer to become possible.

The output of durable work is often the thing that didn’t happen. The bad hire that didn’t get made. The product detour that didn’t consume six months. The customer escalation that didn’t become a crisis. Performance systems are terrible at crediting avoided damage. But avoided damage is often where senior judgment lives.

Durable work also compounds differently. The pattern recognition you build from holding ambiguous questions for years doesn’t live entirely in a document or a workflow. It lives in your calibration — in the scar tissue of having been wrong before, in the moment where you can feel a clean story becoming just a little too clean. You can teach around it. You can share parts of it. You can’t fully transfer it. That’s the point.

The Legibility Paradox in Practice

Here’s where most people make the mistake, and it comes from good intentions.

You do something valuable — you read the room, you prevent the bad decision, you shape the strategy in a conversation. And then you want to be credited for it, which means you have to explain it. So you write it down. You turn the judgment into a process document. You explain every step.

Two things happen. First, the organization understands your value. Second, you’ve just written a delegation spec. If you described it accurately, someone can now follow the process. If you described it inaccurately, you’ve created a broken process that generates mistakes.

Either way, you’ve lost something.

The alternative — keeping everything invisible — is equally bad. If you read the room and prevent the bad decision and nobody can explain what you did, you get undercredited. You become the person who’s hard to evaluate, which is a dangerous position in any reorg.

The answer is partial legibility. Show the outcomes, not the mechanism. “I was concerned we were solving the wrong problem, and I got us to have the conversation. We changed the plan.” That’s a visible claim. It helps the system understand where you contribute outside commoditized work. But it doesn’t turn your judgment into a recipe.

Separate analysis from judgment in the way you talk. Analysis is the work that can be transferred. Judgment is what you do with the analysis. “The analysis says this; my judgment is that we should still do that.” “The framework says one thing, but this case is different.” That language teaches people where to bring you in — give me the cases where the analysis isn’t enough.

Other agents start typing. Remy starts asking.

YOU SAID "Build me a sales CRM."

REMY ASKS

01 DESIGN Should it feel like Linear, or Salesforce?

02 UX How do reps move deals — drag, or dropdown?

03 ARCH Single team, or multi-org with permissions?

Scoping, trade-offs, edge cases — the real work. Before a line of code.

This is also why building a private track record matters. At the end of every week, write down one call you made where the outcome depended on judgment you can’t fully reduce to rules. Not a brag sheet. Just the record: what was the call, what was the context, what was the result. After a year, you have 50 entries. After three years, you have a portfolio of judgment. When someone asks why your role matters, you’re not reconstructing value from half-remembered memories. You have evidence.

The Identity Problem Nobody Talks About

Running this kind of audit on your own work is mechanically straightforward. The hard part is what the results do to the story you tell yourself about who you are at work.

Theater threatens the story because it says some of your professional time was meaningless and now you know which parts. Commodity work threatens the story because it says skills you worked hard to build are becoming less scarce. The durable bucket threatens the story because the part of you that you think is most valuable might be a small fraction of your actual week.

Most people avoid thinking about their time this way not because they lack the tools but because the audit asks you to challenge your self-identity. Identity updates are psychologically expensive. You pay the cost now and the benefit arrives later.

The advantage goes to the person who can update their self-image before the organization forces the update on them.

Six Moves After You See the Picture Clearly

Once you’ve honestly sorted your work — and the theater and commodity numbers will be bigger than you want — the question is what to do about it.

Stop performing theater that exists by inertia. The recurring report nobody reads. The meeting where you’re the third senior person and the second one is enough. Cancel. Skip. Send three sentences in Slack. Most of the time, nothing happens. That’s the point.

Don’t pour recovered time into more commodity work. This is the trap. AI helps you write the update faster, so you write more updates. You become twice as productive at the part of your job whose value is collapsing. The better move is to put recovered time into cases that don’t fit the patterns you already know — projects where the framing is unclear, not just the execution.

Build the private track record. One judgment call per week. The record becomes the evidence.

Use that record to gradually refuse commodity work that doesn’t fit your trajectory. This doesn’t happen through formal declaration. It happens through project selection. When you have a choice, choose the project where the answer is uncertain over the project where the path is documented. Durable judgment needs raw material — messy cases, exposure to reality before it’s been cleaned up into a memo.

Make durable work partially legible. Outcomes visible. Mechanism protected. The distinction between analysis and judgment, spoken out loud.

Plans first. Then code.

PROJECTYOUR APP

SCREENS12

DB TABLES6

BUILT BYREMY

1280 px · TYP.

yourapp.msagent.ai

A · UI · FRONT END

Remy writes the spec, manages the build, and ships the app.

And if the audit says you’re in the wrong role — if most of your week is theater and commodity work with no realistic path to meaningful durable work — the answer may not be better time management. Some roles are theater-heavy because the organization is theater-heavy. Some roles were designed for an earlier era and nobody has rebuilt them yet. If you’re evaluating a new role, don’t read the job description. Ask the people doing it what they spent time on last week. Ask what calls they made that couldn’t have been made by a process.

The broader point here connects to how AI tools are being built right now. Platforms like MindStudio handle orchestration across 200+ models and 1,000+ integrations — the commodity layer of AI work is getting compressed fast, which means the humans in the loop need to be doing something other than coordination and summarization. The tools are getting better at the describable parts of knowledge work. That’s not a threat to durable work. It’s a clarification of what durable work actually is.

The Power Law Nobody Wants to Hear

There’s a case worth making for people who build their careers on great moments rather than consistent throughput.

The hours in a working life are distributed according to a power law. A handful of moments across decades are extraordinarily impactful. The decision that saved the company. The question you held open that prevented six months of work on the wrong problem. The hire you didn’t make. There are not many of those hours. But they make or break careers.

If your durable work is small in volume but high in impact, that’s a legitimate career. But the audit asks a cold question: are those moments actually happening, or are you telling yourself they’re happening while filling the week with commodity work?

The difference matters. And the only way to know is to look at the actual week.

This is also where the macro picture is worth keeping in mind. Software engineering job postings are up 18% from the May 2025 inflection point, according to Citadel Securities data — the most AI-exposed occupation is seeing accelerating demand, not displacement. The Wall Street Journal reported that AI created 640,000 jobs between 2023 and 2025. Stripe Atlas Q1 2026 incorporations were up 130% year-over-year. The unemployment rate in March 2026 was 4.3%, compared to 4.4% in March 2020. The doom narrative is not matching the macro data.

What this suggests is not that AI poses no risk to knowledge workers. It’s that the risk is more surgical than apocalyptic. AI doesn’t have to replace your whole job to put you on thin ice. It only has to pick away at enough pieces that when the next shock comes — a recession, a budget freeze, a reorg — the rest of the story stops holding together.

The people who get sorted badly in that moment are the ones who built their professional identity around the describable parts of their work. The ones who survive are the ones who moved toward the work that still requires a person — not because they were faster or more organized, but because they were carrying context and taste and judgment that couldn’t be fully specified before the work started.

For anyone building AI-assisted workflows to handle the commodity layer, tools like Remy take a similar approach to the spec problem: you write annotated markdown describing what you want, and the full-stack application — TypeScript backend, database, auth, deployment — gets compiled from it. The spec is the source of truth; the code is derived output. The human judgment lives in the spec, not in the implementation details.

That’s the shape of durable work in a world where the implementation layer keeps getting cheaper. The judgment about what to build, what question to hold open, what problem is actually worth solving — that’s where the value accumulates.

The audit is not a verdict. It’s a starting point. The question after the count is done: what part of your week are you going to stop defending, and what part are you going to start feeding?

Those are different questions. Most people only ask the first one.

For a practical walkthrough of building AI agents for personal productivity that compress the commodity layer of your work, that’s a good place to start. If you want to understand the underlying model capabilities driving this shift, the Claude overview covers the current state. And if you’re thinking about how to structure agentic workflows — the kind that handle the describable work so you can focus on the judgment work — the WAT framework is worth understanding.