Run the 4-Bucket AI Job Audit in 20 Minutes: Which Parts of Your Work Are Already on Thin Ice?

Your Calendar Is Full and That’s the Problem

Most performance reviews will tell you you’re doing fine right up until the moment you’re not. The work gets done, the manager is happy, the meetings happen — and none of that tells you whether the work actually needed you. You can find out in about 20 minutes, using a framework called the TCLD audit: Theater, Commodity, On-the-Line, Durable, applied to the last two weeks of your actual work.

This post walks you through the audit step by step. By the end, you’ll have a concrete count of which parts of your job are already on thin ice and which parts are worth building around.

Why You’d Want to Know This Before Your Boss Does

The most dangerous moment in a job isn’t when work disappears. It’s when work still exists but less and less of it needs you specifically — and that’s invisible to most performance systems.

That’s exactly what happened to travel agents. Expedia didn’t erase the profession overnight. Online booking made the routine layer less defensible. Nothing looked different at first. Then a downturn hit, and the industry admitted what had already changed. The agents who survived moved toward complex trips, corporate travel, emergencies — work that still required judgment. The ones who defended routine booking as a professional identity had a rough decade.

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

remy.msagent.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

OpenAI and University of Pennsylvania researchers estimate that roughly 80% of US workers could have at least 10% of their tasks affected by language models, and about 1 in 5 could see 50% or more affected. Anthropic’s Economic Index found that approximately 49% of jobs have already had at least 25% of their tasks performed using Claude. A Microsoft study of 200,000 Bing Copilot conversations found the most common work people bring to AI is gathering information and writing — not exotic edge cases. For a deeper look at what Claude is actually capable of and how it’s being used in agent workflows, the What is Claude and How to Use It for AI Agents post is worth reading alongside this audit.

AI doesn’t have to replace your whole job. It only has to pick away at enough pieces that when the next budget freeze or reorg hits, the organization asks a question it’s been avoiding: why is this role bundled this way?

The TCLD audit gives you a way to ask that question yourself, before someone else does.

What You Need Before You Start

No special tools required. You need:

Your calendar for the last 10 business days, open in front of you
Your sent email for the same period
Slack DMs or equivalent — wherever your actual work conversations happen
Any docs, tickets, code commits, spreadsheets, or memos you produced

You’re not auditing your role or your projects. You’re auditing individual items — one meeting, one memo, one decision, one conversation. The unit of analysis is the actual thing you did, not the category it belongs to.

Set aside 20 minutes. This is a private exercise. Nobody sees the tags.

The Audit, Step by Step

Step 1: Open everything and go line by line

Open your calendar. Open sent mail. Open Slack. Pull up whatever contains your real work from the last two weeks.

Go item by item. For each one, you’re going to assign a single letter: T, C, L, or D. Don’t overthink any single item — first instinct is fine. If you genuinely can’t decide, tag it L and keep moving.

Now you have: a list of work items ready to tag.

Step 2: Tag Theater (T)

Theater is work that exists because the organization performs it, not because it produces examined value.

The test: if this work disappeared, would the main consequence be that the organization had to admit it had been performing rather than producing?

Examples that are almost always T:

The status meeting where nothing changed and nobody got unblocked
The deck that exists because one senior person needs something to flip through, but nobody reads it carefully
The alignment call that produced no alignment but let everyone say alignment was attempted
The recurring update that gets sent because someone asked for it 18 months ago
The review process that once solved a real problem, but the problem is gone

This is the hardest tag to use honestly. Tagging something T means admitting you spent professional time on something that didn’t need to happen. It’s uncomfortable. Do it anyway.

One useful reframe: theater is not a moral failure. Large organizations create theater because theater is legible — it gives people something to point at, lowers social risk, creates the appearance of coordination. It’s structural. You didn’t invent it. But you’re the one paying the time cost.

Now you have: your T items tagged. Expect this number to be bigger than you want.

Step 3: Tag Commodity (C)

Commodity is real work that produces real value — it just doesn’t require you specifically.

✗ VIBE-CODED APP

Tangled. Half-built. Brittle.

✓ AN APP, MANAGED BY REMY

UIReact + Tailwind✓

APIValidated routes✓

DBPostgres + auth✓

DEPLOYProduction-ready✓

Architected. End to end.

Built like a system. Not vibe-coded.

Remy manages the project — every layer architected, not stitched together at the last second.

The test: could you write a spec and have someone else in your organization produce an output that’s just about as useful?

Examples:

Summarizing meeting notes into next steps
Routing decisions that have already been made
Writing a status report that someone does read, but anyone could have written
Producing the first draft of a doc where the shape is already well-known
Applying known rules to known situations

This is not an insult. A lot of C work is genuinely important. Companies run on it. The question isn’t whether the work matters — it’s whether your career should be built on being the one who does it.

The uncomfortable part: commodity work often took real skill to learn. You spent years learning how to summarize messy context, how to write the update that calms people down, how to turn chaos into a doc. That was real. The problem is that a skill can be real and still become less scarce. Markets don’t protect a skill because it once took a decade to build.

Understanding what token-based pricing for AI models actually looks like helps clarify why commodity tasks are the first to get automated — the economics of running a model against a well-defined prompt are already cheaper than a human hour for a wide range of routine work.

Now you have: your C items tagged. This number is also probably bigger than you want.

Step 4: Tag On-the-Line (L)

On-the-Line is the uncomfortable middle — work that doesn’t cleanly fit into commodity or durable.

Examples:

Pattern recognition where the patterns are structured
Relationship management that depends on history you carry
Editorial calibration in an established format
Routine synthesis across familiar inputs
Work that used to feel hard and now feels a little too repeatable

The tell for L work: a strong junior person could do 70% of it, but the last 30% feels like yours. If someone asked you to explain exactly what judgment you applied, you might struggle to articulate it.

L is where most of your work anxiety lives. It’s where your professional identity and the moving line of AI capability are pressing against each other. Some L work will drift toward C. Some will become D. The point of tagging it isn’t to resolve that — it’s to see where the line is moving.

Now you have: your L items tagged. This is probably the largest bucket.

Step 5: Tag Durable (D)

Durable is work where the output depends on something you cannot fully describe in advance.

The test: would this work have degraded without you there, and is the reason not simply that you’re faster, more organized, or more available?

Signs of D work:

You changed the question more than you answered it
You read what was happening in a room and acted on it
You saw that the stated problem wasn’t the real problem
Your presence visibly changed the outcome in a way that goes beyond competence

D is not just hard work. Some hard work is commodity. Some durable work looks almost invisible from the outside — the bad hire that didn’t get made, the product detour that didn’t consume six months, the customer escalation that didn’t become a crisis. Performance systems are terrible at crediting avoided damage. But avoided damage is often where senior judgment lives.

One specific marker: durable work often starts with question-holding rather than question-answering. Most organizations reward people for answering questions — someone asks for a plan, you make the plan. That’s valuable, but it’s also very commodifiable. The question is already given; the frame is set; the output can be judged against the prompt. Durable work often starts before that, when the right move is to say “I think we’re asking the wrong question” — and then hold that gap without losing the room.

If you agonize over whether something is D, tag it L and keep going.

Now you have: all items tagged with T, C, L, or D. Count them up.

Step 6: Read the numbers

Add up each bucket. That count is the first honest picture of your job you’ve seen in a while.

The number that matters first is T + C combined. That’s the fraction of your current week where your personal claim on the work is weakest. It doesn’t mean all of that work vanishes tomorrow. It means that’s where you’re most exposed when the next shock hits.

Here’s what most people find:

T is bigger than expected (because we confuse “professionally expected” with “created value”)
C is bigger than expected (because commodity work often took real skill to build)
D is smaller than expected (because our professional identity is built around what we think is unique, not how many hours we actually spend on it)

If your D number is small, the question isn’t whether you have durable skills. It’s whether your week is organized around them.

If you want to run this audit with AI assistance, the transcript this post is based on suggests using Codex with computer use to work across multiple UIs — calendar, email, Slack — though you’ll need to chunk the work and express your own preferences clearly. A separate agent for email, one for calendar, Slack summary tools for the rest. You still have to do the thinking; the agent helps with the retrieval.

Now you have: a concrete breakdown of your last two weeks by displacement risk.

Where the Audit Gets Confusing

“I can’t tell if something is C or L.” That’s normal. The line between commodity and on-the-line is genuinely blurry. The useful question: if you had to explain the judgment you applied, could you? If yes, it’s probably C. If you’d struggle to articulate it, it’s probably L. Don’t spend more than 30 seconds on any single item.

“Everything feels durable when I’m doing it.” This is the identity trap. The audit asks a cold question: how many hours last week did you actually spend on work that would have degraded without you? Not how much of your self-image depends on it. Count the hours, not the feeling.

REMY IS NOT

✕a coding agent
✕no-code
✕vibe coding
✕a faster Cursor

IT IS

✓a general contractor for software

The one that tells the coding agents what to build.

“My T number is embarrassingly high.” Good. That’s the point. The theater number being high doesn’t mean you’re bad at your job — it means you’re in a large organization that generates theater structurally. The question is what you do with that information.

“I don’t have any D work in my current role.” This is the most important finding the audit can produce. Some roles are theater-heavy because the organization is. Some roles were designed for an earlier era and haven’t been rebuilt. If there’s no realistic path inside your current role to build meaningful durable skills, the answer might not be better time management — it might be moving. When evaluating a new role, don’t read the job description. Ask the people doing the role what they spent time on last week. Ask what calls they made that couldn’t have been made by a process.

What to Do With the Count

The audit is an input, not a verdict. Here’s where to take it.

Stop performing the theater you can stop. Start with the theater that exists by inertia — the recurring report nobody reads, the check-in that made sense two years ago. Cancel it. Send a short version. Watch what happens. Most of the time, nothing happens.

Don’t reinvest recovered time into more commodity work. This is the trap. AI helps you write the update faster, so you write more updates. You cut two useless meetings, so you fill the space with more routine coordination. You become twice as productive at the part of your job whose value is collapsing, and it feels like progress because the system still rewards visible throughput.

Build a private record of durable calls. At the end of each week, write one line: what was the call you made where the outcome depended on judgment you can’t reduce to rules? After a year, you have roughly 50 entries. After three years, you have a portfolio of judgment — actual evidence, not half-remembered impressions from three months ago.

Make durable work partially legible. Talk about outcomes: “I was concerned we were solving the wrong problem, and I got us to have the conversation — we changed the plan.” That’s a visible claim. It helps the system understand where you contribute outside commoditized work. But it doesn’t turn your judgment into a recipe. There’s a real legibility paradox here: durable work has to be visible enough that the system values it, but not so fully specified that the system can run it without you. Separate analysis from judgment in the way you talk. “The analysis says X; my judgment is that this case is different” teaches people where to bring you in.

If you want to build tooling around any of this — say, an agent that surfaces your durable calls from Slack history, or a workflow that flags when your week is skewing toward commodity — MindStudio makes it possible to chain those kinds of agents visually, connecting to Slack, email, and calendar integrations without writing the orchestration code yourself. It’s an enterprise AI platform with 200+ models and 1,000+ integrations, built around a visual workflow builder that handles the orchestration layer so you can focus on what the agent actually needs to do.

Not a coding agent. A product manager.

Remy doesn't type the next file. Remy runs the project — manages the agents, coordinates the layers, ships the app.

BY MINDSTUDIO

Use your durable track record to gradually refuse commodity work. Most people can’t simply stop doing routine work. The sequence is: become visibly valuable on non-routine work first, then use that value to renegotiate the routine load. This usually happens through project selection before it happens through formal authority. When you have a choice, choose the project where the answer is uncertain over the project where the path is documented.

The deeper point is that the system you’re operating inside was built around an old assumption — that human output was the scarce thing. Performance reviews, promotion frameworks, quarterly goals, meeting rituals: all of it assumed that. That assumption is breaking unevenly right now, not all at once, not the same way for every role. During the lag, the people who can see their own work clearly have an advantage over the people waiting for the review cycle to catch up.

The audit is just a way to see clearly. One sitting, last two weeks, four letters. The count doesn’t lie.

Where to Take This Further

If the audit surfaces a lot of L work you want to move toward D, the practical question is how to get more exposure to genuinely ambiguous cases — work where the framing is unclear, not just the execution. That usually means choosing harder projects, sitting in on conversations where you’re not the expert, and getting closer to the people carrying context your normal workflow abstracts away.

For the commodity work you want to compress rather than just abandon, understanding what the WAT framework means for workflows, agents, and tools is a useful lens — it maps out where human judgment still sits in AI-augmented workflows, which helps you identify which C tasks are worth automating first and which L tasks are worth protecting.

If your audit reveals that your role is heavily theater-and-commodity with no clear durable path, it’s worth thinking carefully about what a durable role actually looks like in practice. The What Is OpenClaw? post on open-source AI agents that take real actions is one concrete example of where the commodity layer is moving — understanding what agents can already do autonomously sharpens your sense of where human judgment is still genuinely required.

And if you’re building something from this — a spec for a judgment-tracking tool, a workflow for surfacing durable calls, anything that starts as a structured document and needs to become a working application — Remy takes a different approach to that problem: you write the application as an annotated spec in markdown, and it compiles that into a complete TypeScript backend, database, auth, and deployment. The spec stays the source of truth; the code is derived from it.

The audit takes 20 minutes. What you do with the count is the longer work.