Jeff Bezos's 'What Won't Change' Principle Applied to AI Tool Stacks — And Why It Matters Now

Jeff Bezos Built Amazon Around What Wouldn’t Change. Your AI Stack Should Work the Same Way.

Jeff Bezos famously told his team to stop asking what would change in the next ten years and start asking what wouldn’t. Customers would always want lower prices, faster delivery, more selection. Build toward those constants, and every investment compounds. Build toward the trends, and you’re on a treadmill.

The same principle applies directly to how you build your AI workflow. And most people are getting it wrong.

Right now, the default behavior for AI builders is to chase tools. A new model drops, you rebuild your setup around it. A new agent framework gets announced, you spend a week migrating. Cursor was essential six months ago. ChatGPT was the obvious default before that. OpenClaw was the go-to for agentic work until something better came along. The churn is real, and it’s expensive.

The Bezos principle, applied to AI tool stacks: think about what will never change, not what will change. And the corollary that follows from it: build directories like they’re going to outlive any tool. Because they will.

The Non-Obvious Thing About Tool Churn

Here’s what most people miss. The problem isn’t that they’re using the wrong tools. The problem is that their architecture is coupled to specific tools.

If your workflow only works because of Cursor’s specific UI, or because ChatGPT’s memory feature stores your context, or because some tool has a proprietary project format — you’ve built fragility into the foundation. When that tool gets replaced (and it will), you don’t just swap one thing out. You rebuild.

The alternative is to treat your working directory as the durable artifact. Your claude.md file, your agent instructions, your scripts, your skills — these live in a directory that any agent can read. Claude Code can work in it. Codex can work in it. Hermes Agent can work in it. Some new tool that doesn’t exist yet will be able to work in it too, because it’s just a directory with well-structured files.

This is not a subtle distinction. It’s the difference between owning your workflow and renting it.

The reason this is non-obvious is that tool-specific features feel like productivity gains in the short term. Cursor’s inline editing is fast. ChatGPT’s interface is familiar. But those features are the UI layer, not the substance. The substance is your project structure, your prompts, your domain knowledge encoded in markdown. That’s what you should be investing in.

What the Evidence Actually Shows

Consider the tools that have been “graduated out” of active use over the past year: ChatGPT regular chat, Cursor, OpenClaw, Notebook LM, WhisperFlow. These weren’t bad tools. They were genuinely useful at the time. But they’ve been replaced — not because they failed, but because better alternatives emerged that fit into a more durable architecture.

Claude Code replaced Cursor. The reason isn’t that Claude Code has a better UI (it doesn’t, really — it runs in a terminal). The reason is that Claude Code operates directly on your filesystem, in your directory, with your context. There’s no proprietary project format. There’s no lock-in. You can run it in VS Code, in the CLI, in any IDE. If Claude Code disappeared tomorrow, everything you built with it — your directory structure, your .md files, your scripts — would still be there, ready for the next tool to pick up. For a deeper look at how Claude Code’s architecture actually works, the Claude Code source code leak analysis surfaced eight specific behaviors that explain why it’s structured this way.

Hermes Agent replaced OpenClaw for similar reasons. It wakes on demand via Telegram, runs instant crons, and — critically — it can work inside the same directory as Claude Code and Codex. You’re not maintaining three separate project contexts. You have one directory, and multiple agents can operate on it.

WhisperFlow got replaced by Glydo. This one is interesting because it’s a smaller, newer tool replacing an established one. Glydo is faster, more private, and Windows support is imminent. The switch happened because the new ceiling was higher — which brings up the 20% productivity dip rule.

The 20% Dip Rule and When to Ignore It

Every tool switch costs you roughly 20% efficiency in the short term. Muscle memory, workflow familiarity, knowing where things are — all of that resets. This is real and it’s worth taking seriously.

How Remy works. You talk. Remy ships.

YOU14:02

Build me a sales CRM with a pipeline view and email integration.

REMY14:03 → 14:11

Scoping the project

Wiring up auth, database, API

Building pipeline UI + email integration

Running QA tests

✓ Live at yourapp.msagent.ai

The question isn’t whether the dip happens. It’s whether the new ceiling justifies it. If you switch tools and eventually plateau at the same level you were at before, the dip wasn’t worth it. If you switch and break through to a higher level of output, it was.

This is why the decision framework matters. When a new tool or feature appears, the first question is: does this solve a pain point I have right now? Not a hypothetical future pain point. Not a pain point someone else has. A real, current bottleneck in your actual work.

If the answer is no, save the link. Don’t learn it. Don’t experiment with it. You can always come back when it becomes relevant.

If the answer is yes, test it in a real scenario with real data. Not mock data, not a toy project — something that actually matters to your work. Run it for a week. Then ask: did this solve the problem? Is the ceiling higher? If yes, keep it. If no, discard it and move on.

This framework sounds simple, but it’s surprisingly hard to follow when you’re surrounded by announcements and demos and people telling you that the latest thing is essential. The discipline is in the “save the link” step. Most new tools are not relevant to your current work. That’s fine. They don’t need to be.

What Tool-Agnostic Actually Means in Practice

Tool-agnostic doesn’t mean tool-indifferent. You should have strong opinions about which tools you use for which tasks. The point is that your workflow shouldn’t break if any single tool disappears.

Think about it at the task level. A YouTube video production workflow might use Perplexity for research (good at web-sourced synthesis), Claude Code with a skill for structuring the script (because it has context about your style and format), GPT Image 2 for thumbnail creation (strong at generative image work), and Nano Banana 2 for post-processing effects — what you might call a Photoshop-style layer on top of the generated image. Each tool is chosen for a specific task. None of them is load-bearing for the others.

Fal.ai fits into this pattern as the routing layer for image and video generation inside agents — essentially what Open Router does for language models, but for image and video models. You swap the underlying model without changing the integration. Open Router itself belongs in the same category: it’s infrastructure for model-agnosticism, not a tool you’re dependent on.

This is the architecture principle in action. The workflow is the durable thing. The tools are interchangeable at the task level.

For teams building more complex agent infrastructure, MindStudio’s approach extends this further — 200+ models, 1,000+ integrations, and a visual builder for chaining agents and workflows — so the orchestration layer doesn’t couple you to any single model provider either.

The Productivity Metric That Changes Everything

Productivity is needle moved per hour, not hours worked.

This sounds obvious but it has a specific implication for how you think about tool switching and learning. A 12-hour day spent watching demos, reading release notes, and experimenting with new tools is not a productive day by this definition. It might feel productive. It’s not.

TIME SPENT BUILDING REAL SOFTWARE

95%

5% Typing the code

95% Knowing what to build · Coordinating agents · Debugging + integrating · Shipping to production

Coding agents automate the 5%. Remy runs the 95%.

The bottleneck was never typing the code. It was knowing what to build.

The North Star question is: what are you actually trying to accomplish? If you’re building a product, the path to that goal is probably not “learn every new AI framework that drops this month.” Most of what gets announced is not on your critical path. Knowing about something is different from knowing how to use it, and you only need the latter when it’s directly relevant to your work.

This is where the “what won’t change” principle becomes operational. Your core competencies — your domain knowledge, your project structure, your understanding of how to decompose problems — these are the constants. Tools are the variables. Invest heavily in the constants. Be selective about the variables.

The WAT framework for workflows, agents, and tools is a good example of investing in constants: it’s a way of structuring Claude Code projects that will remain useful regardless of which specific models or tools you’re using underneath.

Building the Durable Directory

Concretely, what does a tool-agnostic directory look like?

You have a project directory. Inside it, you have a claude.md or agent.md file that encodes context about the project — what it is, how it’s structured, conventions to follow, domain knowledge that any agent needs to be useful. You have a skills/ or workflows/ directory with reusable procedures. You have your scripts and your data.

Any agent — Claude Code, Codex, Hermes, something that doesn’t exist yet — can read this directory and get up to speed. The context isn’t locked in a chat history or a proprietary memory system. It’s in files, in your repo, under your control.

This is why the Andrej Karpathy LLM wiki approach to personal knowledge bases resonates: structured markdown that any model can query is more durable than any tool-specific memory feature. The knowledge outlives the tool.

The same principle applies when you’re thinking about spec-driven development. Tools like Remy take this further: you write your application as an annotated markdown spec — readable prose carrying intent, annotations carrying precision — and the full-stack TypeScript application gets compiled from it. The spec is the source of truth; the generated code is derived output. That’s the same philosophy: invest in the durable artifact (the spec), not the generated artifact (the code).

When you’re thinking about cost management within this architecture, using Open Router’s free model tier with Claude Code is a practical example of tool-agnosticism paying off — you can route to different models without restructuring your workflow.

The Lean Stack Principle

The final thing worth saying is that a lean stack is not a compromise. It’s a feature.

The S-tier daily drivers — Claude Code, VS Code as the IDE, Glydo for speech-to-text — are three tools. That’s the core. Everything else is a specialist that gets called for a specific task. Apify for scraping inside automations. HeyGen when you need avatars. ElevenLabs for voice cloning. These aren’t things you live in. They’re tools you reach for when the task requires them.

The instinct to add more tools is usually a sign that you’re optimizing for the wrong thing. More tools means more context switching, more maintenance, more surface area for things to break. The question is never “what’s the best tool in this category?” The question is “for this specific task in this specific context, what do I already have that can handle it?”

Everyone else built a construction worker.
We built the contractor.

🦺

CODING AGENT

Types the code you tell it to.
One file at a time.

🧠

CONTRACTOR · REMY

Runs the entire build.
UI, API, database, deploy.

And if nothing in your current stack handles it, then you apply the decision framework: is this a real pain point right now? Test it. Evaluate after a week. Keep or discard.

The tools will keep changing. The directory outlasts them all.

Jeff Bezos's 'What Won't Change' Principle Applied to AI Tool Stacks — And Why It Matters Now

Jeff Bezos Built Amazon Around What Wouldn’t Change. Your AI Stack Should Work the Same Way.

The Non-Obvious Thing About Tool Churn

What the Evidence Actually Shows

The 20% Dip Rule and When to Ignore It

How Remy works. You talk. Remy ships.

What Tool-Agnostic Actually Means in Practice

The Productivity Metric That Changes Everything

Coding agents automate the 5%. Remy runs the 95%.

Building the Durable Directory

The Lean Stack Principle

Everyone else built a construction worker.
We built the contractor.

Related Articles

How to Evaluate Any New AI Tool in One Week Without Wrecking Your Productivity

AI Token Management: Why Your Claude Code Session Drains Faster Than It Should

What Is Gemini 3.1 Flash Lite? Google's Fastest, Cheapest AI Model

What is Claude and How to Use It for AI Agents

Jeff Bezos Built Amazon Around What Wouldn’t Change. Your AI Stack Should Work the Same Way.

The Non-Obvious Thing About Tool Churn

What the Evidence Actually Shows

The 20% Dip Rule and When to Ignore It

How Remy works. You talk. Remy ships.

What Tool-Agnostic Actually Means in Practice

The Productivity Metric That Changes Everything

Coding agents automate the 5%. Remy runs the 95%.

Building the Durable Directory

The Lean Stack Principle

Everyone else built a construction worker.We built the contractor.

Related Articles

How to Evaluate Any New AI Tool in One Week Without Wrecking Your Productivity

AI Token Management: Why Your Claude Code Session Drains Faster Than It Should

What Is Gemini 3.1 Flash Lite? Google's Fastest, Cheapest AI Model

What is Claude and How to Use It for AI Agents

Everyone else built a construction worker.
We built the contractor.