My 2026 AI Builder Stack: S-Tier Daily Drivers, What I Retired, and the 20% Rule for Switching

Six Tools Make the Cut. Everything Else Is Noise.

Most AI builder stacks I see are either embarrassingly thin or hopelessly bloated. One builder I follow closely — Nate Herk — just published his full ranked stack for 2026, and the specificity is worth unpacking. The short version: S-tier daily drivers are Claude Code, VS Code, and Glido. A-tier weekly tools are Codeex, Claude Chat, Hermes Agent, Perplexity, and Grok. Graduated and retired: ChatGPT chat, OpenClaw, Cursor, NotebookLM, and Whisper Flow. That’s the whole picture. What’s interesting isn’t the list itself — it’s the reasoning behind each placement, and the framework for deciding when a tool earns a spot at all.

You don’t need to use every tool that drops. You need to know which ones actually move your needle.

S-Tier: The Three Tools That Run Every Day

Claude Code as an operating system

Claude Code isn’t described as a coding assistant here. It’s described as an operating system — the environment you live inside, not a tool you reach for occasionally. That framing matters.

The practical implication is that Claude Code becomes the hub through which other tools connect. Research from Perplexity feeds in. Scripts built with Codeex land in the same directory. Voice input from Glido drives prompts. The IDE is just VS Code with Claude Code running in the terminal or via the extension — nothing exotic, but the combination means you’re working inside a single coherent environment rather than context-switching between five browser tabs.

TIME SPENT BUILDING REAL SOFTWARE

95%

5% Typing the code

95% Knowing what to build · Coordinating agents · Debugging + integrating · Shipping to production

Coding agents automate the 5%. Remy runs the 95%.

The bottleneck was never typing the code. It was knowing what to build.

If you want to understand what Claude Code is actually capable of at a lower level, the Claude Code source code leak revealed 8 hidden features worth knowing about — including some behaviors that aren’t documented anywhere in the official docs.

One thing worth flagging: Claude Code’s effort level setting has a real cost impact. Running everything at max effort burns tokens fast. The Claude Code effort levels guide covers when to dial back to low or medium — which matters if you’re running this as your primary environment all day.

VS Code as the interface layer

This is a deliberate choice over the Claude Code desktop app or raw CLI. VS Code gives you file tree visibility on the left, Claude Code in the terminal below, and the option to use the Claude Code extension if you prefer that surface. It’s also what makes the stack portable — you could swap Claude Code for Codeex or another agent and the IDE stays the same.

The IDE choice is less important than the principle: pick one interface and stay in it. Switching between the desktop app, the CLI, and a browser-based tool for the same project is friction that compounds.

Glido replacing Whisper Flow entirely

This is the most interesting S-tier move. Glido is a speech-to-text startup — not a household name — and it has fully displaced Whisper Flow in this stack. The stated reasons: it’s faster, it’s private, and Windows support is imminent.

Speech-to-text as a daily driver for an AI builder might seem like an odd priority, but think about what it’s actually doing. If your primary interface is Claude Code and you’re spending hours in it, voice input for prompts is a genuine productivity multiplier. Dictating a complex instruction is often faster than typing it, especially for longer context-setting prompts.

The retirement of Whisper Flow here is clean and unambiguous — not “I use both” but “Glido replaced it.” That’s the kind of signal worth paying attention to.

A-Tier: Weekly Tools That Earn Their Place

Codeex as Claude Code’s complement

Codeex is positioned not as a competitor to Claude Code but as a complement. The framing is that Codeex has strengths where Claude Code has weaknesses, and vice versa. Both agents can work inside the same project directory — the same herk2 folder, the same claude.md or agents.md files — so switching between them for different tasks doesn’t require any project restructuring.

This is the tool-agnostic directory principle in practice. You’re not locked into one agent because your project isn’t built around one agent. It’s built around a directory.

The comparison between frameworks that sit on top of Claude Code is worth understanding if you’re choosing between them — the GStack vs Superpowers vs Hermes comparison covers the tradeoffs in detail.

Claude Chat for quick access

Claude Chat stays in A-tier not because it’s better than Claude Code for chat, but because it’s faster to open when you just need a quick answer. This is an honest admission that sometimes the overhead of your primary environment isn’t worth it for a 30-second question.

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

The lesson here isn’t “use Claude Chat more.” It’s that your stack should have a fast path for low-stakes queries so you don’t interrupt your primary workflow unnecessarily.

Hermes Agent: the OpenClaw replacement

Hermes Agent is the most significant A-tier addition, and the most surprising retirement is OpenClaw making way for it. Hermes runs through Telegram, wakes on demand when you message it, and supports instant cron jobs. The setup overhead is lower than building equivalent infrastructure in Claude Code.

The use case is specific: general knowledge work when you’re away from your desk, and lightweight automations that don’t need the full Claude Code infrastructure. It’s not replacing Claude Code for heavy work — it’s filling the gap where Claude Code is overkill.

Platforms like MindStudio handle similar orchestration at a different layer: 200+ models, 1,000+ integrations, and a visual builder for chaining agents and workflows — useful when you want the agent infrastructure without writing the plumbing yourself.

Perplexity for agent research

Perplexity shows up here specifically in the context of automations — having agents use Perplexity for research tasks rather than using it as a personal search tool. That’s a narrower use case than most people associate with Perplexity, and it explains why it’s A-tier rather than S-tier. It’s not a daily driver; it’s a reliable component in automated pipelines.

Grok for X/Twitter search

Grok’s placement is specific: searching through Twitter threads and finding particular posts or insights. Not general research, not coding help — just the thing it does better than anything else, which is searching X’s corpus. Using the right tool for a specific sub-task rather than forcing your primary tool to do everything is the pattern this whole stack is built on.

Specialist Tools: Reach for These When the Task Demands It

The specialist tier is where the stack gets interesting for builders who work across content, automation, and media.

Apify handles web scraping via actors — pre-built scraping modules you can call from automations or have Hermes Agent invoke. You don’t live in Apify; you reach for it when a pipeline needs structured data from the web.

GPT Image 2 and Nano Banana 2 serve different image needs. GPT Image 2 is the generative/creator tool — thumbnails, concept images, illustrative content. Nano Banana 2 is the editing tool, described explicitly as a Photoshop replacement for adding effects, adjusting existing images, making things stand out. Having two image tools with distinct roles is more useful than trying to force one tool to do both jobs adequately.

Fal.ai is described as “the Open Router for image and video models” — a routing layer that gives you access to multiple image and video generation APIs through a single interface. If you’re building agents that need to generate images or video, Fal.ai means you’re not hard-coded to one provider. On the subject of routing, using Open Router’s free model tier with Claude Code is worth reading if token costs are a concern — you can route through free models for tasks that don’t need frontier-level capability.

Day one: idea. Day one: app.

DAY

DELIVERED

Not a sprint plan. Not a quarterly OKR. A finished product by end of day.

Open Router itself earns a B-tier specialist spot for model routing on the text side. Same principle as Fal.ai: don’t hard-code your agents to one model provider.

HeyGen for avatars, ElevenLabs for voice cloning and voice agent builds — both narrow specialists used for specific production tasks, not general-purpose tools.

Claude Design is the newest addition to the specialist tier, adopted at the team level for landing pages and design systems. The value isn’t just that it generates designs — it’s that everyone on the team can build landing pages using the same design system, with shared context and the ability to comment on work. That’s a workflow problem solved, not just a capability added.

Speaking of going from design and spec to deployed application: Remy takes a different approach to the build step — you write an annotated markdown spec, and it compiles a complete full-stack TypeScript application from it, including backend, database with auto-migrations, auth, and deployment. The spec is the source of truth; the generated code is derived output.

C-Tier and Experimenting: What’s Being Watched

Gemini is in the experimenting tier with an honest assessment: it’s rarely used, and Nano Banana 2 gets reached for more often than Gemini 2.5 Pro. That’s not a knock on Gemini’s capabilities — it’s a statement about where it fits in this particular workflow.

Ollama is used for downloading and experimenting with open-source models, not for production workloads. If you want to run Claude Code against local models via Ollama, the guide to running Claude Code free with Ollama and Open Router covers the setup. It’s a useful experimentation path even if you don’t end up running local models in production.

Manus is acknowledged as a capable tool — potentially an S-tier daily driver for someone new to AI building — but it doesn’t displace Claude Code for someone already deep in that workflow.

The Graduated List: What Got Retired and Why

The retired tools are worth examining because the reasons vary.

Whisper Flow was replaced by a better tool (Glido). Clean substitution.

OpenClaw was replaced by Hermes Agent, which does the same job with easier setup and better mobile access via Telegram.

Cursor graduated in favor of VS Code with Claude Code. Both are IDEs for AI-assisted coding; one won.

ChatGPT chat graduated as Claude Code absorbed the chat use case.

NotebookLM graduated not because it’s bad but because the functionality got replicated inside Claude Code with more customization and lower cost. This is a pattern worth watching: as Claude Code becomes more capable, tools that were previously necessary for specific tasks become redundant.

Poppy AI same story — the functionality is now achievable inside Claude Code, more customized, cheaper.

The graduation category is doing something important: it’s distinguishing between “this tool is bad” and “I’ve moved past this tool.” NotebookLM is a good tool. It just doesn’t fit this stack anymore.

The Decision Framework: When to Actually Switch

Remy doesn't write the code. It manages the agents who do.

AGENTS ASSIGNED TO THIS BUILD

Remy

Product Manager Agent

Leading

Design

Engineer

Deploy

Remy runs the project. The specialists do the work. You work with the PM, not the implementers.

The 20% productivity dip rule is the most practically useful piece of this whole stack discussion. Every tool switch comes with a productivity dip — you’re learning new interfaces, new mental models, new failure modes. The question isn’t whether the dip happens; it’s whether the ceiling after the dip is higher than your current trajectory.

If switching tools gets you back to exactly where you were, the dip wasn’t worth it. If it breaks through a plateau you couldn’t otherwise clear, it was.

The decision framework that follows from this is simple enough to actually use:

New tool drops. Does it solve a current pain point? If no, save the link and move on.
If yes, test it in a real scenario — not mock data, not a toy project — for one week.
After the week: did it solve the pain point? Keep it. If not, discard it.

The “real scenario” requirement is important. Testing a scraping tool on a fake dataset tells you nothing. Testing it on the actual data your automation needs to process tells you whether it works.

There’s also a meta-principle underneath all of this: build your directories like they’ll outlive any tool, because they will. The herk2 project directory has had OpenClaw, Hermes, Codeex, and Claude Code all working inside it. The agents change; the directory persists. If Claude Code disappeared tomorrow, you’d pick up Codeex and keep working. That’s the goal.

Productivity, in this framing, is needle moved per hour — not hours worked. A four-hour day where you ship something that matters beats a twelve-hour day of reading threads and watching demos. The stack exists to serve that metric, not to be impressive on a tier list.

The Jeff Bezos principle applied here: think about what will never change, not what will change. The tools will change. The directory structure, the project context, the skill files — those are the durable layer. Build there.