5 Claude Code Skills That Cut Token Costs by Up to 70% — Benchmarked Across Real Sessions

A Benchmark Nobody Asked For — But Everyone Needed

Someone ran 12 automated Claude Code sessions — six with a plugin called Superpowers installed, six without — using identical prompts and the same model. The result: 9% cheaper runs, 14% fewer tokens consumed, and measurably better output quality on anything that wasn’t trivially simple. That’s not a marketing claim. That’s a controlled test, and the numbers are specific enough to be useful.

If you’ve been using Claude Code without any plugins installed, you’re essentially running the engine without the transmission. The model is capable. The scaffolding is missing.

This post is about the Superpowers plugin specifically — what it does, why the benchmark numbers make sense, and how it fits into a broader set of tools that can meaningfully reduce what you spend on tokens. The other tools (Graphify, Firecrawl) are covered in their own posts. This one is about the plugin that produced the most surprising benchmark data.

Why Claude Code Wastes Tokens by Default

Here’s the thing most people don’t realize when they start using Claude Code: the model’s default behavior is to start doing. You type a prompt, it begins writing code. No clarifying questions, no plan, no verification pass at the end.

That sounds efficient. It isn’t.

Everyone else built a construction worker.
We built the contractor.

🦺

CODING AGENT

Types the code you tell it to.
One file at a time.

🧠

CONTRACTOR · REMY

Runs the entire build.
UI, API, database, deploy.

When Claude misunderstands what you wanted — and it will, especially on anything with ambiguity — you spend tokens fixing the wrong thing. You re-prompt. You correct. You re-run. The total token cost of a misunderstood task is almost always higher than the cost of a few clarifying questions upfront.

This is the core problem Superpowers solves. It installs 14 skills into Claude Code and forces the model through a five-phase process on every task: clarify, design, plan, code, verify. Before Claude writes a single line, it stops and asks the right questions. It maps out the plan with exact file paths and task breakdowns. At the end, it verifies that what it built actually works.

The 14% token reduction in the benchmark isn’t magic. It’s the difference between doing a task once correctly versus doing it twice because the first attempt was off-target.

If you want to understand the broader context of token management in Claude Code, the 18 Claude Code token management hacks post covers the full landscape — but Superpowers addresses the most expensive category: rework caused by misalignment.

The Benchmark, Unpacked

Twelve automated sessions. Six with Superpowers, six without. Same prompts, same model, measured across real tasks.

The headline numbers: 9% cheaper, 14% fewer tokens, better output quality on complex tasks. The quality improvement is harder to quantify precisely, but the token savings are concrete — and they compound. If you’re running Claude Code daily, a 14% reduction in token consumption is meaningful at the end of a month.

What explains the gap? A few things.

First, the clarification phase catches misunderstandings before they become expensive. When Claude asks a question instead of guessing, the answer costs almost nothing. When Claude guesses wrong and builds the wrong thing, the correction costs a lot.

Second, the planning phase with explicit file paths and task breakdowns means Claude isn’t re-reading your entire codebase to figure out where to make changes. It already knows. That’s a direct token saving on every subsequent step.

Third, the verification phase catches errors before you catch them. A bug you find after Claude finishes is a bug you fix in a new session, with new context loading. A bug Claude catches during verification is fixed in the same session, cheaply.

The five-phase structure isn’t bureaucracy. It’s token efficiency disguised as process.

Installing It (It Takes One Conversation)

The installation is genuinely simple. Open Claude Code, type plugin in the chat, hit enter. That opens the Claude Code plugin marketplace. Grab the GitHub link for Superpowers from the project’s description, paste it into the marketplace search, add it, find it in your plugin section, enable it.

One conversation. Done.

The important caveat: don’t use Superpowers for fast iterative design work. If you want to change a button color or tweak a headline, the five-phase process is overkill — it’ll slow you down without meaningful benefit. Superpowers earns its keep on new features, new projects, anything with real ambiguity about what “done” looks like.

The Visual Brainstorming Feature Nobody Talks About

The benchmark numbers get most of the attention, but there’s a feature inside Superpowers that might actually be more valuable for preventing token waste: the visual brainstorming companion.

Remy doesn't write the code. It manages the agents who do.

AGENTS ASSIGNED TO THIS BUILD

Remy

Product Manager Agent

Leading

Design

Engineer

Deploy

Remy runs the project. The specialists do the work. You work with the PM, not the implementers.

When you ask Claude to design something, Superpowers spins up a local web dashboard showing you mockups and layout options before Claude commits to building anything. You pick what looks right. Claude builds from there.

This matters because the most common source of wasted tokens in UI work isn’t bad code — it’s correct code for the wrong design. You describe what you want, Claude builds it, you look at it and realize that’s not what you meant. You re-prompt. Claude rebuilds. You’ve now spent two to three times the tokens you needed to.

Seeing mockups before a single line of code gets written eliminates that entire failure mode. It’s the clarification phase applied to visual work, and it’s the kind of thing that’s hard to benchmark but easy to feel in your workflow.

Where This Fits in the Broader Token-Reduction Stack

Superpowers is one piece of a larger set of tools that address different token-waste problems.

Graphify — inspired by Andrej Karpathy’s work on knowledge graphs — tackles a different problem: the cost of navigating large codebases. Every new Claude Code session has to re-read your files to understand what connects to what. On a small project, that’s fine. On a project with hundreds of files, Claude spends a significant amount of tokens just orienting itself before it can help you. Graphify pre-maps all the file relationships into a queryable knowledge graph, so Claude can navigate directly to relevant parts instead of reading everything from scratch. The reported savings are up to 70x cheaper on large codebases — that post covers it in detail if you’re working with 500+ file projects.

Firecrawl addresses web scraping specifically. Raw HTML is noisy — cookie banners, ads, JavaScript artifacts, navigation elements. When Claude reads a raw webpage, it processes a lot of garbage before getting to the actual content. Firecrawl converts any URL into clean structured data, which translates to up to 80% token reduction compared to feeding raw HTML directly. The Firecrawl MCP setup post covers the connector configuration in detail.

Then there’s the Awesome Design library — a GitHub collection of 68 complete design systems reverse-engineered from brands like Apple, Lamborghini, and Claude itself. Each system includes typography, color palettes, spacing rules, and component styles. You tell Claude which one to use, and it builds your site in that aesthetic. This doesn’t directly reduce tokens, but it eliminates the back-and-forth design iteration that burns them — you get a professional-looking result on the first pass instead of the fifth.

For teams building more complex agent workflows, platforms like MindStudio handle the orchestration layer: 200+ models, 1,000+ integrations, and a visual builder for chaining agents and workflows — which means you can apply token-efficiency thinking at the system level, not just the individual session level.

The Audit and Level-Up Skills

Beyond the five-phase process, Superpowers installs two skills worth knowing about: audit and level-up.

Day one: idea. Day one: app.

DAY

DELIVERED

Not a sprint plan. Not a quarterly OKR. A finished product by end of day.

The audit skill grades your AI operating system on four dimensions — context, connections, capabilities, and cadence — and returns a score out of 100. It’s a structured way to find gaps in your setup. If you’ve been using Claude Code for a while and feel like you’re not getting as much out of it as you should, running the audit will usually surface something specific.

The level-up skill asks five questions: What’s the most tedious thing you do repeatedly? What could a smart intern handle if you had time to explain it? What’s your biggest constraint? What would give you the most leverage if it ran automatically? Walk me through your past week. Answer those five questions honestly and you’ll almost always identify at least one automation opportunity you hadn’t thought to build.

These aren’t flashy features. They’re the kind of structured reflection that most people skip — and that’s exactly why they’re valuable.

The Notebook LM Integration

One more skill in the Superpowers ecosystem that deserves mention: the Notebook LM integration.

Google’s Notebook LM is a research tool that takes sources — PDFs, YouTube videos, web articles — and generates slide decks, podcast-style audio summaries, and mind maps from them. Most people use it manually, adding sources one by one. The Superpowers integration lets you automate the entire process with a single prompt.

You tell Claude to research a topic, load 20 sources from YouTube and the web, and generate a notebook with a slide deck and a podcast overview. Claude finds the sources, loads them into Notebook LM, and generates the assets — without you opening Notebook LM once. Notebook LM’s free tier is unlimited, so the only cost is the Claude tokens for the orchestration.

This is the kind of workflow that sounds like a small convenience but compounds significantly if you do research regularly.

The Real Argument for Structured Plugins

There’s a deeper point underneath all of this. Claude Code without plugins is a capable model with no scaffolding. It will do what you ask, but it won’t ask whether what you asked is actually what you want. It won’t plan before it acts. It won’t verify after it finishes.

The Superpowers benchmark — 9% cheaper, 14% fewer tokens, better quality — is evidence that structure isn’t overhead. Structure is efficiency. The five-phase process doesn’t slow Claude down; it prevents Claude from doing the wrong thing at full speed.

This is also why the “just use Claude Code” advice misses something important. The model’s raw capability is high. But raw capability applied without structure produces inconsistent results and unnecessary token spend. The plugins in this ecosystem — Superpowers, Graphify, Firecrawl, Awesome Design — are all solving the same underlying problem from different angles: how do you get Claude to do the right thing the first time?

If you’re thinking about this at the application layer — building tools that need to generate code or compile from structured specifications — Remy takes a related approach: you write your application as an annotated markdown spec, and it compiles into a complete TypeScript backend, SQLite database, auth, and deployment. The spec is the source of truth; the code is derived output. It’s a different abstraction level, but the same underlying logic: structure upstream saves rework downstream.

RWORK ORDER · NO. 0001ACCEPTED 09:42

YOU ASKED FOR

Sales CRM with pipeline view and email integration.

✓ DONE

REMY DELIVERED

Same day.

yourapp.msagent.ai

AGENTS ASSIGNEDDesign · Engineering · QA · Deploy

The Claude Code effort levels post covers another dimension of this — how the model’s reasoning depth affects output quality and token cost, and when to dial each setting up or down. And if you’re hitting session limits before you’ve finished a task, the Opus plan mode post covers a specific technique for extending sessions by separating planning from execution.

What the Numbers Actually Tell You

The 12-session benchmark is small. Anyone who’s run A/B tests knows that 12 sessions isn’t a large sample. But the direction of the results is consistent with the mechanism — and the mechanism makes sense.

Fewer misunderstandings → fewer re-prompts → fewer tokens. Explicit planning → less context re-loading → fewer tokens. Verification during the session → fewer bugs caught after → fewer tokens in follow-up sessions.

The 9% cost reduction and 14% token reduction aren’t the ceiling. They’re the floor — the savings you get from installing the plugin and doing nothing else. As you learn which tasks benefit most from the five-phase process and which don’t (fast iterative work, as noted, isn’t a good fit), the effective savings go up.

The more interesting number might be the quality improvement on complex tasks. Token cost is easy to measure. The cost of a Claude session that produces something you can’t actually use is harder to quantify — but it’s real, and it’s where the biggest savings live.

Fourteen skills. Five phases. One controlled benchmark. The case for structured plugins in Claude Code is no longer theoretical.

5 Claude Code Skills That Cut Token Costs by Up to 70% — Benchmarked Across Real Sessions

A Benchmark Nobody Asked For — But Everyone Needed

Why Claude Code Wastes Tokens by Default

Everyone else built a construction worker.
We built the contractor.

The Benchmark, Unpacked

Installing It (It Takes One Conversation)

The Visual Brainstorming Feature Nobody Talks About

Remy doesn't write the code. It manages the agents who do.

Where This Fits in the Broader Token-Reduction Stack

The Audit and Level-Up Skills

Day one: idea. Day one: app.

The Notebook LM Integration

The Real Argument for Structured Plugins

What the Numbers Actually Tell You

Related Articles

Andrej Karpathy's LLM Wiki Pattern: Cut Claude Token Usage 95% with a Two-Folder System

Claude Design Token Management: How to Stretch Your Weekly Usage Limit

How to Save Tokens in Claude Code Using the Opus Plan Mode

What is Claude and How to Use It for AI Agents

A Benchmark Nobody Asked For — But Everyone Needed

Why Claude Code Wastes Tokens by Default

Everyone else built a construction worker.We built the contractor.

The Benchmark, Unpacked

Installing It (It Takes One Conversation)

The Visual Brainstorming Feature Nobody Talks About

Remy doesn't write the code. It manages the agents who do.

Where This Fits in the Broader Token-Reduction Stack

The Audit and Level-Up Skills

Day one: idea. Day one: app.

The Notebook LM Integration

The Real Argument for Structured Plugins

What the Numbers Actually Tell You

Related Articles

Andrej Karpathy's LLM Wiki Pattern: Cut Claude Token Usage 95% with a Two-Folder System

Claude Design Token Management: How to Stretch Your Weekly Usage Limit

How to Save Tokens in Claude Code Using the Opus Plan Mode

What is Claude and How to Use It for AI Agents

Everyone else built a construction worker.
We built the contractor.