Claude Code Context Mode Compresses 315KB Sessions to 5KB — Here's How to Install It
Context Mode routes tool calls through a sandbox and shrinks a 56KB Playwright snapshot to 299 bytes. Two commands to install.
A 56KB Playwright Snapshot Becomes 299 Bytes. Here’s What’s Actually Happening.
Every tool call you make in Claude Code dumps raw data into your context window. A Playwright snapshot: 56KB. Twenty GitHub issues: 59KB. An access log from a thirty-minute session: 46KB. You do this for an hour and you’ve burned through a meaningful fraction of your context window on output that Claude never needed to reason over — it just needed the signal inside it.
Context Mode fixes this at the source. The numbers from their published benchmarks: a 56KB Playwright snapshot compresses to 299 bytes. A 46KB access log becomes 155 bytes. Over a full session, 315KB of raw tool output becomes 5KB total. That’s not a rounding error — that’s a different architecture for how tool calls work.
This post is about what Context Mode actually does, how to install it in two commands, and what to watch for when it breaks.
What’s Actually Happening Under the Hood
The naive approach to tool calls is: run the command, dump the output into context, let the model figure out what matters. This works fine for short sessions. It falls apart around the 30-minute mark when you’ve accumulated enough raw output that compaction kicks in — and compaction is lossy. Claude forgets which files it was editing, what tasks were in progress, what you last asked it to do.
Coding agents automate the 5%. Remy runs the 95%.
The bottleneck was never typing the code. It was knowing what to build.
Context Mode intercepts tool calls before they hit your context window. When Claude runs a command or fetches a URL, Context Mode routes that call through a sandbox — an isolated sub-process. The raw output gets captured there. Then only the semantically compressed version comes back into the context window. The 56KB Playwright snapshot doesn’t disappear; it gets processed and the 299-byte summary of what Claude actually needs is what lands in context.
This is the first half of the problem. The second half is what happens after compaction.
Context Mode maintains a local SQLite database that tracks every meaningful event in your session: file edits, task creation, decisions, errors. When Claude compacts the conversation, it doesn’t lose this. Context Mode rebuilds a session snapshot from the database and injects it back in. The model picks up where it left off — files, tasks, last prompt — instead of starting from a degraded reconstruction.
The practical result: sessions that used to fall apart around 30 minutes now run for three hours. You stop spending prompts re-establishing context that Claude already had.
If you’ve been managing this manually — running /compact at 60% capacity, carefully choosing what to include — you know the discipline required. Context Mode automates the discipline. For more on the manual approach, the post on using the /compact command to prevent context rot is worth reading alongside this one.
What You Need Before Installing
Claude Code version. Context Mode installs as a plugin via the Claude Code plugin system. Any recent version should work. If you’re also planning to use /ultra review, you’ll need v2.1.86 or later specifically for that — but Context Mode itself doesn’t carry that constraint.
A working Claude Code setup. You need Claude Code running locally with plugin installation enabled. If you’ve never installed a plugin before, the process is the same as installing any other skill — you run the install command in your terminal, not inside a Claude Code session.
Node/npm. The Context Mode plugin installs an MCP server under the hood. This requires npm to be available. Run npm --version to confirm.
One thing to know about ClaudeMem. If you’re also looking at ClaudeMem (the cross-session memory plugin that uses SQLite + vector search for 3-layer retrieval), there’s a documented footgun: running npm install for ClaudeMem installs the SDK library only — the hooks never register and nothing works. Stick with the plugin marketplace commands. Context Mode doesn’t have this issue, but worth knowing if you’re setting up both.
Installing Context Mode: Two Commands and a Restart
Step 1: Run the install commands
Open your terminal and run the two install commands for Context Mode. The plugin auto-installs the MCP server, the hooks, and the routing instructions. You don’t manually configure any of this.
/plugin install context-mode
The second command registers the MCP server so Context Mode can intercept tool calls. Both commands need to complete before you restart.
After running both commands, you should see confirmation that the plugin installed and the MCP server registered. Now you have the plugin installed but not yet active.
Step 2: Restart Claude Code
This is the step people skip. The hooks don’t take effect until Claude Code restarts. Quit and reopen — not just a new session, a full restart of the application.
Everyone else built a construction worker.
We built the contractor.
One file at a time.
UI, API, database, deploy.
After restart, Context Mode is active. Tool calls are now being routed through the sandbox. Now you have a running Context Mode installation.
Step 3: Verify it’s working
Run /contextmode:ctx-stats inside Claude Code. This shows you the compression stats for your current session — how much raw output has been intercepted and what the compressed size is. At session start, the numbers will be small. After 20-30 minutes of real work involving Playwright, file reads, or command execution, you’ll see the delta.
If /contextmode:ctx-stats returns nothing or errors, the MCP server didn’t register correctly. See troubleshooting below.
Now you have a verified working installation with live compression stats.
Step 4: Run a Playwright task to see the compression
If you want to see the 56KB → 299 bytes compression directly, run any task that involves Playwright — a browser automation, a screenshot, a DOM snapshot. Check /contextmode:ctx-stats before and after. The raw Playwright output that would have landed in your context window is now a 299-byte summary.
For anyone building browser automation workflows, this pairs well with the browser automation with Claude Code and Playwright setup — Context Mode makes those sessions dramatically more sustainable over long runs.
Now you have empirical confirmation of the compression ratio on your own machine.
Where This Fits in a Larger Workflow
Context Mode solves the garbage-in problem. It doesn’t solve the planning problem or the code quality problem.
The workflow that makes sense: use the Superpowers skill (150,000+ GitHub stars, forces plan-first → isolated environment → test-before-code) for the planning and quality discipline. Use GSD (Get Shit Done) for context engineering on complex multi-day projects — it spawns fresh sub-agents per task so each one has a clean context window, with a plan → execute → verify phase structure. Then Context Mode handles the raw output compression that would otherwise degrade both of those.
These aren’t redundant. Superpowers slows Claude down to think the problem through. GSD gives each task a clean context window. Context Mode keeps that window clean by not letting raw tool output fill it up. They address different failure modes.
For cross-session memory — carrying knowledge from one session to the next — Context Mode doesn’t help. That’s ClaudeMem’s job: SQLite + vector search, 3-layer retrieval (compact index → timeline → full details), auto-generated folder-level claude.md files, reported ~10x token savings on retrieval versus dumping all past context at session start. Context Mode handles within-session compression; ClaudeMem handles between-session recall. You want both.
The 18 Claude Code token management techniques post covers the broader landscape of session optimization — Context Mode is one piece of that, specifically the piece that handles tool output bloat.
Real Failure Modes
The MCP server didn’t register. Symptom: /contextmode:ctx-stats errors or returns nothing. Fix: check that both install commands completed without errors, then do a full restart (not just a new session). If it still fails, check whether another MCP server is conflicting — Context Mode needs to intercept tool calls before they reach Claude, and a conflicting MCP configuration can break this.
Compression isn’t happening on certain tool types. Context Mode routes tool calls through its sandbox, but not every tool call type is intercepted by default. If you’re seeing full-size outputs for a specific tool, check the plugin documentation for which tool types are covered. Playwright and command execution are the primary targets; some custom MCP tools may not be intercepted.
Session snapshot injection is slow. After compaction, Context Mode rebuilds the session snapshot from its SQLite database and injects it. On very long sessions with many events, this injection can take a few seconds. This is expected behavior — the database is doing real work. It’s still faster than manually re-establishing context.
The sandbox is blocking something it shouldn’t. Context Mode runs tool calls in an isolated sub-process. Occasionally this isolation breaks something that depends on the main process environment — environment variables, file system state, etc. If a command that worked before Context Mode now fails, try running it with Context Mode’s passthrough mode to confirm the sandbox is the issue.
You’re on an API key, not a Claude account. This doesn’t affect Context Mode directly, but if you’re also trying to use /ultra review (which launched alongside Opus 4.7 and requires Claude Code v2.1.86+), an API key alone won’t work — you need to be signed in with a Claude account. Worth knowing if you’re setting up multiple tools at once.
Where to Take This Further
The compression numbers are the headline, but the session snapshot injection is the part I find more interesting. Compression saves tokens. The snapshot injection changes the failure mode of long sessions — instead of gradual degradation that you have to catch and manage, you get a hard reset that preserves state. That’s a different kind of reliability.
If you’re building agents that run for hours — research pipelines, multi-step code generation, anything with significant tool use — the combination of Context Mode (within-session compression + snapshot injection) and ClaudeMem (cross-session memory with semantic retrieval) gives you a much more stable foundation than either alone.
The memory architecture question is worth thinking through carefully. The self-evolving Claude Code memory system with Obsidian and hooks post covers a complementary approach — Context Mode sits at the session layer handling compression, while hook-based memory systems operate at the persistence layer. Understanding where each one operates helps you pick the right tool for the right problem.
For teams building more complex agent infrastructure — multi-client architectures, scheduled skill chains, orchestration across multiple models — MindStudio handles the orchestration layer with 200+ models and 1,000+ integrations, which is a different approach to the same underlying problem of keeping agents well-contextualized across complex workflows.
One thing worth flagging for anyone building production applications on top of this kind of agentic infrastructure: the spec-driven approach to app generation is maturing fast. Remy takes annotated markdown specs and compiles them into complete TypeScript stacks — backend, SQLite database with auto-migrations, auth, deployment. The spec is the source of truth; the generated code is derived output. If you’re building the application layer that sits above your Claude Code workflows, that abstraction is worth knowing about.
Not a coding agent. A product manager.
Remy doesn't type the next file. Remy runs the project — manages the agents, coordinates the layers, ships the app.
The core insight from Context Mode is simple: the model doesn’t need raw tool output, it needs the signal inside it. Routing through a sandbox and compressing before injection is the right architecture. The 315KB → 5KB number is the proof that it works.