School CLI Built in 10 Minutes Compresses 132K Tokens to 2K: How Printing Press Solves Context Bloat

A School CLI Fetched 132,000 Tokens and Injected Only 2,000 Into Context

Claude Code built a School CLI in 10 minutes. When asked to pull 10 recent posts from a School community, the CLI fetched 132,000 tokens worth of data from School’s servers — and injected exactly 2,000 tokens into the Claude context window. That’s a 66x compression ratio, on a site with no public API, built by an agent that had never seen the site before.

That number deserves a second look. Not because it’s impressive in the abstract, but because it points at something structural about how agents consume information that most people building with Claude Code haven’t fully internalized yet.

The compression didn’t happen by accident. It happened because of how the CLI was architected — and understanding that architecture changes how you think about every tool your agents touch.

The Number That Should Bother You

Here’s the thing about 132,000 tokens: that’s not a pathological case. That’s what you get when you ask a reasonable question to a modern web community platform. Ten posts, with comments, metadata, author info, timestamps, nested replies. The raw payload is large because the web is large.

The question is what happens to those 132,000 tokens next.

One coffee. One working app.

You bring the idea. Remy manages the project.

WHILE YOU WERE AWAY

✓Designed the data model

✓Picked an auth scheme — sessions + RBAC

✓Wired up Stripe checkout

✓Deployed to production

Live at yourapp.msagent.ai

In a naive agent setup — say, an MCP server that hits a REST endpoint and returns JSON — all of that lands in your context window. Or close to it. You pay for it in tokens, you pay for it in latency, and you pay for it in reliability. The agent now has to reason over a massive blob of semi-structured data to find the three things you actually wanted.

The School CLI doesn’t do that. It fetches the full payload, processes it locally, and returns ~2,000 tokens of clean, pre-formatted text. The agent gets the answer, not the raw material to find the answer.

This is the distinction that matters. And it’s not unique to School — it’s a design principle that applies to every data source your agents touch.

If you’ve been thinking carefully about token management in Claude Code sessions, you already know that context bloat is one of the primary ways sessions degrade. What Printing Press is doing is attacking that problem at the source — before the data ever enters the context window.

Why This Is Non-Obvious (And Why Most People Miss It)

When MCP servers appeared, the mental model most people adopted was: “this is how agents talk to tools.” It was a reasonable assumption. MCP solved a real problem — tool discovery. You could point an agent at an MCP server and it would figure out what was available, what parameters each tool took, how to invoke them. That’s genuinely useful.

But MCP’s solution to discovery has a cost that compounds badly at scale. Every tool in an MCP server loads its description into context, whether you invoke it or not. Open a fresh Claude Code session with several MCP servers configured and run /context — you’ll see tokens consumed before you’ve typed a single prompt.

The benchmark from Printing Press is stark: MCP used 35x more tokens than a CLI on the same task. And reliability dropped from 100% with the CLI to 72% with MCP as task complexity increased. That last number is the one that should concern you most. It means MCP isn’t just expensive — it’s actively less reliable when the tasks get hard, which is exactly when you need reliability most.

CLIs solve discovery differently. They use lazy discovery — the agent only loads tool descriptions when it actually needs them. The output is pre-formatted, typically around 200 tokens of clean text rather than a raw JSON blob. Auth tokens are held by the CLI itself, so there’s no OAuth dance on every invocation. And a local SQLite mirror means no round trips for repeated queries, no rate limit exposure on reads.

The School example makes this concrete. The CLI sent about 260 tokens to School. School sent back 132,000 tokens. The CLI processed that locally and returned 2,000 tokens to Claude. The agent never saw the 132,000 tokens. They were never in the context window. They were never counted against your session limit.

This is what “agent-native” actually means in practice — not just “works with agents,” but “designed around the constraints agents actually operate under.”

What Printing Press Actually Does

Printing Press is two things: a library of pre-built CLIs and a factory for building new ones.

How Remy works. You talk. Remy ships.

YOU14:02

Build me a sales CRM with a pipeline view and email integration.

REMY14:03 → 14:11

Scoping the project

Wiring up auth, database, API

Building pipeline UI + email integration

Running QA tests

✓ Live at yourapp.msagent.ai

The library covers a lot of ground. ESPN (no public API), Craigslist (no public API), Amazon, eBay, TikTok Shop, Shopify, Airbnb, LinkedIn via Contact Goat, Hacker News, Linear. The starter pack — ESPN, Flight Goat, Movie Goat, Recipe Goat — gets you running in minutes. Each CLI in the catalog links to a GitHub repo you can clone directly.

The factory is the more interesting piece. You point it at any website, and Claude Code runs a structured process: research the site, catalog every feature that exists in any tool, generate a Go CLI that covers those features, then verify quality through what the docs call “dog food runtime verification and scoring.” The whole thing takes around 10 minutes for a straightforward site.

The prerequisite is Go — the programming language from Google, free and open source. You don’t need to write Go yourself. Claude Code installs it and uses it. But it’s worth understanding why Go: it’s fast, produces single static binaries, and is well-suited for CLI tools that need to handle HTTP, parse HTML, and manage local SQLite state. The CLIs aren’t scripts — they’re compiled binaries with real performance characteristics.

The School CLI is a good example of what the factory can do with a hard case. School has no public API. It’s a community platform that requires authentication and has no documented endpoints. The factory did deep discovery — essentially reverse-engineered the site’s request patterns — and built a CLI that can authenticate, fetch posts by category, filter by type, and return structured summaries. The agent asked for wins from the wins category, got nine results, and surfaced the three strongest with links. All of that happened in a single natural language request.

Sites like All Recipes have anti-scraping protection, but the CLI uses a real Chrome session to bypass it. Domino’s has no public API. These aren’t edge cases — they’re the normal state of the web. Most of the data sources your agents might want to touch don’t have clean APIs.

The Architecture Underneath the Compression

The 66x compression ratio isn’t magic. It’s the result of several specific architectural decisions working together.

Pre-formatted output. The CLI doesn’t return raw JSON. It returns clean text, formatted for an agent to read. When you ask for NBA games tonight, you get “Knicks vs Sixers at 7pm ET, Spurs vs Timberwolves at 9:30pm ET” — not a JSON object with nested team objects, venue objects, broadcast objects, and a dozen fields you didn’t ask for. The difference between ~200 tokens and several thousand tokens per query, multiplied across every tool call in a session, is significant.

Local SQLite mirror. Frequently-accessed data gets cached locally. The agent can query the local mirror without making a network request, which eliminates round trips and rate limit exposure for read-heavy workflows. This matters especially for compound commands — queries that chain multiple data fetches together.

Auth held by the CLI. The CLI manages authentication tokens. The agent doesn’t need to handle OAuth flows or store credentials. This removes a whole category of context pollution — no auth-related prompting, no credential management in the conversation.

Day one: idea. Day one: app.

DAY

DELIVERED

Not a sprint plan. Not a quarterly OKR. A finished product by end of day.

Lazy discovery. Tool descriptions only load when invoked. Compare this to MCP, where every tool in every connected server loads its description at session start. For a session with multiple MCP servers, this overhead is constant and unavoidable.

Compound commands. CLIs can chain operations. Instead of the agent making three separate tool calls and synthesizing the results, a single compound command handles the chaining internally and returns one clean output. Fewer round trips, less context accumulation.

These aren’t independent optimizations — they compound. The SQLite mirror reduces round trips. Pre-formatted output reduces tokens per response. Lazy discovery reduces baseline context load. Compound commands reduce the number of tool calls. The 66x compression is the product of all of them working together.

What This Means for How You Build

The practical implication is a priority ordering for how your agents should talk to external systems. CLIs first. If a CLI exists for what you need, use it. If one doesn’t exist, build it — the factory makes this a 10-minute task, not a multi-day project. APIs second, if there’s no CLI option. MCP last, only when there’s no other path.

This inverts the mental model most people have been operating with. MCP felt like the answer because it was new and it solved discovery elegantly. But discovery is only one part of the problem. Token efficiency and reliability under complexity are the other parts, and CLIs win both.

The team sharing story is worth noting here. Once you build a CLI, you can package it into a GitHub repo and share it across your team. Each person clones it and substitutes their own API key. The Tally CLI example from the video: built once, pushed to a private repo, team members clone it and swap in their credentials. The CLI becomes a shared asset, not a one-off script. This is how you build institutional knowledge around agent tooling — not by documenting which MCP servers to configure, but by maintaining a library of CLIs your agents can actually use efficiently.

For teams building more complex agent workflows, platforms like MindStudio handle the orchestration layer — 200+ models, 1,000+ integrations, and a visual builder for chaining agents and workflows — but the token efficiency problem lives one layer down, at the tool interface level. That’s where CLI architecture matters regardless of what’s orchestrating above it.

The Contact Goat CLI is a useful illustration of what’s possible at the edges. It finds verified emails by looking up a person on LinkedIn, cross-checking on Happenstand, and running a deep email verification pass. LinkedIn has no public API for this kind of lookup. The CLI handles the session management, the cross-referencing, and the verification — and returns a single verified email address. The agent doesn’t see any of the intermediate steps.

If you’re building agents that need to reason over information rather than just retrieve it, the /compact command in Claude Code helps manage context accumulation over long sessions. But compacting a bloated context is treating the symptom. Printing Press attacks the cause — keeping the data that enters context small and clean from the start.

REMY IS NOT

✕a coding agent
✕no-code
✕vibe coding
✕a faster Cursor

IT IS

✓a general contractor for software

The one that tells the coding agents what to build.

The Opus plan mode for token savings is another layer of the same problem: how do you get more done within session limits? Planning with Opus and executing with Sonnet helps. But if every tool call is dumping raw JSON into your context, you’re fighting a losing battle regardless of which model is executing.

The broader point is that agent efficiency isn’t just about prompt engineering or model selection. It’s about the entire data pipeline — what enters context, in what form, at what cost. The School CLI’s 66x compression is a concrete demonstration of what’s achievable when that pipeline is designed correctly.

The Abstraction That’s Been Missing

There’s a useful analogy here to how programming abstractions have evolved. Each layer — assembly to C to higher-level languages — didn’t just make things easier. It changed what was possible by hiding complexity that didn’t need to be visible. The developer stopped thinking about register allocation and started thinking about algorithms.

CLIs do something similar for agents. The agent stops thinking about pagination, authentication, JSON parsing, and rate limits. It thinks about the question it’s trying to answer. The CLI handles everything else.

This is also why tools like Remy are interesting in a related context: Remy treats a spec — annotated markdown — as the source of truth and compiles it into a complete TypeScript stack with backend, database, auth, and deployment. The code is derived output. The same principle applies here: the CLI is the interface that hides the complexity, and the agent works at the level of intent rather than implementation.

The WAT framework for structuring Claude Code projects — Workflows, Agents, and Tools — puts CLIs squarely in the Tools layer. That framing is useful: CLIs are tools, and well-designed tools have clean interfaces, predictable outputs, and minimal side effects on the systems that use them. The 66x compression is what a well-designed tool interface looks like in practice.

The School CLI took 10 minutes to build. It works on a site with no public API. It compresses 132,000 tokens to 2,000. And it can be shared across a team with a single git clone.

That’s the baseline you should be expecting from every tool your agents touch.

School CLI Built in 10 Minutes Compresses 132K Tokens to 2K: How Printing Press Solves Context Bloat

A School CLI Fetched 132,000 Tokens and Injected Only 2,000 Into Context

The Number That Should Bother You

One coffee. One working app.

Why This Is Non-Obvious (And Why Most People Miss It)

What Printing Press Actually Does

How Remy works. You talk. Remy ships.

The Architecture Underneath the Compression

Day one: idea. Day one: app.

What This Means for How You Build

The Abstraction That’s Been Missing

Related Articles

What Is Context Rot in Claude Code Skills? How Bloated Skill Files Degrade Agent Performance

What is Claude and How to Use It for AI Agents

Claude Code Hooks: 18 Lifecycle Events Most Users Have Never Touched — and How to Use Them

How Anthropic's Natural Language Autoencoders Work: The 3-Component Architecture That Reads Claude's Mind