Andrej Karpathy's LLM Wiki Pattern: Cut Claude Token Usage 95% with a Two-Folder System

Andrej Karpathy Posted a Two-Folder System That Cuts Claude Token Usage by 95%

Andrej Karpathy shared a note about something he’d been quietly using: a personal knowledge base built from nothing more than two folders and a handful of markdown files. No vector database. No embedding pipeline. No chunking infrastructure. Just raw/ and wiki/, an index.md, and Claude doing the organizational work.

One person who implemented this approach turned 383 scattered files and over 100 meeting transcripts into a compact, queryable wiki — and dropped their Claude token usage by 95% compared to naive RAG. That number is worth sitting with for a moment.

This is the Karpathy LLM wiki pattern: raw/ folder + wiki/ folder + index.md + log.md. Claude auto-maintains the index and relationship links between documents. You query the index first, follow links to what you need, and never dump your entire knowledge base into context. The result is something that behaves like a well-organized colleague’s brain rather than a search engine.

Here’s how to build it.

What You Actually Get Out of This

Before the setup, the outcome. Because “95% fewer tokens” is abstract until you feel it.

Other agents start typing. Remy starts asking.

YOU SAID "Build me a sales CRM."

REMY ASKS

01 DESIGN Should it feel like Linear, or Salesforce?

02 UX How do reps move deals — drag, or dropdown?

03 ARCH Single team, or multi-org with permissions?

Scoping, trade-offs, edge cases — the real work. Before a line of code.

The problem with naive RAG — or worse, just pasting documents into context — is that you’re paying for everything whether it’s relevant or not. You have 50 meeting transcripts, you want to ask about a decision made in Q2, and you end up loading all 50 transcripts because you’re not sure which one has the answer. That’s context rot by design.

The wiki pattern inverts this. Claude reads a compact index.md first — maybe 2,000 tokens — which contains summaries of every document and the relationships between them. It then follows links to the two or three pages that actually matter for your question. You load 5% of the knowledge base instead of 100%.

The other thing you get is compounding. Every time you add a document, Claude updates the index and creates relationship links to existing pages. Your knowledge base doesn’t just grow — it gets more connected. Ask about “our pricing strategy” six months from now and Claude will surface not just the pricing doc but the customer interview that informed it and the competitor analysis that ran alongside it.

This is meaningfully different from having a folder of markdown files you point Claude at. The index is the key. Without it, Claude has to read everything. With it, Claude reads almost nothing.

What You Need Before Starting

Claude Code — this pattern works best with Claude Code because you need an agent that can read, write, and update files autonomously. The desktop app or VS Code extension both work. You’ll need a paid Anthropic subscription.

Obsidian (optional but recommended) — Obsidian is free and gives you a graph view of your wiki. You can see which documents are hubs, which are isolated, and where relationships are forming. It’s not required for the system to function, but it makes the structure visible in a way that’s genuinely useful. Download it at obsidian.md.

The Obsidian Web Clipper Chrome extension — if you want to pull articles from the web directly into your raw/ folder, this extension clips pages straight into your vault. Set the default destination to raw/ in the extension options.

Source material — meeting transcripts, YouTube transcripts, research articles, internal docs, whatever you want to be able to query. The system works at any scale but starts showing its value around 20-30 documents.

A claude.md file — you’ll need to tell Claude how the project works. This is the master prompt that explains the folder structure, how to search, and how to update the wiki.

Building the System

Step 1: Create the vault structure

Open your terminal, navigate to wherever you want this to live, and create the folder:

mkdir my-wiki
cd my-wiki

Then open Claude Code in this directory. Your opening prompt should be something like:

I want to implement Andrej Karpathy’s LLM wiki pattern. Create the following structure: a raw/ folder for source documents, a wiki/ folder for processed knowledge pages, an index.md at the root that will serve as the master index of all wiki pages and their relationships, and a log.md that tracks every ingest operation. Then create a claude.md that explains how this project works and how to search and update it.

Claude will scaffold everything. You now have an empty vault with the right bones.

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

remy.msagent.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

Check: You should see raw/, wiki/, index.md, log.md, and claude.md in your directory.

Step 2: Write the `claude.md` schema

The claude.md is what makes this system self-maintaining. Claude reads it at the start of every session and knows exactly how to behave. The key instructions to include:

Search protocol: Always read index.md first. Follow links to relevant wiki pages. Only read raw/ files if explicitly asked.
Ingest protocol: When given a new document in raw/, create wiki pages for the key concepts, update index.md with summaries and links, and log the operation in log.md.
Relationship maintenance: When creating a new wiki page, check index.md for existing pages that should link to it. Add backlinks both ways.
Index format: The index should have sections for topics, people, tools, sources, and concepts — whatever categories make sense for your domain.

Karpathy’s original note deliberately left the prompt vague so people could customize it. That’s the right instinct. Your claude.md should reflect what you’re actually storing.

Check: Ask Claude “how does this project work?” It should describe the folder structure and search protocol back to you accurately.

Step 3: Ingest your first batch of documents

Drop your source files into raw/. PDFs, markdown files, plain text, whatever you have. If you’re using the Obsidian Web Clipper, set it to save to raw/ and start clipping articles.

Then tell Claude:

I’ve added [N] documents to the raw/ folder. Please ingest them all: create wiki pages for the key concepts and entities, update index.md with summaries and relationship links, and log everything in log.md.

For a batch of 36 YouTube transcripts, this took about 14 minutes. For a single long article, closer to 10. The time scales with content density, not document count.

Watch what Claude creates in wiki/. You’ll see it making judgment calls: this concept gets its own page, these two things are the same entity, this document references that earlier one. Let it run. You can always correct the structure afterward.

Check: Open index.md. You should see summaries of every ingested document and links to the wiki pages Claude created. Open a wiki page — it should have backlinks to related pages.

Step 4: Set up Obsidian to visualize the graph

Open Obsidian, click “Open folder as vault,” and point it at your wiki directory. Switch to graph view. You’ll see nodes for every wiki page and edges for every link.

What you’re looking for: hub nodes (many connections) versus isolated nodes (few connections). Hub nodes are your most important concepts. Isolated nodes might indicate documents that haven’t been fully connected yet, or concepts that genuinely stand alone.

This visualization doesn’t change how the system works — it’s just a useful diagnostic. When you add 50 more documents and the graph starts showing clusters, you’ll understand your knowledge base differently than you did before.

Check: Graph view shows nodes for your wiki pages with visible connections between related concepts.

Step 5: Query the system

Now test it. Ask Claude a question that would require synthesizing across multiple documents:

What decisions have we made about pricing strategy, and what was the reasoning behind them?

Watch what Claude reads. It should open index.md, identify the relevant pages, read those pages, and answer from them — without touching the raw/ folder at all. That’s the token efficiency in action.

If Claude is reading raw files instead of wiki pages, your claude.md search protocol needs to be more explicit. Add a line like: “Never read from raw/ when answering questions. Always use wiki/ pages and index.md.”

Check: Claude answers your question by reading 2-4 wiki pages, not the entire raw/ folder.

Step 6: Establish an ongoing ingest cadence

The system compounds over time, but only if you keep feeding it. The practical pattern: whenever you have a new document worth keeping — a meeting transcript, a research article, a decision log — drop it in raw/ and run a quick ingest.

For recurring sources, you can automate this. Claude Code’s /loop skill can schedule recurring ingest operations within a session, creating cron jobs that check for new files in raw/ and process them automatically. The /loop command creates cron jobs with a 3-day expiry, so it’s suited for active sprints rather than permanent automation. For permanent scheduled ingests, Claude Code’s remote routines (which run on Anthropic’s cloud against your GitHub repo) are the right tool — though those require your vault to be in a GitHub repository and API keys set as environment variables in the Cloud Environment settings panel.

Check: You have a clear mental model of when and how new documents enter the system.

Where This Breaks Down

The index gets stale. If you add documents to raw/ without running an ingest, index.md won’t know about them. Claude will answer questions as if those documents don’t exist. Fix: always ingest immediately, or run a periodic “check raw/ for unindexed files” prompt.

Wiki pages become inconsistent. Over many ingests, you might end up with two pages for the same concept under slightly different names. Claude won’t always catch this. Fix: periodically ask Claude to “audit the wiki for duplicate or overlapping pages and consolidate them.”

The index grows too large. At some point, even the index becomes expensive to read. Karpathy’s note mentions this — the pattern works well up to roughly hundreds of pages with good indexes, but starts to strain at enterprise scale. If your index.md is getting unwieldy, consider domain sub-indexes: index-people.md, index-projects.md, etc., with a master index that points to them.

Claude ignores the search protocol. Sometimes Claude will read raw files anyway, especially if you ask a question in a way that implies you want the original source. Be explicit: “Using only the wiki, tell me…” If this happens repeatedly, strengthen the search protocol in claude.md.

Remy is new. The platform isn't.

Remy

Product Manager Agent

THE PLATFORM

200+ models 1,000+ integrations Managed DB Auth Payments Deploy

▮

BUILT BY MINDSTUDIO

Shipping agent infrastructure since 2021

Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.

Relationship links are wrong. Claude occasionally creates links between concepts that aren’t actually related, or misses obvious connections. The graph view in Obsidian makes these errors visible. Correct them by editing the wiki pages directly — Claude will learn from the corrected structure in subsequent ingests.

Where to Take This Further

The obvious next step is connecting the wiki to your broader Claude Code setup. If you’re building a personal AI operating system — with context files, skills, and connections to external tools — the wiki becomes the knowledge layer that everything else queries. Your Claude Code token management strategy should account for the wiki as a first-class citizen: it’s the thing that lets you answer questions about your business without loading your entire context.

The claude.md in your wiki project can reference your main AIOS’s context files. When Claude needs to answer a question about your business, it reads the wiki index first, follows links, and only escalates to the full context files if the wiki doesn’t have what it needs. This is progressive disclosure applied to knowledge retrieval — the same principle that makes Claude Code’s effort levels worth understanding, since matching the right effort level to the right task is what keeps token costs from spiraling as your wiki grows.

For teams building more complex agent architectures, the wiki pattern also solves a real problem in multi-agent systems: shared memory. If you’re building agents that need to share knowledge about a domain, a well-maintained wiki gives every agent the same grounding without duplicating context. MindStudio handles this kind of orchestration across multiple agents and models — 200+ models, 1,000+ integrations, and a visual builder for wiring them together — but the underlying knowledge layer still benefits from being a structured wiki rather than a raw document dump.

Karpathy also mentions running periodic “lint” passes over the wiki — LLM health checks that find inconsistent data, identify gaps, and suggest new articles based on what’s missing. This is worth building into your cadence. Once a week, ask Claude: “Audit the wiki. What concepts are underrepresented? What relationships are missing? What should I read next to fill the gaps?” The wiki becomes not just a record of what you know, but a map of what you don’t.

One more thing worth building: a hot.md cache file in your wiki. This is a 500-word summary of the most recent context — what you’ve been working on, what decisions are in flight, what changed this week. Your main AIOS reads this first before touching the full wiki. It’s a small addition that meaningfully reduces the tokens needed for day-to-day queries. If you’re thinking about building a self-evolving memory system that grows smarter over time, the hot cache is one of the most practical pieces to add early.

TIME SPENT BUILDING REAL SOFTWARE

95%

5% Typing the code

95% Knowing what to build · Coordinating agents · Debugging + integrating · Shipping to production

Coding agents automate the 5%. Remy runs the 95%.

The bottleneck was never typing the code. It was knowing what to build.

The wiki pattern also pairs naturally with tools that generate structured output from source material. Remy, for instance, takes a different approach to the same underlying idea: you write an annotated markdown spec where prose carries intent and annotations carry precision, and it compiles a complete TypeScript application — backend, database, auth, deployment — from that spec. The spec is the source of truth; the generated code is derived output. The wiki is doing something analogous for knowledge: the structured wiki pages are the source of truth, and the raw documents are the input that generated them.

The deeper point Karpathy is making, and the reason this pattern spread quickly on X, is that you don’t need infrastructure to have good AI memory. You need structure. Two folders, an index, and a model that’s good enough to maintain relationships between documents. That’s it. The comparison between this approach and traditional RAG comes down to scale and use case — but for personal knowledge bases and small-team contexts, the wiki wins on simplicity and token efficiency by a wide margin.

Start with 20 documents. Run the ingest. See what the graph looks like. The system will tell you what it needs next.

Andrej Karpathy's LLM Wiki Pattern: Cut Claude Token Usage 95% with a Two-Folder System

Andrej Karpathy Posted a Two-Folder System That Cuts Claude Token Usage by 95%

What You Actually Get Out of This

Other agents start typing. Remy starts asking.

What You Need Before Starting

Building the System

Step 1: Create the vault structure

Seven tools to build an app. Or just Remy.

Step 2: Write the `claude.md` schema

Step 3: Ingest your first batch of documents

Step 4: Set up Obsidian to visualize the graph

Step 5: Query the system

Step 6: Establish an ongoing ingest cadence

Where This Breaks Down

Remy is new. The platform isn't.

Where to Take This Further

Coding agents automate the 5%. Remy runs the 95%.

Related Articles

5 Claude Code Skills That Cut Token Costs by Up to 70% — Benchmarked Across Real Sessions

Claude Design Token Management: How to Stretch Your Weekly Usage Limit

How to Save Tokens in Claude Code Using the Opus Plan Mode

What is Claude and How to Use It for AI Agents

Andrej Karpathy Posted a Two-Folder System That Cuts Claude Token Usage by 95%

What You Actually Get Out of This

Other agents start typing. Remy starts asking.

What You Need Before Starting

Building the System

Step 1: Create the vault structure

Seven tools to build an app. Or just Remy.

Step 2: Write the claude.md schema

Step 3: Ingest your first batch of documents

Step 4: Set up Obsidian to visualize the graph

Step 5: Query the system

Step 6: Establish an ongoing ingest cadence

Where This Breaks Down

Remy is new. The platform isn't.

Where to Take This Further

Coding agents automate the 5%. Remy runs the 95%.

Related Articles

5 Claude Code Skills That Cut Token Costs by Up to 70% — Benchmarked Across Real Sessions

Claude Design Token Management: How to Stretch Your Weekly Usage Limit

How to Save Tokens in Claude Code Using the Opus Plan Mode

What is Claude and How to Use It for AI Agents

Step 2: Write the `claude.md` schema