Where RAG Breaks Down: The Karpathy LLM Wiki Alternative

Why RAG Is Overkill for Most Knowledge Retrieval Problems

Most teams building AI agents assume they need a full RAG pipeline. That means embedding models, vector databases, chunking strategies, similarity thresholds, reranking logic, and a whole infrastructure layer sitting between your agent and its knowledge.

Andrej Karpathy — former OpenAI research director and the mind behind a lot of foundational thinking in modern AI — proposed something simpler. The Karpathy LLM Wiki pattern skips all of that. Instead, it uses plain text files loaded directly into an LLM’s context window. No vector search. No embeddings. No retrieval failures from poor semantic matching.

It sounds almost too simple. But for a wide class of knowledge retrieval problems, it outperforms RAG on accuracy, speed, and maintainability — with a fraction of the complexity.

This article explains how the pattern works, when to use it, and how to build your own personal knowledge base without a single vector database.

What RAG Actually Does (And Where It Breaks Down)

Retrieval-Augmented Generation is a well-established technique. You take a large body of documents, split them into chunks, convert those chunks into vector embeddings, and store them in a database like Pinecone, Weaviate, or Chroma. When a user asks a question, you embed the query, find the most semantically similar chunks, and inject those into the LLM prompt as context.

It works reasonably well for large-scale document retrieval — searching through thousands of PDFs, support tickets, or product pages. But it introduces a lot of moving parts that fail in subtle ways.

The hidden failure modes of RAG

Chunking is lossy. When you split a document into 512-token chunks, you destroy context. A sentence in chunk 14 might only make sense in light of chunk 3. The retrieval step can’t know that.

Semantic similarity ≠ relevance. A query about “billing issues” might semantically match chunks about “invoice formatting” rather than the actual refund policy. Embeddings capture meaning imperfectly, and that imperfection compounds.

Retrieval is a bottleneck. If the right information isn’t retrieved, the LLM can’t use it — even if it’s sitting in your database. This creates silent failures where the model confidently answers with partial information.

It’s expensive to maintain. Every time a document changes, you re-chunk and re-embed. Schema changes in your vector DB can require full reindexing. Debugging retrieval failures is opaque and time-consuming.

For knowledge bases with hundreds or thousands of documents, RAG may still be the right call. But for structured, curated knowledge — the kind an agent or power user maintains about their own domain — it’s often massive overkill.

The LLM Wiki Pattern, Explained

The core idea is simple: instead of retrieving relevant context, you load all of it.

Karpathy’s approach treats knowledge as a set of structured plain text files — a personal wiki — maintained in a human-readable format like Markdown. When an agent needs to answer a question or complete a task, those files are concatenated and passed directly into the LLM’s context window.

That’s it. No vector search. No embedding step. No chunking. The model reads everything and reasons over the full picture.

Why this works now but didn’t before

Context windows used to be tiny. GPT-3 had a 4K token limit. You couldn’t fit much useful knowledge without retrieval.

Modern models have changed the calculus completely:

GPT-4o supports 128K tokens
Claude 3.5 Sonnet supports 200K tokens
Gemini 1.5 Pro supports up to 1 million tokens

At 200K tokens, you can fit roughly 150,000 words — the equivalent of a full-length novel, or a substantial personal knowledge base covering an entire domain of expertise.

Loading your curated wiki directly into that context window means the model sees everything, not just whatever chunks happened to match your query. It can synthesize across multiple topics, reason about relationships between concepts, and answer questions that would have required multi-hop retrieval in a RAG setup.

What “plain text files” actually means in practice

The wiki is typically organized as a folder of Markdown files, each covering a specific topic or domain. For a personal assistant agent, this might look like:

/wiki
  /personal
    preferences.md
    schedule-rules.md
    contacts.md
  /work
    projects.md
    meeting-norms.md
    key-stakeholders.md
  /knowledge
    domain-expertise.md
    research-notes.md

Each file is written in natural language, structured however makes sense for the content. No special syntax. No metadata schemas. Just information, organized clearly.

The agent concatenates these files into a system prompt or context block before reasoning. The LLM treats them as background knowledge — always available, always complete.

How This Compares to RAG on Efficiency

Catch up on Hermes — free 60-minute live workshop

The meta description mentions a 70x efficiency advantage. That’s not a random number — it reflects a fundamental difference in how the two approaches use compute.

In a typical RAG system, you might embed a query, retrieve 5–10 chunks from thousands of stored documents, and pass those ~2,000 tokens to the model. The rest of your knowledge base sits idle. If the retrieval step missed something important, you’re out of luck.

With the LLM Wiki pattern, you’re loading maybe 10,000–50,000 tokens of carefully curated knowledge upfront. There’s no retrieval layer, no embedding inference, no vector search latency. The “inefficiency” of loading more tokens is offset by not running embedding models, not paying for vector DB infrastructure, and not building retrieval pipelines.

For curated knowledge bases where completeness matters more than scale, the LLM Wiki pattern processes each query with full context — which means fewer follow-up queries, fewer hallucinations from missing context, and dramatically simpler infrastructure.

When the LLM Wiki pattern wins

Agent memory: Persistent facts about a user, their preferences, their projects
Operational playbooks: Step-by-step procedures, decision trees, escalation rules
Domain expertise: A specialist’s curated notes, frameworks, and heuristics
Team knowledge: Policies, processes, and institutional knowledge for a small team
Personal assistant context: Background a personal AI needs to act on your behalf

When RAG still makes sense

Large-scale document retrieval: Searching through thousands of PDFs, articles, or support tickets
Frequently updated corpora: News feeds, product catalogs with daily changes
Public-facing knowledge bases: Where you can’t predict what users will ask
Datasets too large for any context window: If your knowledge base is 10M+ tokens, you need retrieval

How to Build Your Own LLM Wiki

Building a personal knowledge base using this pattern doesn’t require any specialized tooling. Here’s how to do it from scratch.

Step 1: Define your knowledge domains

Start by identifying the categories of information your agent needs. For a personal productivity agent, this might be:

Your preferences and communication style
Active projects and their status
People you work with and key context about each
Rules and constraints (e.g., “never schedule meetings before 9am”)
Reference knowledge specific to your domain

Avoid trying to capture everything. Curated and current beats comprehensive and stale.

Step 2: Write your wiki files

Create one Markdown file per domain. Write in plain language, as if explaining to a smart new hire who doesn’t know your context. Use headers to organize, bullet points for lists, and clear prose for narrative information.

Example: preferences.md

# Communication Preferences

I prefer concise written communication. Default to bullet points over prose.
Never use jargon unless the audience clearly knows it.

I respond to Slack faster than email. For urgent items, use Slack.

Writing style: direct, no filler phrases, active voice. Don't start emails with "I hope this finds you well."

## Meeting Preferences

Keep meetings to 30 minutes unless complexity demands more. Always have an agenda.
I don't take meetings on Fridays if possible.

This level of specificity is what makes the pattern powerful. The LLM doesn’t need to infer your preferences — they’re stated directly.

Step 3: Build a concatenation layer

You need a way to combine your wiki files into a single context block. This can be as simple as a script:

import os

def load_wiki(wiki_dir):
    context = ""
    for root, dirs, files in os.walk(wiki_dir):
        for file in sorted(files):
            if file.endswith(".md"):
                with open(os.path.join(root, file), "r") as f:
                    context += f"\n\n## {file}\n\n" + f.read()
    return context

This script walks your wiki directory, reads each Markdown file, and concatenates them with headers. Pass the result as a system prompt or context block when calling your LLM.

Step 4: Create an update workflow

A wiki only stays useful if it stays current. Build a habit or automated trigger to update files when things change:

New project started → update projects.md
New stakeholder → update contacts.md
Policy changed → update the relevant file

You can also have the agent itself flag when it detects potentially outdated information. If it notices a conflict between something you’ve said and what’s in the wiki, it can flag it for review.

Step 5: Test and refine

Run your agent against real questions and tasks. When it gets something wrong or incomplete, check whether the answer was in the wiki. If it wasn’t, add it. If it was but the model missed it, restructure how that information is presented.

Iteration is fast because there’s no reindexing, no embedding pipeline to retrain, and no retrieval threshold to tune. You edit a text file and you’re done.

Common Mistakes When Building an LLM Wiki

Treating it like a dump, not a wiki

The pattern breaks down when you add everything indiscriminately. Raw meeting transcripts, unedited notes, and duplicated information all degrade performance. The LLM has to work harder to extract signal from noise.

Curate aggressively. Summarize rather than paste. Synthesize rather than archive.

Making files too large

Even with a 200K token context window, you can’t load unlimited content. Keep individual files focused. If a file grows beyond a few pages, split it into sub-topics.

Not updating regularly

Stale information is worse than no information — an agent that confidently acts on outdated context causes real problems. Build a lightweight update process into your workflow.

Skipping structure within files

Plain text works, but structure helps. Use Markdown headers, lists, and clear labeling so the model can navigate your wiki efficiently. Think of it as writing for a very fast, very literal reader.

How to Implement the LLM Wiki Pattern in MindStudio

MindStudio is a no-code platform for building AI agents and workflows, and it’s a natural fit for implementing the LLM Wiki pattern without writing infrastructure code.

Here’s how you’d build this in MindStudio:

Use the system prompt as your primary context layer. MindStudio lets you write detailed, structured system prompts for any agent. You can paste your wiki content directly into the system prompt or use variables to inject it dynamically from a connected data source.

Connect to where your wiki lives. MindStudio integrates with Notion, Google Docs, Airtable, and other tools. You can pull the latest version of your wiki files at runtime, so your agent always has current information without manual copy-paste.

Remy is new. The platform isn't.

Remy

Product Manager Agent

THE PLATFORM

200+ models 1,000+ integrations Managed DB Auth Payments Deploy

▮

BUILT BY MINDSTUDIO

Shipping agent infrastructure since 2021

Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.

Build an update agent. Create a second MindStudio agent whose job is maintaining the wiki — listening for triggers (a form submission, a Slack message, a new entry in Airtable) and updating the right files. This closes the loop on the hardest part of knowledge management: keeping it current.

Layer in workflows. Once your LLM Wiki is in place, you can build agents that act on its contents — scheduling meetings based on your preference files, drafting emails in your communication style, or routing decisions based on your documented rules.

Because MindStudio supports 200+ AI models, you can pick the model with the context window that fits your wiki size — no API keys or separate accounts required.

You can try it free at mindstudio.ai.

Frequently Asked Questions

What is the Karpathy LLM Wiki pattern?

The Karpathy LLM Wiki pattern is an approach to agent knowledge retrieval where structured plain text files — typically organized as a Markdown wiki — are loaded directly into an LLM’s context window, rather than being stored in a vector database and retrieved via semantic search. It was proposed by Andrej Karpathy as a simpler, more reliable alternative to RAG for curated, structured knowledge bases.

Is the LLM Wiki pattern better than RAG?

It depends on the use case. For curated personal or team knowledge bases with a few hundred to a few thousand pages, the LLM Wiki pattern is often more accurate, faster, and much easier to maintain than RAG. For large-scale document retrieval across tens of thousands of documents, RAG remains the better choice because no context window can hold that much content.

How large can an LLM Wiki be?

This depends on the model. GPT-4o supports 128K tokens (~96,000 words). Claude 3.5 Sonnet supports 200K tokens (~150,000 words). Gemini 1.5 Pro supports up to 1 million tokens. For most personal or small-team knowledge bases, 200K tokens is substantial — you can fit a well-curated domain knowledge base comfortably within that limit.

Does the LLM Wiki pattern require any special tools?

No. At its simplest, you need a folder of text or Markdown files and a script that concatenates them into a context block. There’s no vector database, no embedding model, and no special retrieval infrastructure. You can implement it with a few lines of Python or using a no-code platform like MindStudio.

What file format should I use for an LLM Wiki?

Markdown is the most common choice because it’s human-readable, supports structure (headers, lists, code blocks), and most LLMs parse it well. Plain text works too. Avoid formats that require parsing (like DOCX or PDF) unless you have a pre-processing step to convert them to text first.

How do I keep an LLM Wiki up to date?

The most practical approach is to build an update habit or an automated trigger. For personal use, a weekly review of each file is usually enough. For team use, you can build an agent that watches for specific events (project status changes, new onboarding, policy updates) and updates the relevant file automatically. Tools like MindStudio can connect to your existing data sources to automate this process.

Key Takeaways

The Karpathy LLM Wiki pattern replaces vector databases and semantic search with plain text files loaded directly into an LLM’s context window.
Modern large context windows (128K–1M tokens) make this approach viable for substantial knowledge bases.
For curated, structured knowledge, the pattern is simpler, more accurate, and easier to maintain than RAG.
RAG still makes sense at scale — for tens of thousands of documents or frequently updated corpora.
Building an LLM Wiki is straightforward: define domains, write Markdown files, build a concatenation layer, and establish an update workflow.
No-code platforms like MindStudio make it easy to connect your wiki to live data sources and build agents that act on that knowledge.

One coffee. One working app.

You bring the idea. Remy manages the project.

WHILE YOU WERE AWAY

✓Designed the data model

✓Picked an auth scheme — sessions + RBAC

✓Wired up Stripe checkout

✓Deployed to production

Live at yourapp.msagent.ai

If you’re building an AI agent that needs to know a lot about you, your team, or your domain — start with a wiki, not a vector database. Add complexity only when you’ve proven you need it.