What Is the Karpathy LLM Wiki Pattern? How to Build a Personal Knowledge Base Without RAG
Andrej Karpathy's LLM Wiki approach uses plain text files instead of vector databases. Learn how it's 70x more efficient than RAG for agent knowledge retrieval.
Why RAG Is Overkill for Most Knowledge Retrieval Problems
Most teams building AI agents assume they need a full RAG pipeline. That means embedding models, vector databases, chunking strategies, similarity thresholds, reranking logic, and a whole infrastructure layer sitting between your agent and its knowledge.
Andrej Karpathy — former OpenAI research director and the mind behind a lot of foundational thinking in modern AI — proposed something simpler. The Karpathy LLM Wiki pattern skips all of that. Instead, it uses plain text files loaded directly into an LLM’s context window. No vector search. No embeddings. No retrieval failures from poor semantic matching.
It sounds almost too simple. But for a wide class of knowledge retrieval problems, it outperforms RAG on accuracy, speed, and maintainability — with a fraction of the complexity.
This article explains how the pattern works, when to use it, and how to build your own personal knowledge base without a single vector database.
What RAG Actually Does (And Where It Breaks Down)
Retrieval-Augmented Generation is a well-established technique. You take a large body of documents, split them into chunks, convert those chunks into vector embeddings, and store them in a database like Pinecone, Weaviate, or Chroma. When a user asks a question, you embed the query, find the most semantically similar chunks, and inject those into the LLM prompt as context.
It works reasonably well for large-scale document retrieval — searching through thousands of PDFs, support tickets, or product pages. But it introduces a lot of moving parts that fail in subtle ways.
The hidden failure modes of RAG
Chunking is lossy. When you split a document into 512-token chunks, you destroy context. A sentence in chunk 14 might only make sense in light of chunk 3. The retrieval step can’t know that.
Semantic similarity ≠ relevance. A query about “billing issues” might semantically match chunks about “invoice formatting” rather than the actual refund policy. Embeddings capture meaning imperfectly, and that imperfection compounds.
Retrieval is a bottleneck. If the right information isn’t retrieved, the LLM can’t use it — even if it’s sitting in your database. This creates silent failures where the model confidently answers with partial information.
It’s expensive to maintain. Every time a document changes, you re-chunk and re-embed. Schema changes in your vector DB can require full reindexing. Debugging retrieval failures is opaque and time-consuming.
For knowledge bases with hundreds or thousands of documents, RAG may still be the right call. But for structured, curated knowledge — the kind an agent or power user maintains about their own domain — it’s often massive overkill.
The LLM Wiki Pattern, Explained
The core idea is simple: instead of retrieving relevant context, you load all of it.
Karpathy’s approach treats knowledge as a set of structured plain text files — a personal wiki — maintained in a human-readable format like Markdown. When an agent needs to answer a question or complete a task, those files are concatenated and passed directly into the LLM’s context window.
That’s it. No vector search. No embedding step. No chunking. The model reads everything and reasons over the full picture.
Why this works now but didn’t before
Context windows used to be tiny. GPT-3 had a 4K token limit. You couldn’t fit much useful knowledge without retrieval.
Modern models have changed the calculus completely:
- GPT-4o supports 128K tokens
- Claude 3.5 Sonnet supports 200K tokens
- Gemini 1.5 Pro supports up to 1 million tokens
At 200K tokens, you can fit roughly 150,000 words — the equivalent of a full-length novel, or a substantial personal knowledge base covering an entire domain of expertise.
Loading your curated wiki directly into that context window means the model sees everything, not just whatever chunks happened to match your query. It can synthesize across multiple topics, reason about relationships between concepts, and answer questions that would have required multi-hop retrieval in a RAG setup.
What “plain text files” actually means in practice
The wiki is typically organized as a folder of Markdown files, each covering a specific topic or domain. For a personal assistant agent, this might look like:
/wiki
/personal
preferences.md
schedule-rules.md
contacts.md
/work
projects.md
meeting-norms.md
key-stakeholders.md
/knowledge
domain-expertise.md
research-notes.md
Each file is written in natural language, structured however makes sense for the content. No special syntax. No metadata schemas. Just information, organized clearly.
The agent concatenates these files into a system prompt or context block before reasoning. The LLM treats them as background knowledge — always available, always complete.
How This Compares to RAG on Efficiency
The meta description mentions a 70x efficiency advantage. That’s not a random number — it reflects a fundamental difference in how the two approaches use compute.
In a typical RAG system, you might embed a query, retrieve 5–10 chunks from thousands of stored documents, and pass those ~2,000 tokens to the model. The rest of your knowledge base sits idle. If the retrieval step missed something important, you’re out of luck.
With the LLM Wiki pattern, you’re loading maybe 10,000–50,000 tokens of carefully curated knowledge upfront. There’s no retrieval layer, no embedding inference, no vector search latency. The “inefficiency” of loading more tokens is offset by not running embedding models, not paying for vector DB infrastructure, and not building retrieval pipelines.
For curated knowledge bases where completeness matters more than scale, the LLM Wiki pattern processes each query with full context — which means fewer follow-up queries, fewer hallucinations from missing context, and dramatically simpler infrastructure.
When the LLM Wiki pattern wins
- Agent memory: Persistent facts about a user, their preferences, their projects
- Operational playbooks: Step-by-step procedures, decision trees, escalation rules
- Domain expertise: A specialist’s curated notes, frameworks, and heuristics
- Team knowledge: Policies, processes, and institutional knowledge for a small team
- Personal assistant context: Background a personal AI needs to act on your behalf
When RAG still makes sense
- Large-scale document retrieval: Searching through thousands of PDFs, articles, or support tickets
- Frequently updated corpora: News feeds, product catalogs with daily changes
- Public-facing knowledge bases: Where you can’t predict what users will ask
- Datasets too large for any context window: If your knowledge base is 10M+ tokens, you need retrieval
How to Build Your Own LLM Wiki
Building a personal knowledge base using this pattern doesn’t require any specialized tooling. Here’s how to do it from scratch.
Step 1: Define your knowledge domains
Start by identifying the categories of information your agent needs. For a personal productivity agent, this might be:
- Your preferences and communication style
- Active projects and their status
- People you work with and key context about each
- Rules and constraints (e.g., “never schedule meetings before 9am”)
- Reference knowledge specific to your domain
Avoid trying to capture everything. Curated and current beats comprehensive and stale.
Step 2: Write your wiki files
Create one Markdown file per domain. Write in plain language, as if explaining to a smart new hire who doesn’t know your context. Use headers to organize, bullet points for lists, and clear prose for narrative information.
Example: preferences.md
# Communication Preferences
I prefer concise written communication. Default to bullet points over prose.
Never use jargon unless the audience clearly knows it.
I respond to Slack faster than email. For urgent items, use Slack.
Writing style: direct, no filler phrases, active voice. Don't start emails with "I hope this finds you well."
## Meeting Preferences
Keep meetings to 30 minutes unless complexity demands more. Always have an agenda.
I don't take meetings on Fridays if possible.
This level of specificity is what makes the pattern powerful. The LLM doesn’t need to infer your preferences — they’re stated directly.
Step 3: Build a concatenation layer
You need a way to combine your wiki files into a single context block. This can be as simple as a script:
import os
def load_wiki(wiki_dir):
context = ""
for root, dirs, files in os.walk(wiki_dir):
for file in sorted(files):
if file.endswith(".md"):
with open(os.path.join(root, file), "r") as f:
context += f"\n\n## {file}\n\n" + f.read()
return context
This script walks your wiki directory, reads each Markdown file, and concatenates them with headers. Pass the result as a system prompt or context block when calling your LLM.
Step 4: Create an update workflow
A wiki only stays useful if it stays current. Build a habit or automated trigger to update files when things change:
- New project started → update
projects.md - New stakeholder → update
contacts.md - Policy changed → update the relevant file
You can also have the agent itself flag when it detects potentially outdated information. If it notices a conflict between something you’ve said and what’s in the wiki, it can flag it for review.
Step 5: Test and refine
Run your agent against real questions and tasks. When it gets something wrong or incomplete, check whether the answer was in the wiki. If it wasn’t, add it. If it was but the model missed it, restructure how that information is presented.
Iteration is fast because there’s no reindexing, no embedding pipeline to retrain, and no retrieval threshold to tune. You edit a text file and you’re done.
Common Mistakes When Building an LLM Wiki
Treating it like a dump, not a wiki
The pattern breaks down when you add everything indiscriminately. Raw meeting transcripts, unedited notes, and duplicated information all degrade performance. The LLM has to work harder to extract signal from noise.
Curate aggressively. Summarize rather than paste. Synthesize rather than archive.
Making files too large
Even with a 200K token context window, you can’t load unlimited content. Keep individual files focused. If a file grows beyond a few pages, split it into sub-topics.
Not updating regularly
Stale information is worse than no information — an agent that confidently acts on outdated context causes real problems. Build a lightweight update process into your workflow.
Skipping structure within files
Plain text works, but structure helps. Use Markdown headers, lists, and clear labeling so the model can navigate your wiki efficiently. Think of it as writing for a very fast, very literal reader.
How to Implement the LLM Wiki Pattern in MindStudio
MindStudio is a no-code platform for building AI agents and workflows, and it’s a natural fit for implementing the LLM Wiki pattern without writing infrastructure code.
Here’s how you’d build this in MindStudio:
Use the system prompt as your primary context layer. MindStudio lets you write detailed, structured system prompts for any agent. You can paste your wiki content directly into the system prompt or use variables to inject it dynamically from a connected data source.
Connect to where your wiki lives. MindStudio integrates with Notion, Google Docs, Airtable, and other tools. You can pull the latest version of your wiki files at runtime, so your agent always has current information without manual copy-paste.
Build an update agent. Create a second MindStudio agent whose job is maintaining the wiki — listening for triggers (a form submission, a Slack message, a new entry in Airtable) and updating the right files. This closes the loop on the hardest part of knowledge management: keeping it current.
Layer in workflows. Once your LLM Wiki is in place, you can build agents that act on its contents — scheduling meetings based on your preference files, drafting emails in your communication style, or routing decisions based on your documented rules.
Because MindStudio supports 200+ AI models, you can pick the model with the context window that fits your wiki size — no API keys or separate accounts required.
You can try it free at mindstudio.ai.
Frequently Asked Questions
What is the Karpathy LLM Wiki pattern?
The Karpathy LLM Wiki pattern is an approach to agent knowledge retrieval where structured plain text files — typically organized as a Markdown wiki — are loaded directly into an LLM’s context window, rather than being stored in a vector database and retrieved via semantic search. It was proposed by Andrej Karpathy as a simpler, more reliable alternative to RAG for curated, structured knowledge bases.
Is the LLM Wiki pattern better than RAG?
It depends on the use case. For curated personal or team knowledge bases with a few hundred to a few thousand pages, the LLM Wiki pattern is often more accurate, faster, and much easier to maintain than RAG. For large-scale document retrieval across tens of thousands of documents, RAG remains the better choice because no context window can hold that much content.
How large can an LLM Wiki be?
This depends on the model. GPT-4o supports 128K tokens (~96,000 words). Claude 3.5 Sonnet supports 200K tokens (~150,000 words). Gemini 1.5 Pro supports up to 1 million tokens. For most personal or small-team knowledge bases, 200K tokens is substantial — you can fit a well-curated domain knowledge base comfortably within that limit.
Does the LLM Wiki pattern require any special tools?
No. At its simplest, you need a folder of text or Markdown files and a script that concatenates them into a context block. There’s no vector database, no embedding model, and no special retrieval infrastructure. You can implement it with a few lines of Python or using a no-code platform like MindStudio.
What file format should I use for an LLM Wiki?
Markdown is the most common choice because it’s human-readable, supports structure (headers, lists, code blocks), and most LLMs parse it well. Plain text works too. Avoid formats that require parsing (like DOCX or PDF) unless you have a pre-processing step to convert them to text first.
How do I keep an LLM Wiki up to date?
The most practical approach is to build an update habit or an automated trigger. For personal use, a weekly review of each file is usually enough. For team use, you can build an agent that watches for specific events (project status changes, new onboarding, policy updates) and updates the relevant file automatically. Tools like MindStudio can connect to your existing data sources to automate this process.
Key Takeaways
- The Karpathy LLM Wiki pattern replaces vector databases and semantic search with plain text files loaded directly into an LLM’s context window.
- Modern large context windows (128K–1M tokens) make this approach viable for substantial knowledge bases.
- For curated, structured knowledge, the pattern is simpler, more accurate, and easier to maintain than RAG.
- RAG still makes sense at scale — for tens of thousands of documents or frequently updated corpora.
- Building an LLM Wiki is straightforward: define domains, write Markdown files, build a concatenation layer, and establish an update workflow.
- No-code platforms like MindStudio make it easy to connect your wiki to live data sources and build agents that act on that knowledge.
If you’re building an AI agent that needs to know a lot about you, your team, or your domain — start with a wiki, not a vector database. Add complexity only when you’ve proven you need it.