Open Brain: The Open-Source Memory System That Lets You Rebuild AI Indexes Without Losing Your Data
Open Brain separates raw data from embeddings in SQL — so when better embedding models arrive, you rebuild the index without touching source data.
Your Embedding Model Will Be Obsolete in Six Months — Here’s How to Not Lose Your Data When It Is
Open Brain is an open-source memory system built by Nate Jones that stores raw data and embeddings separately in SQL — which means when a better embedding model ships, you rebuild the index without touching your source data. That design decision sounds minor until you’ve watched someone’s carefully assembled knowledge base become partially useless because they chunked everything into a vector store that’s now entangled with a model they can no longer justify using.
This post is about that architecture: why it matters, how to set it up, and what breaks when you don’t think about it in advance.
The embedding model churn problem is real. In the past eighteen months, the practical quality ceiling for open-weight embedding models has moved substantially. If you built a retrieval system in early 2024 and embedded everything with whatever was available then, you’re leaving meaningful retrieval quality on the table today. The question is whether your architecture lets you do something about it.
The Problem With Treating Your Vector Store as a Database
Most people building personal RAG systems make the same mistake: they treat the vector store as the database. Documents go in, chunks come out, embeddings get written alongside them, and the whole thing becomes one artifact. This feels clean until you need to change anything.
Hire a contractor. Not another power tool.
Cursor, Bolt, Lovable, v0 are tools. You still run the project.
With Remy, the project runs itself.
When a better embedding model arrives — and one will — you face a choice between re-ingesting everything from scratch or living with degraded retrieval. If your source documents are entangled with your embeddings in the same records, re-indexing means touching data you’d rather not touch. If you’ve lost track of which documents were indexed with which model, you can’t even do a clean rebuild.
The separation principle is simple: raw data is permanent, embeddings are derived. Your meeting transcript from last Tuesday is a fact. The 768-dimensional vector representation of a chunk of that transcript is a computation performed by a specific model at a specific point in time. Those are different things and they should live in different places.
Open Brain implements this as a SQL-driven database where the source content and the embedding vectors are stored in separate tables, with enough metadata to know which embedding model produced which vectors. When you want to rebuild, you query the source table, re-embed, and write new vectors. The source data is untouched. This is the same principle that makes Postgres with pgvector a serious option for production systems — relational structure for the data you own, vector indexes for the retrieval layer on top.
What You Actually Need Before Starting
Before you touch Open Brain, you need a few things in place.
You need a local inference runtime. Ollama is the practical default here — it gives you a clean CLI, a local server, and an OpenAI-compatible surface that Open Brain and other tools can talk to. If you’re on Apple Silicon, MLX is worth knowing about as a more native performance path, but Ollama is where you start. If you’re on an RTX 5090 or similar CUDA hardware, Ollama works there too, though vLLM becomes relevant once you’re serving real workloads.
You need an embedding model running locally. Qwen’s embedding models are a reasonable choice for a general-purpose local stack. The point is that embeddings are cheap to run, easy to cache, and central to privacy — if your documents leave the machine just to become vectors, you’ve given up one of the clearest wins in local AI.
You need a database. Open Brain supports SQLite with sqlite-vec for personal use — single file, easy to back up, easy to understand. If you’re building something more serious, Postgres with pgvector is the grown-up default: relational data, metadata, permissions, and vector search in one place. For most individuals starting out, SQLite is fine. For a small team or anything with audit requirements, go straight to Postgres.
You need to have thought about what data you’re ingesting. PDFs need different handling than markdown. Meeting transcripts need speaker attribution and timestamps. Code needs symbol-aware indexing. Notes need links preserved. Open Brain handles a lot of the chunking strategy for you, but you still need to know what you’re feeding it.
Setting Up Open Brain: The Actual Steps
Step 1: Clone the repository and review the schema.
Open Brain is on GitHub. Pull it down and read the database schema before you run anything. Understanding how source records and embedding records relate to each other is the mental model you need. Now you have a clear picture of what “separation of raw data and embeddings” looks like in practice — it’s not abstract, it’s two tables with a foreign key.
Step 2: Configure your database connection.
Remy is new. The platform isn't.
Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.
For SQLite, this is a file path. For Postgres with pgvector, you need a connection string and pgvector installed. If you’re running Postgres locally (which you should be for anything beyond personal notes), make sure pgvector is enabled in your instance. Now you have a database that can store both relational content and vector indexes without switching tools.
Step 3: Point Open Brain at your embedding model.
This is where your local Ollama instance comes in. Open Brain needs an OpenAI-compatible embedding endpoint. Ollama exposes one. Configure the embedding model name — whatever you’re running locally — and the endpoint URL. Now you have a pipeline that embeds locally, which means your documents never leave the machine during indexing.
Step 4: Ingest your first document set.
Start small. A folder of markdown notes, a handful of PDFs, a few meeting transcripts. Watch what gets written to the source table and what gets written to the embeddings table. Verify they’re separate. This is the check that matters: if you can delete all the embedding records and re-run indexing without losing any source content, the architecture is working correctly. Now you have a retrieval system you can actually maintain.
Step 5: Start the MCP server.
Open Brain ships with an MCP server, which is what lets Claude, ChatGPT, or any MCP-compatible client query your memory. The MCP server is an executable tool surface — it exposes your database as a set of callable tools. Configure it with appropriate permissions. Think about what each client actually needs access to: a writing assistant doesn’t need to delete records, a search tool doesn’t need to write new ones. Now you have a memory layer that multiple AI clients can query without each one needing to know your database schema.
Step 6: Test retrieval before you trust it.
Run queries against your indexed content. Check whether the results make sense. Bad retrieval is almost never a model problem — it’s a chunking problem or a metadata problem. If transcripts are returning irrelevant chunks, check whether speaker attribution is preserved. If code files are returning poorly, check whether the indexing is symbol-aware. The pipeline is where things go wrong, not the model. Now you have a retrieval system you’ve actually validated, not just assumed is working.
Step 7: Document which embedding model produced which indexes.
This is the step people skip and regret. Open Brain stores this metadata, but you should also keep a plain-text record somewhere durable: which model, which version, which date, which document sets. When you upgrade your embedding model in six months, you’ll know exactly what needs to be rebuilt. Now you have an auditable memory system, not just a working one.
Where This Actually Breaks
The failure modes in personal memory systems are predictable, and most of them aren’t about the model.
Chunking strategy mismatch. If you chunk a 40-page PDF into 512-token blocks without regard for section boundaries, your retrieval will return fragments that lack context. The fix is content-aware chunking: respect headers in markdown, respect page breaks in PDFs, respect speaker turns in transcripts. Open Brain handles a lot of this, but you need to verify it’s working for your specific content types.
Not a coding agent. A product manager.
Remy doesn't type the next file. Remy runs the project — manages the agents, coordinates the layers, ships the app.
Embedding model drift without metadata. If you upgrade your embedding model and re-index some documents but not others, you now have a mixed index. Queries will return inconsistent results because different records were embedded in different semantic spaces. The solution is to track embedding model provenance per record and rebuild completely when you switch models — which is exactly what the source/embedding separation makes possible.
MCP permissions too broad. An MCP server in front of your database is not magic. It’s an executable tool surface, and if you configure it with write permissions when a client only needs read access, you’ve created an attack surface. A meeting summarizer doesn’t need to delete records. A search tool doesn’t need to create new ones. Set permissions at the MCP layer, not just at the database layer.
No audit trail for what was indexed. If you don’t know what’s in your memory system, you can’t trust it. Open Brain lets you inspect what was stored, trace where a fact came from, and delete records that are wrong. Use this. The value of a personal memory system compounds over time only if you can trust what’s in it.
Retrieval without reranking. Vector similarity search returns the most similar chunks, not necessarily the most relevant ones. For serious retrieval, you want a reranking step — a small model that scores the top-k results for relevance to the actual query. This is a pipeline addition, not a database change, and it meaningfully improves retrieval quality on longer documents.
For teams building more complex agent workflows on top of memory infrastructure like this, platforms like MindStudio handle the orchestration layer — 200+ models, 1,000+ integrations, and a visual builder for chaining agents — which can sit on top of a local memory system exposed via MCP without requiring you to write the routing code yourself.
The Deeper Architectural Point
The reason to build this carefully is compounding. Every meeting transcript, every project note, every decision record you add to a well-structured memory system becomes more valuable over time, not less. A frontier model has read the public internet. It hasn’t read your last three years of meeting notes, and it shouldn’t need to — that’s what your local memory layer is for.
The inversion that matters here: in the cloud-first model, the AI service owns your memory and you visit it. In the local model, you own the memory and the models — local or cloud — come to you. Open Brain is one implementation of that inversion. Obsidian with plain markdown is another, simpler one. Postgres with pgvector is the production version. The specific tool matters less than the principle: your knowledge should persist independently of any particular AI application.
This is also why the Claude Code memory architecture that leaked earlier this year is interesting to study — it uses a pointer-index approach where memory.md tracks what’s stored where, which is a different implementation of the same underlying idea: separate the index from the source, keep the source durable.
If you’re building agents that need persistent memory across sessions, the personal AI second brain pattern with Obsidian is a complementary approach — markdown files as the durable source layer, with AI tooling on top. The two approaches aren’t mutually exclusive; many serious local stacks use both, with Obsidian handling unstructured notes and a SQL database handling structured facts and embeddings.
The economic argument for local inference also changes the calculus on how aggressively you run memory-building loops. When you’re paying per token for cloud inference, there’s a psychological cost to running long indexing jobs or continuous transcription pipelines. When inference is local and the marginal cost is electricity, you run Whisper on every call, you index every document, you let the agents run longer. That’s the cost reduction argument for local models made concrete: not just cheaper API calls, but a different relationship with how much you’re willing to automate.
When the question is how to build a full application on top of this kind of memory infrastructure — say, a personal knowledge management tool with a real backend — Remy takes a different approach than hand-rolling the stack: you write an annotated markdown spec describing the application’s behavior, data types, and edge cases, and it compiles that into a complete TypeScript backend with SQLite, auth, and deployment. The spec is the source of truth; the generated code is derived output. Same principle as Open Brain’s source/embedding separation, applied one layer up.
Where to Take This Further
The immediate next step is indexing something you actually use. Not a test document. Your actual meeting notes from last month, or your project drafts, or your research PDFs. Real content reveals real chunking problems that synthetic tests don’t.
After that, the question is retrieval quality. Query your index with questions you’d actually ask. If the results are wrong, work backward: is it a chunking problem, a metadata problem, or a genuine model limitation? Most of the time it’s chunking.
The longer-term question is what you add to the pipeline. Whisper for local transcription means every meeting becomes searchable without audio leaving the machine. A vision model means document screenshots and charts become part of the index. A reranker means retrieval quality improves without changing the underlying database.
The model families worth knowing for this stack: Qwen for embeddings and multilingual work, Gemma 4 for capable small models that run efficiently on local hardware, Llama 4 Scout and Maverick for the mixture-of-experts approach where inference cost scales with what the model actually needs to activate. None of these are permanent recommendations — the model list ages fast. The stack underneath them doesn’t have to.
For a detailed comparison of the open-weight models that work well in local agentic stacks, the Gemma 4 vs Qwen 3.5 comparison covers the tradeoffs in depth, particularly for the kind of tool-use and retrieval tasks that a memory system like Open Brain depends on.
The personal AI computer that Nate Jones describes in the source material is not primarily a hardware argument. It’s a data ownership argument. The hardware enables it, the runtime serves it, the models query it — but the memory layer is the thing that makes it worth building. Build that part right and the rest of the stack becomes much more forgiving.