Skip to main content
MindStudio
Pricing
Blog About
My Workspace

AI Second Brain Architecture: 7 Folders That Make Your Obsidian Vault Actually Intelligent

The right folder structure turns Obsidian from a passive note dump into an active AI knowledge base. Here are the 7 folders that make it work.

MindStudio Team RSS
AI Second Brain Architecture: 7 Folders That Make Your Obsidian Vault Actually Intelligent

Seven Folders That Turn Obsidian from a Dump Into a Knowledge Engine

Most second brain systems fail the same way: you save 400 articles, clip 200 YouTube transcripts, and then never open any of it again. The folder structure is the reason. Seven specific folders — /raw, /raw/processed, /wiki, /journal, /crm, plus three files (agents.md, index.md, log.md) — are the difference between a passive archive and a system that actively reasons over what you’ve saved.

This architecture comes from a working implementation built on top of Andrej Karpathy’s LLM Wiki spec on GitHub, extended with journaling and CRM layers. The vault folder structure: /raw, /raw/processed, /wiki, /journal, /crm, agents.md, index.md, log.md — that’s the whole thing. No database. No vector store. No API keys wired to a custom backend. Just markdown files in a directory that any AI agent can read and write.

Here’s why the structure matters, what each piece does, and what the non-obvious design decisions are.


The Folder Map, Explained

/raw — The Intake Queue

One coffee. One working app.

You bring the idea. Remy manages the project.

WHILE YOU WERE AWAY
Designed the data model
Picked an auth scheme — sessions + RBAC
Wired up Stripe checkout
Deployed to production
Live at yourapp.msagent.ai

Everything you save from the web lands here first. The Obsidian Web Clipper Chrome extension handles the intake: point it at a YouTube video and it pulls the full transcript into a .md file in /raw. Point it at an article and it pulls the full text. The file gets front matter automatically — source title, source URL, date clipped, and a web-clip tag.

This folder is intentionally dumb. Nothing in /raw has been processed, summarized, or linked to anything else. It’s a queue, not a library. The AI hasn’t touched it yet.

One thing to configure: set the Obsidian Web Clipper’s note location to raw explicitly. The default drops files into the vault root, which breaks the automation logic downstream.

/raw/processed — The Audit Trail

After the agent ingests a file from /raw, it moves that file to /raw/processed. This is a small design decision with large practical consequences.

Without it, you have no way to know what’s been ingested and what hasn’t. The hourly Codex automation — which checks /raw for unprocessed files — uses the absence of a file in /raw as its signal to skip it. If you reprocess everything every hour, you burn compute and get duplicate wiki entries. The /raw/processed subfolder is the deduplication mechanism.

It also gives you a clean audit trail. If a wiki page looks wrong, you can go back to the original source in /raw/processed and see exactly what the agent was working from.

/wiki — The AI-Generated Knowledge Layer

This is where the actual intelligence lives. When the agent processes a source file, it doesn’t just summarize it — it extracts entities, concepts, tools, people, and themes, then creates or updates individual wiki pages for each one. A video about discipline without willpower might generate pages for identity-led-goals.md, temporal-discounting.md, and temptation-bundling.md, each cross-linked to the original source file.

The Zettelkasten-style cross-linking is specified in agents.md as an explicit instruction: generated wiki pages must link back to the original source page. This prevents orphaned nodes — wiki entries that exist but have no traceable origin.

The graph view in Obsidian makes this tangible. When you first build the vault, the graph is a handful of disconnected dots. After a few weeks of consistent clipping, it looks like a neural network — dense clusters of interconnected concepts with clear hubs around your most-saved topics.

The wiki also grows from questions, not just from saved content. When you query the vault — “what are some tips for motivation when I don’t feel like doing the hard task today?” — the agent answers from existing wiki pages and then creates a new wiki page (motivation-for-hard-tasks.md) synthesizing the answer, logs the query in log.md, and updates index.md. The act of asking a question expands the knowledge base. This is the behavior that separates this architecture from a static RAG system.

/journal — Grounded Reflection

The journal folder is where the system earns its keep for daily use. Start any chat in your Codex project with the word journal and the agent treats the entire conversation differently: it reads the wiki index, scans past journal entries for relevant patterns, checks the CRM for relevant contacts, and then responds — grounded in your saved knowledge rather than generic LLM priors.

RWORK ORDER · NO. 0001ACCEPTED 09:42
YOU ASKED FOR
Sales CRM with pipeline view and email integration.
✓ DONE
REMY DELIVERED
Same day.
yourapp.msagent.ai
AGENTS ASSIGNEDDesign · Engineering · QA · Deploy

The demo in the source video is instructive. A journal entry about YouTube title anxiety — the tension between clickbait titles that perform and literal titles that don’t — got a response that cited specific saved videos from the wiki by date, referenced the “YouTube valley of death” concept from a saved creator strategy page, and framed the anxiety as two distinct fears (creative integrity vs. channel safety) rather than one undifferentiated worry. That specificity comes from the wiki, not from GPT’s base training.

Each journal entry gets saved as a dated .md file with a short title derived from the content. The journal folder has its own index.md listing entries chronologically with one-paragraph summaries. The agent also appends a log entry to the root log.md for every journal session.

The pattern-detection behavior is worth flagging: if you journal about the same struggle repeatedly, the agent is supposed to surface that pattern in subsequent responses. This requires past journal entries to be in scope when the agent processes new ones — which they are, because the journal index is part of what gets read at the start of every journal session.

/crm — Named Markdown Files as Contact Records

The CRM is the simplest module architecturally. Each contact gets a single .md file named after the person. You add someone by telling the agent “add to CRM” followed by whatever details you have — where you met, what you discussed, contact info, context. The agent creates or updates the file.

The /crm folder has its own index.md with contacts listed alphabetically and a short bio for each. When you ask “where did I meet Matthew Berman?” the agent checks the CRM index and the individual contact file, then answers from the record.

The practical use case: you’re heading to a conference and want to prep for a meeting with someone you met six months ago. Open Codex, ask about the person, get back the context you saved at the time. No scrolling through notes apps or trying to remember which email thread has the details.

The CRM also feeds into journal responses. If you journal about a business problem and you have a contact in the CRM who’s relevant — someone you had a conversation with about that exact topic — the agent is supposed to surface that connection. This is the “big soup” behavior the architecture is designed for: everything is connected to everything else because it’s all in the same directory.

agents.md — The Prompt File That Governs Everything

This is the most important file in the vault and the one most people will underestimate. agents.md is a plain-text prompt file that specifies how the agent behaves across all operations: how to ingest a source, how to respond to a query, how to handle a journal entry, how to update the CRM, what to do with YouTube channel names, when to move files to /raw/processed, when to push to GitHub.

Every behavioral change you want to make to the system is a change to agents.md. You can edit it directly in Obsidian, or you can prompt the agent to update it for you. The video shows both approaches: manually adding “move the source file from the raw directory to raw/processed” as step six of the ingest operation, and prompting Codex to add the journal and CRM rules automatically.

Everyone else built a construction worker.
We built the contractor.

🦺
CODING AGENT
Types the code you tell it to.
One file at a time.
🧠
CONTRACTOR · REMY
Runs the entire build.
UI, API, database, deploy.

The key insight here is that the agent’s behavior is fully inspectable and fully editable. There’s no black box. If the agent is doing something wrong — say, adding the YouTube channel name to the generated wiki page instead of the original source file — you find the relevant instruction in agents.md and fix it. The fix propagates immediately to the next run.

This is also why the system is agent-agnostic. Because agents.md is just a markdown file with plain-text instructions, any agent that can read the vault directory can follow them. Claude Code, Codex, and other tools can all work from the same vault interchangeably. The instructions travel with the data.

index.md and log.md — The Navigation Layer

index.md is the catalog. It lists every source file and every wiki page, updated automatically after each processing run. When the agent needs to answer a query, it reads index.md first to understand what’s in the vault before deciding which files to pull.

log.md is the audit trail for agent actions. Every ingest, every query response, every journal entry, every CRM update gets a log entry. This is useful for debugging (what did the agent actually do during the last hourly run?) and for the agent itself (the log is part of the context that helps it understand the vault’s history).


Why the Architecture Works When Others Don’t

The standard second brain failure mode is retrieval. You save content into a flat folder or a tag-based system, and then you can’t find it when you need it because you don’t remember what you called it or when you saved it. Search helps but requires you to know what you’re looking for.

This architecture sidesteps that problem by making the AI do the retrieval work continuously. The hourly Codex automation — set to GPT 4.5 on High reasoning in the source implementation — processes new files in /raw and updates the wiki without you doing anything. By the time you need information, it’s already been extracted, categorized, and cross-linked. You query the index, not the raw files.

The Claude Code memory architecture uses a similar principle: a pointer index (memory.md) that the agent reads first, then follows references into specific files. The Obsidian vault structure here is doing the same thing with index.md as the pointer layer. The pattern is consistent enough across implementations that it’s probably the right way to think about agent memory generally.

For teams building more complex agent workflows, platforms like MindStudio handle this orchestration at a different layer — 200+ models, 1,000+ integrations, and a visual builder for chaining agents — which is useful when the vault pattern needs to connect to external systems like Slack, HubSpot, or Notion rather than staying local.


The Non-Obvious Design Decisions

The processed subfolder is load-bearing. It looks like a minor housekeeping detail. It’s actually the deduplication mechanism for the entire automation. Without it, the hourly run has no way to distinguish “already ingested” from “waiting to be ingested.”

Plans first. Then code.

PROJECTYOUR APP
SCREENS12
DB TABLES6
BUILT BYREMY
1280 px · TYP.
yourapp.msagent.ai
A · UI · FRONT END

Remy writes the spec, manages the build, and ships the app.

agents.md is the system’s API. Every behavioral change goes through this file. New rules, new triggers, new output formats — all of it is a text edit. This means the system is extensible without touching any code. Want to add a /workouts folder with its own index and processing rules? Add the instructions to agents.md and the next processing run will follow them.

The GitHub backup is part of the architecture, not an afterthought. The Codex automation is set to commit and push to a private GitHub repo after each processing run. This means the vault is versioned. If the agent makes a bad edit to a wiki page, you can roll it back. If you want to run the vault on a different machine, you clone the repo. The backup also means you can hand the repo URL to a different agent — Claude Code, for instance — and it has full context immediately.

The wiki grows from questions. This is the behavior that makes the system compound over time. Every query you ask creates a new wiki page synthesizing the answer and linking back to sources. After six months of daily use, the wiki contains not just what you saved but what you’ve wondered about. That’s a fundamentally different kind of knowledge base.

If you’re thinking about how this pattern scales into production applications — where the “vault” becomes a structured database and the “agents.md” becomes a formal spec — Remy is one answer: you write an annotated markdown spec and it compiles into a complete TypeScript backend, SQLite database, auth, and deployment. The spec-as-source-of-truth principle is the same; the output is a deployed application rather than a local vault.


What to Build This Week

The minimum viable version of this system takes about an hour to set up:

  1. Install Obsidian (free) and create a new vault in a dedicated folder.
  2. Install the Obsidian Web Clipper Chrome extension. Set the note location to raw. Add your vault name to the settings.
  3. Point Codex (or Claude Code) at the vault folder. Paste in Karpathy’s LLM Wiki GitHub URL and ask it to build the architecture. Delete the extra files it generates — you want just /raw, /wiki, agents.md, index.md, and log.md to start.
  4. Clip five to ten things you’ve been meaning to read or watch. Let the agent process them.
  5. Ask the wiki a question about something you saved. Watch it answer and then update the wiki with a new page.

Add the journal and CRM modules once the wiki is working. Add the hourly automation once you’re clipping regularly enough that manual processing becomes annoying.

The system gets more useful the longer you run it, because the wiki compounds. A vault with 50 sources is useful. A vault with 500 sources, six months of journal entries, and 80 CRM contacts is a different kind of tool — one that knows your interests, your struggles, and your conversations well enough to give you genuinely personalized responses rather than generic LLM output.

For builders thinking about AI agents for personal productivity more broadly, this architecture is a useful reference point: the folder structure is the schema, agents.md is the behavior spec, and the markdown files are both the data and the interface. It’s a pattern that scales surprisingly far before you need anything more complex.

Other agents ship a demo. Remy ships an app.

UI
React + Tailwind ✓ LIVE
API
REST · typed contracts ✓ LIVE
DATABASE
real SQL, not mocked ✓ LIVE
AUTH
roles · sessions · tokens ✓ LIVE
DEPLOY
git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

The Karpathy LLM Wiki knowledge base approach is worth reading alongside this if you want to go deeper on the wiki module specifically — the source architecture has more nuance than the seven-folder summary suggests, particularly around how entity pages get created and updated across multiple source files.

Start with the folders. The intelligence follows from the structure.

Presented by MindStudio

No spam. Unsubscribe anytime.