Claude Code vs. Codex as Your Second Brain Engine — Which AI Agent Works Best with Obsidian?
Your Obsidian vault is just markdown — meaning Claude Code, Codex, or any agent can power it. Here's how each performs as your second brain engine.
The Agent Doesn’t Matter. The Vault Does.
Claude Code and Codex are both capable of running your Obsidian second brain. The choice between them is real, but it’s not the choice most people think they’re making. The entire system is just markdown files — meaning Claude Code, Codex, or OpenClaw can all work from the same vault directory interchangeably. Swap the agent, keep the brain.
That’s the architectural insight worth sitting with before you spend an afternoon configuring one tool over another. Your /raw, /wiki, /journal, and /crm folders don’t care which agent reads them. The agents.md file that governs all behavior is plain text you can edit by hand. The wiki pages are .md files. The CRM records are .md files. The journal entries are .md files. If you build this correctly, you’re not locked into anything.
So the question isn’t “which agent is better?” The question is: given your workflow, your budget, and how you want to interact with this system day-to-day, which agent fits better right now — with the understanding that you can switch later without rebuilding anything?
That framing changes the comparison considerably.
What You’re Actually Evaluating
Before running through Claude Code versus Codex, it helps to agree on the dimensions that matter for this specific use case. A second brain system has different requirements than a coding assistant or a customer support bot.
Day one: idea. Day one: app.
Not a sprint plan. Not a quarterly OKR. A finished product by end of day.
Autonomous background processing. The hourly automation that checks /raw for unprocessed files and ingests them is the heartbeat of this system. You need an agent that can run on a schedule, operate without supervision, and make consistent decisions about file movement, wiki page creation, and index updates. This is not a “chat with me” task. It’s a daemon.
Instruction fidelity over time. The agents.md file is your prompt layer. It governs everything: how source files get processed, how YouTube channel names get extracted into front matter, how wiki pages cross-link back to original sources, how journal entries get grounded in the wiki before responding. As you add rules to agents.md, the agent needs to follow the updated instructions reliably — not drift back to defaults.
Context window and multi-file reasoning. Processing a new YouTube transcript means reading the raw file, checking the wiki index, creating or updating relevant concept pages, updating index.md, appending to log.md, and moving the source to /raw/processed. That’s six or more file operations in a single pass. The agent needs to hold the full picture across all of them.
Cost per run. If the automation runs hourly and you’re clipping content aggressively, you’re looking at potentially hundreds of runs per month. Model choice has real cost implications here. GPT 5.5 on High reasoning — the recommendation in the source setup — is not cheap at scale.
Interactive quality. The journal and CRM interactions are synchronous. You type journal and then brain-dump your thoughts. The response should cite specific saved videos by date, reference past journal entries, and check the CRM for relevant contacts. This is where model quality shows up most visibly to you as the user.
Codex as the Engine
Codex is the tool the system was built with, and that matters. The automations tab — set to hourly, pointed at the second brain project folder, running against the local file system — is a native feature. You configure it once: process unprocessed files in /raw, then commit and push to the private GitHub repo. Done. It runs without you.
The model recommendation here is GPT 5.5 on High reasoning. In practice, this means the processing runs are slower and more expensive than a lighter model, but the output quality — particularly the Zettelkasten-style auto-linking and the extraction of entities, concepts, and themes into separate wiki pages — is noticeably better. The demo in the source shows it correctly renaming a video from “How to Trick Your Brain into Becoming So Disciplined Your Friends Will Be Shocked by Your Success” to “Discipline Without Willpower” based on content analysis. That’s the kind of judgment call you want from the processing layer.
The interactive layer also works well. The journal demo is instructive: a brain-dump about YouTube title anxiety produced a response that cited “YouTube valley of death” and “creator persistence” pages from the wiki — specific pages that existed because of previously saved videos. The agent checked the vault index first, pulled relevant pages, noted the absence of prior journal entries, and then composed a response grounded in saved content rather than generic LLM knowledge. That’s the system working as intended.
Not a coding agent. A product manager.
Remy doesn't type the next file. Remy runs the project — manages the agents, coordinates the layers, ships the app.
Where Codex has friction: the setup requires you to be comfortable with a local IDE environment. The project folder, the automations configuration, the GitHub integration — these are not difficult, but they’re not zero-friction either. If you’ve never used Codex before, there’s a learning curve that has nothing to do with the second brain itself.
The other issue is that Codex’s automations are tied to the Codex environment. If you want to run this on a server or trigger it from a different tool, you’re working against the grain of how Codex is designed.
Claude Code as the Engine
Claude Code can run the same vault. The agents.md file is model-agnostic — it’s just instructions. Point Claude Code at the same directory, give it the same prompt structure, and it will process files, update the wiki, and handle journal entries. The self-evolving Claude Code memory system with Obsidian and hooks approach takes this further, using Claude Code hooks to automatically capture session logs and extract lessons — a natural extension of the same markdown-first architecture.
Claude Code’s strengths show up in a few specific places. First, multi-file reasoning. Claude’s context window handling tends to be more consistent when you’re asking it to hold a large vault in mind and make coherent decisions across many files simultaneously. For a wiki that’s grown to hundreds of interconnected pages, this matters. Second, instruction following on complex, multi-step tasks. The kind of nuanced rule in agents.md — “cross-link any wiki pages generated or updated to the original source page” or “add the channel name to the original source page front matter, not the generated wiki page” — tends to stick more reliably with Claude across repeated runs.
The Andrej Karpathy LLM wiki built with Claude Code demonstrates this directly: the same architectural foundation, the same folder structure, running on Claude instead of GPT. The output is structurally identical because the vault is structurally identical.
The friction with Claude Code is on the automation side. Claude Code doesn’t have a native “run this every hour” feature the way Codex does. You’d need to wire up a cron job, a shell script, or an external scheduler to get the background processing running autonomously. That’s not a hard problem, but it’s a problem you have to solve yourself. For builders comfortable with a terminal, this is a ten-minute task. For everyone else, it’s a reason to stick with Codex.
Cost is also worth flagging. Claude Opus is expensive for high-volume background processing. Claude Sonnet is a reasonable middle ground — better than GPT 4o for instruction fidelity, cheaper than Opus for batch runs. The GPT-5.5 vs Claude Opus 4.7 coding comparison found that GPT-5.5 uses 72% fewer output tokens than Opus on equivalent tasks, which has direct implications for the cost of running hourly automations at scale.
The Agents.md Insight Nobody Talks About
Both agents are ultimately executing against the same prompt file. The agents.md file is the real intelligence layer of this system — not the model. It defines what happens when a source file arrives in /raw, what happens when you prefix a chat with journal, what happens when you say add to CRM. The model is the executor. The spec is the brain.
This is actually a general principle worth internalizing. When you’re building agentic systems on top of file systems, the quality of your instruction layer matters more than the model choice — up to a point. A well-written agents.md running on a mid-tier model will outperform a poorly-written one running on the best available model. The demo where the agent misunderstood the channel name instruction — adding it to the generated wiki page instead of the original source page — was a spec problem, not a model problem. The fix was editing agents.md, not switching models.
This is also why the markdown-first architecture is more durable than it looks. If Anthropic ships a better model next month, you update one line in your automation config. If OpenAI ships something better the month after, same thing. The vault persists. The wiki grows. The agent is interchangeable. Tools like Remy operate on a similar principle — the spec is the source of truth, and the generated output (in Remy’s case, a full TypeScript stack with backend, database, and auth) is derived from it. Fix the spec, recompile, done. The same logic applies here: fix agents.md, re-run, done.
Verdict: Which Agent for Which Situation
Use Codex if you want the fastest path to a working system with native scheduling. The automations tab is genuinely convenient. You configure the hourly run once, point it at the second brain project, add the GitHub commit step, and the background processing just works. If you’re new to this kind of setup and want something that runs without you managing cron jobs, Codex is the lower-friction choice. The GPT 5.5 High reasoning recommendation is real — use it for processing, not for every interaction.
Use Claude Code if you have a large, mature vault and instruction fidelity is your primary concern. Claude’s handling of complex, multi-step agents.md rules tends to be more consistent at scale. You’ll need to handle scheduling yourself, but if you’re already comfortable with Claude Code’s environment — and the Claude Code frameworks comparison gives you a sense of the ecosystem — the tradeoff is worth it. Claude Sonnet is the right model for batch processing; Opus for interactive journal and CRM queries where response quality is visible.
Use both if you’re serious about this system long-term. Run Codex automations for background ingestion — it’s what the tool is designed for. Use Claude Code for interactive queries, journal responses, and CRM lookups — where its instruction-following and reasoning quality shows up most. The vault doesn’t care. Both agents read the same files.
Don’t use either exclusively if you’re building something that needs to connect to external systems — HubSpot, Slack, Notion, your email. The markdown vault is excellent for personal knowledge management, but the moment you want your second brain to push data somewhere or pull from a live API, you’re outside what either agent handles natively. Platforms like MindStudio address this differently: 200+ models, 1,000+ pre-built integrations, and a visual builder for chaining agents and workflows — useful when the vault is one node in a larger system rather than the whole system.
The deeper point is that the agent choice is a deployment decision, not an architectural one. Build the vault correctly — clean folder structure, well-maintained agents.md, consistent front matter, cross-linked wiki pages — and you’ve built something that outlasts any particular model or tool. The graph view in Obsidian that starts as a handful of nodes and grows into a dense interconnected web over weeks? That’s the vault. The agent just tends it.
One coffee. One working app.
You bring the idea. Remy manages the project.
Pick the tool that gets you to a running system fastest. You can always swap the engine later. You can’t easily rebuild the knowledge base you didn’t start building six months ago.