Claude Code Memory Levels Explained: 6 Layers from claude.md to Cross-Tool Shared Memory

Claude Code Has 6 Memory Levels. Most Builders Are Stuck on Level 1.

Every Claude Code session starts from zero. No memory of last week’s decisions, no awareness of the architecture choices you made three sessions ago, no recollection of the bug you spent four hours debugging on Tuesday. You explain the project again, burn tokens on context-setting, and wonder why the outputs feel generic.

The fix isn’t a single tool or a single file. It’s a 6-level memory taxonomy — native files → session hooks → semantic search → verbatim recall → knowledge bases → cross-tool shared memory — and most builders are only using the first level. The gap between Level 1 and Level 3 alone is the difference between an AI that forgets everything and one that picks up a project you haven’t touched in two weeks and already knows where you left off.

Here’s what each level actually does, when it’s worth implementing, and which skills unlock the tiers that matter most.

Level 1: The claude.md File (And Why It’s Not Enough on Its Own)

The claude.md file is Claude Code’s native identity layer. It’s a plain markdown file that gets injected into the system prompt at the start of every session, giving Claude standing instructions about how you work, what the project is, and what rules to follow.

Every Claude Code user has one. Most stop here.

RWORK ORDER · NO. 0001ACCEPTED 09:42

YOU ASKED FOR

Sales CRM with pipeline view and email integration.

✓ DONE

REMY DELIVERED

Same day.

yourapp.msagent.ai

AGENTS ASSIGNEDDesign · Engineering · QA · Deploy

The problem is structural. A claude.md file is a suggestion, not a guarantee. Claude reads it, but it doesn’t have to act on every instruction. If you tell it to read a separate context file, it might skip that step. If your instructions conflict with something in the conversation, the conversation often wins.

There’s also a maintenance problem. claude.md files are static — you write them once and they decay. The project evolves, decisions get made, new constraints emerge, and the file doesn’t update itself. Miss one update and Claude misses that context entirely.

That said, Level 1 is still foundational. The file structure matters: a user.md (about you — your role, preferences, communication style) paired with a soul.md (the agent’s personality) paired with a shared brand context folder gives you a clean separation between who you are, how the agent should respond, and what your business context looks like. The claude.md then references these files to pull the right context at the right time. This is the static layer — it doesn’t change often, but it has to be right.

For multi-client work, the architecture extends naturally. A master claude.md at the root folder passes shared methodology down to individual client subfolders, where duplicate claude.md files can override or extend the parent instructions. Each client gets its own brand context and its own memory. The inheritance is built into how Claude Code reads parent directories.

Level 2: Session Start Hooks (Deterministic Injection)

Level 2 solves the “Claude might skip it” problem with a blunt instrument: a session start hook that forces context into the conversation window whether Claude wants it or not.

The distinction matters. A claude.md instruction that says “read the brand voice file before responding” is advisory. A session start hook is deterministic — it runs at session initialization and pushes the specified data directly into context. There’s no reading required, no chance of the instruction being deprioritized.

If you have business context or brand voice that absolutely must be present in every session, this is how you guarantee it. The hook fires, the context loads, the session starts with the right foundation already in place.

Level 2 is particularly useful for teams or client work where consistency matters more than flexibility. You’re not relying on Claude’s interpretation of an instruction — you’re wiring the context injection directly into the session lifecycle.

Level 3: Semantic Search (The 80/20 for Most Builders)

This is where memory starts to feel like memory.

Levels 1 and 2 handle static context — things that don’t change much. Level 3 handles dynamic context: what you worked on last week, what decisions you made three sessions ago, what bugs you fixed and why.

The mechanism is semantic search. Tools like mem search or the ClaudeMem skill store session observations in a local SQLite database with vector search, then retrieve the most relevant pieces when you start a new session. You ask a question, the system finds the memory fragments most semantically related to it, and injects those — not everything, just what’s relevant.

Remy doesn't build the plumbing. It inherits it.

Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.

WHAT REMY DOESN'T HAVE TO BUILD

200+

AI MODELS

GPT · Claude · Gemini · Llama

✓

1,000+

INTEGRATIONS

Slack · Stripe · Notion · HubSpot

✓

MANAGED DB

AUTH

PAYMENTS

CRONS

Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.

ClaudeMem specifically uses a three-layer retrieval system: it first returns a compact index of observations, then lets you pull a timeline around the ones that matter, then fetches full details only for the handoff you actually need. The repo reports roughly 10x token savings on retrieval compared to dumping all past context at session start. That’s not a rounding error — it’s the difference between a session that starts clean and one that starts bloated.

ClaudeMem also auto-generates folder-level claude.md files and updates them as you work. Your project documentation writes itself. Install it through the Claude Code plugin marketplace; the repo includes a specific warning not to run the npm install command directly, because that installs the SDK library without registering the hooks — nothing actually works. Use the two plugin commands instead.

The combination of Levels 1, 2, and 3 is what most builders actually need. You get static identity (Level 1), guaranteed context injection (Level 2), and session-aware memory that retrieves by meaning rather than recency (Level 3). For anyone running a real project or managing client work, this stack is the baseline.

This is also the layer that makes tools like MindStudio relevant in a different context — when you’re building agents that need to maintain state across sessions and across models, the orchestration question becomes: where does memory live, and how does it get routed? MindStudio’s approach to chaining 200+ models and 1,000+ integrations means the memory layer has to be explicit, not assumed.

Level 4: Verbatim Recall (When Exact Phrasing Matters)

Most memory systems work by compression. They summarize, extract key points, store semantic meaning. That’s usually what you want — you need the gist of a decision, not a word-for-word transcript.

But sometimes you need the word-for-word transcript.

Level 4 is verbatim recall, implemented through tools like Mem Palace. The use case is specific: client work where exact phrasing matters, legal or compliance contexts where you need to reproduce what was said rather than what it meant, or any situation where summarization introduces unacceptable risk.

This is an optional bolt-on, not a default. The overhead of storing and retrieving verbatim records is higher than semantic search, and for most projects it’s unnecessary. Add it when the cost of paraphrase is higher than the cost of storage.

Mark Kashef’s hive mind system takes a related approach — his entire architecture runs on a local SQLite database that stores agent conversations, tasks, memories, and scheduled jobs with zero cloud cost. The 3D and 2D graph views (the latter inspired by Obsidian’s graph view) are built on top of that list view. His insight is worth keeping: if the underlying table is populated and accurate, the fancy visualization is just additive. Get the boring data layer right first.

Level 5: Knowledge Bases (Structured External Knowledge)

Level 5 moves beyond session memory into structured knowledge that exists independently of any conversation.

A knowledge base is a curated, queryable store of information — documentation, research, SOPs, past project outputs — that Claude can retrieve from rather than having to regenerate. The distinction from Level 3 is intentionality: semantic search captures what happened during sessions; a knowledge base captures what you’ve deliberately decided Claude should know.

Andrej Karpathy’s approach to building a personal knowledge base with Claude Code is a good reference point here. The idea is to turn raw documents into a structured markdown knowledge base that Claude can query — not a dump of files, but an organized retrieval system with clear structure.

For builders working across multiple clients or domains, a knowledge base at Level 5 is how you encode institutional knowledge that shouldn’t have to be re-explained every time. Your research on a client’s industry, your accumulated understanding of their competitive landscape, your library of past successful outputs — these live here, queryable on demand.

The GSD (Get Shit Done) skill addresses a related problem at the workflow level. Its plan → execute → verify framework spawns fresh sub-agents per task, each with a clean context window focused only on what they need. This isn’t about memory storage — it’s about memory hygiene. By preventing context rot during long sessions, GSD ensures that whatever you’ve stored at Levels 1–5 actually gets used correctly rather than getting crowded out by accumulated session noise.

Level 6: Cross-Tool Shared Memory (The Architecture Question)

Level 6 is the hardest to implement and the most powerful when it works: memory that persists across different AI tools, different devices, and different sessions in a unified, queryable store.

The practical version of this is what Kashef has built — a SQLite database that all his agents write to and read from, accessible through a Telegram interface via an Anthropic SDK bridge. Every agent in his system inherits the same memory infrastructure. When he runs /standup, each agent queries its own entries in the shared database and reports back. The /discuss command opens a multi-agent council where each participant has context on what the others have said.

The key architectural decision is where the database lives. Kashef runs everything locally — no cloud cost, no latency, no vendor dependency. For teams or distributed setups, a hosted option like Supabase or Neon works the same way, just with different tradeoffs on cost and access.

Cross-tool memory also means thinking about what gets shared versus what stays siloed. In a multi-client architecture, you want shared methodology (skills, planning frameworks, process knowledge) but isolated client memory (conversations, decisions, brand context). The master claude.md handles the shared layer; client-level folders handle the isolation.

The Claude Code source leak’s three-layer memory architecture revealed that Anthropic is thinking about this problem at the infrastructure level — a self-healing memory system using memory.md as a pointer index. The direction of travel is toward more sophisticated cross-session memory, not less.

Choosing Your Stack

The taxonomy isn’t a ladder you have to climb rung by rung. It’s a menu.

For a solo developer working on a single project: Levels 1 and 3 are probably sufficient. Set up your claude.md properly, install ClaudeMem, and you’ve eliminated most of the context-tax that makes long projects painful.

For client work or agency use: Add Level 2 (session start hooks) to guarantee brand context loads every time, and consider Level 4 if exact phrasing matters in deliverables. The multi-client architecture — master claude.md at root, client folders with override files, per-client brand context and memory — handles the separation cleanly.

For teams building autonomous agent systems: Level 6 is where the real leverage is. The self-evolving Claude Code memory system using Obsidian and hooks is one implementation path. Kashef’s SQLite hive mind is another. The common thread is that memory has to be a first-class architectural concern, not an afterthought.

One thing worth saying plainly: the Superpowers skill (150,000+ GitHub stars) and the Context Mode skill address memory-adjacent problems that the taxonomy doesn’t fully capture. Superpowers forces plan-first, test-before-code workflows that prevent the kind of rushed decisions that create bad memories in the first place. Context Mode compresses raw session output — a 56KB Playwright snapshot becomes 299 bytes, 315KB of session data becomes 5KB total — so your context window doesn’t fill up with garbage before your memory system even gets a chance to work.

The memory levels tell you where to store knowledge. Those skills tell you how to keep the context window clean enough that the storage actually matters.

For builders thinking about how this connects to production app development, Remy takes a complementary approach to the spec-as-source-of-truth problem: you write your application as annotated markdown, and it compiles into a complete TypeScript stack — backend, database with auto-migrations, auth, deployment. The spec is the persistent source of truth; the generated code is derived output. It’s a different layer of the same underlying question: what’s the canonical representation of what you’re building?

The deeper you get into Claude Code memory, the more you realize the taxonomy is really a question about data architecture. Where does knowledge live? How does it get retrieved? What’s the cost of retrieval versus the cost of forgetting? The six levels are just a framework for answering those questions systematically rather than by accident.

Most builders are on Level 1. The ones shipping consistently are on Level 3. The ones running autonomous agent systems are thinking about Level 6. The gap between them isn’t model quality or prompt skill — it’s memory architecture.