Claude Code Memory Systems Compared: Memarch vs Hermes vs Built-In

Q: What is the CLAUDE.md file and how does it work?

CLAUDE.md is a Markdown file that Claude Code automatically reads into its context at the start of each session. It functions as a persistent memory layer for project conventions, architecture notes, and instructions. You can place one at the project root (for project-specific context) or in your home directory (~/.claude/CLAUDE.md) for user-level preferences that apply across all projects. The file is loaded in full, so everything in it consumes context tokens.

Why Memory Is the Hardest Part of Building Claude Agents

Every Claude Code agent hits the same wall eventually: the context window resets, and everything learned in the previous session is gone. Project context, user preferences, past decisions — wiped. You either re-explain everything or watch your agent make the same mistakes twice.

This is the core challenge of persistent memory for Claude agents. And there are now several approaches competing for how to solve it: Claude Code’s built-in memory system, the vector-database approach (best represented by Memarch), and the curated-facts approach (as seen in Hermes). Each makes different trade-offs that matter a lot depending on what you’re building.

This article breaks down how each system works, where each one excels, and how to decide which is right for your use case.

The Problem With Context Windows

Before comparing solutions, it helps to understand exactly what’s being solved.

Claude’s context window is large — up to 200,000 tokens in Claude 3.5 and later models — but it’s still finite, and more importantly, it’s ephemeral. When a session ends, everything in that context is gone unless something explicitly saves it.

For simple, one-shot tasks, that’s fine. But agents doing ongoing work — maintaining codebases, managing projects, learning user preferences — need memory that survives session boundaries.

The three approaches covered here each solve this in different ways:

Built-in memory uses the filesystem (specifically CLAUDE.md files and structured documents) as a persistent store that gets reloaded into context.
Memarch uses vector embeddings to create a semantic memory index, retrieving relevant memories based on similarity to the current task.
Hermes extracts discrete, curated facts from interactions and stores them in a structured format for precise retrieval.

Remy is new. The platform isn't.

Remy

Product Manager Agent

THE PLATFORM

200+ models 1,000+ integrations Managed DB Auth Payments Deploy

▮

BUILT BY MINDSTUDIO

Shipping agent infrastructure since 2021

Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.

None of these is universally better. They’re optimized for different workloads.

Claude Code’s Built-In Memory: Simple and Surprisingly Powerful

How It Works

Claude Code’s native memory system is straightforward: it reads specific files into the context window at the start of each session. The primary mechanism is the CLAUDE.md file.

There are two scopes:

Project-level (CLAUDE.md in your project root): Contains project-specific context — architecture decisions, coding conventions, file structure, known issues.
User-level (~/.claude/CLAUDE.md): Contains user preferences, common workflows, personal instructions that apply across all projects.

Claude Code reads these automatically, so the agent always starts a session with that baseline context loaded. Beyond CLAUDE.md, agents can also read and write arbitrary files, effectively creating their own memory structures using JSON, Markdown, or any other format.

What It’s Good At

Built-in memory is remarkably effective for structured, predictable information. If you have a codebase with specific conventions, a set of decisions that should never be revisited, or a workflow the agent should always follow, CLAUDE.md handles this well.

Zero setup — it works out of the box
Human-readable — you can inspect and edit the memory directly
Deterministic — the agent always sees the same information at session start
No additional infrastructure required

Where It Falls Short

The problems emerge at scale. CLAUDE.md files are loaded entirely into context, which means large memory files consume tokens before the agent has done any work. If you’re managing a large project or tracking many things over time, this becomes a real constraint.

There’s also no retrieval intelligence. The agent gets everything in the file whether it’s relevant to the current task or not. And managing the file manually — deciding what to keep, what to remove, when to reorganize — creates ongoing maintenance work.

Best for: Single-project agents, teams with well-defined conventions, situations where memory is mostly static and small.

Memarch: Vector Database Memory for Agents

The Core Idea

Memarch takes a fundamentally different approach. Instead of loading a flat file into context, it stores memories as vector embeddings in a database and retrieves only the memories that are semantically relevant to the current task.

When the agent encounters something worth remembering, Memarch embeds it and stores it. When the agent starts a new session or faces a new problem, Memarch queries the vector store to surface the most relevant past memories — typically the top-k results by cosine similarity — and injects only those into context.

This is essentially RAG (Retrieval-Augmented Generation) applied to agent memory.

How Memory Gets Created

Memarch typically follows a two-phase pattern:

Storage phase: As the agent works, it identifies memories worth saving — user preferences, project decisions, solutions to recurring problems, context about specific files or systems. These get embedded and written to the vector store.
Retrieval phase: At the start of each task or session, a query (based on the current task description or recent conversation) pulls the most relevant memories from the store.

The agent’s context window therefore only receives relevant memories, not the entire history. A Memarch-powered agent working on a database migration won’t have its context cluttered with memories about the UI component library.

Memory Categories in Memarch

Memarch implementations typically organize memories into categories to improve retrieval precision:

Working memory: Recent, highly relevant context (often kept in context continuously)
Episodic memory: Records of specific past events or sessions
Semantic memory: General knowledge about the project, system, or user
Procedural memory: How-to knowledge — steps for specific tasks

This categorization helps with filtering during retrieval, so you can ask for “only semantic memories about the authentication system” rather than pulling from everything.

Strengths of the Vector Approach

The main advantage is scale. A Memarch store can hold thousands of memories without any single session paying the full context cost. The agent surfaces what’s relevant, when it’s relevant.

This makes Memarch well-suited for:

Long-running agents that accumulate substantial history
Agents working across many different subsystems or domains
Multi-user deployments where each user’s preferences need to be tracked independently
Scenarios where the agent should “remember” interactions from weeks or months ago

The Trade-offs

Vector retrieval isn’t perfect. Semantic similarity doesn’t always equal task relevance. An agent working on a bug in the payment system might retrieve memories about a completely different bug that happened to use similar vocabulary. This kind of “false positive” retrieval can mislead the agent or just clutter context with noise.

There’s also real infrastructure overhead. You need a vector database (Pinecone, Qdrant, Weaviate, Chroma, or similar), an embedding model, and logic to manage the embedding and retrieval pipeline. This is meaningfully more complex than CLAUDE.md.

Retrieval latency is another factor — vector search adds time to each operation, which matters for interactive agents.

Best for: Agents with large, growing memory needs; long-term projects; multi-user or multi-domain deployments where selective retrieval is essential.

Hermes: Curated Facts as Memory

A Different Philosophy

Where Memarch stores and retrieves raw experiences, Hermes takes a more opinionated stance: the agent should actively curate what it remembers, extracting structured facts rather than storing episodes verbatim.

The core idea is that most of what happens in a session isn’t worth remembering. What matters is a distilled set of facts — things that will genuinely affect how the agent should behave in future sessions.

When a session ends (or at key points during a session), Hermes runs an extraction pass. The agent reviews what happened and explicitly identifies facts worth preserving:

“User prefers tabs over spaces in all Python files”
“The /api/auth endpoint requires a JWT with 24-hour expiry”
“Never run database migrations on production without a backup confirmation”
“The team has decided not to use Redux in this project”

These facts are stored in a structured, human-readable format — typically JSON or Markdown with clear schema — and loaded into context at session start.

Why Curation Matters

The argument for Hermes is that quality beats quantity in memory. An agent with 50 precise, relevant facts is more useful than one with 5,000 fuzzy semantic memories. The extraction step forces a kind of distillation that weeds out irrelevant detail.

Curated facts are also:

Auditable: You can read every fact the agent “knows” and edit them
Reliable: No retrieval failures — what’s in the fact store is always loaded
Explainable: When an agent behaves a certain way, you can trace it to a specific fact
Compact: Well-curated facts are small, leaving more context budget for actual work

✗ VIBE-CODED APP

Tangled. Half-built. Brittle.

✓ AN APP, MANAGED BY REMY

UIReact + Tailwind✓

APIValidated routes✓

DBPostgres + auth✓

DEPLOYProduction-ready✓

Architected. End to end.

Built like a system. Not vibe-coded.

Remy manages the project — every layer architected, not stitched together at the last second.

The Extraction Problem

The biggest challenge with Hermes is the extraction step itself. Deciding what’s worth remembering requires judgment. If this is done automatically (the agent decides what to save), the agent might miss important things or save too much noise. If it’s done manually, it creates maintenance overhead similar to CLAUDE.md.

Most Hermes implementations use a semi-automatic approach: the agent proposes facts to save, and a human reviews or approves them. This works well but adds friction to the workflow.

There’s also a staleness problem. Facts can become outdated. “The API uses OAuth 1.0” might have been true six months ago. Without mechanisms to review and retire stale facts, the fact store can accumulate incorrect beliefs.

Best for: Agents where precision and auditability matter more than scale; regulated or high-stakes environments; teams that want to understand and control exactly what their agent knows.

Head-to-Head Comparison

Dimension	Built-In (CLAUDE.md)	Memarch (Vector)	Hermes (Curated Facts)
Setup complexity	Minimal	High	Medium
Infrastructure required	None	Vector DB + embeddings	Structured store (can be flat files)
Memory capacity	Low (context-bound)	High (scales independently)	Medium (scales, but curation limits growth)
Retrieval method	Always-on (full load)	Semantic similarity	Always-on (full load)
Retrieval accuracy	Deterministic	Probabilistic	Deterministic
Human-readable	Yes	Partially (requires tooling)	Yes
Auditability	Full	Limited without tooling	Full
Staleness handling	Manual	Manual	Manual (more visible)
Latency	None (pre-loaded)	Adds retrieval latency	None (pre-loaded)
Best memory type	Static conventions	Dynamic, experiential	Distilled, high-value facts

Choosing the Right Memory System

When to Use Built-In Memory

If your agent works on a single, well-scoped project with stable conventions, built-in memory is often the right answer. It requires no additional tooling, it’s fast, and it’s easy to maintain.

Start here. Add complexity only when you hit real limits — specifically, when your CLAUDE.md is getting so long that it’s consuming a significant portion of your context budget, or when your agent needs to recall information that changes frequently across sessions.

When to Use Memarch

Choose Memarch when:

Your agent accumulates substantial history over time and needs to recall things from months ago
You’re building for multiple users who each have separate memory requirements
The agent operates across many different domains or subsystems
You can tolerate some retrieval imprecision in exchange for scale

The vector approach shines in long-lived, high-volume scenarios. It’s the closest thing to how humans access memory — imperfect, associative, but capable of handling vast amounts of information.

When to Use Hermes

Choose Hermes when:

Accuracy and auditability are non-negotiable
You’re building in a regulated industry or high-stakes environment where the agent’s beliefs need to be inspectable
Your team wants control over what the agent “knows”
The volume of memory is manageable — you’re storing decisions and facts, not raw experience

Hermes works particularly well for agents that maintain institutional knowledge — things like “this is how we handle compliance requests” or “these are the non-negotiable security constraints on this system.”

Hybrid Approaches

In practice, many production deployments combine approaches. A common pattern:

Use CLAUDE.md for truly static information (project architecture, always-on conventions)
Use Hermes for curated facts that have been deliberately chosen for high importance
Use Memarch for episodic memory — “what happened last week” — where scale matters more than precision

Wondering what the Hermes hype is about? Free 60-minute primer

The stack adds complexity, but for sufficiently large agents, the combination avoids the weaknesses of any single approach.

Where MindStudio Fits Into This Picture

If you’re spending significant time on memory plumbing — setting up vector databases, writing embedding pipelines, managing CLAUDE.md files across multiple agents — you’re spending time on infrastructure rather than the actual work your agent should be doing.

This is where MindStudio’s Agent Skills Plugin becomes relevant. It’s an npm SDK (@mindstudio-ai/agent) that exposes 120+ typed capabilities as simple method calls, so agents like Claude Code can focus on reasoning rather than managing the underlying infrastructure.

For memory-related workflows, this matters practically. Instead of building custom pipelines for storing user preferences or project context across sessions, you can wire Claude Code into MindStudio’s workflow layer, which handles state persistence, retries, and integration with external tools. The agent calls agent.runWorkflow() and the infrastructure layer handles the rest.

More broadly, if you want to move beyond a single Claude Code agent and build multi-agent systems where memory needs to be shared or handed off between agents, MindStudio’s visual builder makes that orchestration tractable without writing all the coordination logic from scratch. Teams at companies like Microsoft and Adobe use it for exactly this kind of multi-step agentic work.

You can try MindStudio free at mindstudio.ai — setup takes a few minutes and you don’t need to configure API keys separately.

Frequently Asked Questions

What is the CLAUDE.md file and how does it work?

CLAUDE.md is a Markdown file that Claude Code automatically reads into its context at the start of each session. It functions as a persistent memory layer for project conventions, architecture notes, and instructions. You can place one at the project root (for project-specific context) or in your home directory (~/.claude/CLAUDE.md) for user-level preferences that apply across all projects. The file is loaded in full, so everything in it consumes context tokens.

Can Claude Code agents use vector databases for memory?

Yes, though it requires custom implementation or a framework like Memarch. Claude Code itself doesn’t natively connect to a vector database — you’d set up the embedding and retrieval pipeline externally and inject the retrieved memories into the agent’s context before each task. Libraries like LangChain’s memory modules or purpose-built tools like Memarch handle most of this scaffolding.

How does Hermes decide what facts to save?

Most Hermes implementations use the agent itself to propose facts at the end of a session or after significant interactions. The agent reviews the conversation and identifies statements that represent durable, high-value knowledge worth preserving. In some setups, a human approves or edits these proposals before they’re committed to the fact store. This semi-automatic curation is the key differentiator from raw vector storage.

What’s the biggest risk with vector-based memory for agents?

Remy doesn't build the plumbing. It inherits it.

Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.

WHAT REMY DOESN'T HAVE TO BUILD

200+

AI MODELS

GPT · Claude · Gemini · Llama

✓

1,000+

INTEGRATIONS

Slack · Stripe · Notion · HubSpot

✓

MANAGED DB

AUTH

PAYMENTS

CRONS

Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.

The main risk is retrieval noise — surfacing memories that are semantically similar to the current task but not actually relevant. This can cause an agent to act on outdated information or get confused by superficially similar past events. The fix is usually a combination of better memory categorization, metadata filtering during retrieval, and periodic memory hygiene (reviewing and deleting outdated entries).

Is there a memory system built specifically for multi-agent Claude setups?

Multi-agent memory is an active area of development. The Anthropic documentation on multi-agent architectures outlines several patterns, but there’s no single official memory system for multi-agent Claude deployments. In practice, teams combine shared vector stores (where all agents read and write to a common memory), orchestrator-managed state, and tool-use patterns where one agent explicitly hands context to another.

How do I handle stale memories in any of these systems?

Staleness is a universal problem across all three approaches. For CLAUDE.md and Hermes, the answer is periodic review — scheduling time to read through what the agent “knows” and removing or updating outdated entries. For Memarch, most vector databases support metadata tagging, so you can timestamp memories and expire old ones automatically, or use recency weighting during retrieval to favor newer information. None of these approaches are fully automatic — they all require some human oversight.

Key Takeaways

Built-in memory (CLAUDE.md) is the right starting point — zero infrastructure, human-readable, deterministic. Its limits only matter at scale.
Memarch’s vector approach solves the scale problem but introduces retrieval uncertainty and significant infrastructure overhead. Use it when you need to recall from a large, growing history.
Hermes’s curated facts prioritize accuracy and auditability over volume. It’s the right choice when you need to inspect and control exactly what your agent believes.
Hybrid approaches — combining static conventions, curated facts, and semantic search — are common in production systems that have outgrown any single approach.
The right choice depends on your agent’s workload: static projects favor built-in memory; long-running, high-volume agents favor vector storage; high-stakes or regulated environments favor curated facts.

If you want to build agents that handle memory and multi-step reasoning without managing all the infrastructure yourself, MindStudio is worth exploring — it handles the plumbing so you can focus on what your agent actually does.