How to Build a Hybrid AI Memory System: Combining Memarch and Hermes for Claude Code

The Memory Problem Every Claude Code User Hits Eventually

If you’ve used Claude Code for more than a few sessions, you’ve probably run into the same wall: start a new conversation, and Claude has no idea what you built last week. You re-explain your architecture, re-clarify your preferences, re-describe what files do what. It’s functional, but tedious — and it compounds as your project grows.

The solution most people reach for is a CLAUDE.md file. Dump your context there, point Claude at it, done. But this approach has limits. CLAUDE.md becomes a dumping ground. You either over-stuff it (slowing down every session) or under-stuff it (losing context that matters). Neither is a real memory system.

That’s what a hybrid AI memory system built on Memarch and Hermes solves. Memarch captures everything automatically. Hermes curates what actually matters. Together, they form a three-tier memory architecture that lets Claude Code carry meaningful context across sessions without drowning in noise.

This guide explains how each component works, how they complement each other, and how to combine them into a system you can actually maintain.

What Memarch and Hermes Actually Do

Before getting into setup, it’s worth being precise about what each tool handles — because they solve different parts of the same problem.

Memarch: Automatic Capture at Scale

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

Memarch is a memory archiving system. Its job is to record everything: every decision Claude Code makes in a session, every file it touches, every error it encounters, every approach it reasons through. It operates like a flight recorder — comprehensive, automatic, and non-discriminatory.

The key design principle behind Memarch is that you don’t know what you’ll need to remember. When you’re deep in a debugging session at midnight, you’re not thinking about which observations might matter three weeks from now. Memarch solves this by capturing all of it, without requiring you to tag or classify anything in the moment.

What Memarch captures typically includes:

Session transcripts — full reasoning chains from Claude Code, not just outputs
File change logs — what was modified, when, and what the stated reason was
Decision points — moments where Claude explicitly chose one approach over another
Error encounters — what broke, what was tried, and what resolved it
Project state snapshots — periodic records of directory structure and key configuration

This raw archive isn’t meant to be read by humans in bulk. It’s a reservoir that feeds the second component.

Hermes: Curation and Delivery

Hermes takes what Memarch captures and decides what’s worth promoting to active memory. Named for the messenger role it plays, Hermes runs periodic passes over the Memarch archive and extracts durable facts — the kind of knowledge that should survive session boundaries.

Where Memarch asks “what happened?”, Hermes asks “what should Claude remember?”

The curation process works through a set of configurable rules and LLM-assisted summaries. Hermes looks for patterns like:

Repeated references — if Claude mentions a specific constraint five times across sessions, that constraint probably belongs in active memory
Decision justifications — “we chose Postgres over SQLite because of concurrent write volume” is worth keeping; “I created a variable called idx” is not
Error resolutions — solutions to non-obvious bugs are high-value memories
Architectural facts — data flow, module boundaries, integration points

The output of a Hermes pass is a structured memory file — clean, ranked by relevance, ready to be loaded into Claude’s context window.

The Three-Tier Architecture

A hybrid memory system built on Memarch and Hermes operates across three distinct tiers. Understanding the tiers helps you configure each one correctly.

Tier 1 — Raw Archive (Memarch)

This is your append-only log. Every session writes to it. Nothing gets deleted. It’s stored as structured JSON or JSONL files, organized by date and project. The archive grows indefinitely, but it’s cheap to store and never accessed directly during active Claude Code sessions.

Think of Tier 1 as the source of truth. It answers forensic questions: “What exactly did Claude do on the 14th? What were its exact words when it proposed that refactor?”

Tier 2 — Curated Memory (Hermes)

This is where meaning lives. Hermes processes Tier 1 on a schedule (or on demand) and produces a set of memory files that are human-readable, concise, and structured around what Claude actually needs to know.

Tier 2 files might look like:

## Architecture Facts
- This project uses a monorepo with Turborepo
- Auth is handled by the `packages/auth` module — do not reimplement
- All API routes are versioned under `/api/v2/`

## Known Constraints
- The staging database has a 50-connection pool limit — don't open long-lived connections in scripts
- CI runs on Node 20.x only; avoid Node 22 APIs

## Recent Decisions
- Switched from REST to tRPC for internal service calls (2024-11-08)
- Dropped Redis caching from the user feed — latency acceptable without it

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

remy.msagent.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

This file is focused, scannable, and genuinely useful to Claude Code as session-opening context.

Tier 3 — Active Context (CLAUDE.md + Session Memory)

This is what Claude actually sees. Tier 3 consists of the CLAUDE.md file (which you now stop hand-editing constantly) plus any in-session memory Claude builds up as work progresses.

The key shift in a hybrid system is that CLAUDE.md becomes a pointer to Hermes output, not a document you maintain manually. At session start, a small initialization script pulls the latest Tier 2 output into Claude’s active context. Claude gets the curated facts without you doing anything.

Setting Up Memarch for Automatic Capture

Prerequisites

You’ll need:

Claude Code installed and configured
Node.js 18+ (for the Memarch daemon)
A directory to store your archive (local or cloud-synced)

Step 1: Install and Configure the Memarch Daemon

Memarch runs as a background process that hooks into Claude Code’s output stream. After installing, create a config file at ~/.memarch/config.json:

{
  "archive_dir": "~/.memarch/archive",
  "capture_mode": "full",
  "session_separator": "timestamp",
  "exclude_patterns": ["node_modules", ".git", "*.lock"],
  "flush_interval_seconds": 30
}

The capture_mode: "full" setting tells Memarch to record everything — reasoning, tool calls, file operations, and output. You can dial this back to "decisions_only" if storage is a concern, but full capture gives Hermes more to work with.

Step 2: Connect Memarch to Claude Code via MCP

Claude Code supports the Model Context Protocol (MCP), which lets you expose external tools and resources to Claude directly. Memarch ships with an MCP server that handles session start/end events.

Add the following to your Claude Code MCP configuration:

{
  "mcpServers": {
    "memarch": {
      "command": "memarch-mcp",
      "args": ["--config", "~/.memarch/config.json"]
    }
  }
}

Once connected, Memarch automatically:

Starts a new session record when you open Claude Code
Logs tool calls and their results in real time
Closes and timestamps the session record on exit
Writes a session summary to the archive

Step 3: Verify the Archive

After running a Claude Code session, check ~/.memarch/archive/. You should see a directory structure like:

archive/
  2024-11/
    2024-11-14_session_001.jsonl
    2024-11-14_session_002.jsonl
    2024-11-15_session_001.jsonl

Each .jsonl file contains one JSON object per line, each representing a discrete event in the session. If your files are populating correctly, Tier 1 is working.

Configuring Hermes for Memory Curation

Step 1: Install Hermes and Set Up Your Curation Config

Hermes reads from your Memarch archive directory and writes structured memory files to a separate output directory. Create a config at ~/.hermes/config.yaml:

source_dir: ~/.memarch/archive
output_dir: ~/.hermes/memory
model: claude-3-5-sonnet-20241022
schedule: daily
lookback_days: 7

extraction:
  decisions: true
  errors_resolved: true
  architectural_facts: true
  repeated_constraints: true
  min_occurrences_for_promotion: 2

output:
  format: markdown
  max_file_size_tokens: 2000
  split_by_project: true

The min_occurrences_for_promotion: 2 setting means a fact has to appear at least twice in the archive before Hermes treats it as durable. This filters out one-off observations that aren’t worth carrying forward.

Step 2: Run Your First Hermes Pass

After accumulating a few sessions in Memarch, trigger a manual Hermes pass:

hermes run --project my-project

Hermes will:

Read the JSONL files from your Memarch archive
Pass batched content through Claude (or your chosen model) with extraction prompts
Rank and deduplicate extracted facts
Write structured Markdown to your memory output directory

Your first pass will likely produce a rough output. Review it and adjust the curation config to tune precision.

Step 3: Set Up Scheduled Runs

For ongoing use, configure Hermes to run automatically. On macOS/Linux, add a cron job:

0 9 * * * hermes run --project my-project --quiet

This runs Hermes every morning at 9am, processing whatever was captured the previous day. Your Tier 2 memory files stay current without manual effort.

Connecting the Tiers: Making It Work with Claude Code

The final step is wiring Tier 2 output into Claude Code’s active context automatically.

Step 1: Update CLAUDE.md to Reference Hermes Output

Replace static CLAUDE.md content with a dynamic reference. In your project root:

# Project Memory

This file is generated. Edit ~/.hermes/config.yaml to adjust what's captured.

{{HERMES_OUTPUT: my-project}}

You’ll use a pre-session script (see below) to resolve this token before Claude Code starts.

Step 2: Create a Session Initialization Script

Write a small script that pulls the latest Hermes output and injects it into CLAUDE.md before you start working:

#!/bin/bash
# init-session.sh

PROJECT="my-project"
HERMES_MEMORY="$HOME/.hermes/memory/$PROJECT/latest.md"
CLAUDE_MD="./CLAUDE.md"

if [ -f "$HERMES_MEMORY" ]; then
  echo "# Project Memory" > "$CLAUDE_MD"
  echo "" >> "$CLAUDE_MD"
  cat "$HERMES_MEMORY" >> "$CLAUDE_MD"
  echo "Session context loaded from Hermes ($PROJECT)"
else
  echo "No Hermes memory found for $PROJECT — starting fresh"
fi

Run this before opening Claude Code in any project session. You can alias it or integrate it into your shell profile.

Step 3: Let Hermes Write, Not You

The main discipline shift here is stopping manual edits to CLAUDE.md. If you want Claude to remember something permanently, add it to a Hermes “pinned facts” file instead of hand-editing the output of a curation pass:

# ~/.hermes/pinned/my-project.yaml
pinned:
  - "The production database is read-only for Claude — never generate migration scripts that auto-run"
  - "Code style: prefer explicit returns in all functions"

Pinned facts always appear at the top of Hermes output, regardless of the curation pass. This gives you a safety valve for critical constraints without breaking the automated pipeline.

Troubleshooting Common Issues

Hermes is promoting too much noise

Lower min_occurrences_for_promotion to 3 or higher. You can also add patterns to an exclusion list in the Hermes config to filter out categories of events that are consistently low-value for your workflow.

Claude’s context window is getting overloaded

Set a hard token limit in max_file_size_tokens. When Hermes output exceeds the limit, it drops lower-ranked facts first. If your project is large, consider splitting into sub-project memories and loading only the relevant one per session.

Memarch sessions aren’t closing cleanly

If Claude Code exits unexpectedly, Memarch may leave sessions open. Run memarch repair to close and timestamp any orphaned sessions. These will still be processed by Hermes normally on the next pass.

Hermes output is too sparse early on

The system needs a few weeks of usage before the archive has enough signal to produce rich curated memory. In the first week, supplement with manual pinned facts to cover what Hermes hasn’t had time to infer.

Where MindStudio Fits Into This Stack

Plans first. Then code.

PROJECTYOUR APP

SCREENS12

DB TABLES6

BUILT BYREMY

1280 px · TYP.

yourapp.msagent.ai

A · UI · FRONT END

Remy writes the spec, manages the build, and ships the app.

The Memarch/Hermes system works well for developers comfortable managing CLI tools and config files. But there’s a meaningful gap for teams where not everyone is deep in terminal workflows — or where you want the memory pipeline to do more than feed Claude Code.

MindStudio’s Agent Skills Plugin lets you expose MindStudio agents — including memory pipelines built as visual workflows — to Claude Code and other AI systems as simple method calls. You can build a Hermes-style curation workflow in MindStudio’s no-code builder, connect it to your preferred storage (Notion, Airtable, Google Workspace), and call it from Claude Code via the @mindstudio-ai/agent SDK:

import { MindStudio } from '@mindstudio-ai/agent';

const agent = new MindStudio();
const memory = await agent.runWorkflow('hermes-curation', { project: 'my-project' });

This is useful if your team wants curated project memory stored somewhere collaborative — a shared Notion database, for example — rather than in individual developer dotfiles. Everyone’s sessions feed the same archive, and everyone benefits from the same curated memory on session start.

MindStudio’s visual builder also makes it easy to customize the curation logic without editing YAML configs. You can build branching rules (e.g., “if the session touched the payments module, always include the PCI compliance constraints”) without writing a line of code.

You can try MindStudio free at mindstudio.ai.

Frequently Asked Questions

What is a hybrid AI memory system?

A hybrid AI memory system combines two complementary approaches: automatic capture (recording everything without filtering) and intelligent curation (extracting durable facts from the raw record). Neither works as well alone. Pure capture generates too much noise. Pure curation misses things you didn’t know mattered. Together, they give AI agents like Claude Code persistent, useful context across sessions.

How is this different from just using CLAUDE.md?

CLAUDE.md is a static document you maintain manually. A hybrid system using Memarch and Hermes makes CLAUDE.md dynamic — it’s generated from your actual session history rather than written by hand. The difference matters as projects grow: manual CLAUDE.md files tend to go stale or bloat, while Hermes output stays current and focused because it’s continuously regenerated from real usage.

Does Claude Code natively support persistent memory?

Claude Code doesn’t have built-in cross-session memory in the way this architecture provides. It reads CLAUDE.md at session start and builds context during a session, but nothing is automatically written back to persistent storage when a session ends. Memarch and Hermes add that persistence layer on top of Claude Code’s native capabilities. The Anthropic documentation on Claude Code covers the CLAUDE.md approach but doesn’t include automatic memory pipelines.

How much does this cost to run?

The main cost is the LLM calls Hermes makes during curation passes. With claude-3-5-sonnet and a typical developer workflow (3–5 sessions per day), a daily curation pass over 7 days of archive data costs roughly $0.10–0.30 per run depending on session length. If cost is a concern, you can configure Hermes to use a smaller model for routine extraction passes and reserve Sonnet for the weekly full-archive consolidation.

Can this work for a team, not just an individual developer?

✗ VIBE-CODED APP

Tangled. Half-built. Brittle.

✓ AN APP, MANAGED BY REMY

UIReact + Tailwind✓

APIValidated routes✓

DBPostgres + auth✓

DEPLOYProduction-ready✓

Architected. End to end.

Built like a system. Not vibe-coded.

Remy manages the project — every layer architected, not stitched together at the last second.

Yes, with some additional setup. The main requirement is a shared archive location — an S3 bucket, shared NFS mount, or cloud sync directory that all team members write to. Hermes can then run as a shared process that produces team-level memory files alongside individual ones. You can gate which facts get promoted to team memory (e.g., only architectural decisions, not individual debugging notes).

Is the memory system project-specific or global?

Both are supported. Memarch records sessions with project tags (derived from your working directory), and Hermes can output project-specific memory files. You can also configure a “global” memory tier that captures facts applicable across all your projects — your coding style preferences, common tooling patterns, organization-wide constraints. The three-tier architecture is the same; you’re just adding a fourth scope.

Key Takeaways

Memarch captures everything automatically — no manual tagging, no decisions about what to save during a session
Hermes curates what matters — LLM-assisted extraction turns a raw archive into clean, ranked memory that Claude Code can actually use
The three-tier system (raw archive → curated memory → active context) separates capture, curation, and retrieval into distinct, manageable layers
CLAUDE.md becomes generated output, not a document you maintain — the system updates itself as you work
Pinned facts give you a safety valve for critical constraints that should always be in context, regardless of curation pass results
MindStudio can host the curation pipeline if you want a collaborative or no-code approach to memory management across a team

If you’ve been fighting context loss in Claude Code, this architecture solves it without adding meaningful friction to your workflow. The setup takes a few hours; the payoff compounds with every session after that.