How to Add Persistent Memory to Claude Code: Short-Term, Long-Term, and Scoped Access

Claude Code Doesn’t Remember You — Here’s How to Fix That

Claude Code is a capable coding agent. But it has a fundamental limitation: every session starts fresh. Ask it about a decision you made last week, and it has no idea what you’re talking about. Reference a codebase convention your team established months ago, and you’ll need to re-explain it from scratch.

This isn’t a bug — it’s how stateless AI sessions work by design. But it is a real problem for teams using Claude Code as a serious development tool. The solution is building a persistent memory system that Claude Code can read from and write to, across sessions, across team members, and with the right level of access control.

This guide covers exactly how to do that: short-term session memory, long-term semantic storage, and scoped access for teams. It’s practical, not theoretical.

Why Stateless AI Is a Real Workflow Problem

When you work with Claude Code across multiple sessions, you’re constantly re-teaching it things it should already know:

Your project’s architectural patterns
Why you made specific technical decisions
Which libraries you’ve ruled out and why
Team conventions that aren’t obvious from the code itself
Past bugs and how they were resolved

Without persistent memory, Claude Code treats every conversation as if it’s meeting your codebase for the first time. This creates duplicated effort, inconsistent suggestions, and a lot of copy-pasting context into every new session.

Everyone else built a construction worker.
We built the contractor.

🦺

CODING AGENT

Types the code you tell it to.
One file at a time.

🧠

CONTRACTOR · REMY

Runs the entire build.
UI, API, database, deploy.

The workaround most developers use — dumping a wall of context into the system prompt — only goes so far. You hit token limits quickly, and it’s hard to keep that context fresh and organized.

A real memory system changes this. Instead of re-explaining everything, Claude Code queries what it needs, writes what it learns, and builds up a useful knowledge base over time.

The Three Types of Memory Claude Code Needs

Not all memory is the same. Before building anything, it’s worth being clear about what kind of memory you actually need.

Short-Term Session Memory

This is the working memory for a single session. It holds:

The current file being edited
Recent tool call results
Decisions made earlier in this conversation
Intermediate reasoning steps

Short-term memory doesn’t need to persist across sessions. It just needs to be structured well enough that Claude Code can refer back to it within a single working context window. This is mostly handled by how you structure your system prompt and conversation history, but a lightweight in-memory store can help when context gets long.

Long-Term Persistent Memory

This is the knowledge base that survives between sessions. It contains:

Architectural decisions and the reasoning behind them
Codebase conventions and patterns
Bug postmortems and resolved issues
Technical debt notes
Dependency decisions

Long-term memory needs actual storage — a database, vector store, or structured file system. It also needs semantic search so Claude Code can retrieve relevant context based on meaning, not just keywords.

Scoped Team Memory

This is long-term memory with access control. Different team members (or different Claude Code instances) should have different read/write permissions. Examples:

A junior developer’s Claude Code instance can read architectural docs but can’t overwrite them
Security-sensitive context is only accessible to certain agent configurations
Project-specific memory is isolated from other projects

Scoping prevents memory pollution and keeps sensitive information appropriately contained.

Building Short-Term Session Memory

For a single session, the simplest approach is a structured scratchpad that Claude Code can update as the conversation progresses.

Use a Structured System Prompt Template

Start each session with a template that reserves space for key working context:

## Current Session Context
- Active file: [FILE]
- Session goal: [GOAL]
- Decisions made this session: [empty at start]
- Open questions: [empty at start]

## Project Context
[Pulled from long-term memory — see next section]

Claude Code fills in the “decisions made” and “open questions” sections as the session progresses. You can do this with a simple updateScratchpad tool call pattern.

Implement a Scratchpad Tool

Give Claude Code a tool that lets it update its own working notes:

def update_scratchpad(key: str, value: str) -> dict:
    """
    Update a field in the current session scratchpad.
    Returns the updated scratchpad state.
    """
    scratchpad[key] = value
    return {"status": "updated", "scratchpad": scratchpad}

This is simple but effective. Claude Code can now write notes to itself mid-session (“Decided to use PostgreSQL instead of SQLite because of concurrent write requirements”) and reference them later.

Track Tool Call Results

If Claude Code is running shell commands, file reads, or API calls, store the results in a rolling buffer rather than relying on the full conversation history. This keeps context focused on what’s actually relevant.

tool_call_log = []

def log_tool_result(tool_name: str, args: dict, result: any):
    tool_call_log.append({
        "timestamp": datetime.now().isoformat(),
        "tool": tool_name,
        "args": args,
        "result": result
    })
    # Keep only the last N results to manage token usage
    if len(tool_call_log) > 20:
        tool_call_log.pop(0)

REMY IS NOT

✕a coding agent
✕no-code
✕vibe coding
✕a faster Cursor

IT IS

✓a general contractor for software

The one that tells the coding agents what to build.

Short-term memory doesn’t need to be complicated. The goal is just to reduce repetition within a session and keep Claude Code oriented.

Building Long-Term Persistent Memory with Semantic Search

Long-term memory is where the real complexity lives. You need a storage layer, a retrieval mechanism, and a way for Claude Code to write new memories without creating noise.

Choose Your Storage Layer

There are three practical options, depending on your scale and setup:

1. A vector database (recommended for most teams)

Vector databases like Pinecone, Weaviate, Chroma, or pgvector store text as embeddings, which makes semantic search possible. You can find records that are conceptually related to a query, not just keyword-matched.

For most Claude Code setups, a locally-run option like Chroma or pgvector (if you’re already on Postgres) is the easiest starting point. Cloud-hosted vector DBs make sense when you have a team spread across machines.

2. A simple JSON or SQLite store

If your knowledge base is small and you don’t need semantic search, a flat file or SQLite database is perfectly adequate. Use this when you have fewer than a few hundred memory entries and can get by with keyword search.

3. A hybrid approach

Store structured metadata (project name, date, tags, author) in a relational database, and store the full text as embeddings in a vector database. Query both layers together for the best results.

Set Up the Embedding Pipeline

To do semantic search, you need to convert text to embeddings. Here’s a minimal setup using OpenAI’s embedding API (you can swap in any embedding model):

from openai import OpenAI
import chromadb

client = OpenAI()
chroma_client = chromadb.Client()
collection = chroma_client.create_collection("claude_memory")

def store_memory(content: str, metadata: dict):
    """
    Store a memory entry with its embedding.
    metadata should include: source, session_id, project, timestamp, tags
    """
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=content
    )
    embedding = response.data[0].embedding
    
    collection.add(
        documents=[content],
        embeddings=[embedding],
        metadatas=[metadata],
        ids=[generate_memory_id()]
    )

def retrieve_memories(query: str, n_results: int = 5, filter: dict = None):
    """
    Retrieve the most semantically relevant memories for a query.
    """
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=query
    )
    query_embedding = response.data[0].embedding
    
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=n_results,
        where=filter  # for scoped access
    )
    return results

Give Claude Code Memory Read/Write Tools

The memory system only works if Claude Code can actually use it. Give it two core tools:

remember() — Write a new memory:

def remember(content: str, tags: list[str], project: str, importance: str = "normal"):
    """
    Store something worth remembering for future sessions.
    Use this when a decision is made, a pattern is established, or a significant finding is reached.
    """
    metadata = {
        "project": project,
        "tags": ",".join(tags),
        "importance": importance,
        "session_id": current_session_id,
        "timestamp": datetime.now().isoformat(),
        "author": current_user
    }
    store_memory(content, metadata)
    return {"status": "stored", "preview": content[:100]}

recall() — Query the memory store:

def recall(query: str, project: str = None, limit: int = 5):
    """
    Search long-term memory for relevant context.
    Call this at the start of a session or when hitting a decision point.
    """
    filter = {"project": project} if project else None
    results = retrieve_memories(query, n_results=limit, filter=filter)
    return {
        "memories": results["documents"][0],
        "metadata": results["metadatas"][0]
    }

Inject Memory at Session Start

Hermes Crash Course — free 1-hour live workshop

The most reliable pattern is automatic memory injection. When a new Claude Code session starts, run a retrieval query based on the session’s stated goal, and inject the top results into the system prompt:

def build_session_context(goal: str, project: str) -> str:
    relevant_memories = recall(query=goal, project=project, limit=8)
    
    memory_block = "\n".join([
        f"- [{m['metadata']['timestamp'][:10]}] {m['document']}"
        for m in zip(relevant_memories["memories"], relevant_memories["metadata"])
    ])
    
    return f"""
## Relevant Project Context (from memory)
{memory_block}

## Current Session Goal
{goal}
"""

This gives Claude Code a head start without requiring you to re-explain your project every time.

Cited Sources and Audit Trails

One of the most overlooked parts of a memory system is provenance. When Claude Code retrieves a memory and uses it to make a decision, you want to know:

When was this memory created?
Who or what created it?
What session or conversation is it from?
Has it been revised?

Without this, your memory store becomes an unverifiable black box. A bad memory could silently influence decisions without anyone knowing where it came from.

Always Store Source Metadata

Every memory entry should include at minimum:

{
    "content": "We decided to use Celery for background task processing because Redis pub/sub latency was inconsistent under load.",
    "source": "architecture_review_session",
    "session_id": "sess_abc123",
    "author": "claude-code",
    "human_confirmed": False,
    "timestamp": "2025-01-15T14:30:00Z",
    "project": "payments-service",
    "tags": ["architecture", "background-jobs", "celery"]
}

The human_confirmed field is important. By default, memories written by Claude Code are unconfirmed. A human developer can review and mark them as confirmed, giving them higher trust in future retrievals.

Build a Memory Review Interface

Consider building a simple UI or CLI tool that lets developers review, edit, and delete memories. This doesn’t need to be fancy — even a script that prints memories grouped by project and lets you confirm or reject them works well.

$ python memory_review.py --project payments-service --unconfirmed

[2025-01-15] (Unconfirmed)
"We decided to use Celery for background task processing because Redis pub/sub latency was inconsistent under load."
Tags: architecture, background-jobs, celery
Source: session sess_abc123

[c]onfirm / [e]dit / [d]elete / [s]kip:

This review loop keeps your memory store clean and trustworthy.

Log Every Memory Access

For auditability, log every recall() call alongside the session context in which it was used. This lets you trace why Claude Code made a particular decision:

Session sess_xyz456 recalled:
- "Use Celery for background jobs" (retrieved for query: "task queue approach")
- "Avoid Celery Beat for scheduling — use APScheduler" (retrieved for query: "task scheduling")
Decision output: Recommended Celery + APScheduler combination

This kind of audit trail is valuable for debugging, compliance, and maintaining trust in your AI-assisted development process. The Anthropic model specification emphasizes transparency and corrigibility as core properties — your memory system should reflect that.

Implementing Team-Scoped and Role-Scoped Access

Once more than one person is using Claude Code on a shared codebase, you need to think about who can read and write what.

Define Your Scope Hierarchy

A practical three-level scope hierarchy:

Global — Available to all team members across all projects (e.g., company-wide coding standards)
Project — Available to all team members on a specific project
Personal — Private to a specific developer’s Claude Code instance

Each memory entry should have a scope field, and retrieval should filter by the appropriate scope for the current user and project.

Implement Scope-Based Filtering

def recall_scoped(query: str, user: str, project: str, limit: int = 8):
    """
    Retrieve memories from all applicable scopes for this user and project.
    """
    # Build a combined result from all relevant scopes
    scopes = ["global", f"project:{project}", f"personal:{user}"]
    
    all_results = []
    for scope in scopes:
        results = retrieve_memories(
            query=query,
            n_results=limit // len(scopes),
            filter={"scope": scope}
        )
        all_results.extend(zip(results["documents"][0], results["metadatas"][0]))
    
    # Re-rank by relevance score and return top results
    return sorted(all_results, key=lambda x: x[1].get("score", 0), reverse=True)[:limit]

Set Write Permissions Per Scope

Not everyone should be able to write to global or project-scoped memory. Enforce write permissions at the tool level:

def remember_scoped(content: str, scope: str, user: str, project: str, tags: list):
    """
    Write a memory to the appropriate scope, if permitted.
    """
    write_permissions = {
        "global": ["admin", "senior-engineer"],
        "project": ["team-member", "senior-engineer", "admin"],
        "personal": ["any"]
    }
    
    user_role = get_user_role(user)
    allowed_roles = write_permissions.get(scope.split(":")[0], [])
    
    if "any" not in allowed_roles and user_role not in allowed_roles:
        return {"error": f"User {user} does not have permission to write to {scope} scope"}
    
    metadata = {
        "scope": scope,
        "project": project,
        "author": user,
        "tags": ",".join(tags),
        "timestamp": datetime.now().isoformat()
    }
    store_memory(content, metadata)
    return {"status": "stored"}

Handle Memory Conflicts

When multiple team members are writing memories for the same project, conflicts happen. The same decision might be recorded differently by different people.

A simple conflict resolution strategy:

Recency wins by default — newer memories appear first in retrieval results
Human-confirmed memories rank higher than agent-written ones
Tag duplicates when detected and surface them for human review

You can detect potential duplicates by checking cosine similarity between new memories and existing ones in the same scope:

def check_for_duplicates(new_embedding, scope: str, threshold: float = 0.92):
    results = collection.query(
        query_embeddings=[new_embedding],
        n_results=3,
        where={"scope": scope}
    )
    for score in results["distances"][0]:
        if score > threshold:
            return True
    return False

Common Mistakes and How to Avoid Them

Writing Too Much to Memory

Not everything is worth storing. If Claude Code writes a memory for every minor observation, the store fills with noise and retrieval quality drops.

Set clear guidelines in your system prompt about when to use remember():

Use remember() ONLY when:
- A significant architectural or technical decision is made
- A non-obvious pattern is established
- A recurring error is diagnosed and resolved
- A team convention is explicitly agreed upon

Do NOT use remember() for:
- Routine observations
- Temporary workarounds
- Information already documented in the codebase

Not Cleaning Up Stale Memories

A memory about a library you deprecated two years ago can actively mislead Claude Code today. Set up a periodic review process — even monthly — to flag memories older than a certain threshold for human review.

def find_stale_memories(project: str, days_threshold: int = 180):
    cutoff = (datetime.now() - timedelta(days=days_threshold)).isoformat()
    return collection.get(
        where={"$and": [{"project": project}, {"timestamp": {"$lt": cutoff}}]}
    )

Ignoring Context Window Limits

Injecting too many memories at session start can eat up your context budget. Keep injected memory concise — prefer summaries over full verbatim entries when retrieval results are long.

Not Testing Retrieval Quality

Other agents start typing. Remy starts asking.

YOU SAID "Build me a sales CRM."

REMY ASKS

01 DESIGN Should it feel like Linear, or Salesforce?

02 UX How do reps move deals — drag, or dropdown?

03 ARCH Single team, or multi-org with permissions?

Scoping, trade-offs, edge cases — the real work. Before a line of code.

Retrieval only works if the right memories come back for the right queries. Test your system regularly by running sample queries and checking whether the returned memories are genuinely relevant. Adjust your embedding model or chunking strategy if retrieval quality is poor.

How MindStudio Fits Into This Architecture

Building and maintaining a memory system from scratch is a reasonable amount of work. If you want to skip the infrastructure layer and connect Claude Code to memory, retrieval, and workflow tools without managing your own vector database setup, MindStudio’s Agent Skills Plugin is worth looking at.

The plugin (@mindstudio-ai/agent) is an npm SDK that lets any AI agent — including Claude Code — call MindStudio’s typed capabilities as simple method calls. Instead of wiring up your own embedding pipeline, storage layer, and retrieval logic, you call methods like agent.searchKnowledgeBase() or agent.runWorkflow() directly from your agent code.

The infrastructure concerns — rate limiting, retries, auth — are handled for you. Claude Code can focus on reasoning about what to remember and when to recall, rather than managing the plumbing.

For teams that want to go further, MindStudio also lets you build full multi-agent workflows where Claude Code is one node in a larger system — reading from a shared memory store, passing context to other agents, and writing outputs back to a central knowledge base. This is particularly useful when you have multiple agents working on the same codebase and need a single source of truth for shared context.

You can try MindStudio free at mindstudio.ai.

Frequently Asked Questions

Does Claude Code have any built-in memory?

Claude Code does not persist memory between sessions by default. Each new session starts with only what’s in the system prompt and conversation history. Anthropic’s Claude models do support a large context window, but that context is not saved automatically — you have to build the persistence layer yourself.

What’s the best vector database for a Claude Code memory system?

For local or small-team setups, Chroma is the easiest to get started with — it runs in-process with no infrastructure setup. For production or team-wide deployments, pgvector (Postgres extension) is a solid choice if you’re already using Postgres, and Pinecone or Weaviate work well for fully managed cloud setups. The right choice depends more on your existing stack than on any inherent performance difference.

How do I prevent Claude Code from writing bad memories?

The most effective approach is a human confirmation step. Memories written by Claude Code are flagged as unconfirmed by default and ranked lower in retrieval. A developer reviews and confirms them before they’re treated as authoritative. You can also add a validation step that checks new memories for obvious issues (very short, very vague, duplicates of existing entries) before they’re stored.

How much memory context should I inject at the start of each session?

This depends on your model’s context window and the complexity of your session goal. A practical starting point is 6–10 memories, summarized to 2–3 sentences each, injected at the top of the system prompt. This keeps memory context under 500–800 tokens while still providing meaningful starting context. Expand this if you’re working on a particularly complex task that requires more background.

✗ VIBE-CODED APP

Tangled. Half-built. Brittle.

✓ AN APP, MANAGED BY REMY

UIReact + Tailwind✓

APIValidated routes✓

DBPostgres + auth✓

DEPLOYProduction-ready✓

Architected. End to end.

Built like a system. Not vibe-coded.

Remy manages the project — every layer architected, not stitched together at the last second.

Yes, and this is one of the most valuable setups for teams. If you use a shared vector database and scope memories by project, multiple Claude Code instances (running for different developers or different tasks) can all read from and write to the same knowledge base. The key is implementing proper write permissions so one instance doesn’t overwrite another’s memories without review.

How do I handle memory about sensitive information like API keys or credentials?

Never store secrets in the memory system. Use environment variables and secrets managers for credentials. If a memory needs to reference a secret, store a reference to where the secret is stored (e.g., “Production DB password is in AWS Secrets Manager under /prod/db/password”) rather than the secret itself. Consider encrypting memory stores that contain security-sensitive architectural context.

Key Takeaways

Claude Code has no built-in persistence — every session starts from scratch unless you build a memory layer
Short-term memory handles working context within a session; long-term memory survives between sessions
Semantic search via embeddings is the right retrieval mechanism for a knowledge base that grows over time
Every memory entry should carry source metadata and timestamps so you can audit why Claude Code made a given decision
Team memory needs scope-based access control — global, project, and personal tiers with clear write permissions
Keep your memory store clean by reviewing stale entries and setting clear guidelines for what’s worth remembering

If you want to build this kind of system without managing the infrastructure yourself, MindStudio gives Claude Code access to knowledge retrieval, workflow tools, and shared memory through a simple SDK — no database setup required. Start free and add the memory layer your AI development workflow actually needs.