How to Add Persistent Memory to Claude Code: Short-Term, Long-Term, and Scoped Access
Claude Code forgets everything between sessions. Learn how to build a memory system with cited sources, semantic search, and team-scoped access.
Claude Code Doesn’t Remember You — Here’s How to Fix That
Claude Code is a capable coding agent. But it has a fundamental limitation: every session starts fresh. Ask it about a decision you made last week, and it has no idea what you’re talking about. Reference a codebase convention your team established months ago, and you’ll need to re-explain it from scratch.
This isn’t a bug — it’s how stateless AI sessions work by design. But it is a real problem for teams using Claude Code as a serious development tool. The solution is building a persistent memory system that Claude Code can read from and write to, across sessions, across team members, and with the right level of access control.
This guide covers exactly how to do that: short-term session memory, long-term semantic storage, and scoped access for teams. It’s practical, not theoretical.
Why Stateless AI Is a Real Workflow Problem
When you work with Claude Code across multiple sessions, you’re constantly re-teaching it things it should already know:
- Your project’s architectural patterns
- Why you made specific technical decisions
- Which libraries you’ve ruled out and why
- Team conventions that aren’t obvious from the code itself
- Past bugs and how they were resolved
Without persistent memory, Claude Code treats every conversation as if it’s meeting your codebase for the first time. This creates duplicated effort, inconsistent suggestions, and a lot of copy-pasting context into every new session.
Everyone else built a construction worker.
We built the contractor.
One file at a time.
UI, API, database, deploy.
The workaround most developers use — dumping a wall of context into the system prompt — only goes so far. You hit token limits quickly, and it’s hard to keep that context fresh and organized.
A real memory system changes this. Instead of re-explaining everything, Claude Code queries what it needs, writes what it learns, and builds up a useful knowledge base over time.
The Three Types of Memory Claude Code Needs
Not all memory is the same. Before building anything, it’s worth being clear about what kind of memory you actually need.
Short-Term Session Memory
This is the working memory for a single session. It holds:
- The current file being edited
- Recent tool call results
- Decisions made earlier in this conversation
- Intermediate reasoning steps
Short-term memory doesn’t need to persist across sessions. It just needs to be structured well enough that Claude Code can refer back to it within a single working context window. This is mostly handled by how you structure your system prompt and conversation history, but a lightweight in-memory store can help when context gets long.
Long-Term Persistent Memory
This is the knowledge base that survives between sessions. It contains:
- Architectural decisions and the reasoning behind them
- Codebase conventions and patterns
- Bug postmortems and resolved issues
- Technical debt notes
- Dependency decisions
Long-term memory needs actual storage — a database, vector store, or structured file system. It also needs semantic search so Claude Code can retrieve relevant context based on meaning, not just keywords.
Scoped Team Memory
This is long-term memory with access control. Different team members (or different Claude Code instances) should have different read/write permissions. Examples:
- A junior developer’s Claude Code instance can read architectural docs but can’t overwrite them
- Security-sensitive context is only accessible to certain agent configurations
- Project-specific memory is isolated from other projects
Scoping prevents memory pollution and keeps sensitive information appropriately contained.
Building Short-Term Session Memory
For a single session, the simplest approach is a structured scratchpad that Claude Code can update as the conversation progresses.
Use a Structured System Prompt Template
Start each session with a template that reserves space for key working context:
## Current Session Context
- Active file: [FILE]
- Session goal: [GOAL]
- Decisions made this session: [empty at start]
- Open questions: [empty at start]
## Project Context
[Pulled from long-term memory — see next section]
Claude Code fills in the “decisions made” and “open questions” sections as the session progresses. You can do this with a simple updateScratchpad tool call pattern.
Implement a Scratchpad Tool
Give Claude Code a tool that lets it update its own working notes:
def update_scratchpad(key: str, value: str) -> dict:
"""
Update a field in the current session scratchpad.
Returns the updated scratchpad state.
"""
scratchpad[key] = value
return {"status": "updated", "scratchpad": scratchpad}
This is simple but effective. Claude Code can now write notes to itself mid-session (“Decided to use PostgreSQL instead of SQLite because of concurrent write requirements”) and reference them later.
Track Tool Call Results
If Claude Code is running shell commands, file reads, or API calls, store the results in a rolling buffer rather than relying on the full conversation history. This keeps context focused on what’s actually relevant.
tool_call_log = []
def log_tool_result(tool_name: str, args: dict, result: any):
tool_call_log.append({
"timestamp": datetime.now().isoformat(),
"tool": tool_name,
"args": args,
"result": result
})
# Keep only the last N results to manage token usage
if len(tool_call_log) > 20:
tool_call_log.pop(0)
- ✕a coding agent
- ✕no-code
- ✕vibe coding
- ✕a faster Cursor
The one that tells the coding agents what to build.
Short-term memory doesn’t need to be complicated. The goal is just to reduce repetition within a session and keep Claude Code oriented.
Building Long-Term Persistent Memory with Semantic Search
Long-term memory is where the real complexity lives. You need a storage layer, a retrieval mechanism, and a way for Claude Code to write new memories without creating noise.
Choose Your Storage Layer
There are three practical options, depending on your scale and setup:
1. A vector database (recommended for most teams)
Vector databases like Pinecone, Weaviate, Chroma, or pgvector store text as embeddings, which makes semantic search possible. You can find records that are conceptually related to a query, not just keyword-matched.
For most Claude Code setups, a locally-run option like Chroma or pgvector (if you’re already on Postgres) is the easiest starting point. Cloud-hosted vector DBs make sense when you have a team spread across machines.
2. A simple JSON or SQLite store
If your knowledge base is small and you don’t need semantic search, a flat file or SQLite database is perfectly adequate. Use this when you have fewer than a few hundred memory entries and can get by with keyword search.
3. A hybrid approach
Store structured metadata (project name, date, tags, author) in a relational database, and store the full text as embeddings in a vector database. Query both layers together for the best results.
Set Up the Embedding Pipeline
To do semantic search, you need to convert text to embeddings. Here’s a minimal setup using OpenAI’s embedding API (you can swap in any embedding model):
from openai import OpenAI
import chromadb
client = OpenAI()
chroma_client = chromadb.Client()
collection = chroma_client.create_collection("claude_memory")
def store_memory(content: str, metadata: dict):
"""
Store a memory entry with its embedding.
metadata should include: source, session_id, project, timestamp, tags
"""
response = client.embeddings.create(
model="text-embedding-3-small",
input=content
)
embedding = response.data[0].embedding
collection.add(
documents=[content],
embeddings=[embedding],
metadatas=[metadata],
ids=[generate_memory_id()]
)
def retrieve_memories(query: str, n_results: int = 5, filter: dict = None):
"""
Retrieve the most semantically relevant memories for a query.
"""
response = client.embeddings.create(
model="text-embedding-3-small",
input=query
)
query_embedding = response.data[0].embedding
results = collection.query(
query_embeddings=[query_embedding],
n_results=n_results,
where=filter # for scoped access
)
return results
Give Claude Code Memory Read/Write Tools
The memory system only works if Claude Code can actually use it. Give it two core tools:
remember() — Write a new memory:
def remember(content: str, tags: list[str], project: str, importance: str = "normal"):
"""
Store something worth remembering for future sessions.
Use this when a decision is made, a pattern is established, or a significant finding is reached.
"""
metadata = {
"project": project,
"tags": ",".join(tags),
"importance": importance,
"session_id": current_session_id,
"timestamp": datetime.now().isoformat(),
"author": current_user
}
store_memory(content, metadata)
return {"status": "stored", "preview": content[:100]}
recall() — Query the memory store:
def recall(query: str, project: str = None, limit: int = 5):
"""
Search long-term memory for relevant context.
Call this at the start of a session or when hitting a decision point.
"""
filter = {"project": project} if project else None
results = retrieve_memories(query, n_results=limit, filter=filter)
return {
"memories": results["documents"][0],
"metadata": results["metadatas"][0]
}
Inject Memory at Session Start
The most reliable pattern is automatic memory injection. When a new Claude Code session starts, run a retrieval query based on the session’s stated goal, and inject the top results into the system prompt:
def build_session_context(goal: str, project: str) -> str:
relevant_memories = recall(query=goal, project=project, limit=8)
memory_block = "\n".join([
f"- [{m['metadata']['timestamp'][:10]}] {m['document']}"
for m in zip(relevant_memories["memories"], relevant_memories["metadata"])
])
return f"""
## Relevant Project Context (from memory)
{memory_block}
## Current Session Goal
{goal}
"""
This gives Claude Code a head start without requiring you to re-explain your project every time.
Cited Sources and Audit Trails
One of the most overlooked parts of a memory system is provenance. When Claude Code retrieves a memory and uses it to make a decision, you want to know:
- When was this memory created?
- Who or what created it?
- What session or conversation is it from?
- Has it been revised?
Without this, your memory store becomes an unverifiable black box. A bad memory could silently influence decisions without anyone knowing where it came from.
Always Store Source Metadata
Every memory entry should include at minimum:
{
"content": "We decided to use Celery for background task processing because Redis pub/sub latency was inconsistent under load.",
"source": "architecture_review_session",
"session_id": "sess_abc123",
"author": "claude-code",
"human_confirmed": False,
"timestamp": "2025-01-15T14:30:00Z",
"project": "payments-service",
"tags": ["architecture", "background-jobs", "celery"]
}
The human_confirmed field is important. By default, memories written by Claude Code are unconfirmed. A human developer can review and mark them as confirmed, giving them higher trust in future retrievals.
Build a Memory Review Interface
Consider building a simple UI or CLI tool that lets developers review, edit, and delete memories. This doesn’t need to be fancy — even a script that prints memories grouped by project and lets you confirm or reject them works well.
$ python memory_review.py --project payments-service --unconfirmed
[2025-01-15] (Unconfirmed)
"We decided to use Celery for background task processing because Redis pub/sub latency was inconsistent under load."
Tags: architecture, background-jobs, celery
Source: session sess_abc123
[c]onfirm / [e]dit / [d]elete / [s]kip:
This review loop keeps your memory store clean and trustworthy.
Log Every Memory Access
For auditability, log every recall() call alongside the session context in which it was used. This lets you trace why Claude Code made a particular decision:
Session sess_xyz456 recalled:
- "Use Celery for background jobs" (retrieved for query: "task queue approach")
- "Avoid Celery Beat for scheduling — use APScheduler" (retrieved for query: "task scheduling")
Decision output: Recommended Celery + APScheduler combination
This kind of audit trail is valuable for debugging, compliance, and maintaining trust in your AI-assisted development process. The Anthropic model specification emphasizes transparency and corrigibility as core properties — your memory system should reflect that.
Implementing Team-Scoped and Role-Scoped Access
Once more than one person is using Claude Code on a shared codebase, you need to think about who can read and write what.
Define Your Scope Hierarchy
A practical three-level scope hierarchy:
- Global — Available to all team members across all projects (e.g., company-wide coding standards)
- Project — Available to all team members on a specific project
- Personal — Private to a specific developer’s Claude Code instance
Each memory entry should have a scope field, and retrieval should filter by the appropriate scope for the current user and project.
Implement Scope-Based Filtering
def recall_scoped(query: str, user: str, project: str, limit: int = 8):
"""
Retrieve memories from all applicable scopes for this user and project.
"""
# Build a combined result from all relevant scopes
scopes = ["global", f"project:{project}", f"personal:{user}"]
all_results = []
for scope in scopes:
results = retrieve_memories(
query=query,
n_results=limit // len(scopes),
filter={"scope": scope}
)
all_results.extend(zip(results["documents"][0], results["metadatas"][0]))
# Re-rank by relevance score and return top results
return sorted(all_results, key=lambda x: x[1].get("score", 0), reverse=True)[:limit]
Set Write Permissions Per Scope
Not everyone should be able to write to global or project-scoped memory. Enforce write permissions at the tool level:
def remember_scoped(content: str, scope: str, user: str, project: str, tags: list):
"""
Write a memory to the appropriate scope, if permitted.
"""
write_permissions = {
"global": ["admin", "senior-engineer"],
"project": ["team-member", "senior-engineer", "admin"],
"personal": ["any"]
}
user_role = get_user_role(user)
allowed_roles = write_permissions.get(scope.split(":")[0], [])
if "any" not in allowed_roles and user_role not in allowed_roles:
return {"error": f"User {user} does not have permission to write to {scope} scope"}
metadata = {
"scope": scope,
"project": project,
"author": user,
"tags": ",".join(tags),
"timestamp": datetime.now().isoformat()
}
store_memory(content, metadata)
return {"status": "stored"}
Handle Memory Conflicts
When multiple team members are writing memories for the same project, conflicts happen. The same decision might be recorded differently by different people.
A simple conflict resolution strategy:
- Recency wins by default — newer memories appear first in retrieval results
- Human-confirmed memories rank higher than agent-written ones
- Tag duplicates when detected and surface them for human review
You can detect potential duplicates by checking cosine similarity between new memories and existing ones in the same scope:
def check_for_duplicates(new_embedding, scope: str, threshold: float = 0.92):
results = collection.query(
query_embeddings=[new_embedding],
n_results=3,
where={"scope": scope}
)
for score in results["distances"][0]:
if score > threshold:
return True
return False
Common Mistakes and How to Avoid Them
Writing Too Much to Memory
Not everything is worth storing. If Claude Code writes a memory for every minor observation, the store fills with noise and retrieval quality drops.
Set clear guidelines in your system prompt about when to use remember():
Use remember() ONLY when:
- A significant architectural or technical decision is made
- A non-obvious pattern is established
- A recurring error is diagnosed and resolved
- A team convention is explicitly agreed upon
Do NOT use remember() for:
- Routine observations
- Temporary workarounds
- Information already documented in the codebase
Not Cleaning Up Stale Memories
A memory about a library you deprecated two years ago can actively mislead Claude Code today. Set up a periodic review process — even monthly — to flag memories older than a certain threshold for human review.
def find_stale_memories(project: str, days_threshold: int = 180):
cutoff = (datetime.now() - timedelta(days=days_threshold)).isoformat()
return collection.get(
where={"$and": [{"project": project}, {"timestamp": {"$lt": cutoff}}]}
)
Ignoring Context Window Limits
Injecting too many memories at session start can eat up your context budget. Keep injected memory concise — prefer summaries over full verbatim entries when retrieval results are long.
Not Testing Retrieval Quality
Other agents start typing. Remy starts asking.
Scoping, trade-offs, edge cases — the real work. Before a line of code.
Retrieval only works if the right memories come back for the right queries. Test your system regularly by running sample queries and checking whether the returned memories are genuinely relevant. Adjust your embedding model or chunking strategy if retrieval quality is poor.
How MindStudio Fits Into This Architecture
Building and maintaining a memory system from scratch is a reasonable amount of work. If you want to skip the infrastructure layer and connect Claude Code to memory, retrieval, and workflow tools without managing your own vector database setup, MindStudio’s Agent Skills Plugin is worth looking at.
The plugin (@mindstudio-ai/agent) is an npm SDK that lets any AI agent — including Claude Code — call MindStudio’s typed capabilities as simple method calls. Instead of wiring up your own embedding pipeline, storage layer, and retrieval logic, you call methods like agent.searchKnowledgeBase() or agent.runWorkflow() directly from your agent code.
The infrastructure concerns — rate limiting, retries, auth — are handled for you. Claude Code can focus on reasoning about what to remember and when to recall, rather than managing the plumbing.
For teams that want to go further, MindStudio also lets you build full multi-agent workflows where Claude Code is one node in a larger system — reading from a shared memory store, passing context to other agents, and writing outputs back to a central knowledge base. This is particularly useful when you have multiple agents working on the same codebase and need a single source of truth for shared context.
You can try MindStudio free at mindstudio.ai.
Frequently Asked Questions
Does Claude Code have any built-in memory?
Claude Code does not persist memory between sessions by default. Each new session starts with only what’s in the system prompt and conversation history. Anthropic’s Claude models do support a large context window, but that context is not saved automatically — you have to build the persistence layer yourself.
What’s the best vector database for a Claude Code memory system?
For local or small-team setups, Chroma is the easiest to get started with — it runs in-process with no infrastructure setup. For production or team-wide deployments, pgvector (Postgres extension) is a solid choice if you’re already using Postgres, and Pinecone or Weaviate work well for fully managed cloud setups. The right choice depends more on your existing stack than on any inherent performance difference.
How do I prevent Claude Code from writing bad memories?
The most effective approach is a human confirmation step. Memories written by Claude Code are flagged as unconfirmed by default and ranked lower in retrieval. A developer reviews and confirms them before they’re treated as authoritative. You can also add a validation step that checks new memories for obvious issues (very short, very vague, duplicates of existing entries) before they’re stored.
How much memory context should I inject at the start of each session?
This depends on your model’s context window and the complexity of your session goal. A practical starting point is 6–10 memories, summarized to 2–3 sentences each, injected at the top of the system prompt. This keeps memory context under 500–800 tokens while still providing meaningful starting context. Expand this if you’re working on a particularly complex task that requires more background.
Built like a system. Not vibe-coded.
Remy manages the project — every layer architected, not stitched together at the last second.
Can multiple Claude Code instances share the same memory?
Yes, and this is one of the most valuable setups for teams. If you use a shared vector database and scope memories by project, multiple Claude Code instances (running for different developers or different tasks) can all read from and write to the same knowledge base. The key is implementing proper write permissions so one instance doesn’t overwrite another’s memories without review.
How do I handle memory about sensitive information like API keys or credentials?
Never store secrets in the memory system. Use environment variables and secrets managers for credentials. If a memory needs to reference a secret, store a reference to where the secret is stored (e.g., “Production DB password is in AWS Secrets Manager under /prod/db/password”) rather than the secret itself. Consider encrypting memory stores that contain security-sensitive architectural context.
Key Takeaways
- Claude Code has no built-in persistence — every session starts from scratch unless you build a memory layer
- Short-term memory handles working context within a session; long-term memory survives between sessions
- Semantic search via embeddings is the right retrieval mechanism for a knowledge base that grows over time
- Every memory entry should carry source metadata and timestamps so you can audit why Claude Code made a given decision
- Team memory needs scope-based access control — global, project, and personal tiers with clear write permissions
- Keep your memory store clean by reviewing stale entries and setting clear guidelines for what’s worth remembering
If you want to build this kind of system without managing the infrastructure yourself, MindStudio gives Claude Code access to knowledge retrieval, workflow tools, and shared memory through a simple SDK — no database setup required. Start free and add the memory layer your AI development workflow actually needs.
