How to Set Up Claude Dreaming for Persistent Agent Memory: Step-by-Step Guide

Your Agent Forgets Everything When the Session Ends — Here’s How to Fix That in Under an Hour

Every agent you’ve built has the same problem. It finishes a task, does decent work, and then the session closes. Next time it runs, it starts from zero. Same mistakes, same suboptimal paths, same rediscovery of things it already figured out last Tuesday.

You’ve probably worked around this manually — writing notes into a system prompt, maintaining a memory.md file, or just accepting that your agent is perpetually amnesiac. None of those solutions scale. And none of them run automatically between sessions while you’re doing something else.

Anthropic’s answer to this is called Dreaming: a scheduled process that reviews your agent sessions and memory stores, extracts patterns, and curates memories between sessions so your agents improve over time. It shipped as part of the managed agents platform update at Code with Claude 2026. This guide walks through how to configure it for your own agent workflows, what it actually does under the hood, and where it breaks.

What Persistent Memory Actually Gets You

Before touching any configuration, be precise about what you’re solving. There are three distinct memory problems in agent systems, and Dreaming addresses a specific one.

Problem 1: In-session context. The agent forgets what it did three tool calls ago. This is a context window problem, not a memory problem. Dreaming doesn’t help here.

Hire a contractor. Not another power tool.

Cursor, Bolt, Lovable, v0 are tools. You still run the project.
With Remy, the project runs itself.

Problem 2: Cross-session state. The agent finishes a task, you close the session, and next time it has no idea what happened before. This is what most people mean when they say “my agent forgets everything.” Dreaming addresses this directly.

Problem 3: Emergent pattern recognition. Across many sessions, the agent keeps making the same class of mistake, or keeps converging on the same workaround, but no single session has enough signal to surface it. This is the harder problem — and it’s the one Dreaming is actually most interesting for.

The concrete outcome Anthropic describes: Dreaming “surfaces patterns that a single agent can’t see on its own, including recurring mistakes, workflows that agents converge on, and preferences shared across a team.” It also “restructures memory so it stays high signal as it evolves.” That second part matters. Naive memory systems accumulate noise. Dreaming is supposed to curate, not just append.

Think of it as the difference between a developer who keeps a running notes file versus one who periodically reviews their notes, discards the stale ones, and writes a clean summary. The first approach degrades over time. The second one compounds.

For multi-agent systems specifically — where a lead agent delegates to specialist sub-agents, each with their own model, prompts, and tools — Dreaming becomes the mechanism by which sub-agent learnings propagate back into the orchestration layer. A sub-agent that discovers a better approach to parsing a particular data format can encode that learning so the next invocation of that sub-agent starts with it pre-loaded.

What You Need Before You Start

Managed agents access. Dreaming is part of Anthropic’s managed agents platform, which launched in April with sandbox, state management, and error recovery. You need a Claude account with managed agents enabled. As of the Code with Claude event, this is available on Pro, Max, Team, and enterprise plans.

An existing agent workflow. Dreaming works on sessions that have already happened. If you’re starting from scratch, you need at least a few completed sessions before the scheduled review has anything to analyze. Build your agent first, run it a handful of times, then configure Dreaming.

A clear sense of what “good” looks like for your task. This is the part most people skip. Dreaming extracts patterns from sessions, but it needs signal about which patterns are worth encoding. If you haven’t thought about what successful session behavior looks like, you’ll get memory that’s just a statistical average of everything your agent did — including the bad runs.

Optional but useful: familiarity with the Outcomes feature. Dreaming and Outcomes are designed to work together. Outcomes lets you write a rubric for task success; a separate grading agent scores output against that rubric. The grading signal feeds into what Dreaming treats as “good” sessions worth learning from. Anthropic’s internal benchmarks showed an 8.4% improvement in Word document quality and 10.1% improvement in PowerPoint quality just from adding the grading agent — no model change, no prompt rewrite. If you’re not using Outcomes yet, Dreaming will still work, but you’re leaving signal on the table.

Day one: idea. Day one: app.

DAY

DELIVERED

Not a sprint plan. Not a quarterly OKR. A finished product by end of day.

If you want to understand the broader memory architecture before configuring Dreaming, the Claude Code source leak analysis covering the three-layer memory system is worth reading — it explains how memory.md functions as a pointer index, which is directly relevant to how Dreaming writes and reads persistent state.

Configuring Dreaming: Step by Step

Step 1: Enable managed agents and verify session logging

Navigate to Claude Console (console.anthropic.com). Under your agent’s settings, confirm that session logging is enabled. Dreaming can only review sessions that were logged. If you’ve been running agents with logging off, those sessions are gone — Dreaming can’t retroactively analyze them.

Once logging is on, run your agent through at least 3–5 representative tasks. Don’t cherry-pick easy ones. You want the session log to include the full range of what your agent encounters, including the cases where it struggles or takes a suboptimal path.

Check: You should see session records in Claude Console with full tool call traces and reasoning steps. If you see only input/output pairs with no intermediate steps, logging is incomplete.

Step 2: Define your memory schema

This is the step most tutorials skip, and it’s where most Dreaming setups fail quietly.

Dreaming extracts patterns and writes them to your agent’s memory store. But “patterns” is vague. You need to tell the system what categories of information are worth remembering. Think of this as a schema for your memory store.

For a report generation agent, your schema might include:

Data source quirks (e.g., “this API returns null for missing fields, not empty string”)
User preferences that have been stated or inferred
Formatting decisions that were accepted vs. revised
Recurring error patterns and their resolutions

Write this schema as a structured prompt that gets prepended to the Dreaming review process. In Claude Console, this goes in the “Memory extraction instructions” field under your agent’s Dreaming configuration.

Be specific. “Remember useful things” is not a schema. “Extract any data formatting decisions the agent made that the user did not ask to revise, and store them as user preferences” is a schema.

Check: After your first Dreaming run, inspect the memory store output. If it’s a wall of undifferentiated text, your schema is too loose. If it’s empty, your instructions are too restrictive.

Step 3: Set the Dreaming schedule

Dreaming is a scheduled process. You configure when it runs — not on-demand during a session, but between sessions. The right cadence depends on your agent’s usage pattern.

For agents that run multiple times per day: daily Dreaming review is appropriate. The memory store will update frequently, and you’ll see compounding improvement relatively quickly.

For agents that run weekly or less: match the Dreaming schedule to your run cadence, or run it manually after each session batch. Running Dreaming on an empty session log wastes compute and can introduce noise if the system tries to extract patterns from insufficient data.

In Claude Console, set the schedule under “Dreaming cadence.” Options are hourly, daily, weekly, or manual trigger. For most production workflows, daily is the right default.

Check: Confirm the next scheduled run timestamp is visible in your agent’s configuration panel. If it shows “never,” the schedule didn’t save.

Step 4: Configure memory persistence and retrieval

Other agents start typing. Remy starts asking.

YOU SAID "Build me a sales CRM."

REMY ASKS

01 DESIGN Should it feel like Linear, or Salesforce?

02 UX How do reps move deals — drag, or dropdown?

03 ARCH Single team, or multi-org with permissions?

Scoping, trade-offs, edge cases — the real work. Before a line of code.

Dreaming writes to a memory store. Your agent needs to be configured to read from that store at session start. These are two separate settings, and it’s easy to configure one without the other.

In your agent’s system prompt or managed agent configuration, add a memory retrieval step at the top of the session initialization. The pattern looks like this:

At the start of each session, load the current memory store and treat its contents as established context. Do not re-derive information that is already encoded in memory. If memory contains a preference or learned behavior, apply it without asking the user to restate it.

The key phrase is “do not re-derive.” Without this instruction, agents will often ignore memory and start fresh anyway — not because the memory isn’t loaded, but because the agent’s default behavior is to reason from first principles rather than from stored state.

For multi-agent setups where a lead agent delegates to sub-agents, you need to decide whether sub-agents get access to the full memory store or a filtered view. Full access is simpler to configure but can cause sub-agents to act on memory that’s irrelevant to their specific task. Filtered views require more setup but produce cleaner behavior. Start with full access, then narrow it once you see what sub-agents are actually using.

Check: Run a session, then run another session. In the second session, ask the agent to recall something specific from the first. If it can’t, memory retrieval isn’t working. If it can, you’re done with this step.

Step 5: Validate the memory quality

After your first Dreaming cycle completes, read the memory store output. Actually read it. This is not optional.

You’re looking for three things:

Signal-to-noise ratio. Is the memory store full of specific, actionable information? Or is it vague summaries that could apply to any session?
Accuracy. Does the stored memory accurately reflect what happened in the sessions? Dreaming can hallucinate patterns that weren’t really there, especially with small session samples.
Staleness handling. If you’ve updated your workflow or the user’s preferences have changed, is the old memory getting overwritten or just accumulated alongside the new? Accumulation without curation is how memory stores become unreliable.

If the memory quality is poor, go back to Step 2 and tighten your schema. This is an iterative process. The first Dreaming cycle is almost never the best one.

Check: You should be able to read the memory store and immediately understand what the agent has learned. If you can’t, the agent can’t either.

When Dreaming Breaks (And Why)

The agent ignores memory. Most common failure mode. Usually caused by memory retrieval not being explicitly instructed in the system prompt. The memory is loaded but the agent doesn’t treat it as authoritative. Fix: add explicit instructions to apply memory without re-deriving.

Memory accumulates noise. Dreaming runs on all sessions, including bad ones. If you don’t have Outcomes configured, there’s no quality filter on which sessions are worth learning from. The memory store ends up encoding mistakes as learned behaviors. Fix: configure Outcomes with a rubric before relying on Dreaming for quality-sensitive tasks.

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

Memory conflicts with user instructions. The agent has learned that User A prefers bullet points. User B asks for prose. The agent defaults to bullets. This happens when memory is too user-agnostic. Fix: scope memories to specific users or contexts in your schema.

Dreaming runs on empty sessions. If your agent hasn’t run since the last Dreaming cycle, the review process has nothing to analyze. This is harmless but wasteful. Fix: switch to manual trigger or match schedule to actual usage cadence.

Sub-agent memory contamination. In multi-agent orchestration, a sub-agent encodes a learned behavior that’s specific to one task type, and that behavior bleeds into other task types. Fix: use separate memory stores for sub-agents with distinct task domains, rather than a single shared store.

For teams building more complex memory architectures, the approach described in building a self-evolving Claude Code memory system with Obsidian and hooks is worth studying — it’s a manual implementation of the same pattern Dreaming automates, and understanding the manual version helps you debug the automated one.

Where to Take This Further

The obvious next step is connecting Dreaming to Outcomes. Write a rubric for your task. Let the grading agent score sessions. Feed that signal into Dreaming so it learns from good sessions rather than all sessions. The 10.1% PowerPoint quality improvement Anthropic reported came from adding the grading agent alone — the memory system compounds that improvement over time.

For teams building financial workflows, the Claude Finance cookbook (released alongside the Code with Claude event) includes pre-built agents — pitch builder, meeting preparer, market researcher, evaluation reviewer, month-end closer — that are already configured for managed agents deployment. These are worth examining as reference implementations of how memory and task structure interact in production.

If you’re building multi-agent systems where a lead agent coordinates specialist sub-agents, the five Claude Code workflow patterns for agentic systems covers the orchestration patterns that Dreaming is designed to work with. Understanding how parallel sub-agent execution and shared file systems work will help you design memory schemas that make sense for your specific topology.

For teams that want to go further with agent orchestration without writing all the plumbing from scratch, MindStudio offers a visual builder for chaining agents and workflows across 200+ models and 1,000+ integrations — useful when the memory architecture is solid but the surrounding infrastructure is still being assembled.

The open-source ecosystem got here first, by the way. Hermes Agent shipped persistent cross-session memory and skill extraction from experience nearly a year before Anthropic’s managed agents research preview. If you want to understand the design space more deeply — or if you need something that works today without waiting for managed agents access — Hermes Agent’s built-in learning loop is worth understanding on its own terms.

REMY IS NOT

✕a coding agent
✕no-code
✕vibe coding
✕a faster Cursor

IT IS

✓a general contractor for software

The one that tells the coding agents what to build.

One opinion: the most underrated part of Dreaming isn’t the memory extraction — it’s the curation. Every naive memory system I’ve seen eventually becomes a liability because it accumulates everything without pruning anything. The explicit goal of keeping memory “high signal as it evolves” is the hard part, and it’s the part that determines whether your agent gets smarter over time or just noisier. Get the schema right in Step 2, and the rest follows. Get it wrong, and you’ll spend more time debugging memory than you saved by having it.

The session limit problem that’s been plaguing Claude users for months is also getting addressed in parallel — doubled rate limits, removed peak-hour throttling, and substantially higher API throughput for Opus models. That matters for Dreaming specifically because scheduled memory reviews are compute-intensive. Running Dreaming on a constrained account was previously impractical for high-volume workflows. That constraint is now significantly relaxed.

When you’re building the application layer on top of a memory-enabled agent — say, a dashboard that surfaces what the agent has learned, or a spec-driven interface for configuring memory schemas — tools like Remy take a different approach than hand-coding it: you write an annotated markdown spec, and it compiles into a complete TypeScript backend, database, auth, and deployment. The spec stays as the source of truth; the generated code is derived output. That’s a reasonable fit when the agent infrastructure is the interesting part and you don’t want to spend cycles on the surrounding app scaffolding.

The architecture most teams have been trying to build manually — agents that learn from past sessions and iterate until they hit quality thresholds — is now a configuration problem rather than an engineering problem. That’s a meaningful shift. The question is whether you configure it carefully enough to actually get the compounding improvement, or just add another layer of complexity that breaks in subtle ways.

Set the schedule. Read the memory. Tighten the schema. Repeat.