How to Build an Agentic Operating System Inside Claude Code
Replace OpenClaw and Hermes with a custom Claude Code setup: persistent memory layers, self-improving skills, scheduled workflows, and business context.
Why OpenClaw and Hermes Aren’t the Answer Anymore
If you’ve been building multi-agent systems with Claude Code, you’ve probably run into OpenClaw or its successor Hermes. Both tools tried to solve the same problem: Claude Code is powerful, but raw, and someone needed to wrap it in scaffolding that made it behave like a real operating system.
The problem is that neither approach ages well. Anthropic’s OpenClaw ban made clear that third-party harnesses piggybacking on Claude subscriptions aren’t a stable foundation. And while Hermes added a built-in learning loop over OpenClaw’s base architecture, it’s still a dependency you don’t control.
The better path is building the agentic OS yourself, natively inside Claude Code. You get the same capabilities — persistent memory, self-improving skills, scheduled automation, and shared business context — without relying on tools that can be blocked or deprecated overnight.
This guide walks through how to do that from scratch.
What an Agentic Operating System Actually Means
The term gets used loosely, so let’s be specific. An agentic OS inside Claude Code is a folder-based system that gives your AI agent four things it doesn’t have by default:
- Persistent memory — so it knows your business across sessions
- Modular skills — discrete, callable units of work
- Self-improvement — a feedback loop that makes each skill better over time
- Scheduled execution — so agents act proactively, not just reactively
Without these four layers, Claude Code is still a conversational tool. You prompt it, it responds, the session ends. Nothing carries forward. Nothing improves. Nothing happens unless you ask.
With all four layers in place, it becomes something closer to a system that runs your business logic continuously. Understanding the full agentic OS architecture — how context, memory, collaboration, and self-learning stack together — is the foundation everything else builds on.
Layer 1: Persistent Memory
The Two-File Memory Model
Claude Code doesn’t have native persistent memory. Every session starts blank. To fix this, you need two files that serve different purposes.
The first is your shared brand context — a markdown file (typically brand.md or business-brain.md) that lives at the root of your project and contains everything the agent should always know: your company name, voice, target customers, product positioning, pricing, and any rules that apply across every task.
The second is a context folder — a directory of task-specific files. Where brand context is evergreen, context folder files are operational. Things like campaign briefs, current project state, recent decisions, and work-in-progress documents.
The distinction between shared brand context and context folders matters because they have different update cadences. Brand context changes infrequently and should be reviewed deliberately. Context folder files change constantly and should be written to by skills after every run.
What to Put in Brand Context
A good brand context file answers:
- What does the company do and for whom?
- What’s the tone of voice for all communications?
- What are the core products, prices, and positioning?
- Who are the main competitors and how do you differentiate?
- What are the non-negotiable rules? (e.g., “Never promise delivery in less than 5 business days,” “Always include a disclaimer on financial content”)
This file gets loaded at the top of every skill prompt. It’s the reason an agent writing a blog post and an agent processing customer refunds both “sound like” your company without you repeating yourself in every instruction.
Building a shared business brain for Claude Code skills is one of the highest-leverage things you can do early in the setup. Get this right and every other layer benefits from it automatically.
Structuring the Context Folder
Your context folder should be organized by function:
/context
/brand
brand.md
voice-guidelines.md
/products
product-catalog.md
/campaigns
current-campaign.md
/decisions
recent-decisions.md
/state
last-run.json
The /state subfolder is particularly important. After each skill run, the skill should write a brief summary of what it did, what it decided, and what it found. This gives future runs continuity without requiring you to explain context manually.
Layer 2: The Skills Architecture
What a Skill Is
A skill is a self-contained Claude Code task. It has a clear input, a clear output, and it knows exactly what files to read and write. Think of it as a function: isolated, testable, reusable.
Claude Code skills are typically defined as markdown prompt files inside a /skills directory. Each file contains:
- The task description
- Which context files to load
- What the expected output is
- Where to write results
A skill for writing a weekly email newsletter, for example, would load brand.md, current-campaign.md, and last-newsletter.md. It would produce a draft email and write it to /outputs/newsletters/[date].md. It might also update last-run.json with a summary of what was produced.
Designing Skills for Reuse
The most common mistake when building skills is making them too broad. A skill called “do content marketing” is not a skill — it’s a project. Good skills are narrow enough to be reliable and broad enough to be worth automating.
Aim for skills that:
- Complete in one focused session
- Have clear success criteria
- Produce a file or structured output
- Can be called by other skills
That last point matters for the next layer. Chaining skills into end-to-end workflows is what turns a collection of useful automations into an actual system. A research skill feeds a draft skill feeds an edit skill feeds a publish skill — each one small, each one calling the next.
A Practical Skill Directory Structure
/skills
research-brief.md
write-draft.md
edit-and-finalize.md
publish-to-cms.md
weekly-summary.md
heartbeat.md
Each file is a prompt. Claude Code reads it, loads the referenced context, executes the task, and writes output. The simplicity is the point. You’re not writing code — you’re writing instructions that Claude Code can follow reliably.
For a concrete example of how this plays out at scale, the 5-skill content marketing workflow shows how these pieces connect in a real business context.
Layer 3: Self-Improving Skills
Why Skills Degrade Without Feedback
A skill that runs identically every time isn’t learning from what works. The first version of any prompt is a hypothesis. Some parts will work great. Others will produce outputs that are technically correct but not quite right — the tone is off, the format is wrong, the focus is too broad.
Without a feedback mechanism, you fix these issues manually and forget you fixed them. The same mistake happens three months later.
The solution is a learnings loop: a structured way for each skill run to capture what worked, what didn’t, and what should be changed.
How to Build the Learnings Loop
Each skill should append to a learnings.json file in its output directory. After every run, the skill writes:
{
"date": "2026-04-21",
"skill": "write-draft",
"what_worked": "Opening with a specific stat increased engagement in past reviews",
"what_didnt": "3-section structure felt thin — 5 sections performed better",
"rule_change": "Default to 5 sections unless brief specifies otherwise"
}
The next run of that skill loads the learnings file and applies the accumulated rules. Over time, the skill gets better without you manually rewriting its prompt.
This is the compounding knowledge loop in action: each run makes the next run marginally better, and marginal improvements stack.
Using eval.json for Structured Quality Checks
For skills where output quality is measurable, add an eval.json file that defines what “good” looks like. This might include:
- Word count ranges
- Required sections or headings
- Tone checks (“must not use first person,” “must include a CTA”)
- Format rules (“output must be valid JSON”)
Building a self-improving skill with eval.json lets the agent evaluate its own output against criteria before writing the final file. If it fails the eval, it revises. If it passes, it writes the learnings file with notes on what it did to pass.
This isn’t magic — it’s just structured self-correction. But it makes a real difference in output consistency, especially for skills that run unsupervised.
Layer 4: Scheduled Workflows
The Problem with On-Demand Agents
An agent that only acts when you ask it something is useful but limited. Business operations don’t wait for you to open a terminal. Competitors move, metrics shift, content needs to go out, and reports need to be ready for Monday morning whether or not you remembered to prompt anything.
The agentic OS becomes genuinely autonomous when it runs on a schedule.
Cron-Based Scheduling with Claude Code
The most direct approach is a cron job that triggers a Claude Code skill at a set interval. A basic setup looks like this:
# Run daily summary at 7am
0 7 * * * cd /path/to/project && claude -p skills/daily-summary.md
# Run weekly report on Fridays at 5pm
0 17 * * 5 cd /path/to/project && claude -p skills/weekly-report.md
Each scheduled task is just a skill call. The skill loads its context, does its work, writes its output, and updates its learnings file. No manual intervention needed.
Building scheduled AI agents with Claude Code covers the full setup including how to handle errors, logging, and output routing when tasks run unattended.
The Heartbeat Pattern
Beyond cron jobs, there’s a more sophisticated pattern called the heartbeat. Instead of running isolated tasks on a schedule, a heartbeat skill runs frequently (every 15–60 minutes) and acts as a lightweight orchestrator. It checks:
- What tasks are overdue?
- Have any inputs changed that should trigger a skill?
- Is there anything in the queue that should be processed now?
The heartbeat doesn’t do the work itself — it delegates to other skills when conditions are met. The agentic OS heartbeat pattern explains how to implement this so your system stays proactive without running expensive full-skill executions constantly.
The heartbeat is what makes the difference between “an agent that runs on a schedule” and “an agent that behaves like it’s always paying attention.”
Wiring It Together: Multi-Agent and Command Center
When to Use Multiple Agents
Single-agent systems work well for linear workflows. But some business processes are parallel — content and analytics, sales and support, product and marketing. Running these through a single sequential skill chain is slow and creates bottlenecks.
The answer is parallel agents sharing a task list. Each agent takes items off the same queue, completes them, and writes results back to a shared output folder. Claude Code agent teams use a shared tasks.json file to coordinate without needing a central controller.
This pattern scales well because you add capacity by adding agents, not by rewriting your workflow.
Managing by Goals, Not Terminals
As soon as you have multiple skills and multiple agents, you need a way to see what’s happening without opening every output file manually. The solution is a command center: a single dashboard skill that reads all output logs, summarizes what ran, what’s pending, and what needs attention.
A basic command center skill reads:
- All
last-run.jsonstate files across skills - The current task queue
- Any error logs from the last 24 hours
- Key metrics from scheduled reports
It outputs a single status.md that gives you a clear picture in seconds. Managing multiple Claude Code agents through a command center shows how to build this so you’re managing outcomes, not processes.
Where Remy Fits
The agentic OS pattern described here — persistent memory, modular skills, self-improvement loops, scheduled execution — maps cleanly onto what Remy does at the application layer.
Remy compiles annotated specs into full-stack applications: real backends, real databases, real auth. If you want to surface your agentic OS outputs as a proper web application — a dashboard that shows agent status, a CMS that skills can write to directly, or an interface for reviewing and approving agent outputs before publishing — Remy is the most direct path.
Instead of stitching together frontend frameworks, backend APIs, and database schemas by hand, you write a spec describing what the app should do and Remy handles the rest. The agent outputs from your Claude Code skills become structured data that a real application can read, display, and act on.
You can try Remy at mindstudio.ai/remy.
Frequently Asked Questions
What’s the difference between an agentic OS and just using Claude Code with good prompts?
A well-crafted prompt makes one task better. An agentic OS makes every task better over time. The difference is persistence: shared context that carries across sessions, skills that learn from feedback, and scheduled execution that doesn’t depend on you showing up. Good prompts are table stakes. The OS is what turns those prompts into a system.
Do I need to know how to code to build this?
No, but you need to be comfortable with file structures, markdown, and basic command-line operations. The skills themselves are markdown files. The scheduling is cron syntax. The learnings files are JSON. None of this requires writing code — it requires thinking carefully about inputs, outputs, and state.
How is this different from what OpenClaw or Hermes offered?
OpenClaw and Hermes were third-party wrappers around Claude. They added orchestration on top of Claude’s API, which meant they were dependent on Anthropic not changing the rules — which they did. Building an OpenClaw-like agent without OpenClaw is exactly what this architecture does: it replicates the useful patterns natively inside Claude Code, using only first-party tooling that won’t get blocked.
How many skills should I start with?
Start with three: one skill that does the core work you want to automate, one heartbeat skill that checks conditions and triggers others, and one summary skill that reports on what happened. Three skills is enough to see the system working. You add more as gaps become obvious.
Can skills from different projects share context?
Yes, if they point to the same context files. The simplest approach is a shared /context folder at a level above your individual skill projects. Each skill’s prompt references the shared brand context by relative path. This means updates to brand context automatically propagate to every skill that reads it.
What’s the best way to handle skill failures?
Write errors to a dedicated errors.log file in each skill’s output directory. Have your heartbeat skill check this file and flag anything that’s been unresolved for more than 24 hours. For critical skills, add a retry instruction at the end of the prompt: “If output does not meet the eval criteria after two revisions, write a failure note to errors.log with the specific issue.” This gives you visibility without requiring you to monitor constantly.
Key Takeaways
- An agentic OS in Claude Code requires four layers: persistent memory, modular skills, a self-improvement loop, and scheduled execution.
- Shared brand context and context folders serve different purposes — one is evergreen, the other is operational. Both are essential.
- Skills should be narrow, testable, and chainable. Start with three and add more as gaps appear.
- The learnings loop and eval.json pattern turn a one-time prompt into a system that improves with every run.
- Scheduled tasks and the heartbeat pattern are what make the difference between an on-demand tool and a system that runs your business logic continuously.
- If you want to surface agent outputs as a real web application, Remy handles the full-stack layer without requiring you to build and maintain it manually.