How to Build an Agentic Operating System Inside Claude Code
Replace OpenClaw and Hermes with a Claude Code setup that includes persistent memory, self-improving skills, and scheduled workflows.
Why OpenClaw and Hermes Aren’t the Answer Anymore
If you’ve been building automation workflows on top of Claude, you’ve probably already run into the wall. OpenClaw got blocked. Hermes filled the gap for a while. And now there’s a growing camp of builders who’ve realized that chasing third-party harnesses is the wrong approach entirely.
The better path is native Claude Code — configured carefully so it behaves like a proper agentic operating system: persistent memory, self-improving skills, multi-agent coordination, and scheduled workflows that run without you babysitting them.
This guide walks through how to build that system from scratch. It covers the architecture, the file structures that make it work, the patterns for chaining Claude Code skills into automated workflows, and the scheduling layer that keeps everything running around the clock.
The Problem with Third-Party Harnesses
When Anthropic blocked third-party harnesses from Claude subscriptions, it wasn’t a surprise to everyone. Tools that scraped OAuth sessions or proxied Claude through unofficial API pathways were always fragile. They worked until they didn’t.
OpenClaw and tools like it gave developers a way to run persistent, multi-turn agent loops on top of Claude — but they did so by working around the platform rather than with it. When the ban landed, anyone who’d built a production workflow on top of these tools had to scramble.
Hermes Agent emerged as an alternative with a built-in learning loop — genuinely useful, but still fundamentally a workaround. You’re still dependent on a layer that sits between you and Claude, adds complexity, and can break when Anthropic updates its policies or token handling.
Native Claude Code doesn’t have this problem. It’s the supported surface. Anthropic builds it, maintains it, and extends it. When you invest in a Claude Code setup, you’re building on ground that isn’t going to shift underneath you.
What an Agentic OS Actually Is
Before getting into the build, it’s worth being precise about what “agentic operating system” means here — because it’s used loosely in a lot of places.
An agentic OS inside Claude Code is a structured system that gives your AI agent:
- Persistent memory — context that survives between sessions, so the agent knows your business, your preferences, and what it’s already learned
- Modular skills — discrete, reusable capabilities that can be chained together into workflows
- A self-improvement loop — a mechanism for the agent to record what worked, what didn’t, and apply that learning in future runs
- Scheduled execution — the ability to run workflows on a timer, without requiring manual triggering
- Multi-agent coordination — multiple specialized agents working on different parts of a workflow simultaneously
This is meaningfully different from just using Claude Code as a smarter terminal. The agentic OS architecture treats Claude not as a one-off assistant but as an operating layer for your entire business workflow.
The Core File Structure
Everything starts with how you organize your Claude Code workspace. The file structure is what makes the system persistent and composable.
Here’s the baseline layout:
/project-root
CLAUDE.md ← The business brain: brand, voice, goals, context
LEARNINGS.md ← Accumulated lessons from every skill run
/skills
research.md ← Individual skill definitions
write.md
publish.md
review.md
/memory
recent-outputs.md ← Log of recent runs and outputs
decisions.md ← Key decisions the agent has made or been told to make
/schedules
heartbeat.md ← Defines recurring tasks and their cadence
wrap-up.md ← End-of-session consolidation skill
This isn’t arbitrary. Each file serves a specific role in making Claude Code behave like a system rather than a session.
CLAUDE.md: The Business Brain
CLAUDE.md is the always-loaded context file. Every Claude Code session starts by reading it. This is where you put everything the agent should always know:
- What your business does
- Your brand voice and communication style
- Your current goals and priorities
- Rules the agent must always follow
- Pointers to other key files
Think of it as the equivalent of a thorough onboarding document — except Claude reads it fresh every single session. Sharing brand context across all skills through a single authoritative file is what keeps your agent consistent across dozens of different tasks.
LEARNINGS.md: The Memory Layer
This is where the self-improvement loop lives. After every significant task, the agent appends what it learned — what worked, what failed, edge cases it discovered, shortcuts it found.
A typical entry looks like this:
## 2026-04-18: Research Skill Run
- LinkedIn URLs frequently block scraping. Use Exa or Perplexity instead.
- Industry reports from Gartner require login. Route these to manual review queue.
- When topic has <5 search results, flag for human review before proceeding.
Over time, LEARNINGS.md becomes a dense knowledge base that makes every skill run better than the last. This is how Claude Code skills improve from your feedback — not through fine-tuning, but through accumulated context that compounds with every session.
Building Self-Improving Skills
A Claude Code skill is a markdown file that defines a specific capability: what it does, what inputs it takes, what output it produces, and what rules it follows.
Here’s a minimal example for a research skill:
# Skill: Research
## Purpose
Research a given topic and produce a structured brief with sources, key findings, and open questions.
## Inputs
- topic: string
- depth: shallow | deep (default: shallow)
## Output format
- Summary (2-3 sentences)
- Key findings (bulleted list)
- Sources (with URLs)
- Open questions (what still needs answering)
## Rules
- Always check LEARNINGS.md before starting — apply any relevant lessons
- Flag ambiguous topics for human clarification before proceeding
- Never fabricate sources
- Append any new lessons to LEARNINGS.md after completing the task
The last rule is the self-improvement mechanism. Every skill is instructed to both read LEARNINGS.md at the start and write to it at the end. This creates a compounding knowledge loop where each run informs the next.
Adding Eval-Based Quality Control
For skills where output quality matters a lot, you can add an eval.json file that defines what “good” looks like:
{
"skill": "research",
"criteria": [
"Contains at least 3 distinct sources",
"Summary is under 100 words",
"Open questions section is non-empty",
"No fabricated URLs"
],
"auto_retry_on_fail": true,
"max_retries": 2
}
When this file exists alongside a skill, Claude Code can evaluate its own output against the criteria before handing it back. Building self-improving AI skills with eval.json adds an automatic quality gate to every run.
Chaining Skills Into Workflows
Individual skills are useful. Chained skills are where the system starts to feel like an OS.
A workflow chains multiple skills together, passing outputs from one as inputs to the next. Claude Code skill collaboration works by defining the handoff points explicitly.
Here’s a content marketing workflow as a concrete example:
1. Research skill → topic brief
2. Outline skill → structured outline (takes topic brief as input)
3. Write skill → draft article (takes outline as input)
4. Review skill → edited draft (flags issues, suggests cuts)
5. Publish skill → formats and posts to CMS
Each skill only needs to know its own job and what its input looks like. The workflow file defines the sequence and the data flow:
# Workflow: Content Marketing
## Steps
1. Run research with {topic}
2. Pass research brief to outline skill
3. Pass outline to write skill
4. Pass draft to review skill
5. If review passes quality threshold, run publish skill
6. If review fails, return to write skill with review notes
## Success criteria
- Article is live in CMS
- Word count is within 10% of target
- No review flags remain unaddressed
## On completion
- Log output URL to memory/recent-outputs.md
- Append lessons to LEARNINGS.md
This 5-skill workflow pattern for content marketing is a practical model you can adapt for almost any multi-step business process.
Scheduled Workflows and the Heartbeat Pattern
The biggest limitation of most Claude Code setups is that they’re reactive. You ask, it answers. You trigger, it runs. That works fine for on-demand tasks, but an agentic OS should also be proactive — running tasks on schedule, monitoring for conditions, and taking action without waiting to be asked.
That’s what the heartbeat pattern is for.
What the Heartbeat Is
A heartbeat is a regularly scheduled skill run — typically every 15, 30, or 60 minutes — that checks on your system’s state and triggers actions based on what it finds.
A typical heartbeat skill does things like:
- Check a monitored inbox for new messages that need routing
- Scan a shared folder for new documents that need processing
- Review a queue of pending tasks and prioritize them
- Check whether any time-sensitive workflows need to kick off
The heartbeat doesn’t do heavy lifting itself. It’s a lightweight dispatcher that reads state and hands off to other skills when conditions are met.
Setting Up the Schedule
If you’re running Claude Code on a server or cloud VM, you can use a cron job or a simple wrapper script. Here’s what the schedule file in /schedules/heartbeat.md might look like:
# Heartbeat Schedule
## Cadence
Every 30 minutes
## Tasks
1. Check /memory/recent-outputs.md for tasks older than 24 hours with no follow-up
2. Check monitored inbox for unrouted messages
3. Check current date against scheduled workflow calendar
4. Trigger relevant workflows if conditions are met
## Escalation
If any task fails 3 times consecutively, append to /memory/decisions.md and notify via email
For keeping Claude Code running 24/7 without local hardware, the cleanest approach is to run your agent on a cloud instance — a small VPS or managed compute environment — rather than relying on your laptop being open.
Wrap-Up Skills
Paired with the heartbeat is the wrap-up skill — a routine that runs at the end of each major workflow or at the end of a working day. It consolidates learnings, cleans up temporary files, updates the decision log, and prepares the agent’s context for the next session.
Building a self-maintaining AI system with heartbeat and wrap-up skills is what separates a system that needs daily maintenance from one that runs itself.
Multi-Agent Coordination
For complex workflows, a single agent serializing through every step is too slow. The solution is to run multiple specialized agents in parallel, each handling a different part of the workflow, with an orchestrator coordinating the work.
In Claude Code, this means running multiple instances simultaneously — each loaded with a different skill context — and defining how they hand off results to each other.
A simple two-agent setup for research and writing:
- Research agent — Runs in one terminal, processes research tasks from a shared queue file
- Writing agent — Monitors the same queue, picks up completed research briefs, produces drafts
The queue is just a markdown file:
# Task Queue
## Pending Research
- [ ] Topic: Q2 competitive landscape analysis | Priority: high
## Completed Research / Pending Writing
- [x] Topic: AI agent frameworks comparison | File: /memory/research-ai-frameworks.md
## Completed Drafts / Pending Review
- [x] Draft: AI agent frameworks article | File: /memory/draft-ai-frameworks.md
Both agents check this file. Neither needs to know what the other is doing — they just read and write to the shared state. Agent orchestration at this level doesn’t require a complex framework. A well-structured shared file system is often enough.
For more complex multi-agent setups where you need proper role separation and goal-based task assignment, managing agents by goals instead of terminals gives you a cleaner model for how to think about coordination.
Memory Consolidation and AutoDream
One underappreciated problem with persistent memory systems is that LEARNINGS.md gets long. After a few weeks of active use, it can easily grow to thousands of lines — and a long context file is an inefficient context file.
The solution is periodic memory consolidation: a scheduled skill that reviews the full learnings file, identifies redundant entries, merges related insights, and produces a compressed, higher-quality version. Anthropic has documented similar patterns in their internal tooling — the approach is sometimes called AutoDream, a process analogous to how sleep consolidates human memory.
A consolidation skill might run weekly and do the following:
- Read all entries in
LEARNINGS.md - Group entries by theme
- Merge duplicates and near-duplicates
- Promote the most important lessons to a “Core Learnings” section at the top
- Archive older, lower-value entries to
/memory/archive/ - Write the consolidated file back to
LEARNINGS.md
This keeps your memory layer lean and high-signal, which directly improves the quality of every subsequent skill run.
Where Remy Fits
Everything described above — the skill files, the workflow chains, the memory system, the scheduling layer — requires you to design and maintain it yourself. That’s not a complaint; it’s genuinely powerful work. But it also has a cost: every new skill is a file to write and maintain. Every workflow is a chain you have to define and debug. Every new piece of business context needs to be explicitly added to CLAUDE.md.
Remy takes a different approach. Instead of building your agentic system from individual skill files and coordination logic, you describe what your application or workflow does in a spec — annotated markdown that carries both the readable intent and the precise rules. Remy compiles that into a working full-stack system: backend, database, auth, and all the infrastructure that makes a real application run.
If you’re building a business tool that needs an agent layer, a user-facing interface, a database of accumulated knowledge, and scheduled automation — building it as a Remy app means the spec is the source of truth. You’re not maintaining a collection of markdown files and hoping they stay coherent. The spec compiles into a consistent, running system.
For teams already deep in Claude Code who want to keep the agentic OS approach but add a proper application layer around it, Remy and the patterns described in this guide aren’t competing ideas. They work at different levels of abstraction.
You can try Remy at mindstudio.ai/remy.
FAQ
What’s the difference between an agentic OS and just using Claude Code normally?
Normal Claude Code usage is interactive — you give it a task, it completes it, the session ends. An agentic OS adds persistence (memory that survives sessions), modularity (discrete reusable skills), scheduling (tasks that run on a timer), and coordination (multiple agents working together). The distinction is between a tool you use and a system that runs.
Do I need special infrastructure to run scheduled workflows in Claude Code?
For lightweight scheduling, you can use a cron job on any Linux server or VPS. For anything more complex — parallel agents, cloud-native scheduling, or workflows that need to run reliably without manual intervention — you’ll want to move off local hardware. Running Claude Code routines without keeping your laptop open covers the practical options.
How do I replace OpenClaw or Hermes with a native Claude Code setup?
The core functions of OpenClaw (persistent agent loops, multi-turn memory, tool use) can be replicated natively using CLAUDE.md for persistent context, skill files for modular capabilities, and scheduled heartbeat runs for proactive behavior. The main thing you lose is the harness itself — which, given the policy changes, is more of a gain than a loss. Building an OpenClaw-like agent without installing OpenClaw walks through the direct substitution in more detail.
How many skills should my agentic OS have?
Start small. Three to five well-defined skills that cover your highest-value workflows will beat fifteen loosely-defined ones. Skills are easy to add once your memory and scheduling systems are working reliably. Premature skill proliferation just adds coordination complexity before you’ve validated your core system.
How does the self-improvement loop actually work in practice?
Each skill is instructed to read LEARNINGS.md before starting a task and write to it when the task completes. Over time, the file accumulates specific, actionable knowledge about your domain: which data sources are reliable, which edge cases cause failures, which output formats work best for downstream skills. The agent applies this knowledge automatically in future runs — no retraining required, just better context. How Claude Code skills improve from your feedback explains the mechanism in depth.
Can this setup handle multi-agent workflows without a framework like LangGraph or CrewAI?
Yes, for most use cases. Shared state files (markdown queues, output logs, decision records) are enough to coordinate multiple Claude Code instances for sequential and parallel workflows. You don’t need a dedicated orchestration framework unless your workflow has complex branching logic, dynamic agent spawning, or real-time inter-agent communication. For those cases, the bigger picture of agent orchestration challenges is worth reading before you pick a tool.
Key Takeaways
- Third-party Claude harnesses like OpenClaw and Hermes are fragile. Building on native Claude Code is more stable and supported.
- An agentic OS in Claude Code is built on four layers: persistent memory (CLAUDE.md + LEARNINGS.md), modular skills, self-improvement loops, and scheduled workflows.
- The heartbeat pattern — a lightweight skill that runs on a timer and dispatches other skills — is how you make the system proactive instead of purely reactive.
- Memory consolidation prevents your LEARNINGS.md from becoming unwieldy over time. Schedule a consolidation run weekly.
- Multi-agent coordination doesn’t require a framework. Shared state files are enough for most workflows.
- For teams who want a full application around their agentic system, Remy compiles specs into full-stack apps with backend, database, and auth included.