How to Build an Agentic Operating System Inside Claude Code

Why OpenClaw and Hermes Aren’t the Answer Anymore

A lot of teams discovered Claude Code’s agentic capabilities through third-party harnesses. OpenClaw was the popular one — an open-source wrapper that made Claude behave like an autonomous agent with persistent memory and tool access. Hermes came along with a built-in learning loop. Both worked well enough, until they didn’t.

Anthropic blocked third-party OAuth harnesses from Claude subscriptions, which effectively ended the convenient “install and run” era for tools like OpenClaw. Teams that had built workflows around these frameworks had to scramble.

But here’s the thing: the underlying capability was never in OpenClaw. It was always in Claude Code. The harness was just a convenience layer. If you understand the five architectural layers that make an agentic system work — memory, skills, interaction, scheduling, and business context — you can build something cleaner, more durable, and more tailored to your business than any pre-packaged framework offers.

This guide walks through exactly how to do that.

What an Agentic Operating System Actually Means

Before getting into layers, it’s worth being precise about what “agentic operating system” means here, because the term gets used loosely.

A regular AI workflow is reactive. You send a prompt, you get a response. Done. An agentic OS is different in three ways:

It persists. The system remembers what happened, stores what it learned, and carries that context forward.
It’s composable. Individual capabilities exist as discrete units that can be chained together or run in parallel.
It’s proactive. The system can act on a schedule or in response to conditions — not just when a human sends a prompt.

This is what OpenClaw and Hermes were approximating. The difference between using a pre-built harness and building your own is control. When you build the five layers yourself inside Claude Code, you decide exactly how memory works, what skills exist, how agents communicate, when things run, and what your business context looks like.

Understanding the full agentic OS architecture before writing a line of code will save you significant rework later. The five layers described here map to the core patterns that production agentic systems use.

Layer 1: Memory

Memory is the layer most people underestimate. Without it, every conversation starts from scratch. Your agents can’t learn, can’t reference prior work, and can’t build on what they’ve done before.

There are two distinct memory problems to solve.

Short-Term Context

Short-term context is what’s active in a single session or task. This is Claude’s context window — what the agent can “see” right now. The challenge is that context windows have limits, and long-running jobs regularly hit them.

The AI agent memory wall is a real problem: agents fail mid-task because they’ve consumed their context budget on earlier reasoning steps. The fix is to treat context as a managed resource — summarizing and compressing earlier steps rather than retaining them verbatim, and writing intermediate results to files rather than keeping them in-memory.

In practice, this means your agent should:

Write progress summaries to a progress.md or state.json file after each major step
Load only the relevant portions of prior context when resuming
Use structured formats (JSON, YAML) for state that needs to be read programmatically

Long-Term Memory

Long-term memory is what persists across sessions. This is where dedicated memory infrastructure becomes important.

The standard approach is a memory file system — a directory of markdown or JSON files that agents can read from and write to. A memory/ folder with structured subfolders (memory/clients/, memory/decisions/, memory/lessons-learned/) gives you a simple, version-controlled long-term store.

For more sophisticated setups, external memory systems like Mem0 offer semantic search across stored memories, which is meaningfully better than file-system lookup when you have large memory stores. These tools can outperform built-in memory by a significant margin on retrieval tasks.

The key principle: memory should be explicit and structured. If you can’t inspect what your agent remembers, you can’t debug it.

Layer 2: Skills

Skills are the unit of capability in an agentic OS. A skill is a discrete, reusable function that an agent can call — something like “research a topic,” “draft a blog post,” “analyze a CSV,” or “send a Slack message.”

The reason skills matter is composability. A system built around skills can:

Chain multiple skills into a workflow
Run skills in parallel across multiple agent instances
Reuse the same skill across different workflows
Test and iterate on skills independently

Building a Skill

A Claude Code skill is typically defined by a prompt file and an optional set of tools. The prompt file specifies what the skill does, what inputs it expects, and what output format to use.

Here’s a minimal structure:

skills/
  research/
    prompt.md
    tools.json
  draft/
    prompt.md
    tools.json
  publish/
    prompt.md
    tools.json

Each prompt.md contains the skill’s instructions. Each tools.json specifies which tools that skill has access to — web search, file read/write, API calls, and so on.

The discipline is keeping skills narrow. A skill that does “research and drafting and publishing” is actually three skills collapsed into one. That collapse makes the skill harder to reuse and harder to debug.

Chaining Skills Into Workflows

Chaining skills into end-to-end workflows is where the real value shows up. A content marketing workflow might chain: keyword research → outline generation → draft → edit → format → publish. Each step is a separate skill. The output of one becomes the input of the next.

There are two ways to chain:

Sequential chains — each skill runs after the previous one completes. Simple and reliable, good for workflows where each step depends on the last.
Parallel chains — multiple skills run simultaneously and their outputs are merged. Good for tasks that are independent, like generating five social media variations at once.

See how a 5-skill content marketing workflow comes together if you want a concrete end-to-end example.

Layer 3: Interaction

The interaction layer is how your agents communicate — with humans, with each other, and with external systems.

Human-in-the-Loop vs. Fully Autonomous

Not every workflow should be fully autonomous. Some decisions benefit from human review before the agent proceeds. Claude Code supports both patterns.

For human-in-the-loop workflows, you structure the agent to pause and surface a decision point: write the proposed action to a file, notify via Slack or email, and wait for an approval file to appear before continuing. This is manual but extremely reliable.

For fully autonomous workflows, you need confidence in the skill’s reliability and a clear error-handling path for when things go wrong.

The practical approach is to start human-in-the-loop and remove checkpoints as you gain confidence in each skill’s output quality.

Multi-Agent Coordination

When a single agent isn’t enough — because the task is too long, too broad, or benefits from multiple perspectives — you move to multi-agent patterns.

Claude Code agent teams allow multiple agents to share a task list in real time. One agent picks up item A while another handles item B, and they write results to a shared location. This is the parallel pattern applied at the agent level.

A more sophisticated pattern is agent chat rooms — where multiple agents with different roles debate a question or review each other’s outputs before a final answer is produced. A “critic” agent and a “creator” agent working together consistently produce better outputs than a single agent working alone, because the critic catches errors the creator’s context has normalized.

The implementation is straightforward: each agent writes its contribution to a shared markdown file in a structured format, the orchestrator reads all contributions and synthesizes them, and the result is written to an output file.

Layer 4: Scheduling

Scheduling is what makes an agentic OS proactive rather than reactive. Without scheduling, your agents only run when you manually trigger them.

The Heartbeat Pattern

The heartbeat pattern is the simplest form of scheduling. A heartbeat keeps your agent proactive 24/7 by running a lightweight check on a fixed interval — every 15 minutes, every hour, every morning.

The heartbeat script does three things:

Checks a condition or reads a trigger file
Decides whether any action is needed
Calls the appropriate skill if the condition is met

A concrete example: every morning at 7am, a heartbeat script reads your pipeline CRM data, checks whether any deals have gone stale, and drafts follow-up messages for any deal that hasn’t had contact in seven days.

Headless Mode

Claude Code headless mode is how you run agents without a terminal open. The agent runs as a background process, triggered by your scheduler, and writes output to files rather than to a terminal.

The key flag is --headless, which disables interactive prompts and makes the agent suitable for cron jobs, CI/CD pipelines, or any environment where there’s no human watching.

claude --headless --print "$(cat skills/daily-report/prompt.md)"

That command runs the daily-report skill, prints the result, and exits. Wrap it in a cron job and you have a scheduled agent.

Choosing Between Loops and Scheduled Tasks

The difference between a Claude Code loop and scheduled tasks matters for reliability. A loop keeps Claude running continuously, which is simpler to set up but has higher failure modes — if the loop crashes, nothing runs until you restart it. Scheduled tasks (via cron or a task scheduler) start fresh on each invocation, which is more resilient and easier to debug.

For production use, scheduled tasks are the right choice. Use loops only for genuinely interactive or session-based workflows.

Layer 5: Business Context

The business context layer is what most frameworks skip entirely, and it’s the reason most agentic setups feel generic even when they’re technically sophisticated.

Business context is the shared knowledge that all your agents need to operate as if they’re part of your company, not just as generic AI assistants.

What Goes in Business Context

At minimum, business context should include:

Brand voice and tone — how your company communicates, what vocabulary it uses, what it avoids
Products and services — what you sell, how you price it, what makes it different
Customer personas — who you sell to, what they care about, common objections
Process and policy — how decisions get made, what requires escalation, standard operating procedures
Current priorities — what the business is focused on right now

This isn’t a one-time document. It’s a living file that gets updated as the business evolves.

The Business Brain Pattern

The business brain pattern is the standard way to implement this. You create a single brand-context.md file (or a business-brain/ folder for larger organizations) that every skill loads at the start of each task.

The structure looks like this:

business-brain/
  brand-voice.md
  products.md
  customer-personas.md
  processes.md
  current-priorities.md

Each skill’s prompt includes a line like: “Before beginning, read all files in /business-brain/ and apply this context to all outputs.”

This is deceptively powerful. An agent that knows your brand voice will write content that sounds like you. An agent that knows your pricing will answer customer questions accurately. An agent that knows your current priorities will focus on what matters, not just what’s technically possible.

Two-Layer Memory: Shared Brand Context vs. Task Context

Shared brand context and task-specific context serve different purposes and should be kept separate. Brand context is global — it applies to everything. Task context is local — it’s specific to the current workflow or project.

A content marketing workflow might load:

Global: business-brain/brand-voice.md
Task-specific: projects/q2-campaign/brief.md

Keeping these separate makes it easy to swap out task context without touching the global layer, and vice versa.

Wiring the Five Layers Together

The individual layers are straightforward once you understand them. The harder part is wiring them into a coherent system.

Here’s a minimal directory structure for a functional agentic OS:

agentic-os/
  business-brain/
    brand-voice.md
    products.md
    personas.md
  memory/
    sessions/
    lessons-learned/
    client-context/
  skills/
    research/
    draft/
    review/
    publish/
  workflows/
    content-marketing.sh
    daily-report.sh
    pipeline-review.sh
  scheduling/
    heartbeat.sh
    crontab.txt
  state/
    active-tasks.json
    completed-tasks.json

A workflow script ties the layers together:

#!/bin/bash
# content-marketing.sh

TOPIC="$1"

# Layer 5: Load business context
BRAND_CONTEXT=$(cat business-brain/brand-voice.md)
PERSONAS=$(cat business-brain/personas.md)

# Layer 2: Run research skill
claude --headless --print "$(cat skills/research/prompt.md) Topic: $TOPIC" > state/research-output.md

# Layer 2: Run draft skill with research output
claude --headless --print "$(cat skills/draft/prompt.md) Research: $(cat state/research-output.md)" > state/draft-output.md

# Layer 1: Write to memory
echo "$(date): Completed content for $TOPIC" >> memory/sessions/$(date +%Y-%m-%d).md

echo "Complete. Draft at state/draft-output.md"

That’s the basic pattern. Each workflow script orchestrates skills, loads context, manages state, and writes to memory.

For more complex setups — with parallel agents, conditional logic, and external integrations — see the full architecture and setup guide which goes deeper on the implementation details.

Managing Multiple Agents

Once you have more than two or three workflows running, you need a way to see what’s happening without checking log files manually.

An AI command center for managing multiple Claude Code agents gives you a centralized view of agent status, recent outputs, and upcoming scheduled tasks. The simplest version is a dashboard script that reads your state files and prints a summary. More sophisticated versions include a web UI.

The principle that matters here is managing by goals, not terminals. You shouldn’t need to watch a terminal to know if your agents are working. You should be able to specify a goal — “publish three blog posts per week” — and have the system handle execution while surfacing only the decisions that need your judgment.

The Compounding Effect

One reason to build this yourself rather than using a pre-packaged framework is that a custom agentic OS compounds over time.

Every workflow you build adds to your skills library. Every run adds to your memory store. Every lesson learned makes your business context more precise. The compounding knowledge loop means the system gets materially better the longer it runs — not because the model changes, but because the context it operates within gets richer.

This doesn’t happen with a generic harness. A harness gives you a container. The agentic OS is the content inside the container, and that content is specific to your business.

How Remy Fits Into This Picture

Building the five layers yourself takes real work. The architecture is clear, but implementing it — writing workflow scripts, managing state files, setting up scheduling, keeping business context current — is ongoing maintenance.

Remy takes a different approach to the same problem. Instead of manually wiring together shell scripts and markdown files, you write a spec — a structured document that describes what your system does — and Remy compiles it into a full-stack application with a real backend, database, and deployment.

For agentic OS use cases, that means the memory layer becomes a real SQL database with proper queries, the interaction layer becomes an actual API, and the scheduling layer becomes managed background jobs rather than cron scripts. The five layers are still there, but they’re built from a spec rather than assembled by hand.

If you’re technical and want full control over every layer, the DIY approach in this guide is the right path. If you’d rather describe what you want and have the infrastructure generated from that description, try Remy at mindstudio.ai/remy.

FAQ

What is an agentic operating system in the context of Claude Code?

An agentic OS is a custom system built on top of Claude Code that gives AI agents persistent memory, composable skills, coordination capabilities, scheduling, and shared business context. It’s the difference between a one-off Claude prompt and an AI system that operates continuously, remembers what it’s done, and improves over time. Unlike pre-packaged frameworks like OpenClaw, an agentic OS built directly in Claude Code gives you full control over each layer.

Do I need to replace OpenClaw or Hermes to build an agentic OS?

Not necessarily replace — more accurately, you can achieve the same capabilities (and more) without depending on third-party harnesses. Given Anthropic’s restrictions on third-party OAuth harnesses, building directly on Claude Code’s native capabilities is more durable. The five layers described in this guide cover everything those frameworks provided, plus the ability to customize each layer to your specific business.

How do I handle memory in a Claude Code agentic system?

Use two memory layers. Short-term context management involves writing intermediate results to state files rather than keeping everything in the context window — this prevents hitting the context limit on long-running tasks. Long-term memory is a structured directory of markdown or JSON files that agents read at the start of tasks and write to on completion. For advanced setups, external memory tools with semantic search capabilities significantly improve retrieval accuracy over flat file lookups.

What’s the best way to schedule Claude Code agents?

For production use, scheduled tasks (cron jobs) are more reliable than continuous loops. Use Claude Code’s --headless flag to run agents without an interactive terminal. A heartbeat script running on a schedule is a good starting pattern — it checks conditions and calls skills only when needed, which keeps costs manageable and behavior predictable. If you need more sophisticated scheduling with dependencies between tasks, a lightweight task queue is worth adding.

Create a business-brain/ folder with structured markdown files covering brand voice, products, customer personas, and current priorities. Every skill loads this folder at the start of each run. This is the “business brain pattern” — a single source of truth for company context that all agents reference, so outputs stay consistent regardless of which skill or workflow is running. Update these files as your business changes; the agents will reflect those changes on the next run.

Can I build a multi-agent system without a framework?

Yes. Multi-agent coordination in Claude Code works through shared state files. One agent writes to a shared task list, others pick up tasks from that list, and all write results to a shared output location. No framework required. The orchestrator is typically a shell script that spawns multiple headless Claude processes, waits for their outputs, and combines the results. More sophisticated setups use a structured message format in the shared files to handle agent communication patterns like critique-and-revise.

Key Takeaways

The five layers of an agentic OS — memory, skills, interaction, scheduling, and business context — cover everything OpenClaw and Hermes provided, plus the flexibility to build exactly what your business needs.
Memory has two distinct problems: short-term context management (avoid hitting limits on long tasks) and long-term persistence (structured files or external memory tools).
Skills should be narrow and composable. A skill that does too much is hard to reuse and hard to debug.
The business context layer — your brand voice, products, personas, and priorities — is what makes the difference between a generic AI system and one that operates as if it’s part of your company.
Claude Code’s headless mode and scheduled tasks are the right tools for production scheduling. Loops are for interactive use only.
The system compounds over time. Every run makes the memory richer, every workflow adds to the skills library, every lesson makes the business context more precise.

If you want the full-stack version of this without writing every layer by hand, try Remy at mindstudio.ai/remy.