Hermes Agent's 5-Pillar Architecture: How It Learns, Schedules, and Improves Itself Over Time
Hermes Agent is built on five pillars: memory, skills, soul, crons, and a self-improving loop. Here's how each one works and why the combination matters.
Hermes Agent Has 140,000 GitHub Stars — Here’s the Architecture Behind Why It Keeps Getting Smarter
Hermes Agent crossed 140,000 GitHub stars faster than almost any open-source project in recent memory. That number is a signal worth paying attention to, but the more interesting question is why — what is it about this agent that makes people keep coming back and starring it?
The answer is architectural. Hermes is built on five pillars: memory (user.md + memory.md), skills, soul, crons, and a self-improving loop. Each pillar solves a specific failure mode that makes most AI agents frustrating to use over time. Understanding how they fit together tells you something useful about how to build agents that actually compound in value rather than plateau.
You’ve probably felt the plateau. You set up an agent, it works well for a week, and then you’re back to re-explaining context you’ve already given it, correcting the same mistakes, and wondering why it doesn’t feel like it’s getting better. Hermes’s architecture is a direct answer to that problem.
The Five Pillars, Explained Without the Marketing
Pillar 1: Memory — Two Files That Solve the Stateless Problem
Every AI agent wakes up stateless. If you’ve seen the film Memento, you already understand the core problem: the agent has no persistent recollection of who you are, what you’re working on, or what you’ve told it before. The naive fix is to dump everything into a system prompt. The Hermes fix is more surgical.
Coding agents automate the 5%. Remy runs the 95%.
The bottleneck was never typing the code. It was knowing what to build.
There are two memory files. user.md holds who you are — your name, your communication style, your preferences, the things you never want the agent to do. memory.md holds the environment — active projects, business context, key relationships, ongoing work. Both files load at session start, so the agent always has a working model of you before you type a single word.
The practical implication: you stop repeating yourself. The agent extracts facts from your conversations and updates these files automatically. You can also direct it explicitly — “chuck that in memory” or “make sure you never do this again, add it to user.md” — but the passive accumulation happens without you managing it consciously.
One nuance worth flagging: stale memory is the number one cause of weird agent behavior. If your Hermes starts acting oddly, the first thing to check is memory.md. Old project context that no longer applies will confuse the agent just as much as no context at all. The Claude Code source leak’s three-layer memory architecture revealed similar patterns — memory files as pointer indexes, not raw dumps — and the same discipline applies here.
Pillar 2: Skills — Procedural Memory as Reusable Recipes
If memory answers “what does this agent know about me,” skills answer “how does this agent do things consistently.”
A skill is a markdown file with YAML front matter. The front matter is the key — it tells the agent what the skill is for, so the agent can decide whether to invoke it without loading the full skill into context. This is called progressive disclosure, and it matters because loading every skill into every session would bloat your context window with procedures you’re not using.
When the agent decides a skill is relevant, it reads the full file and follows it. Think of it as a recipe: the pancakes come out the same every time because the recipe doesn’t change, even if the cook’s memory does.
Hermes ships with 91 built-in skills on install. There’s also a community skills hub with 520+ additional skills, including 16 official Anthropic-contributed skills covering things like canvas design and front-end workflows. You can install a skill by dropping its URL into the chat and saying “install this.” The agent handles the rest.
The more interesting behavior is skill creation. If you do something complex and repeatable, Hermes will analyze the workflow and propose turning it into a skill. If you give feedback on how a skill performed, it updates the skill. The skill library is not static — it’s a living document that reflects how you actually work.
For builders thinking about how skills compare to other agent primitives, the Claude Code skills vs plugins breakdown covers the distinction well: skills are specialist playbooks you write once; plugins are bundles others have packaged. Hermes uses the former model almost exclusively.
Pillar 3: Soul — Personality as Infrastructure
soul.md is the file that shapes the agent’s personality and tone. If you have multiple Hermes agents — one for customer-facing work, one for internal research, one for social media — each gets its own soul file. They share the same underlying model but behave differently.
Remy doesn't build the plumbing. It inherits it.
Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.
Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.
This matters more than it sounds. An agent that’s commenting on your YouTube videos on your behalf needs a different register than one that’s auditing your server security. The soul file is how you encode that distinction without re-prompting every session.
The soul file also evolves. As you give feedback — “too verbose,” “wrong tone,” “stop being so formal” — the agent updates the file. You can inspect it at any time: “read me your soul file” returns exactly what’s in there.
Pillar 4: Crons — Natural Language Scheduling
This is where Hermes separates itself most clearly from terminal-first agents like Claude Code.
You can say, in plain English: “Every night at 12am Central time, push changes to this GitHub repo.” Hermes creates both a skill and a cron job from that single instruction. No YAML configuration, no crontab syntax, no infrastructure setup beyond the initial VPS.
When a cron fires, it spins up a fresh isolated session. It doesn’t inherit your current conversation context. It runs the skill, sends the result back to your chat, and updates any relevant files. The sessions are self-contained by design — cron sessions cannot recursively create more cron jobs, which prevents runaway automation.
The practical range here is wide. You can schedule daily news briefings, nightly GitHub syncs, YouTube comment monitoring runs, server health checks, or community engagement tasks. You can also set time-bounded crons: “for the next 12 hours, check comments every 10 minutes, then stop.” The agent handles the scheduling logic and kills the cron when the window closes.
For builders who’ve worked with Claude Code’s agentic workflow patterns, crons are roughly analogous to scheduled loops — but without requiring you to leave a terminal session open or manage dispatch infrastructure. The scheduling lives inside the agent’s own runtime.
Pillar 5: The Self-Improving Loop
The fifth pillar is less a discrete feature and more the emergent property of the other four working together.
The loop works like this: you do work, the agent learns from it, you save things to memory or turn repeatable steps into skills, and the agent searches past sessions when old context becomes relevant. Each iteration makes the agent more accurate and more efficient for your specific workflows.
The honest caveat from the source material is worth quoting directly: “automatic does not mean magic.” The loop works best when you actively correct mistakes, ask the agent to save things to memory, and let it create and update skills after complex work. Passive use produces some improvement. Active use produces compounding improvement.
The self-improving loop is also why the agent’s soul file evolves — every piece of feedback you give is a data point that gets encoded somewhere persistent. The agent isn’t just executing tasks; it’s building a model of how you want tasks executed.
What the Architecture Actually Prevents
The five pillars aren’t just features. Each one is a direct response to a specific failure mode in agent design.
How Remy works. You talk. Remy ships.
Memory prevents context amnesia — the frustrating experience of re-explaining yourself every session. Skills prevent inconsistency — the same task done differently each time because the agent is improvising rather than following a procedure. Soul prevents tone drift — an agent that gradually becomes more verbose or less aligned with your communication style because there’s no persistent personality definition. Crons prevent the agent from being purely reactive — you get proactive, scheduled work without managing infrastructure. The self-improving loop prevents plateau — the agent gets better the more you use it, rather than staying static.
The Docker container isolation model reinforces all of this. Each Hermes agent gets its own .env file, its own API keys, its own memory. A marketing agent and a finance agent running on the same VPS don’t share credentials or context. This is the least-privilege principle applied to agent architecture: give each agent only the credentials and tools needed for its specific job.
The Non-Obvious Detail: API Key Security and the Config Pattern
Most people, when they get a GitHub token, drop it into the chat. That’s the intuitive move — you’re already talking to the agent, just give it the key.
Hermes has a better pattern. You run hermes config set GITHUB_TOKEN [token] directly in the terminal. This writes the token to /opt/data.env inside the Docker container without it ever appearing in the conversation history. The agent can then reference GITHUB_TOKEN in skills and crons without the key being stored in a session log that might sync to an external model’s servers.
This matters more if you’re using a cloud-hosted inference provider (like the OpenAI Codex OAuth path, which uses your ChatGPT subscription instead of API keys). With a fully local open-source model, the risk is lower. But the pattern is worth building as a habit regardless — it’s the same discipline you’d apply to any production system.
The Codex OAuth path itself is worth a note: you can authenticate Hermes using your existing $20/month ChatGPT subscription rather than paying for separate API access. For builders who are already paying for ChatGPT and want to experiment with a self-hosted agent without adding another billing line, this is a meaningful cost reduction.
When thinking about how agents handle credentials and orchestration at scale, platforms like MindStudio take a different approach — 200+ models, 1,000+ integrations, and a visual builder for chaining agents and workflows — which trades the self-hosted control model for managed infrastructure. Both patterns have their place depending on how much you want to own the runtime.
Scaling the Architecture: When to Split Agents
One Hermes agent can handle a lot. The temptation to spin up dedicated agents for every function is real but premature.
The decision tree from the source material is practical: Does the new function need different permissions, secrets, or tools? If yes, new agent. Does it need separate long-term memory? If yes, new agent. Is it ongoing repeated work with a distinct audience? If yes, new agent. Otherwise, keep it in your main personal agent.
The anti-pattern to avoid is one mega-agent with all the API keys, all the skills, and dozens of crons running simultaneously. That creates high confusion, high debugging cost, and high risk if something goes wrong. The better pattern is agents as containers — each with its own memory, its own credentials, its own scope.
Other agents ship a demo. Remy ships an app.
Real backend. Real database. Real auth. Real plumbing. Remy has it all.
The skills and crons that live as markdown files make migration easy. When you’re ready to split a finance function into its own agent, you copy the relevant skill files over. There’s no complex state transfer, no database migration. The agent’s knowledge is in files you can move.
If you’re building toward a multi-agent architecture where agents delegate to each other, the multi-agent company pattern with heartbeat scheduling covers how to wire up agent hierarchies — the same principles apply whether you’re using Paperclip or Hermes containers.
What to Build This Week
If you’re setting up Hermes for the first time, the sequence that produces the fastest compounding return is:
First, connect it to a GitHub repo immediately — before you start building skills or giving it context. The nightly sync cron (“every night at 12am Central time, push changes to this GitHub repo”) is a single sentence that produces a skill, a cron job, and a safety net for everything you build afterward.
Second, spend 10 minutes on onboarding. Tell it your name, your goals, your team structure, what you’re working on. This populates user.md and memory.md with enough context that every subsequent session starts with a working model of you.
Third, watch what it does. When it invokes a skill, notice which one. When it doesn’t invoke a skill and should have, tell it to update the YAML front matter so it recognizes the trigger next time. The feedback loop is the product.
Fourth, give each agent its own accounts and API keys. Not your personal Gmail, not your main API key with full scope. Treat it like a new hire: scoped access, named credentials, auditable usage.
The self-improving skills pattern from AutoResearch is worth reading alongside this — the overnight improvement loop Karpathy describes maps cleanly onto how Hermes’s skill update cycle works. The underlying principle is the same: encode what works, measure against what doesn’t, iterate.
The architecture is designed to reward consistent use. The more you correct it, the better it gets. The more you build skills, the more consistent it becomes. The more crons you set up, the more proactive it becomes. That’s not a marketing claim — it’s just what happens when you build persistent memory, procedural knowledge, and scheduled execution into the same system.
Tools like Remy take a related approach to persistence and precision in a different domain: you write a spec — annotated markdown — and a complete full-stack TypeScript application gets compiled from it, with the spec as the source of truth rather than the code. The underlying idea is similar to Hermes’s skill files: structured, readable documents that encode intent precisely enough for a machine to execute reliably.
The five pillars aren’t complicated. They’re just consistent. And consistency, applied over weeks and months of real use, is what separates an agent that helps from one that frustrates.