Claude Code Skills Architecture: 4 Layers That Keep Your AI Agent Fast and Focused
The .claude/skills/ folder uses progressive context loading — only ~100 tokens read at search time — to keep Claude Code lightweight across dozens of SOPs.
Four Layers Inside .claude/skills/ That Keep Claude Code From Drowning in Its Own Context
Most people who build Claude Code skills make the same mistake: they dump everything into one enormous markdown file and wonder why the agent gets slow, expensive, and confused. The fix is already baked into the skills architecture — you just have to understand the four layers it uses, and why each one exists.
The core mechanic is progressive context loading. When Claude Code searches for a skill, it reads only the YAML front matter — roughly 100 tokens — not the full file. That means you can have 30 skills in your .claude/skills/ folder and the search phase costs almost nothing. Only when Claude identifies the right skill does it load the full skill.md. Only when the skill actually needs a reference file does that file get pulled in. Three levels, each triggered on demand.
This architecture is what separates a skills folder that stays fast at scale from one that grinds to a halt by week three.
Layer 1: The YAML Front Matter (~100 Tokens, Always Read)
Every skill.md file starts with a YAML block between two sets of triple dashes. At minimum, it needs two fields: name and description. That’s it. That’s the entire search index.
---
name: pulse-check
description: Used when the user asks for a project status update, wants to check on commitments, or says 'how are we doing'
---
Seven tools to build an app. Or just Remy.
Editor, preview, AI agents, deploy — all in one tab. Nothing to install.
The description is doing real work here. Claude Code reads this block during skill discovery and decides whether this skill matches the current request. If you write a vague description — “does stuff with projects” — you’ll get missed triggers and false positives. If you write a specific one that includes the natural language someone would actually use, the matching gets reliable fast.
The 100-token budget is a hard constraint to design around. Keep the front matter tight. The name and description should be enough to answer: “Is this the right skill for this job?” Nothing else belongs here.
You can also add optional fields in the front matter: allowed_tools to restrict which tools the skill can invoke, a specific model to use (Claude Design, for instance, specifies Opus 4.7 specifically because the self-QA screenshot loop requires a stronger vision model), or disable_model_invocation to force explicit slash-command triggering only. These are power-user additions. Start with just name and description.
Layer 2: The skill.md Body (Full SOP, Under 500 Lines)
Once Claude identifies the right skill, it loads the full skill.md. This is where your SOP lives.
The Anthropic docs are explicit: keep skill.md under 500 lines. Move detailed reference material to separate files. This isn’t arbitrary — it’s the threshold where Claude can reliably recall the full instruction set without degradation.
The structure that works is simple: a goal statement, then numbered phases, then rules. Here’s what the anatomy looks like in practice, using the /audit skill from the AIOS template as a reference point:
---
name: audit
description: Grades the current AIOS on the four C's (Context, Connections, Capabilities, Cadence). Run when the user asks for an audit or wants to know their score.
---
## Goal
Produce a scored audit of the current project against the four C's framework. Output a score out of 100 with specific gaps and next steps.
## Phase 1: Scan
Read the project structure. Check context folder for about_me.md, about_business.md, priorities.md. Check .env for active connections. List all skills in .claude/skills/.
## Phase 2: Grade
Score each C out of 25...
## Rules
- Never ask the user questions during the audit. Work from what exists.
- If a file is missing, count it as a gap, not an error.
The key discipline is separating what Claude needs to know from what Claude needs to do. Instructions go in the body. Reference material — API docs, brand guidelines, example outputs — goes in Layer 3.
One pattern worth stealing from the pulse-check skill example: hardcode stable values directly in the skill file. If your skill always queries the same three ClickUp list IDs, put those IDs in the skill. Every time the skill ran without them, it was spending tokens discovering them dynamically. Hardcoding eliminated that overhead entirely.
Layer 3: Reference Files (Loaded On Demand)
This is where most of the token savings actually come from.
Reference files are anything the skill might need but doesn’t always need: API endpoint documentation, tone-of-voice guides, competitor lists, brand assets, example outputs. They live either nested inside the skill folder or elsewhere in the project — it doesn’t matter, as long as the skill.md points to the right path.
The two common layouts:
Self-contained — everything nested under the skill:
.claude/skills/linkedin-post/
skill.md
references/brand-voice.md
references/post-examples.md
Shared references — files live in a central location:
.claude/skills/linkedin-post/
skill.md ← references ../../references/brand-voice.md
references/
brand-voice.md ← shared across multiple skills
Not a coding agent. A product manager.
Remy doesn't type the next file. Remy runs the project — manages the agents, coordinates the layers, ships the app.
The shared approach is usually better for anything that multiple skills use — brand voice, API documentation, company context. Update it once, every skill that references it gets the update.
The ClickUp API reference strategy is a good concrete example of this pattern working at scale. Rather than having Claude search the web every time it needs an endpoint, you scrape the entire ClickUp API documentation once, save it as a .md file in your references folder, and point skills at it. Processing a local markdown file is dramatically cheaper than making HTTP requests and crawling documentation pages. The same logic applies to any external API your skills use regularly.
For context on how this connects to broader token management, the 18 Claude Code token management hacks post covers the full range of techniques — reference file strategy is one of the highest-leverage ones.
Layer 4: The claude.md Skill Registry (The Dispatcher)
There’s a fourth layer that most tutorials skip: the claude.md file at the project root needs to know your skills exist.
The claude.md is the master prompt for the project. It gets read at the start of every session. If it doesn’t mention your skills — what they’re called, when to invoke them — Claude will sometimes miss them entirely, especially for natural language triggers.
The relevant section looks like this:
## Your Skills
You have the following skills available. Invoke them when the user's request matches.
- /onboard — Run when the user wants to set up or initialize the AIOS
- /audit — Run when the user asks for an audit, a score, or wants to know gaps
- /level-up — Run when the user wants automation ideas or feels stuck
- /pulse-check — Run when the user asks for a project status update
This is the dispatcher layer. It’s how Claude knows to look in .claude/skills/ at all, and it’s how natural language triggers get matched to the right skill before the YAML search even runs.
Keep this section updated as you add skills. A skill that isn’t registered in claude.md is a skill that will get missed half the time.
Building Your First SOP Skill From Scratch
The six-step framework that actually works in practice:
1. Name and trigger. What’s it called? What natural language would fire it? Be specific in both.
2. Goal statement. One sentence: what will exist at the end of this skill that didn’t exist before?
3. Step-by-step process. Write it the way you’d explain it to a new employee. What do you look at? In what order? What decisions do you make?
4. Reference files. What context does the skill need that isn’t in the instructions? Brand voice, API docs, example outputs — list them and point to their paths.
5. Rules and guardrails. What could go wrong? What should never happen? Add explicit constraints.
6. Feedback loop. After you run it the first few times, watch what it does. When it makes a mistake, update the skill file. The skill improves with use.
Other agents ship a demo. Remy ships an app.
Real backend. Real database. Real auth. Real plumbing. Remy has it all.
The feedback loop is where most of the quality gains come from. The first run of a new skill is rarely good. The tenth run, after you’ve patched the edge cases, usually is.
One practical note on the onboarding flow: the template includes an onboard skill that runs a seven-question interview and scaffolds three context files — about_me.md, about_business.md, and priorities.md — into the context folder. Running this before you build any custom skills gives every subsequent skill a foundation to reference. The /level-up skill, which asks five diagnostic questions to surface automation opportunities, works much better when those context files exist.
If you’re building skills that chain together into longer workflows — a content pipeline where skill A produces a transcript, skill B writes a draft, skill C formats for LinkedIn — the Claude Code content marketing skill system post covers how to structure those pipelines so each skill stays modular and reusable across different chains.
The Onboard → Audit → Level-Up Loop
The three built-in skills in the AIOS template are designed to work as a system, not in isolation.
/onboard runs once at setup. It interviews you, creates the context files, and gives the project enough grounding to be useful from day one.
/audit grades the project against the four C’s framework — Context, Connections, Capabilities, Cadence — and scores it out of 100. Run it weekly. The score tells you where to invest next. A project with strong context but no connections scores well on the first C and poorly on the second; that’s a clear signal to wire up your next API integration.
/level-up asks five questions: what did you do three or more times this week, what felt manual and boring, what could a smart intern handle, what would break if you got 500 new customers, and what would give you 500 more customers if it ran on autopilot. Answer those honestly and you will always have a clear next skill to build.
The loop is: onboard once, audit weekly, level-up whenever you’re stuck. This is the maintenance cadence for an AIOS that actually improves over time rather than stagnating after the initial setup.
For the cadence layer — getting skills to run on a schedule while your laptop is closed — the architecture shifts to Claude Code remote routines, which run on Anthropic’s cloud against a cloned GitHub repo. The key gotcha there is that .env files never reach the cloud environment; API keys have to be set in the Cloud Environment settings panel as environment variables. That’s a separate topic, but the Claude Code skills vs plugins post covers how skills fit into the broader automation picture.
Where Skills Break (And How to Fix Them)
A few failure modes come up repeatedly:
Wrong skill triggers. The description in the YAML front matter isn’t specific enough. Fix: add the exact natural language phrases someone would use to trigger it.
Skill doesn’t trigger at all. Either the YAML is malformed, or the skill isn’t registered in claude.md. Fix: check both.
Day one: idea. Day one: app.
Not a sprint plan. Not a quarterly OKR. A finished product by end of day.
Same mistake every run. The skill is missing a rule that prevents it. Fix: add an explicit constraint to the rules section. “Never do X” is more reliable than hoping the model infers it.
Skill works but is slow. It’s loading reference files it doesn’t always need, or it’s doing dynamic lookups that could be hardcoded. Fix: audit what the skill is actually reading on each run and move stable values into the skill file directly.
Skill triggers too often. It’s matching requests it shouldn’t. Fix: tighten the description, or use disable_model_invocation: true in the front matter and require explicit slash-command triggering.
The Claude Code autoresearch self-improving skills post covers a more systematic approach to this iteration loop — using an automated feedback mechanism to improve skill quality over time without manual intervention.
The Portability Argument
One thing worth stating directly: skills are just markdown files. They work in Claude Code, Cursor, Codex, and any other tool that reads a .claude/ folder or equivalent. The YAML front matter format is simple enough that most agent frameworks can parse it.
This matters because the tools will keep changing. The skills you build today in Claude Code can migrate to whatever comes next with minimal friction. The SOP logic you encode in a skill.md is yours — it’s not locked into any particular platform’s proprietary format.
That portability is also why the progressive context loading architecture is worth understanding at the file level rather than just using it as a black box. When you know that only the YAML front matter gets read at search time, you write better descriptions. When you know the 500-line limit on skill.md, you move reference material to separate files instead of cramming everything in. When you know the claude.md dispatcher needs to know about your skills, you keep it updated.
Platforms like MindStudio take a different approach to this orchestration problem — 200+ models, 1,000+ integrations, and a visual builder for chaining agents and workflows — which is useful when you want to connect skills to external tools without writing the API integration code yourself.
The four-layer architecture isn’t complicated. YAML front matter for discovery, skill.md body for instructions, reference files for context, claude.md for dispatch. Each layer has a job. Keep them clean and the whole system stays fast.
The builders who get the most out of Claude Code skills aren’t the ones who write the most elaborate prompts. They’re the ones who understand which layer a given piece of information belongs in — and put it there instead of somewhere else. If you’re thinking about how this kind of structured spec-writing scales to full application development, tools like Remy apply a similar discipline: you write annotated markdown as the source of truth, and the full-stack app — TypeScript backend, database, auth, deployment — gets compiled from it. The spec is precise; the generated output is derived.
The skills folder is a small system. Build it right and it compounds. Build it wrong and you’ll spend more time debugging your agent than using it.