Claude Code Skills: How to Build Self-Improving AI Workflows for Your Business

Q: What file format do skills use?

Skills are written in markdown. The core file is skill.md. Supporting files (learnings.md, eval.json, examples) use markdown and JSON respectively. There's no proprietary format — everything is plain text and version-controllable in git.

What Makes Claude Code Skills Different from Regular AI Prompts

Most people use AI the same way every time: type a prompt, get an output, tweak the prompt, repeat. It works, but it doesn’t compound. Each session starts fresh. The model doesn’t remember what worked last time. You’re the only one carrying forward what you learned.

Claude Code skills break that pattern. A Claude Code skill is a reusable process document — typically a markdown file — that tells Claude exactly how to execute a specific task. Not just what to do, but the precise steps, in order, with decision logic included. Think of it as a standard operating procedure your AI agent can actually follow.

The reason this matters for business automation is simple: skills persist. They sit in your project directory and get invoked every time that task runs. And with a learnings loop built in, they get better with each run rather than staying static. If you’re new to how skills are structured, this overview of what Claude Code skills are and how they work is a good place to start.

This guide covers how to build them, how to make them self-improving, and how to chain multiple skills into end-to-end business workflows.

The Anatomy of a Claude Code Skill

Before you build one, it helps to understand what a skill actually contains — and what it deliberately excludes.

The skill.md file

Remy doesn't build the plumbing. It inherits it.

Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.

WHAT REMY DOESN'T HAVE TO BUILD

200+

AI MODELS

GPT · Claude · Gemini · Llama

✓

1,000+

INTEGRATIONS

Slack · Stripe · Notion · HubSpot

✓

MANAGED DB

AUTH

PAYMENTS

CRONS

Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.

The core file is skill.md. This is where the process lives. It should contain:

A clear description of what the skill does
Numbered steps the agent follows in sequence
Decision points and conditionals (“if X, do Y”)
References to other files the agent should read

What it should not contain: raw brand context, example outputs, reference materials, or historical learnings. Those belong in separate files. Keeping skill.md focused only on process steps is what prevents the file from becoming bloated and degrading over time — a real problem called context rot that makes agents perform worse as files grow.

Supporting files

A well-structured skill directory typically looks like this:

/skills/
  /blog-writer/
    skill.md          ← process steps only
    learnings.md      ← feedback and improvements over time
    examples/         ← reference outputs for tone and format
    eval.json         ← test cases for scoring output quality

Each file has a specific job. The separation is intentional. When Claude reads skill.md, it should get clean, unambiguous instructions without wading through dozens of historical notes and examples.

What goes in learnings.md

The learnings.md file is where the self-improvement happens. After each skill run, the agent (or you) appends structured notes about what worked, what didn’t, and what edge cases appeared. Over time, this file becomes a knowledge base the agent references before executing the skill.

A simple entry might look like:

## 2026-04-15 — Blog post run
- Headlines with numbers performed better than question-based headlines
- Client asked to avoid passive voice; apply to all future runs
- Intro paragraphs over 80 words were flagged as too long

The skill picks up these notes on the next run and adjusts accordingly. That’s the basic feedback loop.

How to Create Your First Claude Code Skill

Here’s a practical walkthrough. This example builds a blog post writing skill, but the pattern applies to any repeatable business task.

Step 1: Define the task boundaries

Before writing any files, answer three questions:

What is the input to this skill? (e.g., a topic, a brief, a URL)
What is the output? (e.g., a markdown file, a JSON object, a drafted email)
What are the non-negotiable rules? (e.g., word count, tone, format)

Clarity here prevents ambiguous instructions later. If you can’t define the boundaries cleanly, the skill will be inconsistent.

Step 2: Write the process steps

Open skill.md and write numbered steps. Be specific. “Write a good intro” is not a step — “Write a 60–80 word introduction that states the problem, names the target reader, and ends with a clear statement of what this post covers” is a step.

Example structure:

# Blog Writing Skill

## Purpose
Write a complete blog post draft from a brief.

## Inputs
- topic (string)
- target audience (string)
- target word count (number)

## Steps
1. Read learnings.md and note any active rules or patterns to apply
2. Read the brief and identify the primary keyword
3. Write an outline with 5–7 H2 sections
4. Write the introduction (60–80 words, problem-first framing)
5. Write each section following the outline
6. Write a conclusion with 3–5 bullet takeaways
7. Run a self-check: confirm keyword appears in first 100 words, word count is within 10% of target
8. Output the completed draft in markdown

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

That’s a real skill. It’s explicit, sequential, and gives Claude clear success criteria.

Step 3: Create an empty learnings.md

Even if it’s blank to start, create the file. The skill references it in Step 1. Having it present — even empty — means the skill doesn’t error out looking for it.

Step 4: Add a few seed examples

Drop one or two good example outputs into an /examples folder. These aren’t required, but they help calibrate tone and format from the first run. Claude uses them as reference, not templates to copy.

Step 5: Test it manually

Run the skill on a real task. Review the output critically. What was off? What worked well? Write your first learnings.md entry based on that review. You now have a working skill with one iteration of feedback already baked in.

The Learnings Loop: How Skills Get Better Over Time

A skill you never update is just a static prompt. The learnings loop is what turns it into something that compounds.

The learnings loop works in three phases:

Phase 1 — Run the skill. The agent executes the process steps and produces an output.

Phase 2 — Evaluate the output. This can be manual (you review and score it), semi-automated (a wrap-up skill scores it against criteria), or fully automated using eval.json test cases that score outputs against binary pass/fail criteria.

Phase 3 — Update learnings.md. New patterns, corrections, and edge cases get appended. On the next run, the skill reads these and adjusts.

Using eval.json for automated scoring

For teams running skills at volume, manual review doesn’t scale. That’s where eval.json comes in. You define a set of test cases with expected outputs or binary criteria. After each run, the eval scores the output against those criteria and logs the result.

A simple eval.json might look like:

{
  "evals": [
    {
      "name": "keyword_in_intro",
      "description": "Primary keyword appears in first 100 words",
      "type": "binary"
    },
    {
      "name": "word_count_within_range",
      "description": "Word count is between 2400 and 3600",
      "type": "binary"
    },
    {
      "name": "no_passive_voice_in_headers",
      "description": "All H2 and H3 headers use active voice",
      "type": "binary"
    }
  ]
}

Each eval either passes or fails. Failed evals trigger automatic notes in learnings.md. Building self-improving skills with binary evals is a more detailed approach to this pattern — worth reading if you’re running skills repeatedly.

What good learnings.md entries look like

The format matters. Vague notes don’t help the agent. Good entries are:

Dated — so you can track when a rule was introduced
Specific — “avoid passive voice” is better than “write better”
Scoped — “apply to blog posts only” prevents cross-contamination with other skills

Over time, learnings.md becomes a living document that encodes everything you’ve learned about running this task well. Building a self-learning skill with a learnings.md file covers the formatting conventions in more detail.

Chaining Skills Into Full Business Workflows

Individual skills are useful. Chained skills are where the real automation happens.

Other agents start typing. Remy starts asking.

YOU SAID "Build me a sales CRM."

REMY ASKS

01 DESIGN Should it feel like Linear, or Salesforce?

02 UX How do reps move deals — drag, or dropdown?

03 ARCH Single team, or multi-org with permissions?

Scoping, trade-offs, edge cases — the real work. Before a line of code.

The pattern is straightforward: the output of one skill becomes the input of the next. A research skill feeds a brief-writing skill. The brief feeds a draft-writing skill. The draft feeds an editing skill. The edited draft feeds a publishing skill.

Each skill in the chain does one job well. None of them tries to do everything.

A content marketing workflow example

Here’s a five-skill chain for a content marketing pipeline:

Research skill — Takes a topic, searches for relevant sources, outputs a structured brief with key points, stats, and questions to answer
Outline skill — Takes the brief, produces a section-by-section outline with word count targets per section
Draft skill — Takes the outline, writes the full post section by section
Edit skill — Takes the draft, checks for clarity, passive voice, keyword density, and brand voice alignment
Format skill — Takes the edited draft, formats it for the publishing platform (adds metadata, internal links, formatting tags)

Each skill has its own skill.md, its own learnings.md, and its own eval criteria. They improve independently. The output quality of the whole pipeline compounds as each component gets better.

Building a 5-skill agent workflow for content marketing walks through this pattern in detail, including how to pass structured data between skills.

Coordination: sequential vs. parallel

Not all skills need to run in sequence. Some can run in parallel and merge results. A research skill might run three sub-queries simultaneously, then a synthesis skill combines them. This is faster and often produces better results for complex research tasks.

The five agentic workflow patterns — from simple sequential chains to fully autonomous orchestration — are worth understanding before you design a multi-skill system. The pattern you choose depends on how much the output of each step depends on the previous one.

Shared brand context across skills

One problem with skill chains is keeping brand voice consistent across multiple agents. If each skill reinvents the wheel on tone and style, outputs feel fragmented.

The solution is a shared brand context file — sometimes called a “business brain” file — that sits at the root level and every skill reads at the start of its run. It contains:

Brand voice guidelines
Audience definition
Terminology standards (words to use, words to avoid)
Formatting defaults

Sharing brand context across all skills through a central file means you update brand guidelines in one place and every skill in your system picks them up automatically.

Making Your Skills Smarter: Advanced Self-Improvement Patterns

Once you have a working skill and a basic learnings loop running, there are a few patterns that significantly accelerate improvement.

The wrap-up skill

Instead of manually updating learnings.md after every run, you can build a dedicated wrap-up skill. After the main skill completes, the wrap-up skill:

Reads the output
Scores it against your eval criteria
Identifies what changed from the previous run
Appends structured notes to learnings.md automatically

This makes improvement continuous rather than dependent on you remembering to update things. Building a self-learning AI skill system with a wrap-up skill covers the implementation in detail.

A/B testing skill variants

Remy doesn't write the code. It manages the agents who do.

AGENTS ASSIGNED TO THIS BUILD

Remy

Product Manager Agent

Leading

Design

Engineer

Deploy

Remy runs the project. The specialists do the work. You work with the PM, not the implementers.

Skills 2.0 introduced built-in evaluation and A/B testing. Instead of guessing which version of a skill performs better, you run both versions against the same input and let the eval scores decide. Claude Code Skills 2.0’s evaluation and A/B testing features make this systematic rather than manual.

This is especially useful for skills where output quality is subjective — like tone or persuasiveness. Binary evals force you to define what “good” actually means, which makes improvement measurable.

Watching for context rot

As learnings.md grows, there’s a real risk of degradation. The file gets long, contradictory notes accumulate, and the agent spends more time parsing historical context than executing the actual skill. This is context rot — and it’s more common than people expect.

Signs your skill is experiencing context rot:

Output quality is declining despite more learnings being added
The agent is taking longer to run
Outputs are contradicting earlier learnings rather than building on them

The fix is periodic pruning. Review learnings.md quarterly, consolidate redundant entries, and remove rules that have been superseded by better ones. Keep the file under 500 lines as a rough guideline.

Building Skills for Common Business Tasks

Claude Code skills work for any repeatable, describable process. Here are a few business functions where they show consistent results:

Content operations

Content is the most common entry point because it’s high-volume, repeatable, and easy to evaluate. Skills for blog drafting, social media repurposing, and email sequences all follow the same basic pattern. Automating social media content repurposing with Claude Code Skills is a good concrete example of a production-ready skill in this category.

Standard operating procedures

Any process you currently document in a wiki or runbook can become a skill. The difference is that a skill isn’t just documentation — it’s executable. The agent reads the steps and does the work, not just references them. Building standard operating procedures as Claude Code Skills explains how to convert existing process docs into agent-ready skill files.

Research and analysis

Research tasks benefit enormously from skill structure because they’re hard to prompt consistently. A research skill defines exactly what sources to check, how to structure findings, what format the output takes, and how to cite sources. Output quality becomes predictable rather than variable.

How Remy Connects to Spec-Driven Skill Development

Claude Code skills are, fundamentally, a documentation layer — a structured way of describing what an agent should do. That’s not coincidentally similar to how Remy works.

Remy’s core idea is that the spec is the source of truth, and code is derived from it. You describe your application in an annotated markdown spec — what it does, its rules, its data types — and Remy compiles that into a full-stack application: backend, database, auth, deployment.

The parallel to skills is direct. In both cases, the structured document is the real work. The execution (whether that’s an AI agent running steps or Remy generating TypeScript) is derived from it.

If you’re building internal tools — a skill management dashboard, a content pipeline tracker, a review and approval workflow — Remy can build the full-stack application that wraps around your Claude Code skill system. You describe the app in a spec, and Remy handles the backend, database, and frontend.

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

goremy.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

You can try Remy at mindstudio.ai/remy and have a working full-stack application from a spec in a single session.

Common Mistakes to Avoid

Most skill systems that underperform have the same issues. Here are the ones worth watching for.

Overloading skill.md

The most common mistake: cramming everything into one file. Brand guidelines, examples, historical notes, process steps — all mixed together. The agent can’t distinguish signal from noise. The three most common Claude Code skill mistakes all trace back to file structure problems.

Keep skill.md to process steps only. Everything else has its own file.

Skipping the eval layer

Running a skill without evals means you have no objective measure of whether it’s improving. You’re just guessing. Even simple binary evals (does the output meet minimum word count? does it include a call to action?) give you something to track.

Building skills that are too broad

A skill that does “marketing content” is too broad. A skill that writes LinkedIn posts in under 200 words using your brand voice is specific enough to execute reliably. Narrow scope leads to consistent output.

Never pruning learnings.md

Adding notes forever without reviewing them is how you end up with context rot. Set a reminder to review learnings.md after every 20–30 runs. Remove what’s no longer relevant, consolidate duplicates, and simplify.

Frequently Asked Questions

What is a Claude Code skill, exactly?

A Claude Code skill is a structured markdown document that describes how to execute a specific task, step by step. When Claude reads a skill file, it follows the process as written rather than improvising from scratch. This makes outputs consistent and repeatable. Skills typically include a process file (skill.md), a learnings file (learnings.md), and optionally examples and eval criteria.

How is a Claude Code skill different from a prompt?

A prompt is a one-time instruction. A skill is a persistent, version-controlled process document that gets better over time. Prompts start fresh every session. Skills accumulate feedback through a learnings loop and apply what they’ve learned to future runs. The distinction matters a lot at scale — a skill running 50 times is significantly better than a prompt running 50 times independently.

Can Claude Code skills work together in a workflow?

Yes — this is one of the main reasons to use them. Skills can be chained so the output of one becomes the input of the next. A research skill feeds a writing skill. A writing skill feeds an editing skill. Each skill in the chain improves independently, and the overall pipeline quality compounds. Chaining skills into end-to-end workflows is a well-documented pattern with specific implementation guidance.

How does the self-improvement mechanism actually work?

REMY IS NOT

✕a coding agent
✕no-code
✕vibe coding
✕a faster Cursor

IT IS

✓a general contractor for software

The one that tells the coding agents what to build.

After each skill run, observations about what worked and what didn’t get appended to learnings.md. On the next run, the skill reads this file before executing its steps and adjusts accordingly. This can be manual (you write the notes), semi-automated (a wrap-up skill writes them), or fully automated using eval.json test cases that score outputs and trigger notes automatically. The compounding knowledge loop this creates is what makes skills more valuable over time.

How many skills does a typical business workflow need?

Most functional workflows use between 3 and 7 skills. Fewer than 3 usually means skills are doing too much. More than 7 often means the workflow needs restructuring, not more skills. Content pipelines typically run 4–6 skills. Research workflows run 3–4. Start with the minimum number that produces reliable output, then add skills as the workflow grows in complexity.

What file format do skills use?

Skills are written in markdown. The core file is skill.md. Supporting files (learnings.md, eval.json, examples) use markdown and JSON respectively. There’s no proprietary format — everything is plain text and version-controllable in git.

Key Takeaways

A Claude Code skill is a reusable process document, not a prompt. The distinction determines whether your automation compounds or stays flat.
Keep skill.md to process steps only. Brand context, examples, and learnings belong in separate files.
The learnings loop — run, evaluate, update learnings.md, repeat — is what makes skills self-improving. Automate it with a wrap-up skill or eval.json for consistency.
Chain skills into workflows where each skill does one job well. The output of one becomes the input of the next.
Watch for context rot. Prune learnings.md regularly to keep performance from degrading as the file grows.
Start narrow. A skill that does one thing reliably is more useful than a skill that attempts to do everything inconsistently.

If you’re building internal tools to manage or extend your skill-based workflows, try Remy — describe what you want to build in a spec and get a full-stack application back, backend and all.