How to Reverse-Engineer a Claude Code Skill from a Winning Output

When You Get a Perfect Output Once and Never Again

That’s the frustrating part of working with AI. You ask Claude something, it produces exactly what you needed — the right tone, the right structure, the right level of detail. You think, “That’s it.” Then you close the tab, and when you try to recreate it tomorrow, you get something entirely different.

The solution isn’t to keep prompting until lightning strikes again. It’s to work backward from that winning output and turn the implicit logic behind it into an explicit, reusable Claude Code skill. This process — reverse-engineering a Claude Code skill from a great result — is one of the highest-leverage things you can do to get consistent outputs from your AI workflows.

This guide walks through the full process: identifying what made a particular output work, extracting the underlying prompt structure, encoding it into a reusable skill, and testing it until it reliably performs.

What “Reverse-Engineering a Skill” Actually Means

Before the steps, it helps to be precise about what we’re doing.

When you prompt Claude Code without any structure, you’re asking it to make hundreds of micro-decisions: what format to use, how long to go, what level of detail to include, which perspective to take, what to prioritize. Sometimes those decisions align perfectly with what you need. Most of the time, they don’t.

A Claude Code skill is a structured prompt — often stored as a custom slash command in .claude/commands/ or as a section in your CLAUDE.md project file — that pre-makes those decisions for Claude. It constrains the model’s output space so it consistently produces a specific type of result.

Reverse-engineering means you already have the result you want. You’re not designing the skill from scratch. You’re looking at the output, figuring out what decisions produced it, and making those decisions explicit.

This is different from prompt engineering in the conventional sense. You’re not guessing what might work. You’re working backward from something you know already does.

Step 1: Find and Document Your Winning Output

The process starts with the output itself. Pull it up and save it somewhere you can reference. You need the complete, unedited response — not a paraphrase.

What counts as a “winning” output?

Look for outputs that:

Hit the right length without being padded or truncated
Used a structure you’d want to replicate exactly (specific headers, bullet format, section order)
Matched a specific tone or voice on the first try
Solved a recurring problem well enough that you thought, “I wish it always did this”
Got approval or positive feedback from someone else without revision

If you can’t point to a specific output, that’s a signal you haven’t found your winning example yet. Keep a running document where you save strong Claude outputs as you get them. Over a few days of normal use, you’ll accumulate several candidates.

Reconstruct the context

Find the original conversation or prompt thread if you can. You need to know:

What you said to Claude (the exact prompt, not your memory of it)
What context Claude had access to (files, code, instructions in your CLAUDE.md)
Whether you were in a fresh session or had prior conversation context
Any follow-up prompts that shaped the final output

If you can’t find the original prompt, that’s fine — you can reconstruct it. But the output is non-negotiable. That’s your north star for this whole process.

Step 2: Analyze the Output for Structural Patterns

This is the core analytical step. Read the output slowly and annotate it. You’re looking for implicit decisions that Claude made.

Identify the output’s anatomy

Break the winning output into its components. For a piece of writing, that might be:

Opening hook style (statement, question, statistic, scenario)
Number and type of sections
Average section length
How transitions work
Closing structure

For a code output:

File and function naming conventions used
Comment density and style
Error handling approach
Whether tests were included and how they were structured
Import organization

For a data analysis or structured document:

The hierarchy of information (what came first and why)
How caveats or uncertainty were handled
The level of explanation vs. raw output

Ask: what would have to be in the prompt for this to appear in the output?

This is the key question. Go element by element through the output and ask what instruction or context would have been necessary to produce it.

Hire a contractor. Not another power tool.

Cursor, Bolt, Lovable, v0 are tools. You still run the project.
With Remy, the project runs itself.

If the output used exactly three bullet points per section, Claude was either told to or inferred it from some constraint. If the code always checked for null values early, there was likely either an explicit instruction or a code pattern in context. If the tone was formal but not stiff, something in the prompt or CLAUDE.md shaped that.

Write down each inference. You’re building a list of the implicit rules that governed the output.

Compare against a bad output for contrast

If you have a weaker version of the same task — something Claude produced that missed the mark — compare them side by side. The delta between bad and good often shows you exactly what made the winning version work.

Common differences you’ll notice:

The good output followed a specific structure the bad one didn’t
The good output had a specific constraint (word count, depth level, format) that was absent in the bad one
The good output was generated with more context available

Step 3: Extract the Implicit Prompt Logic

Now translate your annotations into explicit instructions. You’re writing the prompt that should have produced this output — or the one that reliably will going forward.

Write out your constraints explicitly

Take each implicit rule you identified and write it as a direct instruction. Be specific. Vague instructions produce vague compliance.

Weak: “Write in a clear tone.” Strong: “Write at a level a non-technical product manager could read without looking anything up. Avoid jargon. If a technical term is necessary, define it in plain language immediately after using it.”

Weak: “Format the output well.” Strong: “Structure the output with an H2 for each major section, followed by 2–3 short paragraphs (3 sentences max each). Use a bullet list only when listing 3 or more discrete items. No nested bullets.”

The more concrete the instruction, the more reliably Claude follows it.

Identify the task frame

Every good prompt has a clear task frame: who Claude is acting as, what the deliverable is, and who the audience is. Extract these from your winning output.

Role: Was Claude acting as a senior developer? A UX copywriter? A technical reviewer?
Deliverable: A function? A brief? A summary with action items?
Audience: Who is the output for? What do they know? What do they need?

If the winning output implicitly answered these, make them explicit in your skill.

Capture the input variables

Your skill needs to work across multiple inputs, not just the original one. Identify what changes between uses.

In Claude Code custom commands, variables are passed using $ARGUMENTS or more structured input blocks. Map out:

What the user will provide each time (the variable parts)
What stays constant (the fixed instructions, format rules, role framing)

A well-structured skill has a clear separation between the two. The constant parts define the behavior; the variable parts define the specific instance.

Step 4: Write the Claude Code Skill

Now you’re ready to encode what you’ve extracted into an actual reusable skill.

Using custom slash commands in Claude Code

Claude Code lets you define custom slash commands by adding Markdown files to .claude/commands/ in your project. The file name becomes the command name. A command called write-pr-summary.md is invoked as /write-pr-summary inside Claude Code.

Remy doesn't write the code. It manages the agents who do.

AGENTS ASSIGNED TO THIS BUILD

Remy

Product Manager Agent

Leading

Design

Engineer

Deploy

Remy runs the project. The specialists do the work. You work with the PM, not the implementers.

The file contains the prompt that Claude runs when you invoke the command. You can reference $ARGUMENTS to pass in dynamic values.

Here’s a simple example structure:

You are a senior software engineer reviewing a pull request for a team that values concise, readable code.

Write a PR summary for the following diff: $ARGUMENTS

Format:
- **What changed:** One sentence. No jargon.
- **Why it changed:** One sentence. Focus on the problem solved, not the implementation.
- **Testing:** What was tested and how. Be specific.
- **Risks:** Any edge cases or potential regressions to watch for. If none, say "None identified."

Tone: Direct, not conversational. Write as if the reader has 30 seconds.

That’s it. The format rules, the role framing, the output structure — all explicit, all derived from your analysis in the previous steps.

Using CLAUDE.md for project-wide skills

If a skill applies across an entire project — not just a single command — put it in your CLAUDE.md file. Claude Code reads this file at the start of every session and uses it as persistent context.

Use CLAUDE.md for things like:

Code style and naming conventions specific to this codebase
Testing standards (what to always include, what to skip)
Domain knowledge (how this system works, key terminology)
Constraints that should apply to every interaction (e.g., “Never modify the database schema directly”)

The distinction is: custom commands are task-specific skills you invoke on demand. CLAUDE.md is ambient knowledge that shapes every interaction.

Templating for multiple variations

If you have several related skills, consider templating them. Extract the shared structure into a base prompt and vary only the task-specific parts. This keeps your skills consistent and easier to maintain.

Step 5: Test Against New Inputs

Don’t trust a skill based on one successful run. Test it against inputs it hasn’t seen.

The three-input test

Run your skill against at least three different inputs:

A typical case — similar to the original winning output
An edge case — something unusual or extreme (very long input, very short, ambiguous)
A failure-prone case — the type of input that previously produced bad outputs

Compare each result against your winning output. You’re not looking for identical outputs — you’re checking that the structural and qualitative properties hold. Did it use the right format? Did it maintain the right tone? Did it handle the edge case reasonably?

Iterate on specificity

When a test fails, the fix is almost always to be more specific. Look at where the output diverged from your expectations and add or tighten the relevant instruction.

Common failure modes and their fixes:

What went wrong	What to add to the prompt
Output was too long	Specify a hard length limit or section count
Wrong tone	Add a specific tone example or negative example (“Don’t sound like…”)
Wrong structure	Add an explicit example of the desired format
Missing information	Specify what must always be included
Irrelevant information	Specify what to exclude

Each iteration gets you closer to a skill that reliably performs.

Step 6: Version and Document the Skill

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

remy.msagent.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

A skill you can’t find or remember how to use isn’t useful. Before moving on, document what the skill does and when to use it.

Add a comment block at the top of each custom command file:

<!--
SKILL: write-pr-summary
PURPOSE: Generate a concise PR summary from a diff or description.
USE WHEN: Submitting any PR that another team member needs to review.
INPUT: Paste the diff or a plain-language description of the change.
LAST UPDATED: [date]
DERIVED FROM: PR #284 summary (saved in /docs/good-examples/)
-->

That last line — noting which winning output the skill was derived from — is underrated. When the skill starts drifting or needs updating, you can go back to the source.

Keep your good examples. They’re the ground truth that your skills are trying to reliably reproduce.

How MindStudio Extends Claude Code Skills Into Full Workflows

Building a Claude Code skill that reliably produces a specific output is valuable. But often the output doesn’t live in isolation — it’s one step in a larger process. That’s where connecting Claude Code to external infrastructure becomes useful.

MindStudio’s Agent Skills Plugin (@mindstudio-ai/agent) is an npm SDK that lets Claude Code call over 120 typed capabilities as simple method calls. Once your Claude Code skill produces the output you need, you can chain it into downstream actions without writing custom infrastructure.

For example: your Claude Code skill generates a PR summary → agent.sendEmail() routes it to the right reviewer → agent.runWorkflow() logs it to your project tracker in Airtable. The skill handles the reasoning and content generation; MindStudio handles the operational layer — rate limiting, retries, auth, integrations.

This matters because the most common next step after “I got a great output” is “now what do I do with it?” MindStudio gives Claude Code a way to act on that output without you stitching together APIs manually.

You can try MindStudio free at mindstudio.ai.

Common Mistakes When Reverse-Engineering Skills

Extracting style instead of structure

It’s easy to copy the surface-level qualities of a good output (word choice, paragraph length) without capturing the structural decisions that produced them. Style follows structure. Make sure your skill encodes the underlying structure first.

Being too general

A skill that tries to do too many things well usually does all of them poorly. Narrow scope is good scope. One skill for PR summaries, another for commit messages, another for code review comments. Don’t combine them into a “write all dev docs” mega-prompt.

Not saving the original example

The winning output that started this process is evidence. Keep it. When your skill drifts — and it will, as you modify it — you need something to compare against.

Skipping the edge case tests

Most skills look fine on typical inputs. Edge cases expose the gaps. Test with the weird stuff early, before you’ve committed to the skill’s structure.

Over-engineering the prompt

Longer prompts don’t always produce better outputs. If you’re adding instructions and the output isn’t improving, you may be adding noise. Keep only the constraints that demonstrably change the output.

FAQ

What is a Claude Code skill, exactly?

Everyone else built a construction worker.
We built the contractor.

🦺

CODING AGENT

Types the code you tell it to.
One file at a time.

🧠

CONTRACTOR · REMY

Runs the entire build.
UI, API, database, deploy.

A Claude Code skill is a reusable, structured prompt that encodes specific instructions, format rules, and constraints for a recurring task. In practice, it lives as a custom slash command in .claude/commands/ or as persistent instructions in your CLAUDE.md file. The goal is to pre-make all the micro-decisions Claude would otherwise make arbitrarily, so outputs are consistent across different inputs and sessions.

How do I find the original prompt if I didn’t save it?

Check your Claude Code session history — depending on your setup, sessions may be logged. If not, you can reconstruct it by working backward from the output. Look at what context was available (open files, CLAUDE.md contents, conversation history) and what you were trying to accomplish. The output itself is the more important artifact; the original prompt is just a reference point for your reconstruction.

How specific do prompt instructions need to be?

Specific enough that there’s only one reasonable interpretation. If an instruction could be followed in two different ways, Claude will choose one — and it might not be the one you want. Test your instructions by asking: “If someone read this cold with no context, would they understand exactly what I’m asking for?” If not, tighten it.

Can I use the same skill across different projects?

Yes, and this is one of the main advantages of storing skills as files. You can copy custom command files between project directories, or maintain a personal library of commands in a shared location. Skills that don’t depend on project-specific context (like PR summary formats or code review templates) transfer cleanly. Skills that depend on domain knowledge specific to one codebase may need to be adapted.

How often should I update a skill?

Update a skill when its outputs consistently drift from your target, when the task itself changes, or when you find a new winning output that’s better than the one you originally modeled. Don’t update too frequently — each change resets your calibration. Make changes deliberately and test after each one.

Does this approach work with models other than Claude?

The core technique — identifying winning outputs, extracting implicit prompt logic, encoding it as explicit instructions — works with any instruction-following model. The specific implementation (custom slash commands, CLAUDE.md) is Claude Code-specific. Other environments have analogues: system prompts, agent instructions, workflow templates. The method transfers; the syntax doesn’t.

Key Takeaways

Start with a winning output you can point to — not a vague idea of what “good” looks like.
Analyze the output element by element to identify the implicit decisions Claude made.
Translate those decisions into explicit, specific instructions that leave no room for ambiguity.
Store the result as a Claude Code custom command or CLAUDE.md entry — not just a saved prompt you have to paste manually.
Test against typical, edge, and failure-prone inputs before treating the skill as reliable.
Keep the original winning output as your calibration reference throughout.

The biggest leverage in prompt engineering isn’t writing better first-time prompts. It’s capturing what works when you accidentally stumble onto it — and making it repeatable. Reverse-engineering a Claude Code skill is how you stop losing your best results and start building on them.

One coffee. One working app.

You bring the idea. Remy manages the project.

WHILE YOU WERE AWAY

✓Designed the data model

✓Picked an auth scheme — sessions + RBAC

✓Wired up Stripe checkout

✓Deployed to production

Live at yourapp.msagent.ai

If you want to take those skills further and connect them to real workflows — email, Slack, databases, image generation — MindStudio’s Agent Skills Plugin gives Claude Code the integrations to act on its outputs, not just produce them.