What Makes a Good App Spec? A Framework for Technical Builders

The Difference Between a Spec That Compiles and One That Doesn’t

Most specs fail before they’re ever executed. Not because the idea was bad, but because the document wasn’t doing what a spec is supposed to do.

A good spec is the source of truth for your application. It carries enough information that a compiler — or an AI agent — can produce a working, complete, full-stack app from it. A bad spec is just a description of vibes: a feature list, some wireframe notes, a paragraph explaining what users should “feel.” Useful for conversations. Useless as a build artifact.

If you’re working in spec-driven development, this distinction matters enormously. The spec isn’t documentation you write after the code. It’s the code. Everything downstream — the backend logic, the database schema, the auth flows, the UI — is compiled output. So if the spec is fuzzy, the output is fuzzy. If the spec has gaps, the compiler fills them with guesses.

This article covers what makes a spec actually good: structure, annotations, edge case coverage, scope discipline, and a few common patterns that separate specs that compile cleanly from ones that produce a mess.

Structure: What Every Good Spec Needs

A spec without structure is just a document. Structure is what makes it readable by both humans and agents — and that dual readability is the whole point.

Think of a good spec like a legal contract. The plain English says what the app does. The fine print says exactly how it does it. Both layers need to be present.

A Header That Sets Context

The first thing in your spec should establish what the application is, who it’s for, and what problem it solves. Not in five pages — in three to five sentences.

This isn’t a mission statement. It’s context for the compiler. If the agent knows the app is a B2B invoice tracker for small accountancy firms, it makes different decisions than if it thinks it’s building a general personal finance tool. That context bleeds into schema design, auth assumptions, terminology, and error messages.

What to include:

One sentence describing the application category (e.g., “A multi-user inventory management system for small retail stores”)
The primary user type (admin, end user, anonymous visitor)
Any critical constraints upfront (e.g., “All data is tenant-isolated”)

A Flat, Named Section List

After the header, list your application’s major sections or modules by name. Don’t describe them yet — just name them. This gives the compiler a map before it reads the territory.

A good section list looks something like:

## Sections
- Authentication
- Dashboard
- Inventory
- Orders
- Reports
- Admin Panel
- Settings

Each of these becomes a dedicated section below with its own behaviors, data models, and logic described in full. The flat list at the top is just the index.

Section Descriptions That Follow a Pattern

Each section should cover four things, roughly in this order:

What it is — one or two sentences describing the section’s purpose
What data it works with — the entities and fields involved
What actions are available — what a user can do here
What the rules are — validation, permissions, edge cases

You don’t need to use these as explicit headers within each section. But if a section description is missing one of these, the spec has a gap.

Annotations: Where Precision Lives

The biggest mistake technical builders make with specs is treating them as prose documents. Prose is ambiguous. “The user can filter results” tells you almost nothing useful. Filter by what fields? In what order? Are filters additive or exclusive? Does the filter state persist across sessions?

Annotations are where you inject that precision. In Remy’s spec format, annotations sit alongside readable prose — adding data types, constraints, and rules that the prose can’t carry on its own.

Specification precision is increasingly the skill that separates builders who ship clean apps from those who spend hours debugging AI-generated code that almost works. Annotations are how you achieve that precision.

Type Annotations

Every piece of data your application handles should have an explicit type. Don’t say “the user enters their age.” Say age: integer, min 0, max 120. Don’t say “the product has a price.” Say price: decimal(10,2), non-negative.

Types aren’t just for database schemas. They apply to:

Form inputs and validation
API parameters
Computed values
Status fields (enums are your friend here)

An enum is especially powerful in a spec because it eliminates ambiguity entirely. Instead of “the order can be in various states,” you write status: enum(pending, processing, shipped, delivered, cancelled). The compiler knows exactly what values are valid, what transitions are allowed, and how to render status badges.

Behavioral Annotations

Types tell you what the data looks like. Behavioral annotations tell you what the system does with it.

These come in a few forms:

Triggers: “When an order status changes to shipped, send a confirmation email to the customer”
Computed fields: “Total is calculated as quantity × unit_price, rounded to 2 decimal places”
Side effects: “Deleting a product removes it from all active cart sessions”
Cascades: “Archiving a team removes all pending invitations for that team”

Behavioral annotations are the hardest part of a spec to write well, because they require you to think through the system’s logic explicitly. That effort pays off. If the spec has the logic, the compiler generates it correctly. If the logic is missing, the compiler invents it — and what it invents may not be what you wanted.

Permission Annotations

Every action in your spec should have a clear permission model. Who can do this? Under what conditions?

A weak spec says “admins can manage users.” A good spec says:

- Admins can invite new users by email
- Admins can deactivate users (not delete)
- Admins cannot modify their own role
- Only the account owner can delete the account
- Deactivated users cannot log in but their data is preserved

This level of specificity isn’t bureaucratic — it’s what prevents the compiler from making reasonable-but-wrong assumptions about access control. Why AI app builders still struggle with databases and auth is often a spec problem at its root: the permissions weren’t explicit enough.

Edge Cases: The Spec’s Hidden Load-Bearing Work

Edge cases are where most specs fall apart — and where most AI-generated apps break in production.

It’s easy to describe the happy path. User signs up, enters their info, creates a project, invites a teammate. What happens when the email is already in use? What happens when the invited teammate is already a member? What happens when the account is at its seat limit?

If your spec doesn’t answer those questions, the compiler will. And it’ll answer them consistently with whatever it thinks makes sense — which may not match your product decisions at all.

How to Find Your Edge Cases

Work backwards from each action in your spec. For every action a user can take, ask:

What happens if this action is taken twice?
What happens if the required data doesn’t exist?
What happens if the user doesn’t have permission?
What happens if the action partially succeeds?
What happens if the action fails mid-way?

This sounds exhausting, but most applications have the same twenty edge cases recycled across different features. Once you recognize the patterns, you start writing them into specs automatically.

Common edge cases worth always specifying:

Empty states (no records, no results, no activity yet)
Duplicate submissions (form submitted twice, double-clicking confirm)
Concurrent modification (two users editing the same record)
Deletion of records that other records depend on
Expired sessions and tokens
Rate limiting behavior

Edge Cases vs. Error Messages

✗ VIBE-CODED APP

Tangled. Half-built. Brittle.

✓ AN APP, MANAGED BY REMY

UIReact + Tailwind✓

APIValidated routes✓

DBPostgres + auth✓

DEPLOYProduction-ready✓

Architected. End to end.

Built like a system. Not vibe-coded.

Remy manages the project — every layer architected, not stitched together at the last second.

A spec should cover error cases, but it doesn’t need to write the UI copy for every one. “If the form submission fails due to a duplicate email, show an inline error on the email field indicating that account already exists” is enough. You don’t need to write “That email is already registered. Please log in or use a different email address.” The compiler handles copy. You handle logic.

The distinction matters because it keeps the spec focused on behavior rather than presentation. Specs that drift into UI copy get long, hard to maintain, and often inconsistent with what actually gets built.

Scope: Knowing What to Put In and What to Leave Out

Over-scoped specs are almost as problematic as under-scoped ones. A spec that tries to describe everything produces an overwhelming build that’s hard to iterate on. A spec that’s too sparse produces ambiguous output that needs constant correction.

The right scope for a spec is: everything the compiler needs to make a correct decision, and nothing more.

What Belongs in a Spec

Data models: All entities, fields, types, and relationships
Business logic: Rules, calculations, triggers, side effects
User flows: The sequence of steps for core tasks
Permissions: Who can do what, under what conditions
Edge cases: What happens when things go wrong or behave unexpectedly
Integration points: External services, APIs, webhooks (with enough detail to implement them)

What Doesn’t Belong in a Spec

Visual design details: Color hex codes, exact font sizes, pixel-level spacing. Leave those to a design system or component library. If you have specific visual requirements, note the design token or component name, not the raw value.
Implementation preferences: “Use Redis for caching” or “Write this as a stored procedure.” The compiler chooses implementation. If you have a hard constraint, annotate it — but don’t spec into implementation unless it’s actually a requirement.
Copy and microcopy: Button labels, placeholder text, help tooltips. These come from the app’s voice and can be iterated separately.
Aspirational features: Don’t spec what you hope to add someday. Only spec what this version needs to build. The spec should describe a complete, shippable application — not a roadmap.

Scope and the Iteration Cycle

One of the advantages of spec-driven development is that iteration is clean. You update the spec, recompile, and the changes propagate. But this only works if the spec scope is disciplined.

If you’ve specced a complete v1, adding a feature means adding to the spec. That’s a small, legible change. If the spec was already sprawling and half-finished, adding to it produces chaotic output — because the compiler is trying to reconcile contradictory or incomplete instructions.

Good scope discipline now means faster, cleaner iteration later. The process of going from idea to deployed app is much more predictable when the spec has a clear boundary.

Common Spec Mistakes Technical Builders Make

Even experienced builders make these mistakes when writing their first few specs.

Treating the Spec Like a Product Requirements Document

PRDs are written for people. They’re meant to communicate intent, get alignment, and document decisions. Specs are written for compilers. They carry behavior.

Everyone else built a construction worker.
We built the contractor.

🦺

CODING AGENT

Types the code you tell it to.
One file at a time.

🧠

CONTRACTOR · REMY

Runs the entire build.
UI, API, database, deploy.

A PRD might say “Users should be able to manage their team easily.” A spec says: “Users with the admin role can: invite new members by email, remove existing members, change a member’s role between member and admin, and transfer ownership to another member. Admins cannot remove themselves from a team if they are the only admin.”

The difference is specificity. The PRD gives intent. The spec gives instruction.

Writing Features Instead of Behaviors

“The app will have a search feature” is a feature description. “Users can search by product name, SKU, or category. Search is case-insensitive and matches partial strings. Results are sorted by relevance, then by date added descending. A maximum of 50 results are shown per page.” is a behavior description.

Features tell you what exists. Behaviors tell you what it does. Specs need behaviors.

Leaving the Data Model Implicit

It’s tempting to describe what users see without defining the underlying data. “The dashboard shows recent activity” — but what is activity? What table is it stored in? What fields does it have? How is “recent” defined?

When the data model is implicit, the compiler infers it. Sometimes the inference is reasonable. Often it isn’t. Explicitly defining your entities, their fields, and their relationships is one of the highest-leverage things you can do in a spec.

Describing Only the Happy Path

Specs that only cover what happens when everything works create apps that fail badly when anything doesn’t. This is one of the most common reasons AI-generated apps fail in production — the spec was too optimistic about user behavior and system conditions.

Write the happy path first, then systematically write the sad paths. They’re usually shorter — “if X fails, show an error and allow retry” is a complete spec for many failure modes.

Conflating Scope With Ambition

A spec that describes three versions of a feature — “we could do it this way, or maybe this way, or eventually like this” — is not a spec. It’s a planning document. Make the decision, write it down, and move on. You can always update the spec when the decision changes.

How Remy Handles Spec Compilation

Remy is built specifically for this pattern. The spec is the source of truth — not the code. When you write a spec in Remy’s format, it compiles into a full-stack application: backend, database, frontend, auth, tests, deployment. The code is derived output. The spec is what you maintain.

This matters because it changes what “fixing the app” means. If the generated code doesn’t behave correctly, you don’t grep through TypeScript to find the bug. You look at the spec, find the gap or ambiguity, fix it, and recompile. The code updates to match.

As AI models improve, the compiled output gets better automatically — without you changing a line. Your spec stays the same. The compiler gets smarter. That’s a very different relationship with your codebase than traditional development. It’s the same shift described in the abstraction ladder: each new level of abstraction doesn’t eliminate what’s underneath, it just changes where you work.

Hermes, walked through line by line — free 1-hour workshop

Remy works best with specs that follow the principles in this article. Structured, annotated, edge-case-aware, scope-disciplined. When those elements are present, compilation is clean and the resulting app is close to production-ready. When the spec is vague, Remy fills the gaps the same way any compiler would — with reasonable defaults that may not match your intent.

The precision is on you. Remy handles the rest.

You can try it at mindstudio.ai/remy.

Practical Spec Writing: A Quick Checklist

Before you send a spec to a compiler, run through this:

Structure

Header establishes app category, user type, and key constraints
All sections are named in a flat index at the top
Each section covers: what it is, what data it uses, what actions are available, and what the rules are

Annotations

Every field has an explicit type
Status fields and categories use enums
Triggers, computed fields, and side effects are named explicitly
Every action has a clear permission model

Edge Cases

Empty states are handled for all list views
Duplicate submissions are addressed
Dependent data deletion is specified (what cascades, what blocks)
Session expiry and auth failure flows are described

Scope

No aspirational features that aren’t part of this version
No implementation preferences unless they’re hard requirements
No copy or microcopy (unless it’s core to a business rule)
All external integrations are named and their contracts described

If you can check all of these, the spec is ready to compile. If you can’t, the spec has gaps — and the compiler will fill them.

Frequently Asked Questions

How long should a good app spec be?

It depends on the application’s complexity, but a useful heuristic: a well-annotated spec for a moderately complex web app (auth, CRUD, a few user roles, some reporting) usually runs 500–1,500 words. A simpler app can be 300 words. A complex multi-tenant SaaS might run 3,000+. Length isn’t the goal — completeness is. If every section covers data, actions, rules, and edge cases, the length follows naturally.

Can I write a spec without knowing how to code?

Yes, but you need to know how the system should behave — which is different from knowing how it’s implemented. Defining that a field is a string, max 255 characters or that only admins can delete records doesn’t require coding knowledge. It requires product clarity. The domain expert building model is exactly this: people who know their domain deeply can write excellent specs because they know how the system should behave.

What’s the difference between a spec and a prompt?

A prompt is a one-shot instruction to an AI. A spec is a persistent document that describes an application completely enough to build and rebuild it. Prompts are consumed and discarded. Specs are maintained, versioned, and updated as the application evolves. When you write a software spec, you’re creating a living document — not a message to an AI.

How detailed do edge cases need to be?

Detailed enough that a reasonable engineer reading the spec would make the same implementation decision you’d make. “If the user tries to delete a product that exists in an active order, show an error and prevent deletion” is enough. You don’t need to specify the HTTP status code or the exact error message text. The goal is behavioral clarity, not implementation instruction.

Should I update the spec when I make changes to the app?

Yes — that’s the core discipline of spec-driven development. Why the source of truth in software development is changing is exactly this: the spec is the source, not the code. If you update the code without updating the spec, the spec becomes stale and loses its value as a compile target. Every change to the application should start with a change to the spec.

How do I spec something I’m not sure about yet?

Make a decision, note it explicitly, and mark it as provisional if you want to. “Assuming single-tenant for now; multi-tenant support is out of scope for v1” is a complete spec statement. What you want to avoid is leaving the question open — because the compiler will answer it, and you might not like the answer.

Key Takeaways

A good spec has two layers: readable prose that describes what the app does, and annotations that carry the precision (types, rules, edge cases, permissions).
Structure matters: start with a context header, a flat section index, then detailed sections that each cover data, actions, and rules.
Annotations are where specs earn their value — types, enums, triggers, computed fields, and permission models eliminate the ambiguity that causes bad compilation.
Edge cases aren’t optional. Specs that only describe the happy path produce apps that fail under real user behavior.
Scope discipline means speccing what this version needs to build — nothing more. Aspirational features, implementation preferences, and UI copy don’t belong in a spec.
The spec is the source of truth. Every change to the application should start with a change to the spec.

If you’re ready to put a well-structured spec to work, try Remy — it compiles annotated specs into full-stack applications, and the better your spec, the better the output.

What Makes a Good App Spec? A Framework for Technical Builders

The Difference Between a Spec That Compiles and One That Doesn’t