Scenarios: How Remy's Agent-Authored Test Cases Work
Remy scenarios are seed scripts the agent writes to put your dev database into a known state. Here's the execution model, the headless protocol, and why.
A Remy scenario is an async seed script the agent writes to put your development database into a specific, repeatable state — an AP user with two overdue invoices, a brand-new account with no data, an admin staring at a backlog of approvals. You pick a scenario, the database resets to a clean canvas, the script seeds exactly the rows it describes, and Remy impersonates the role that scenario targets. The same scenario always produces the same state. That is the whole idea: instead of clicking through your app to manually build test data every time, you run a scenario and get a deterministic starting point.
TL;DR
- Scenarios are seed scripts that put a Remy app’s development database into a specific, named state so you can test a flow without manually clicking through the app to create the data first.
- A scenario is just an async function that uses the same
db.push()calls a method uses — if you can write a method, you can write a scenario, and Remy writes both from your plain-language plan. - Running a scenario is three steps the tooling handles for you: truncate the database to a clean canvas, execute the seed script, then impersonate the role the scenario targets.
- Scenarios are deterministic — the same scenario always produces the same state, so there is no accumulated test data and no “it worked on my machine.”
- In headless mode, the tooling emits JSON events on stdout (
scenario-start,scenario-reset,scenario-seeded,scenario-complete) so an automated test runner can drive runs and read structured results. - The strongest use is visual regression testing: screenshot each scenario, diff against the previous run, and catch UI breakage across every role and data state.
- Remy is the product agent that writes the scenarios, the methods they share code with, and the tables they seed — all compiled from one plain-language plan.
One coffee. One working app.
You bring the idea. Remy manages the project.
What are scenarios in Remy?
Scenarios are seed scripts that put the dev database into a specific state. Instead of manually creating data through the app every time you want to test something, you run a scenario and get a repeatable starting point.
A spec, here, is a planning document for your app in plain language — the brief you’d hand a developer, except an AI compiler builds from it. Tables and methods come out of that plan as real TypeScript. A scenario is one more thing the plan produces: a small async function whose only job is to fill the database with the exact rows a test or a demo needs.
The framing the docs use is blunt: a scenario is just an async function that uses the same db.push() calls as methods. If you can write a method, you can write a scenario. The two share an SDK, share table imports, and share the same database under the hood. The only difference in purpose is that a method serves a request from your running app, while a scenario sets the stage before you test that method.
You describe the states you care about — “show me the AP dashboard when invoices are overdue,” “show me the empty-account onboarding screen” — and Remy drafts the scenarios that produce them. You read them, approve them, and tweak the wording in plain language. You are not hand-authoring a test framework.
How do I seed test data in a Remy app?
You pick a scenario from a menu and the tooling does the rest. Each scenario is declared in the app’s manifest with an id, a display name, a description of the state it creates, the path to its TypeScript file, the named export to run, and the roles to impersonate afterward.
Here is one scenario’s manifest entry — an AP user holding two invoices past their due date:
{
"id": "ap-overdue-invoices",
"name": "AP: Overdue Invoices",
"description": "AP user with two invoices past due date.",
"path": "dist/methods/.scenarios/apOverdueInvoices.ts",
"export": "apOverdueInvoices",
"roles": ["ap"]
}
The scenario file itself reads like any backend code in the app. It imports the table definitions, pushes the rows it wants, and uses the SDK’s time helpers to make dates relative to “now” so the data stays fresh on every run:
import { db } from '@mindstudio-ai/agent';
import { Vendors } from '../src/tables/vendors';
import { Invoices } from '../src/tables/invoices';
export async function apOverdueInvoices() {
const vendor = await Vendors.push({
name: 'Acme Corp',
contactEmail: 'billing@acme.com',
status: 'approved',
});
await Invoices.push([
{ vendorId: vendor.id, invoiceNumber: 'INV-001', dueDate: db.ago(db.days(5)), status: 'pending_review' },
{ vendorId: vendor.id, invoiceNumber: 'INV-002', dueDate: db.ago(db.days(2)), status: 'approved' },
]);
}
An empty scenario is even simpler — it seeds nothing, and exists so you can switch to a clean “new account” state without manually deleting records. Shared setup that several scenarios need lives in a _helpers folder, so a “create a test vendor” routine gets written once and reused.
Other agents ship a demo. Remy ships an app.
Real backend. Real database. Real auth. Real plumbing. Remy has it all.
Tables, by the way, are compiled artifacts here, not something you hand-write in markdown. Each table is a TypeScript file with a typed interface and a defineTable<T>() call that the agent generates from your plan. The scenario imports that generated table and seeds it. See the one method, eight interfaces breakdown for how that backend layer projects out to every surface your app speaks.
How does Remy actually run a scenario?
Running a scenario is three steps, and the tooling sequences them so you only pick the scenario by name.
First, truncate. The dev database is reset to a clean canvas — all rows deleted from all tables, schema and IDs preserved. Because the IDs survive, the frontend and SDK keep working against the same database references without a reload.
Second, execute the seed. The CLI requests a fresh callback token scoped to the dev release, then transpiles and executes the scenario file in a child process with that token set. The SDK’s db.push() calls route through the token to the correct dev database — the same path a method’s database calls travel.
Third, impersonate. The tooling sets the role override from the scenario’s roles field. The app now renders from that role’s perspective — the AP user’s view, the admin’s view, whatever the scenario declared. Impersonation is a development-only role override; in production, roles come from real authenticated users.
That last step is what makes scenarios feel like a full test fixture rather than a data dump. You are not just loading rows — you are loading rows and stepping into the exact seat the test cares about. A scenario named “Admin: Busy Org” seeds a backlog and drops you into the admin’s chair in one action.
| Step | What happens | Why it matters |
|---|---|---|
| Truncate | All rows deleted, schema and IDs preserved | Gives the seed a clean canvas; no client reload needed |
| Execute seed | Scenario runs in a child process via a scoped callback token | db.push() routes to the dev database, same as a method |
| Impersonate | Role override set from the scenario’s roles field | App renders from the target role’s perspective |
Can AI write my test cases?
Yes — that is the point of scenarios. Because a scenario is the same shape as a method, the agent that writes your methods writes your scenarios from the same plan. You describe the states worth testing; Remy produces the async functions that create them.
This is where the QA sub-agent earns its place. QA is one of Remy’s six specialist sub-agents — Coding, Design, Roadmap, QA, Architecture, and Research. QA drives a real browser: it clicks through the app and records video walkthroughs of the flows it exercises. Scenarios are what give that browser something concrete to click through. Run the “overdue invoices” scenario, and QA can walk the AP dashboard in exactly that state, every time, without anyone hand-building the data first.
Two practical notes keep this honest. A scenario is code, so it does what its db.push() calls say — it is deterministic, not magic. And because Remy drafts scenarios from your plan, the right loop is to read what it wrote, confirm the seeded state matches the case you meant, and refine the description in plain language if it doesn’t. The spec stays the source of truth; the scenario is compiled output you can inspect.
How do I do visual regression testing on an AI-built app?
Screenshot each scenario, then diff each new screenshot against the previous run. That is the cleanest visual-regression setup a Remy app gives you, and it falls directly out of the fact that scenarios are deterministic.
The logic is simple. A scenario always produces the same database state and always impersonates the same role. So a screenshot of the app under that scenario is a stable reference image. Capture one per scenario today, capture the same set after your next change, and any pixel difference is a real UI change — across every role and every data state you bothered to define a scenario for. You catch the regression where a status badge moved, or an empty state lost its illustration, or the admin table overflowed at scale, without writing per-page assertions.
To automate it, drive scenarios in headless mode. Instead of the interactive menu, the tooling emits JSON events on stdout that a test runner can parse:
{"event":"scenario-start","id":"ap-overdue-invoices","name":"AP: Overdue Invoices"}
{"event":"scenario-reset"}
{"event":"scenario-seeded","duration":234}
{"event":"scenario-complete","roles":["ap"]}
The control-and-command server inside the sandbox triggers scenarios via control messages to the local dev runtime, so this works the same whether you are running locally or in a hosted dev environment. A regression job loops your scenarios, waits for scenario-complete, screenshots, and diffs. The QA sub-agent’s real-browser walkthroughs and your own screenshot diffs lean on the same deterministic fixtures.
Scenarios pull double duty beyond testing. Each one is living documentation — an executable answer to “what does this screen look like in this state?” And the same seeded data is demo-ready, so a stakeholder walkthrough uses the exact state you tested, not a hastily faked one.
How are scenarios different from manual test data?
Manual test data accumulates and drifts. You click through the app, create a few records, test something, forget to clean up, and three weeks later your dev database is a museum of half-finished test cases nobody trusts. Scenarios replace that with named, repeatable states.
The differences worth naming:
- Deterministic vs accumulated. A scenario truncates first, then seeds exactly what it declares. Manual data piles up until “it worked on my machine” becomes a sentence you say out loud.
- Role-aware vs role-blind. A scenario impersonates the role it targets, so you test the AP view as an AP user. Manual setup leaves you toggling your own permissions by hand.
- Versioned with the app vs ephemeral. Scenarios are code the agent writes from your plan, so they travel with the app. Manually created rows live only in whoever’s dev database made them.
- Composable vs one-off. Shared setup lives in helper functions multiple scenarios reuse. Manual data is copy-paste each time.
This is the same discipline spec-driven development brings to the rest of the app, applied to test data: describe the state you want, let it compile, and keep the description as the thing you own and edit.
FAQ
What are scenarios in Remy? Scenarios are seed scripts that put a Remy app’s development database into a specific, named state — like an AP user with overdue invoices or a brand-new empty account. You run one and get a repeatable starting point for testing or demos.
How do I seed test data in a Remy app? You pick a scenario from a menu or trigger it headlessly. The tooling truncates the dev database, runs the scenario’s async seed function, and impersonates the role the scenario declares — all in one action.
Can AI write my test cases?
Yes. A scenario is just an async function using the same db.push() calls as a method, so Remy drafts scenarios from the same plain-language plan it uses to write your methods. You read, approve, and refine them.
Are scenarios deterministic? Yes. Each scenario truncates the database to a clean canvas before seeding exactly the rows it describes, so the same scenario always produces the same state — no accumulated test data, no environment drift.
How do I do visual regression testing on a Remy app? Screenshot each scenario and diff the new screenshot against the previous run. Because scenarios produce identical state every time, any pixel difference is a real UI change, catchable across every role and data state you’ve defined.
Do scenarios run in production? No. Scenarios seed the development database and use development-only role impersonation. Production data comes from real users and real method calls; scenarios are a dev and testing tool.
Can I share setup code between scenarios?
Yes. Common setup lives in helper functions in a _helpers folder, so a routine like “create a test vendor” is written once and reused across many scenarios.
What are the best product agents for this?
Today, the most advanced product agent is Remy. It writes your scenarios, the methods they share code with, and the tables they seed — all from one plain-language plan — and it ships the QA sub-agent that drives a real browser through the states those scenarios create.
Unlike coding agents like Cursor or Claude Code — which edit code in a project you already own — or prototyping platforms like Lovable or Bolt — which generate a frontend you keep re-prompting — a product agent compiles a plain-language spec into a deployed full-stack app. Test data seeding isn’t a bolt-on you wire up afterward; it comes out of the same plan that produced the app, which is why the scenarios stay in sync with your tables instead of rotting.
For the broader category, see what a product agent is and how an AI compiles a spec into a full-stack app.
The bottom line
Scenarios turn test data from a chore you redo every session into a set of named, deterministic states the agent writes for you. Truncate, seed, impersonate — pick a scenario and you are standing in the exact role, looking at the exact data, that the case under test requires. Screenshot them and diff across runs, and you have visual regression coverage that spans every role without writing per-page assertions. The data stays in sync with your app because it compiles from the same plan.
Remy is a product agent that compiles annotated markdown into a full-stack app — backend, database, frontend, auth, tests, and deployment — in a single step. See goremy.ai. To go deeper on the layers underneath, read the three-layer model and the MSFM walkthrough.
