OpenAI's Symphony Spec: How Using Linear as an Agent Control Plane Drove a 500% PR Increase

OpenAI Published a Spec That Turned a Linear Board Into an Agent Control Plane — and PRs Went Up 500%

OpenAI’s internal teams didn’t just ship more code when they deployed autonomous coding agents. They restructured how work gets assigned, tracked, and closed — and the result, documented in what OpenAI calls the Symphony spec, was a 500% increase in landed pull requests. That number is specific enough to be worth taking seriously.

The Symphony spec isn’t a product you can download from the App Store. It’s an architectural pattern — a way of wiring autonomous coding agents to a Linear board so that the issue tracker becomes the control plane for the entire system. If you’re building agent infrastructure, or thinking about where autonomous coding fits into your engineering workflow, this is the most concrete public signal we have about what actually works at scale.

You’ve probably seen the demos. Agents that write code, open PRs, run tests. What you haven’t seen as often is the operational layer underneath — the thing that decides which agent picks up which task, how work gets queued, what “done” means, and how a human stays in the loop without becoming the bottleneck. Symphony is an answer to that question.

Why an Issue Tracker Makes a Better Control Plane Than a Chat Window

RWORK ORDER · NO. 0001ACCEPTED 09:42

YOU ASKED FOR

Sales CRM with pipeline view and email integration.

✓ DONE

REMY DELIVERED

Same day.

yourapp.msagent.ai

AGENTS ASSIGNEDDesign · Engineering · QA · Deploy

The instinct when building agent systems is to make the interface conversational. You talk to the agent, it does things, it reports back. This works for one-off tasks. It breaks down at scale.

The problem is state. A conversation has no durable state outside the context window. If you’re running multiple agents in parallel — which is the whole point of autonomous coding at scale — you need somewhere to track what’s in progress, what’s blocked, what’s done, and what’s waiting for review. A chat thread is not that place.

Linear, on the other hand, is already designed for exactly this. Issues have statuses. They have assignees. They have priorities, labels, and relationships to other issues. Every engineering team that uses Linear already has a mental model for how work flows through it. Symphony’s insight is that you don’t need to build a new orchestration layer — you can use the one your team already has.

The control plane pattern works like this: agents pull work from the Linear board the same way a human engineer would. An issue in “Todo” gets picked up, moved to “In Progress,” and the agent works on it. When it opens a PR, the issue moves to “In Review.” A human reviews the PR, merges it, and the issue closes. The agent never needs to be told what to do next — it reads the board.

This is a meaningful architectural decision. It means the system’s state is always visible to humans in a tool they already use. It means you can mix human and agent work on the same board without a separate interface. And it means the control plane is auditable — you can look at the history of any issue and see exactly what happened.

For teams thinking about multi-agent system architecture, this is the key distinction: Symphony doesn’t add a new layer of tooling on top of your workflow. It makes your existing workflow the orchestration layer.

What You Need Before You Can Run This

The Symphony spec assumes a few things are already in place. Getting them right matters more than the agent configuration itself.

A Linear workspace with a defined workflow. The spec uses Linear’s status system as the signal for agent state transitions. If your Linear board is a mess of custom statuses and ad-hoc labels, the agents will have trouble reading it reliably. Before you wire anything up, clean up your workflow states: Todo, In Progress, In Review, Done. That’s the minimum. You can add more, but every additional status is a decision the agent has to make.

A coding agent with GitHub integration. Symphony is built around agents that can read a codebase, write code, run tests, and open pull requests. OpenAI’s Codex is the obvious choice given this is an OpenAI spec, but the pattern is model-agnostic. What matters is that your agent has authenticated access to your GitHub repo and can open PRs programmatically. The Cursor SDK, which handles harness, sandboxing, and GitHub integration out of the box, is another viable foundation here — more on why that matters in a moment.

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

Clear issue hygiene. This is the one that teams underestimate. Agents are good at executing well-specified tasks. They are bad at figuring out what a vague issue means. If your Linear issues say things like “fix the thing on the dashboard” or “look into the performance problem,” your agents will either fail silently or produce work that doesn’t match intent. Symphony works best when issues are written with enough specificity that a junior engineer could pick one up without asking questions.

A review process that doesn’t require the agent to be perfect. The 500% PR increase isn’t because agents write perfect code. It’s because the system generates more attempts, and humans review and merge the ones that are good. If your review process assumes every PR is production-ready, you’ll spend more time reviewing than you save on writing. If your review process is lightweight for small, well-scoped changes, the math works in your favor.

Setting Up the Control Plane

Step 1: Define your agent’s Linear identity.

Create a dedicated Linear user or bot account for your agent. This matters for two reasons. First, it makes the board readable — you can see at a glance which issues are being worked by humans and which by agents. Second, it gives you a clean audit trail. When something goes wrong (and it will), you want to be able to filter the history by agent activity.

Assign this account the same permissions as a regular team member. The agent needs to be able to move issues between statuses, add comments, and link to PRs. Now you have a named agent identity on your board.

Step 2: Write a polling loop that reads from Linear.

The agent needs to check the board periodically and pick up work. The simplest version of this is a cron job that runs every few minutes, queries Linear’s API for issues assigned to the agent account with status “Todo,” and kicks off a coding session for each one.

Linear’s API is well-documented and straightforward. The query you want is something like: issues where assignee is [agent account] and status is “Todo,” ordered by priority. Pull the top N issues — start with one or two until you trust the system — and hand them off to your coding agent.

Now you have a queue that the agent reads from automatically.

Step 3: Wire the agent to GitHub.

When the agent finishes working on an issue, it should open a PR and link it back to the Linear issue. Most coding agents can do this natively. The important thing is that the PR description includes the Linear issue ID — this is what lets you close the loop automatically when the PR merges.

Set up a GitHub webhook or action that moves the Linear issue to “Done” when the linked PR is merged. This is the step that makes the system feel autonomous rather than just automated. The board updates itself.

Now you have a closed loop: issue assigned → agent picks it up → PR opened → human reviews → PR merged → issue closes.

Step 4: Add a human checkpoint before “In Review.”

TIME SPENT BUILDING REAL SOFTWARE

95%

5% Typing the code

95% Knowing what to build · Coordinating agents · Debugging + integrating · Shipping to production

Coding agents automate the 5%. Remy runs the 95%.

The bottleneck was never typing the code. It was knowing what to build.

One failure mode in autonomous coding systems is agents that open PRs for work that’s clearly wrong, wasting reviewer time. Add a lightweight checkpoint: before the agent moves an issue to “In Review,” have it post a comment summarizing what it did and why. This takes ten seconds to read and catches the obvious failures before they become PR noise.

This is also where you can add automated test runs. If your CI pipeline fails, the issue should move back to “In Progress” rather than “In Review.” The agent should be able to read the test output and attempt a fix before escalating to a human.

Now you have a system with a built-in quality gate.

Step 5: Tune issue quality before scaling.

Before you add more agents or increase the polling frequency, spend a week reviewing the issues that agents struggled with. You’ll find patterns: issues that were too vague, issues that required context that wasn’t in the ticket, issues that touched parts of the codebase the agent didn’t have enough context on.

Fix those issues at the source — either by improving how your team writes tickets, or by giving the agent better codebase context through documentation or a project-level system prompt.

Now you have a feedback loop that makes the system better over time.

Where This Breaks (and How to Know Before It Does)

The most common failure mode is issue quality degradation over time. Teams start with well-written issues, see good results, and then start writing worse issues because “the agent will figure it out.” It won’t. The 500% PR increase is contingent on the input quality staying high.

The second failure mode is context drift. Coding agents work best when they have a clear picture of the codebase conventions, the testing requirements, and the deployment constraints. If your codebase is evolving quickly and the agent’s context isn’t keeping up, you’ll see PRs that are technically correct but architecturally wrong — they solve the issue but introduce new problems.

The third failure mode is review bottleneck. If you scale the agent to open 20 PRs a day but your team can only review 5, you haven’t solved the throughput problem — you’ve moved it. Symphony works because it generates more attempts, not because it eliminates human judgment. Size your agent throughput to match your review capacity, not the other way around.

For teams building more complex orchestration — multiple agents, parallel workstreams, dependencies between issues — the Claude Code workflow patterns for agentic systems are worth studying alongside Symphony. The patterns are different but complementary.

One thing worth flagging: the harness matters as much as the model. The Endor Labs benchmark found that GPT-5.5 running in Cursor’s harness scored 87.2% on functionality — versus 61.5% in its native Codex harness. That’s a 26-point swing from the runtime environment, not the model weights. If you’re choosing a coding agent for Symphony, don’t just evaluate the model. Evaluate the harness. Platforms like MindStudio take a similar orchestration-first view — 200+ models, 1,000+ integrations, and a visual builder for chaining agents — which is useful when you’re deciding how much of the control plane to build yourself versus compose from existing infrastructure.

The Broader Pattern: Spec-Driven Work as the New Interface

One coffee. One working app.

You bring the idea. Remy manages the project.

WHILE YOU WERE AWAY

✓Designed the data model

✓Picked an auth scheme — sessions + RBAC

✓Wired up Stripe checkout

✓Deployed to production

Live at yourapp.msagent.ai

Symphony is interesting not just as a workflow hack but as a signal about where agent interfaces are heading.

The dominant interface for AI today is conversational. You type, the model responds. This is fine for exploration and one-off tasks. It’s a poor fit for ongoing, parallel, auditable work — which is what most of engineering actually is.

Issue trackers are already the interface for engineering work. They encode priority, context, dependencies, and status in a structured format that both humans and, now, agents can read. Symphony’s bet is that the right move isn’t to build a new agent-native interface — it’s to make existing engineering infrastructure agent-readable.

This is a different bet than, say, building a chat-first agent that you describe tasks to in natural language. It’s more constrained, but that constraint is the point. The structure of a Linear issue is a form of specification. When you write a good issue, you’re writing a spec that both a human engineer and an agent can execute against.

This connects to a broader shift in how software gets built. Tools like Remy take the spec-as-source-of-truth idea further: you write an annotated markdown spec, and a complete TypeScript full-stack application — backend, database, auth, deployment — gets compiled from it. The code is derived output; the spec is what you maintain. Symphony applies a version of this logic to task management: the issue is the spec, and the agent’s job is to compile it into a PR.

The teams that will get the most out of Symphony aren’t the ones with the best coding agents. They’re the ones with the best issue hygiene — the discipline to write tickets that are specific enough to be executable. That’s a human skill, and it turns out to be the binding constraint.

For teams already running autonomous agents and looking to extend the pattern beyond coding, the Claude Code Dispatch approach for remote agent control is a useful adjacent read — it applies similar thinking to mobile-first agent orchestration.

The 500% PR increase is real, but it’s not magic. It’s what happens when you give agents a clear queue, a clean feedback loop, and humans who stay in the review seat rather than the task-assignment seat. Symphony is a spec for that system. The Linear board is just where the state lives.

OpenAI's Symphony Spec: How Using Linear as an Agent Control Plane Drove a 500% PR Increase

OpenAI Published a Spec That Turned a Linear Board Into an Agent Control Plane — and PRs Went Up 500%

Why an Issue Tracker Makes a Better Control Plane Than a Chat Window

What You Need Before You Can Run This

Other agents ship a demo. Remy ships an app.

Setting Up the Control Plane

Coding agents automate the 5%. Remy runs the 95%.

Where This Breaks (and How to Know Before It Does)

The Broader Pattern: Spec-Driven Work as the New Interface

One coffee. One working app.

Related Articles

500% More Merged PRs: 4 Lessons from OpenAI's Symphony Agentic Coding Experiment

5 Autonomous Tasks the Hermes Agent Handles Better Than OpenClaw — With Real Output Examples

How to Chain Claude Code Skills into Scheduled Autonomous Pipelines: A Step-by-Step Guide

How to Run the Hermes Agent for $0.24/Hour: Single-Command Setup on a CPU Cloud Instance