Why a 1998 Mozilla Bug Tracker Accidentally Designed the Perfect AI Agent Substrate

Terry Weissman Accidentally Built the Perfect AI Agent Runtime in 1998

Terry Weissman wrote Bugzilla for Mozilla in 1998 to solve one narrow problem: when a lot of people are building software asynchronously, how do you make sure bugs don’t disappear into the ether? He wasn’t thinking about AI. There was no AI in any relevant sense. He was thinking about humans forgetting things across time zones and release trains.

What he produced — almost by accident — were four primitives that define what a good AI agent substrate looks like today. Durable state outside any one person’s head. A state machine with legal transitions. An explicit ownership field. And a queryable audit history. Those Bugzilla 1998 primitives are still the load-bearing structure underneath Jira, Linear, and every serious issue tracker built since. And they turn out to be exactly what agents need.

That’s the thing you have to sit with for a moment. We didn’t design these systems for agents. We designed them to compensate for human weaknesses — forgetting, miscommunication, dropped handoffs, unclear ownership. And it turns out agent weaknesses rhyme almost perfectly with human weaknesses.

The Problem Agents Have That Nobody Talks About Enough

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

The hard part of building a working agentic system is not making the model smarter. You can throw a better model at almost any problem and get marginal gains. The hard part is giving the agent a place to find work, understand who owns it, know what state it’s in, see what changed, and hand the result back.

Context windows are not a solution to this. A context window is ephemeral. It drifts. It gets truncated. It resets between runs. If your work spans multiple agents, multiple days, or multiple systems, the state cannot live inside the model. It has to live somewhere external, durable, and queryable.

This is the thing that separates a demo from a production agentic system. The demo works because the agent holds everything in one context window for one session. The production system breaks because the state evaporates the moment the session ends.

What you need is something that looks suspiciously like an issue tracker.

An agent needs to read the current state of a task at the start of a run. It needs to write back what happened at the end. The next agent — or the next run of the same agent — needs to pick up exactly where things left off without relying on the previous conversation. That’s not a novel requirement. That’s what Bugzilla was solving in 1998 for humans working across time zones.

The constraints are nearly identical. Humans forget context; agents lose context. Humans need handoffs; agents need handoffs. Humans need accountability; agents need observability. Humans need permissions; agents need scoped access, arguably more strictly. The system we built to compensate for human coordination failures compensates for agent coordination failures almost one-for-one.

How the Evidence Assembled

The clearest recent proof is OpenAI’s Symphony spec. Symphony is an open-source Codex orchestration spec whose central design decision is to use a project management board — specifically Linear — as the control plane for autonomous coding agents. It defines polling intervals, per-issue workspaces, active and terminal states, retry logic, observability hooks, concurrency limits, and handoff states including human review. The issue tracker in Symphony is not a UI for humans to look at. It’s the data layer agents operate against.

OpenAI’s internal teams reported a 500% increase in landed pull requests when using the Symphony model. That number is striking, but the mechanism is more interesting than the headline. Symphony works because it gives agents a structured substrate to operate through rather than asking them to invent coordination from scratch.

The timing here is almost too perfect. On March 24, 2026, Linear’s CEO Kari Saarinen published an essay called “Issue Tracking is Dead.” The argument was reasonable: issue trackers were built for a handoff model where humans manually translate messy reality into tickets, and agents can read more of the underlying context directly, so the human translation ceremony should shrink. He’s right about that part. Weeks later, OpenAI published Symphony, which uses Linear as its central control plane for autonomous agents. The issue tracker didn’t die. It got promoted from human UI to agent infrastructure.

These aren’t contradictory positions if you separate the interface from the substrate. The human ceremony around tickets — the manual grooming, the translation, the status theater — that’s under pressure. The underlying primitives Weissman established in 1998 are becoming more valuable, not less.

How Remy works. You talk. Remy ships.

YOU14:02

Build me a sales CRM with a pipeline view and email integration.

REMY14:03 → 14:11

Scoping the project

Wiring up auth, database, API

Building pipeline UI + email integration

Running QA tests

✓ Live at yourapp.msagent.ai

Cursor’s work with long-running agents adds another data point. Running hundreds of agents on large coding projects, they found that flat agent organizations develop coordination problems at scale. Agents hold locks too long. They become risk-averse and pick easy tasks instead of hard end-to-end work. They wait on each other without clear resolution paths. These are exactly the problems issue trackers were built to solve for humans. The unit of work, claiming, status, blockers, priority, visibility — all of that already exists in a well-run tracker. The agent system doesn’t have to invent a coordination layer from scratch. If you want to see this pattern in practice, running an AI engineering team with heartbeats in Paperclip demonstrates exactly how claiming, status updates, and handoffs map onto issue-tracker primitives at the agent level.

Atlassian’s trajectory tells the same story from the enterprise side. In May 2025, Atlassian introduced its remote MCP server in beta, with Claude as the first official partner and Cloudflare infrastructure underneath. By February 2026, the Rovo MCP server was generally available — supporting search and summarization across Jira, Confluence, and Compass, creating and updating issues, using OAuth, and respecting existing permission models. This is Atlassian making Jira and Confluence agent-readable and agent-writable through a controlled interface. The same pattern Symphony assumes with Linear, just at enterprise scale with 26 years of accumulated work state underneath it.

The rumored Anthropic acquisition of Atlassian — no formal announcement, no SEC filing, treat it as speculation — is interesting precisely because the logic is now obvious enough to take seriously. A few years ago, an AI lab buying an issue tracker company would have sounded bizarre. Now the reasoning is clear: Jira is a map of how work happens inside enterprises. It knows the projects, the dependencies, the owners, the history, the approvals. That’s exactly the context agents need to do real work inside a business.

Why Linear’s UX Win Became an Agent Data Win

There’s a subtler point here that’s easy to miss. Linear didn’t invent a new substrate. The basic data model — issue, state, assignee, priority, dependency, history — is the same one Bugzilla established. Linear’s innovation was making that model pleasant enough that people use it voluntarily and consistently.

That matters for agents in a way that’s counterintuitive. Good UX produces cleaner data. When people hate a tool, they work around it. They leave fields blank. They put important decisions in Slack. They use fake statuses. They create tickets after the work is done. The tracker becomes a compliance artifact rather than a reflection of reality.

When people like the tool, more of the real work ends up in the system. State is cleaner. Descriptions are better. Ownership is current. Dependencies are less made up. The audit history is actually useful. An agent operating against a well-maintained Linear board is operating against real work state. An agent operating against a neglected Jira instance is operating against organizational theater.

This is a genuine argument for caring about human UX in 2026, even as agents take on more work. The best agent substrate may not be the tool with the most AI features. It may be the tool your team has been using cleanly for years because they actually like it. The data quality compounds over time.

For teams building multi-agent systems today, MindStudio handles the orchestration layer — 200+ models, 1,000+ integrations, a visual builder for chaining agents and workflows — but the substrate those agents operate against still has to be clean. The orchestration is only as good as the state it reads.

The Five-Question Diagnostic

Remy doesn't build the plumbing. It inherits it.

Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.

WHAT REMY DOESN'T HAVE TO BUILD

200+

AI MODELS

GPT · Claude · Gemini · Llama

✓

1,000+

INTEGRATIONS

Slack · Stripe · Notion · HubSpot

✓

MANAGED DB

AUTH

PAYMENTS

CRONS

Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.

Once you see why issue trackers work for agents, you can apply the same lens to every tool in your stack. The question is not “does this product have an AI chatbot.” The question is “can an agent safely understand and change the state of work inside this product.”

Five questions determine the answer.

Does the tool have records or just content? Records are discrete, addressable, persistent. Content is a pile of text. A Jira issue is a record. A Slack thread is content.

Does it have a state machine or just labels? A state machine has legal transitions. Labels are informal tags. “In Progress → In Review → Done” is a state machine. A tag called “in-progress” is a label.

Is ownership an explicit field or something people infer from conversation? The assignee field in Bugzilla was explicit. Ownership in email is inferred from who sent what to whom.

Are the verbs structural or conversational? Structural verbs: create, assign, resolve, reopen, block, approve. Conversational verbs: reply, forward, archive. Structural verbs map to agent actions. Conversational verbs require interpretation.

Is the history queryable or just visible? Queryable means you can ask “what changed, when, and from what to what.” Visible means you can scroll through it. Agents need queryable.

Email fails most of these. The verbs are too weak. State is implied, not encoded. Slack is similar — enormous amounts of context, but the structure is transcript structure. The state of work is usually inferred from conversation rather than encoded in fields. Agents can read Slack and extract useful signal, but Slack is not a clean control plane.

CRMs pass the test. Salesforce and HubSpot are issue trackers for revenue. A deal moves through stages with owners and history and permissions. That’s agent substrate. Service desks pass too — Zendesk, ServiceNow, Jira Service Management are issue trackers for customer problems, with tickets and SLAs and escalation paths already encoded. ERPs are the most extreme version: SAP, Oracle, Workday encode how money and people and inventory move through the business. Boring, yes. Agent-readable, absolutely.

Spreadsheets are the interesting edge case. They can be incredibly structured if the human designed them well, and completely opaque if they’re a personal scratch pad. The schema is user-defined and often implicit. An agent has to infer the schema before it can act, which is expensive and fragile. Spreadsheets are not the easy case.

The pattern generalizes: if a system was built to coordinate people asynchronously around important work, it probably has the bones of an agent substrate. The 30-year accumulation of human coordination infrastructure is not going to disappear because agents arrived. It’s going to become the surface agents consume.

This is relevant for anyone building production agent systems today. If you’re building something like a multi-agent engineering team with Claude Code and Paperclip, the coordination layer you choose matters as much as the models you use. The agents need somewhere to find work, claim it, update state, and hand off results. You can invent that layer from scratch, or you can use the one your company already trusts.

What This Means for Builders

TIME SPENT BUILDING REAL SOFTWARE

95%

5% Typing the code

95% Knowing what to build · Coordinating agents · Debugging + integrating · Shipping to production

Coding agents automate the 5%. Remy runs the 95%.

The bottleneck was never typing the code. It was knowing what to build.

Your data model is a strategic surface. If you’re building a product that agents will eventually use, the right move is not to bolt a chat interface onto the UI. Start by making the underlying state clean. Expose your records. Define your verbs. Make ownership explicit. Preserve history. Build permissions into the model. Make the important actions available through a real API or MCP server.

If your product is opaque, agents will scrape the UI or guess at intent. That’s fragile. If your product exposes clean state and clean verbs, agents can operate through it reliably. That’s the difference between “we added AI for the demo” and “we became part of the agent stack.”

For teams, the work tracking choice is now the agent infrastructure choice. The Jira versus Linear decision used to be about UX and workflow fit. Now there’s another question: which substrate do you want your agents to run on? If your work data is clean, your agents get a head start. If your work state is spread across Slack threads and half-filled tickets and mystery spreadsheets, agents will struggle in exactly the places you want them to help.

Messy operations used to be a human tax. People could compensate with meetings and memory and relationships. Agents are worse at those compensations. Agents need the business to be legible. Cleaning up workflows, consolidating systems, enforcing fields, keeping ownership current — that’s not just good hygiene. That’s AI readiness.

For anyone thinking about the spec-driven approach to building software, Remy takes a similar philosophy to the substrate question: you write your application as an annotated markdown spec, and it compiles into a complete TypeScript backend, database, frontend, auth, and deployment. The spec is the source of truth; the generated code is derived output. The underlying principle is the same — clean, explicit state and structure compound in value as automation increases.

The strategic lesson from issue trackers is that owning the substrate is better than sitting on top of someone else’s substrate. Atlassian, Salesforce, ServiceNow, Microsoft — these companies own systems of record. They may not have the most impressive demos. But they own the map agents are going to navigate. Wrappers can be valuable businesses, but they borrow their foundation from the incumbent systems underneath.

The three-layer memory architecture that Claude Code uses — with persistent memory files as a pointer index — is solving the same problem from the model side: how do you maintain durable state across runs when the context window resets? The issue tracker solves it from the infrastructure side. Both are necessary. Neither is sufficient alone. The hidden features exposed in the Claude Code source leak reinforce the same point: the most durable capabilities are the ones built around persistent, structured state rather than ephemeral context.

The Accident That Keeps Paying Off

Weissman wasn’t trying to build agent infrastructure. He was trying to make sure Mozilla bugs didn’t get lost. The primitives he established — durable state, state machine, ownership, audit history — were the minimum viable solution to a human coordination problem.

Those same primitives are now the minimum viable solution to an agent coordination problem. Not because we planned it that way. Because the constraints rhyme. Humans forget; agents lose context. Humans need handoffs; agents need handoffs. Humans need accountability; agents need observability.

Not a coding agent. A product manager.

Remy doesn't type the next file. Remy runs the project — manages the agents, coordinates the layers, ships the app.

BY MINDSTUDIO

The boring tools win. Not because they’re exciting, not because they were built for AI, but because they encoded coordination. And coordination turns out to be the hard part whether you’re building for humans or for agents.

The next time you look at a tool that feels like process overhead, run the five questions. Records or content? State machine or labels? Explicit ownership? Structural verbs? Queryable history? If it scores well, that tool is probably more important than it looks. If it scores poorly, someone is going to build the real substrate around it — and the difference between owning that substrate and sitting on top of it is going to matter a lot.

Weissman shipped Bugzilla in April 1998. He solved the problem he had. Twenty-seven years later, the solution still fits.