The 5-Question Test: Is Your Enterprise Software Ready to Be Agent Infrastructure?

Why Most Enterprise Software Fails as Agent Infrastructure

Enterprise AI is failing in a very specific way right now, and it’s not the AI’s fault.

Companies are deploying multi-agent systems against their existing software stack, and the agents keep breaking — not because the models are bad, but because the software underneath wasn’t built to be operated by anything other than a human clicking buttons. The tools work fine for people. They fall apart under autonomous control.

The question isn’t whether your CRM, ticketing system, or ERP is “AI-compatible” in some marketing sense. The question is whether it can function as agent infrastructure — a substrate that autonomous software can read from, write to, reason about, and act on without constant human supervision.

This article gives you a five-question test to find out. Run every tool in your stack through it. The answers will tell you which systems can support multi-agent automation today and which ones will become expensive bottlenecks the moment you try.

What “Agent Infrastructure” Actually Means

An AI agent isn’t a user. It doesn’t browse, doesn’t interpret visual layouts, and doesn’t know what a button looks like. It reads structured data, calls functions, checks state, and makes decisions based on what it receives back.

When a human uses Salesforce, they scan a page, notice the deal stage, read the notes, and act. When an agent tries to do the same thing, it needs:

A structured record it can query programmatically
A clear state model it can check and update
Defined actions it can invoke without ambiguity
Ownership logic it can respect without asking a human
A history it can trace to understand what happened before

REMY IS NOT

✕a coding agent
✕no-code
✕vibe coding
✕a faster Cursor

IT IS

✓a general contractor for software

The one that tells the coding agents what to build.

Software built for humans often satisfies some of these, accidentally. Software built for agents needs to satisfy all five, deliberately.

The difference between the two is what separates an enterprise tool that accelerates your AI program from one that quietly kills it.

The 5-Question Test

Question 1: Does It Have Records?

A record is a discrete, addressable unit of data. A deal in a CRM is a record. A ticket in a helpdesk is a record. An invoice in an ERP is a record.

The test isn’t just whether records exist — it’s whether they’re:

Uniquely identifiable — every record has a stable ID that doesn’t change
Structurally consistent — the same fields exist in the same format across all instances
Independently accessible — you can fetch a single record without loading the whole system
Writable in isolation — you can update one field on one record without touching others

A lot of enterprise software has records in the visual sense but not the operational sense. Wiki tools, for example, often store everything as nested pages with no stable identifier. Communication platforms store threads that are hard to reference without human context. Document systems store files that look like records but are actually unstructured blobs.

If an agent can’t reliably fetch record X, update field Y on it, and confirm the change — that system isn’t agent-ready, regardless of what the vendor claims.

What to look for: A documented API with per-record endpoints, consistent field schemas, and stable unique identifiers. If the API docs describe how to “search for” something but not how to “get” it by ID, that’s a warning sign.

Question 2: Does It Have a State Machine?

Enterprise processes have states. A deal is Prospecting → Qualified → Proposal → Negotiation → Closed. A ticket is Open → In Progress → Pending → Resolved → Closed. An order is Draft → Submitted → Approved → Fulfilled.

A state machine is a formal model of what states exist, what transitions are allowed, and what conditions trigger each transition. The key word is “formal” — it has to be explicit and enforced by the system, not implied in someone’s head.

For agents, this matters enormously. An agent navigating a multi-step process needs to know:

What state is this record currently in?
What transitions are legal from here?
What will happen — to this record and others — when a transition fires?

Without a real state machine, agents make bad guesses. They try to approve things that aren’t ready. They skip states that have downstream dependencies. They close tickets that have open child tasks. This produces errors that look like AI failures but are actually infrastructure failures.

Ask these questions about each tool in your stack:

Can I programmatically query the current state of any record?
Can I get a list of valid next states from any given state?
Does the system enforce state transitions, or just let you write any value to a status field?

Other agents start typing. Remy starts asking.

YOU SAID "Build me a sales CRM."

REMY ASKS

01 DESIGN Should it feel like Linear, or Salesforce?

02 UX How do reps move deals — drag, or dropdown?

03 ARCH Single team, or multi-org with permissions?

Scoping, trade-offs, edge cases — the real work. Before a line of code.

The last point is critical. A “Status” field with values like “Open,” “In Progress,” and “Closed” that you can set to anything at any time is not a state machine. It’s a dropdown. Agents will abuse it in ways humans never would.

Question 3: Does It Enforce Explicit Ownership?

Every record in an enterprise system belongs to someone or something. A deal has an owner. A ticket has an assignee. A document has an author. An approval has a gatekeeper.

Explicit ownership means the system knows who owns what, enforces that ownership in its logic, and exposes it programmatically. It’s not enough for ownership to be visible — it has to be structural.

This matters for agents because multi-agent systems divide work. One agent identifies a lead, another enriches it, a third qualifies it, a fourth writes the outreach. Each handoff has to be clean. The system has to know who’s “holding” a record at every moment, and it has to prevent two agents from working on the same record simultaneously.

Without explicit ownership, you get:

Race conditions — two agents update the same record at the same time, and the second write overwrites the first
Abandoned records — an agent “picks up” a record, fails mid-task, and leaves no clear indication of where it got to
Undefined handoffs — one agent completes its work, but there’s no mechanism to formally transfer responsibility to the next

The enterprise systems that handle this best treat ownership as a first-class concept with its own API surface: get owner, set owner, lock record, release record. The ones that handle it worst treat ownership as a display field with no enforcement logic at all.

What to look for: Locking APIs, assignment APIs, conflict detection on concurrent writes, and audit logs that show who (or what) last touched a record.

Question 4: Does It Expose Structural Verbs?

This is where most systems fail, even technically sophisticated ones.

A structural verb is an explicit, named action that the system exposes as a callable function. Examples:

approveInvoice(invoiceId, approverId)
escalateTicket(ticketId, reason, targetTeam)
closeOpportunity(opportunityId, outcome, closeDate)
mergeDuplicateContacts(primaryId, secondaryId)

Structural verbs matter because agents don’t understand intent — they understand instructions. If you tell an agent to “close a deal,” it needs a function called something like closeDeal() that does exactly that. It cannot infer from context that closing a deal means setting Status to “Closed Won,” filling in a Close Date, triggering a revenue forecast update, and creating a post-sale onboarding task.

A human knows all that from experience. An agent has to be told explicitly, or it will do something incomplete and wrong.

The problem is that most enterprise software exposes field-level APIs rather than action-level APIs. You can write status = "Closed Won" but there’s no closeOpportunity() method. Every field update that should happen atomically has to be composed by hand, and if the agent misses one, the downstream systems break in ways that are very hard to debug.

Ask about every system in your stack:

Does it expose domain-specific actions as API endpoints, not just CRUD operations?
Are those actions transactional (all-or-nothing)?
Are there webhooks or events that fire when actions complete, so other agents can react?

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

Systems that only expose row-level read/write are harder to use for agent infrastructure. You can work around it — but you’ll spend a lot of time building the action layer yourself.

Question 5: Does It Have Queryable History?

Agents are stateless by nature. When a task runs, it has to reconstruct context from what’s available in the systems it touches. The richer that history, the better decisions it makes.

Queryable history means the system maintains a complete, structured log of what happened to each record — not just the current state, but the full sequence of changes, who made them, when, and why.

This is different from a general audit log. A queryable history is:

Accessible per record — you can ask “show me everything that happened to ticket #4521”
Structured — events have types, timestamps, actors, and payloads — not just a text description
Filterable — you can query by event type, date range, actor, or outcome
Causally linked — where possible, events reference what triggered them

Without this, agents are flying blind. They can see the current state but not how it got there. They’ll make decisions that would seem obviously wrong to anyone who knew the history — but the agent doesn’t know the history because it isn’t queryable.

This also matters for failure recovery. When an agent fails mid-task, a supervisor agent needs to figure out what was done, what wasn’t, and where to pick back up. Without queryable history, recovery means human intervention every time.

What to look for: A changelog or activity feed accessible via API, with structured event types and actor identification. Bonus points for systems that distinguish between human actions and API/automated actions — that lets agents know whether a change was made by a person (and should probably be left alone) or by another process (and may be safe to supersede).

Scoring Your Stack

Run every major tool in your enterprise stack through these five questions. Score each one pass/fail per question. Here’s how to interpret the results:

Score	Readiness Level	What It Means
5/5	Agent-native	Can be integrated directly into multi-agent workflows
3–4/5	Agent-capable	Needs some wrapper logic but workable
1–2/5	Agent-hostile	Will create friction; consider alternatives or abstraction layers
0/5	Not viable	Should not be in the critical path of any agent workflow

Most enterprise stacks will have a mix. Your CRM might score 4/5. Your document management system might score 1/5. Your ERP might score 2/5 on a good day.

The goal isn’t to replace everything that scores low. It’s to:

Know where you can build directly
Know where you need to build abstraction layers
Know where agent automation isn’t feasible yet without significant investment

How Common Enterprise Tools Score

Here’s an honest look at how several categories of enterprise software typically perform against these five questions. This isn’t an exhaustive survey — your specific instance, configuration, and API tier all affect the real score.

CRM Systems

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

remy.msagent.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

Modern CRM platforms like Salesforce and HubSpot generally score well on records (stable IDs, consistent schemas) and ownership (assignment APIs, locking in some tiers). They’re weaker on state machines — most surface “stage” as a field rather than a formally enforced sequence. Structural verbs vary widely by configuration: out-of-the-box you get CRUD, but Salesforce Flow and HubSpot Workflows can expose more action-oriented endpoints. History is usually strong — both have robust activity logging accessible via API.

Typical score: 3–4/5. Workable, but the state machine gap will cause problems in complex approval or pipeline workflows.

Ticketing and ITSM Platforms

ServiceNow, Jira Service Management, and similar tools were designed with workflow in mind, which helps. They typically have real state machines (transition guards, approval gates) and decent structural verb exposure. Records are strong. History is usually excellent. The weak spot is often ownership — especially in complex multi-team setups where “assigned to” means something different in different contexts.

Typical score: 3–4/5. Generally solid for agent integration, especially for IT automation use cases.

ERP Systems

Legacy ERP systems often score poorly. Records exist but may not be individually addressable via API. State machines are enforced but rarely exposed programmatically. Structural verbs may exist in a proprietary API format that predates REST. History is often fragmented across modules. Modern ERP systems (and cloud variants of legacy ones) are improving, but this is still the category most likely to require a custom integration layer.

Typical score: 1–3/5. Often requires significant abstraction work before agents can interact reliably.

Collaboration and Messaging Tools

Slack, Teams, and similar platforms are almost entirely agent-hostile for workflow purposes. Messages aren’t records in any meaningful sense. There’s no state machine. Ownership is implicit at best. Structural verbs barely exist. History is present but not structured in a way that supports process reasoning.

Typical score: 0–1/5. Useful for agent-to-human notifications and input collection. Should not be in the critical path of any autonomous workflow.

Project Management Tools

Asana, Linear, Monday.com, and similar tools vary widely. The newer generation (Linear especially) is closer to agent-native — stable IDs, state machines, decent APIs. Older or more flexible tools that let users define arbitrary fields and workflows tend to score lower because there’s no structural consistency to rely on.

Typical score: 2–4/5. Depends heavily on how the tool has been configured and which specific platform you’re using.

Where MindStudio Fits in This Picture

Once you’ve evaluated your stack and know which systems score well, the next challenge is connecting them — building the agent workflows that span multiple tools, each with different readiness levels.

This is exactly what MindStudio is built for. When you’re building multi-agent workflows in MindStudio, you’re not just connecting APIs — you’re defining the logic that compensates for the gaps your 5-question test reveals.

For systems with weak state machines, you build the state enforcement into the agent workflow itself. For systems that only expose field-level writes instead of structural verbs, you compose the multi-step atomic actions in the workflow logic. For systems with poor history, you build a side-channel logging layer that the agent maintains separately.

Hire a contractor. Not another power tool.

Cursor, Bolt, Lovable, v0 are tools. You still run the project.
With Remy, the project runs itself.

MindStudio’s 1,000+ pre-built integrations include most of the enterprise platforms mentioned above — Salesforce, HubSpot, Jira, ServiceNow, and more — with native connectors that handle auth, rate limiting, and retry logic. You can build an agent that reads from your CRM, checks ticket state in your ITSM, updates records in both, and hands off to a downstream agent — all in a visual workflow that non-engineers can read and modify.

The platform also supports custom JavaScript and Python functions, which matters here: when a system only exposes CRUD and you need action-level semantics, you can write that composition logic as a function inside the workflow rather than building a separate service.

For enterprises evaluating where to start, the practical approach is to pick your highest-scoring systems — the 4s and 5s — and build your first agent workflows there. Use MindStudio to connect them, prove out the pattern, and then extend to the harder systems with abstraction layers as you learn what works.

You can try MindStudio free at mindstudio.ai — no API keys or separate accounts required, and most agent builds take under an hour.

Building the Abstraction Layer for Low-Scoring Systems

You won’t always have the option to replace a system that scores 1 or 2. ERPs and legacy platforms are expensive to change. But you can build an abstraction layer that makes them behave like higher-scoring systems from the agent’s perspective.

The abstraction layer pattern works like this:

Define a canonical schema — create a normalized data model for the records you need agents to work with, independent of what the underlying system uses
Build state machine wrappers — write functions that enforce valid state transitions before writing to the system, even if the system itself won’t enforce them
Create structural verb endpoints — write action functions (approveInvoice, escalateTicket) that compose the underlying field writes atomically, then expose them as callable tools to your agents
Maintain a side-channel history — log every agent interaction with the system to a separate store that’s queryable with structured event types

This is more work upfront, but it pays off quickly. Once you have the abstraction layer, you can swap the underlying system — upgrade from legacy ERP to a modern one, migrate CRM vendors — without rewriting any of your agent logic.

The abstraction layer also makes your agent workflows more resilient. If the underlying system is down or rate-limited, the layer can queue writes and replay them. If the system’s API changes, you update the layer, not every workflow that touches it.

What Changes When Your Stack Is Agent-Ready

Companies that have done this work — scored their stack, addressed the gaps, built abstraction layers where needed — describe a qualitative shift in how automation works.

The biggest change is failure mode. Legacy automation (rules-based, trigger-action) fails in obvious ways: a trigger fires, the action doesn’t complete, something breaks. Agent-based automation on solid infrastructure fails in subtler, more recoverable ways: an agent makes a suboptimal decision, a supervisor catches it, a correction runs. The difference between brittle and resilient automation often traces back to infrastructure quality.

The second change is scope. On weak infrastructure, agents are confined to simple, read-only tasks — summarizing, classifying, suggesting. On strong infrastructure, agents can act: create records, trigger transitions, assign ownership, escalate, close. The range of work that’s actually automatable expands significantly.

The third change is trust. Teams that see agents completing multi-step tasks accurately, handling edge cases, and recovering cleanly from errors start extending autonomy in places they wouldn’t have considered. That expansion of trust is how AI programs grow from narrow experiments to genuine operational capability.

None of that happens if the underlying systems don’t support it.

Frequently Asked Questions

What makes enterprise software “agent-ready”?

Agent-ready software has five properties: addressable records with stable IDs, a formal state machine that enforces valid transitions, explicit ownership that can be queried and updated programmatically, structural verbs (action-level API endpoints, not just field-level CRUD), and queryable history with structured event logs per record. Software that satisfies all five can be directly integrated into multi-agent workflows. Software that doesn’t will require abstraction layers or will limit what agents can reliably do.

Can existing enterprise tools work with AI agents, or do you need to replace them?

Most existing enterprise tools can work with AI agents, but the level of effort varies significantly based on how they score on the five criteria. High-scoring tools (4–5/5) can be integrated directly. Lower-scoring tools require abstraction layers that compensate for missing capabilities — typically custom action wrappers, side-channel history logging, and state enforcement logic. Full replacement is rarely necessary in the short term, though it may make sense as part of longer-term modernization.

What is a state machine, and why does it matter for AI agents?

A state machine is a formal model that defines what states a record can be in, what transitions between states are allowed, and what conditions must be true for a transition to fire. For AI agents, state machines matter because agents need to know what actions are legal at any given moment. Without a formal state machine, agents may attempt invalid transitions — trying to approve something that isn’t ready, or skipping stages that have downstream dependencies — producing errors that are hard to diagnose and fix.

How do race conditions affect multi-agent systems in enterprise software?

Race conditions occur when two agents attempt to modify the same record simultaneously, and the second write overwrites the first without any awareness of the conflict. In multi-agent enterprise systems, this is a significant risk when ownership isn’t enforced — two agents “pick up” the same task, work in parallel, and produce inconsistent state. Proper ownership and locking mechanisms prevent this by ensuring only one actor holds a record at a time, and that handoffs are explicit and logged.

What’s the difference between a CRUD API and structural verbs?

A CRUD (Create, Read, Update, Delete) API exposes raw field-level operations — you can read a record, update a field, create a new row. Structural verbs are domain-level action endpoints that encapsulate the full business logic of an operation: closeOpportunity(), approveInvoice(), escalateTicket(). A structural verb fires all the necessary field updates atomically, triggers downstream effects, and returns a structured result. Agents built against CRUD-only APIs have to compose these actions themselves, which means any missing step produces broken state that’s hard to detect.

How do you prioritize which systems to upgrade or wrap first?

Start with the systems that are on the critical path of your highest-value automation use cases. Score each one using the five-question test. For systems that score 3–4/5, direct integration is viable — start there. For systems that score 1–2/5 but can’t be replaced, build abstraction layers before building agent workflows against them. Avoid building production agent workflows directly against systems that score 0/5; the integration will be too brittle to trust at any meaningful scale.

Key Takeaways

Enterprise AI agents fail most often because of infrastructure gaps, not model quality.
The five-question test — records, state machine, explicit ownership, structural verbs, queryable history — gives you a concrete way to evaluate any tool in your stack.
Most enterprise software scores 2–4/5; very few tools are agent-native out of the box.
Low-scoring systems can be wrapped with abstraction layers rather than replaced.
Once infrastructure is solid, agent workflows become more reliable, more autonomous, and more trusted by the teams using them.

The work of building agent infrastructure is unglamorous. It involves API docs, schema reviews, and long conversations about what “ownership” means in your specific context. But it’s also the work that determines whether your AI investment produces real outcomes or expensive demos.

Start with the test. The rest follows from honest answers.