How to Understand the AI Enterprise Business Model Shift Before Your Competitors Do
Anthropic's inference margins jumped from 38% to 70% in one year. Here's what the subscription-to-deployment shift means for builders and buyers.
The Margin Story Nobody Is Telling About AI Enterprise Pricing
Anthropic’s inference margins jumped from 38% to 70% in roughly one year. That single number — reported by SemiAnalysis, who are considered extremely well-sourced on infrastructure economics — tells you more about where enterprise AI is heading than any press release about a new model release.
If you are building on top of these models, or selling into enterprises that are, you need to understand what that margin expansion means. Not because it’s interesting trivia, but because it changes the math on every pricing conversation you will have in the next 18 months.
Here is the short version: the AI business model has shifted from selling seats to selling tokens, and the companies that internalize this shift early will price, build, and sell differently from those that don’t. The ones that don’t will wonder why their enterprise pilots never convert.
What “Seats to Tokens” Actually Means for Your Business
The old SaaS model was legible. You had a product. You had users. You charged per seat, maybe with some usage tiers on top. A CFO could look at a contract and understand it immediately: 500 seats at $40/month is $20,000/month. Done.
Other agents ship a demo. Remy ships an app.
Real backend. Real database. Real auth. Real plumbing. Remy has it all.
The token model is different in a way that sounds subtle but isn’t. In the seat model, there is a natural ceiling — the number of humans in the organization. In the token model, there is effectively no ceiling. A single engineer running an agentic coding workflow can consume thousands of dollars of tokens per month. The same engineer on a seat-based plan costs $20.
This is why Anthropic’s ARR reportedly went from $9 billion to over $44 billion in 2026, doubling approximately every six weeks. Analyst Ming Li calculated that Anthropic is adding $96 million in ARR per day. AWS took 13 years to reach $35 billion in annual revenue. Salesforce took over 20 years to pass $20 billion. The old software valuation framework, as Ming put it, no longer fits.
The mechanism is not magic. It is the agentic shift. When AI moves from answering questions to completing tasks — writing code, processing documents, running workflows autonomously — token consumption per user goes from hundreds per day to potentially millions. The ceiling disappears.
For builders, this has a direct implication: if you are building an AI-powered product and you are still thinking in seats, you are leaving money on the table and probably also mispricing your cost structure.
What You Need to Understand Before You Price Anything
Before you can price correctly in a token-based world, you need to understand three things: your token consumption profile, your inference cost trajectory, and your customer’s value realization curve.
Token consumption profile means knowing how many tokens your application actually uses per unit of work. This is not the same as how many tokens you think it uses. Run your workflows, instrument them, and measure. The difference between a naive RAG implementation and a well-structured one can be 10x in token consumption for the same quality output. Atlassian’s CEO Mike Cannon-Brookes made exactly this point about their Rovo AI search tool: because Rovo can tap into Jira’s existing knowledge graph rather than doing token-hungry vector search, it uses far fewer tokens to get to the same answer. That efficiency is a competitive moat, not just a cost saving.
Inference cost trajectory means understanding that the cost of running these models is falling fast, and that the margin expansion Anthropic is experiencing (38% to 70% in one year) is partly a function of that. If you lock in pricing assumptions based on today’s inference costs, you may be underpricing in ways that hurt you, or overpricing in ways that lose you deals. Build in a model for how your unit economics change as inference gets cheaper.
Value realization curve is the hardest one. Enterprises don’t pay for tokens. They pay for outcomes. Your job is to connect the token consumption to a business result the customer cares about — deals closed, hours saved, errors reduced — and price against that outcome, not against the underlying compute. The companies that figure this out will have pricing power. The ones that just pass through API costs will be commoditized.
If you are building agents or workflows that chain multiple models together, platforms like MindStudio handle the orchestration layer — 200+ models, 1,000+ integrations, and a visual builder — which means you can focus on the value layer rather than the plumbing. The token consumption still happens; you just have more control over where it goes.
Other agents start typing. Remy starts asking.
Scoping, trade-offs, edge cases — the real work. Before a line of code.
The Forward-Deployed Engineer Model and Why It Changes Enterprise Sales
There is a structural reason why AI deployment has been slower than the technology’s capability would suggest. The people who understand how to make these models work don’t understand the customer’s business. The people who understand the customer’s business don’t know how to make the models work. You need both, and right now there aren’t many people who have both.
Palantir figured this out years ago with what they called the Forward Deployed Engineer model. Instead of building a product, handing it to sales, and letting the customer figure out installation, Palantir embedded their best engineers directly into customer organizations. These weren’t account managers writing documentation. They were shipping real code, building the harness around the model, making the thing actually work in the specific weird environment of that specific customer.
Palantir IPO’d at around $19 in 2021, dropped to $6 in 2022, and then returned 640% over five years. The FDE model is a significant part of why.
Now both Anthropic and OpenAI are explicitly copying this playbook. Anthropic’s new joint venture — valued at $1.5 billion, with a $300 million founding commitment from Anthropic, Blackstone, and Hellman & Freeman, and backed by Apollo Global Management, General Atlantic, GIC, Leonard Green, and Suko Capital — is essentially a formalized FDE machine targeting financial services. OpenAI is raising $4 billion from 19 investors at a $10 billion valuation for a similar vehicle, apparently with zero investor overlap with Anthropic’s group.
What this means for you as a builder: the deployment gap is real, and it is the primary constraint on AI adoption right now, not model capability. If you can close that gap — if you can be the person who understands both the model and the customer’s business — you have pricing power that most software companies never see.
The practical implication is that your go-to-market motion for enterprise AI probably needs to look more like professional services than traditional SaaS, at least initially. You get embedded, you make it work, and then the system becomes sticky because the customer depends on you for updates and maintenance. The token consumption that follows is recurring, high-margin, and hard to rip out.
The Real Failure Modes in Token-Based Enterprise Deals
Most enterprise AI pilots fail for one of three reasons, and none of them are model quality.
Failure mode one: pricing the pilot wrong. Enterprises expect pilots to be cheap or free. Token-based pilots can get expensive fast if you don’t cap them. Set hard token budgets for pilots, instrument everything, and make sure the customer sees the value-per-token story before the invoice arrives. If they see a big number without context, they will kill the project regardless of the results.
Remy doesn't write the code. It manages the agents who do.
Remy runs the project. The specialists do the work. You work with the PM, not the implementers.
Failure mode two: building the wrong harness. The model is not the product. The harness — the scaffolding, the tools, the databases, the structure around the model — is the product. A lot of failed enterprise AI implementations failed because someone plugged a model into a workflow without building the harness that makes it reliable. This is the Minecraft Voyager lesson: the agent that worked wasn’t just a model, it was a model plus a structured environment plus a skill library plus multiple specialized instances. Enterprise deployments need the same architecture thinking. For teams building toward production, tools like Remy take a spec-first approach — you write annotated markdown describing what the application should do, and it compiles a complete TypeScript backend, database, auth, and deployment from that spec. The spec is the source of truth; the generated code is derived output. That kind of structured precision matters when you’re handing something to an enterprise customer who will ask hard questions about what the system actually does.
Failure mode three: not connecting to a measurable outcome. Enterprises have been burned by technology vendors promising transformation and delivering dashboards. If you cannot point to a specific, measurable business outcome within 90 days of deployment, the pilot will not convert. Pick one metric. Make it real. Make it visible.
The companies that are winning enterprise AI deals right now are not winning on model benchmarks. They are winning on deployment speed, outcome clarity, and the ability to make the thing actually work in the customer’s environment.
The CapEx Signal You Should Be Watching
There is a macro signal that most builders are ignoring because it feels abstract. Morgan Stanley raised its hyperscaler CapEx forecast to $805 billion for 2026 and $1.1 trillion for 2027. The Mag 7 companies spent over $400 billion in CapEx in Q1 2026 alone, with a reported and projected backlog of around $1.3 trillion.
The backlog number is the important one. Demand for compute capacity is running more than three times the current annual spend. That is not a bubble signal. That is a supply constraint signal. The infrastructure is being built as fast as it can be built, and it still isn’t fast enough to meet demand.
For builders, this means two things. First, token costs will continue to fall as supply catches up, which improves your unit economics over time. Second, in the near term, token efficiency matters more than it will in two years. Atlassian’s Rovo story is instructive here: customers using Rovo were growing their own ARR at twice the pace of those who weren’t, and a significant part of that advantage was token efficiency through structured knowledge graphs rather than brute-force retrieval. In a constrained supply environment, the builder who uses tokens more efficiently has a real advantage.
The Anthropic compute shortage is a real constraint right now, and it affects how you should think about model selection and fallback strategies in production systems. Understanding which models to use for which tasks — and when to route to alternatives — is increasingly a core engineering skill, not an afterthought.
What to Do With This Information This Week
The shift from seat-based to token-based pricing is not coming. It is here. Anthropic’s margin expansion from 38% to 70% in one year is evidence that the economics of this model are working, and working fast. The companies building the deployment infrastructure — the joint ventures, the FDE machines, the enterprise scaffolding — are betting that the constraint is not model quality but deployment capability.
If you are building AI products for enterprise customers, three things are worth doing immediately.
Instrument your token consumption at the workflow level, not just at the API level. You need to know where tokens are going before you can price or optimize intelligently. Most teams don’t have this visibility and are flying blind on their cost structure.
Remy doesn't build the plumbing. It inherits it.
Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.
Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.
Build a value-per-token story for your most important use cases. What business outcome does each major workflow produce? What does it cost in tokens? What is the ratio? If you can answer this, you can have a pricing conversation with a CFO. If you can’t, you are selling on vibes.
Look at your deployment motion and ask honestly whether it is set up to close the deployment gap. If your answer to “how does the customer get this working in their environment” is “they figure it out,” you are leaving deals on the table. The FDE model exists because that answer doesn’t work for high-stakes enterprise customers.
The Claude Code source code leak revealed a lot about how Anthropic thinks about agentic deployment architecture — worth reading if you are building harnesses for enterprise workflows. And if you are thinking about which models to put inside those harnesses, the GPT-5.4 vs Claude Opus 4.6 comparison and the Qwen 3.6 Plus review are both useful for understanding the current capability landscape across providers.
The margin story is the tell. When inference margins go from 38% to 70% in a year, it means the unit economics of AI deployment are working. The question is whether your business model is positioned to capture that value, or whether you are still thinking in seats while the market moves to tokens.