GitHub Copilot Is Moving to Usage-Based Billing — And Satya Nadella Says Every Microsoft Product Will Follow
GitHub's CPO called flat-rate AI pricing 'no longer sustainable.' Satya Nadella confirmed on earnings: every per-user business becomes per-user-and-usage.
GitHub Copilot Just Ended the AI Flat-Rate Era
GitHub Copilot’s CPO Mario Rodriguez said the quiet part out loud this quarter: “A quick chat question in a multi-hour autonomous coding session can cost the user the same amount. GitHub has absorbed much of the escalating inference cost behind that usage, but the current premium request model is no longer sustainable.” Then Satya Nadella confirmed on the Microsoft earnings call that this isn’t a Copilot-specific fix — it’s a company-wide direction: “Any per user business of ours, whether it’s productivity or coding or security, will become a per user and usage business.”
That’s a significant statement. Microsoft has tens of millions of seats across GitHub, M365, and Azure. Every one of those products is now on a path toward consumption pricing.
If you’re building on top of AI APIs, deploying internal tools, or advising a company on its AI stack, this shift changes your cost model in ways that aren’t obvious from the headline.
Why Flat-Rate AI Pricing Was Always a Bet, Not a Business Model
The flat-rate model made sense as a land-grab. You charge $19/month, people try the product, some use it heavily, most don’t, and the heavy users subsidize the light ones while the lab builds market share. Classic SaaS cross-subsidy.
One coffee. One working app.
You bring the idea. Remy manages the project.
The problem is that AI inference doesn’t behave like cloud storage or CRM seats. Storage costs fall predictably. Inference costs are tied to model size, context length, and the number of tokens generated — and all three of those have been moving in the wrong direction for a flat-rate provider.
When Copilot launched, a “premium request” meant a short autocomplete or a quick inline suggestion. Now it means a multi-hour autonomous coding session where the model is reading your entire codebase, writing tests, running them, fixing failures, and iterating. The token count on that session is orders of magnitude higher than a chat question. Charging the same $19/month for both is not a pricing model — it’s a subsidy program.
GitHub absorbed that subsidy for a while because it was buying adoption. That phase is over.
The Demand Signal That Made This Inevitable
The pricing change didn’t happen in isolation. It happened against a backdrop of AI infrastructure demand that is genuinely unprecedented.
Google Cloud grew 63% year-over-year in Q1 2026 — the second biggest single-day market cap jump in Google’s history. Analyst Joseph Carlson looked at the backlog chart and wrote: “This is so crazy, it literally looks fake.” Azure grew 40% year-over-year. AWS grew 28%, its best performance since climbing out of a trough in 2021. Andy Jassy said about Trainium demand: “We have such demand right now from various companies who will consume as much as we make. I expect over time there’s a good chance we’re going to sell racks.”
OpenAI CFO Sarah Fryer described token demand as “a vertical wall of demand with compute being the bottleneck.”
Dylan Patel from SemiAnalysis, speaking on Patrick O’Shaughnessy’s podcast, put it plainly: even tier-two or tier-three labs are going to be sold out of tokens. The analysis of who’s in first or second place on model benchmarks is almost irrelevant in a world where every token that can be produced will be sold.
This is the context in which GitHub’s pricing decision makes sense. It’s not that GitHub suddenly got greedy. It’s that the underlying cost of serving an agentic coding session has risen sharply, and the flat-rate model was never designed to absorb that.
Even consumer hardware is feeling it. Mac minis are sold out for at least several months — Tim Cook addressed it on Apple’s earnings call. The compute constraint isn’t abstract. It’s showing up in every layer of the stack, from data center racks to the device sitting on your desk.
What “Per User and Usage” Actually Means for Your Budget
The shift Nadella described — from per-user to per-user-and-usage — sounds simple. In practice it means your AI spend becomes variable in a way that SaaS spend historically hasn’t been.
With a seat license, your finance team can model costs precisely. 200 engineers × $19/month = $3,800/month. Done. With usage-based billing, that same team of 200 engineers might generate wildly different costs depending on how aggressively they’re using agentic features. An engineer running Copilot for a quick autocomplete is a different cost center than one running a multi-hour autonomous refactor session.
Everyone else built a construction worker.
We built the contractor.
One file at a time.
UI, API, database, deploy.
This creates a new class of problem for engineering managers and finance teams: AI cost governance. You need to understand which workflows are generating the most token consumption, whether that consumption is producing proportional value, and how to set guardrails without killing productivity.
The answer isn’t to restrict usage — that defeats the purpose. The answer is to get smarter about which model tier you’re using for which task. Rodriguez’s framing implies exactly this: not all requests need premium inference. A quick chat question doesn’t need the same model as a multi-hour coding session. The companies that figure out how to route intelligently — premium models for high-stakes tasks, cheaper models for routine ones — will have a structural cost advantage over companies that just let everything hit the most expensive endpoint.
This is where orchestration tooling becomes genuinely important rather than just nice-to-have. Platforms like MindStudio handle this kind of routing across 200+ models and 1,000+ integrations, letting you build workflows that direct tasks to the right model tier rather than defaulting everything to the most expensive option. That’s not a theoretical optimization — in a usage-based world, it’s the difference between a predictable AI budget and an unpleasant surprise at month-end.
The Broader Cascade: Every Per-User Product Is on This Path
Nadella’s statement deserves to be read carefully. He didn’t say “GitHub Copilot is moving to usage-based billing.” He said any per-user business will become a per-user-and-usage business. That’s M365 Copilot. That’s Security Copilot. That’s Dynamics. That’s the entire Microsoft commercial stack.
The same logic applies beyond Microsoft. Anthropic has been doing everything it can to avoid a purely usage-based model — their decisions around how third-party products like OpenClaw use their APIs reflect that tension. But the economics are the same for every lab. When demand exceeds supply and inference costs are rising, flat-rate pricing is a bet against your own unit economics.
For builders, this means the era of “just use the API, it’s cheap enough” is ending. Token-based pricing for AI models has always been the underlying reality — what’s changing is that the flat-rate wrappers that obscured that reality are being peeled back. You’re going to be closer to the raw economics whether you want to be or not.
The companies that will navigate this best are the ones that have already built model-agnostic infrastructure. If your application is tightly coupled to a single model at a single price point, a pricing change is a crisis. If you’ve built with abstraction layers that let you swap models based on cost and capability, a pricing change is just a configuration update.
The Agentic Multiplier
There’s a specific reason this is happening now rather than two years ago: agents.
When AI was primarily a chat interface or an autocomplete tool, token consumption per user was bounded. A conversation has natural limits. Autocomplete suggestions are short. The variance between a light user and a heavy user was maybe 5-10x.
Remy doesn't build the plumbing. It inherits it.
Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.
Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.
Agentic workflows break that bound entirely. An agent running a multi-hour coding session, reading a large codebase, writing and running tests, and iterating on failures can consume tokens at a rate that’s 100x or 1000x a simple chat interaction. The variance between a user who runs one agent session per day and one who runs ten is enormous.
This is what Rodriguez meant when he said the current model isn’t sustainable. It’s not that Copilot users are using it more — it’s that the nature of usage has changed. The product that GitHub sold a flat-rate license for in 2023 is not the same product it’s selling today. The 2023 product was an autocomplete assistant. The 2025 product is an autonomous coding agent. Pricing the latter like the former was always going to end.
The same dynamic is playing out across every AI product that has added agentic capabilities. AI agents for product managers, autonomous research tools, document processing pipelines — anything that runs multi-step workflows is generating token consumption that flat-rate pricing can’t absorb indefinitely.
What You Should Actually Do This Week
The practical response to this shift isn’t complicated, but it does require some deliberate work.
First, audit your current AI spend by workflow type. Separate the tasks where you’re using premium models because you need premium output from the tasks where you’re using premium models because that’s the default. The latter category is where you have immediate cost optimization opportunity.
Second, build or adopt routing logic before you need it. In a usage-based world, having a system that automatically directs low-stakes tasks to cheaper models isn’t premature optimization — it’s basic cost hygiene. If you’re building production applications that chain multiple AI calls, the routing decision should be explicit in your architecture, not an afterthought.
Third, if you’re building full-stack applications that incorporate AI workflows, think carefully about where the AI cost lives in your architecture. Tools like Remy take a spec-driven approach — you write annotated markdown describing your application, and it compiles into a complete TypeScript backend, database, auth, and deployment. When your AI cost model changes, you update the spec and recompile rather than hunting through generated code. That kind of abstraction becomes more valuable, not less, as the underlying pricing landscape shifts.
Fourth, watch how Anthropic handles this. They’re the most visible holdout on pure usage-based pricing, and the decisions they make about third-party access and model tiers will signal whether there’s a viable alternative to the direction Microsoft just announced. The Claude Mythos situation — where compute constraints are already shaping access decisions — suggests that even Anthropic’s resistance to usage-based pricing has limits imposed by physical reality.
The Subsidy Era Is Over
The flat-rate AI subscription was a product of a specific moment: labs needed adoption, compute costs were falling, and the use cases were bounded enough that cross-subsidization worked. All three of those conditions have changed.
Adoption is no longer the primary goal — monetization is. Compute costs are rising relative to demand. And the use cases have expanded into agentic territory where token consumption is effectively unbounded.
Remy is new. The platform isn't.
Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.
Rodriguez and Nadella aren’t announcing a pricing change. They’re announcing the end of a phase. The question for builders and operators isn’t whether to accept this — it’s how quickly you can build the cost discipline and routing infrastructure to operate profitably in the new model.
The companies that treat this as a crisis will spend the next year scrambling to cut AI usage. The companies that treat it as a design constraint will spend the next year building systems that are smarter about how they use AI — and they’ll come out ahead on both cost and capability.
That’s the actual opportunity in this announcement, if you’re willing to see it that way.