2026 AI Lab Power Rankings: 9-Category Scorecard Puts Google and OpenAI Tied — With One Big Surprise

Google and OpenAI Tie at 74/100 — But the Scorecard Tells a More Complicated Story

A 9-category scoring framework just put Google and OpenAI in a dead heat at 74 points each, with Anthropic close behind at 70 and Amazon trailing at 64. That headline number is interesting. What’s underneath it is more interesting.

The framework, built by the AI Daily Brief’s Nathaniel Whittemore, weights nine categories: compute and infrastructure (20 points), enterprise positioning (15), platform and ecosystem control (15), consumer positioning (10), model leverage (10), momentum (10), branded narrative (10), wedge (5), and X-factor (5). The author enterprise scores alone are worth a long look: Anthropic 14/15, Microsoft 14/15, OpenAI 10/15, Google 8/15. If you’re an enterprise buyer or an engineer building on top of these platforms, those numbers should recalibrate how you’re thinking about the field.

The AI consensus ranking, for comparison, had Google at 91.4, OpenAI at 85.4, Microsoft at 84.9, Anthropic at 83.1, and Amazon at 80.4 — all five above 80. The human scorer was considerably harsher. Only three labs broke 70. That gap between AI assessment and human assessment is itself a data point worth sitting with.

The Scorecard in Full

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

The methodology starts with compute and infrastructure as the heaviest single category at 20 points. That weighting reflects a specific moment in the industry: we are in a period of genuine token scarcity. Dylan Patel of SemiAnalysis made the point recently on the Invest Like the Best podcast that it doesn’t really matter who the leading lab is right now — “even tier two or tier three labs are going to be sold out of tokens.” The economic value that capable models can deliver is growing faster than the infrastructure to serve it.

Given that framing, the compute scores carry real weight. Google gets 17/20. OpenAI gets 12/20. Anthropic gets 10/20.

The gap between Google and OpenAI here is intentional. OpenAI has been aggressive about securing compute deals over the past year, but being dependent on financing-backed partnerships with third parties is structurally different from owning a significant portion of your infrastructure in-house. Google owns TPUs. Google owns the data centers. That is a different kind of asset. The argument that Anthropic should be even further behind OpenAI on compute is reasonable — and if you’ve been watching Anthropic’s compute shortage tighten Claude’s rate limits, you’ve seen the real-world consequences of that gap play out in production.

Enterprise positioning is where the scorecard gets genuinely surprising. Anthropic at 14/15 is the same score as Microsoft. That will strike some readers as absurd. Microsoft has decades of enterprise relationships, a 27% equity stake in OpenAI, and Azure as the dominant cloud for AI workloads. How does a company that was a startup five years ago match that score?

The argument is that enterprise buyers are treating AI adoption differently than they’ve treated previous software decisions. This isn’t “pick a new vendor for your CRM.” Companies are making bets on which model labs will define how work gets done. And a meaningful number of them are going direct to the source — to Anthropic and OpenAI — rather than routing through a cloud intermediary. Microsoft’s distribution is real, but at the end of the day it’s serving other companies’ models. Some buyers want that flexibility. Others want the primary relationship.

Google’s 8/15 on enterprise is the score that will generate the most argument. Google has enormous enterprise surface area — Workspace, Drive, Gmail, Sheets — and companies that aren’t locked into the Microsoft ecosystem often default to Google’s tools. But Google has historically struggled to convert that footprint into the highest-tier enterprise relationships. That pattern has followed them into AI. Gemini’s enterprise traction has been weaker than the infrastructure would suggest it should be.

Why the Compute-Enterprise Split Matters to You

If you’re making decisions about which models to build on, the compute-enterprise split tells you something specific about risk profiles.

High compute, lower enterprise traction (Google) means you’re betting on a lab that can serve you at scale but may not have the enterprise-grade support motion you need for a production deployment. Low compute, high enterprise traction (Anthropic) means you’re betting on a lab that understands your procurement process and security requirements but may struggle to serve demand — which, again, is already happening.

Remy doesn't build the plumbing. It inherits it.

Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.

WHAT REMY DOESN'T HAVE TO BUILD

200+

AI MODELS

GPT · Claude · Gemini · Llama

✓

1,000+

INTEGRATIONS

Slack · Stripe · Notion · HubSpot

✓

MANAGED DB

AUTH

PAYMENTS

CRONS

Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.

OpenAI sits in the middle on both dimensions, which is either reassuring or concerning depending on your read. Their 10/15 on enterprise is described as “a little aspirational” — enterprise is clearly growing in importance for them, but compared to how historically central it’s been to Anthropic’s strategy, OpenAI is still catching up. Their 12/20 on compute reflects real progress on deal-making without the in-house ownership that Google has.

For teams building multi-model workflows — where you’re routing different tasks to different models based on cost, latency, and capability — the compute and enterprise scores matter differently than they do for teams going deep on a single provider. Platforms like MindStudio handle this orchestration layer: 200+ models, 1,000+ integrations, and a visual builder for chaining agents and workflows, which means the underlying compute constraints of any single lab become less of a single point of failure.

What’s Buried in the Momentum Numbers

The most counterintuitive number in the entire scorecard is Google’s momentum score: 3/10. The highest-scoring lab overall, on the category most likely to predict near-term trajectory, scores near the bottom.

The explanation is specific: 2026 has been dominated by agentic use cases and coding-based workflows. When developers are choosing which model to reach for when building agents, writing code, or running automated pipelines, they are not reaching for Gemini. They’re reaching for GPT-5.5 or Claude. That behavioral pattern is the momentum signal, and it’s hard to argue with.

OpenAI gets 10/10 on momentum, reflecting the recent reception to GPT-5.5 and the rapid adoption of Codex. Anthropic gets 8/10, which reflects the ARR growth story across all of 2026 — though the score acknowledges a very recent softening as some behavior shifts toward Codex at a moment when Anthropic is struggling to keep up with its own demand. Amazon gets 6/10, which is probably undercounted by most observers: they have the cash and compute to throw weight around, and the OpenAI partnership announcement — GPT-5.4 now available as a limited preview on AWS Bedrock, with 5.5 coming within weeks — is a meaningful move.

Google IO is the obvious catalyst to watch. The Sergey Brin-led strike team working on coding models is the right response to the right problem. But if Google comes out of IO without a credible answer on coding-based use cases, the momentum gap will persist regardless of what else they ship. The comparison between Anthropic, OpenAI, and Google’s different strategic bets on agents is worth reading alongside these momentum scores — the strategic divergence explains a lot of why the momentum numbers look the way they do.

The Outliers: XAI, Meta, and the X-Factor Problem

Two labs outside the top four are worth a closer look.

XAI gets an X-factor score of 8/5 — above the maximum — because of Elon Musk. Whatever your view of him personally, the business track record over 20 years is hard to dismiss. XAI also scores highly on compute, which is a leading indicator for everything else. Their model score of 5/10 is described as a “stronger five” than Amazon’s or Microsoft’s 5/10, because XAI’s five reflects having capable but not state-of-the-art proprietary models, whereas Amazon and Microsoft’s fives reflect having access to all the models without owning any of them. The room to rise is different.

Meta is the strangest entry. Strong compute, interesting consumer wedge with the Ray-Bans, meaningful open-source leverage through Llama. But the restructuring efforts of the past six months haven’t produced visible outcomes yet, and the scorecard reflects that.

The model leverage category is where some of the most interesting near-term movement will happen. The author had OpenAI and Anthropic tied at 9/10 on models, with the caveat that Claude Mythos is somewhere in the pipeline. If you’ve been following what Claude Mythos promises on coding benchmarks — 93.9% on SWE-bench — you understand why that caveat matters. A model at that capability level changes the enterprise and momentum scores simultaneously.

The GPT-5.4 vs Claude Opus 4.6 comparison is one data point in the current model landscape, but the scores in this framework are explicitly forward-looking. The author kept OpenAI and Anthropic tied on models precisely because the near-term pipeline is uncertain enough that a one-point lead could flip within weeks.

The Non-Zero-Sum Caveat That Changes the Analysis

Miles Brundage’s observation deserves more weight than it usually gets: “There is a lot of implicit zero-sum thinking around the AI race, i.e. that only one of OpenAI, Anthropic, Google, etc. will succeed and that one’s growth comes at the expense of the other. Mostly though, there is just a rapidly expanding pie.”

This matters for how you use a power ranking. If you’re reading this as a horse race — pick the winner, bet accordingly — you’re probably making a mistake. The more useful frame is: which labs have structural advantages in the dimensions that matter for your specific use case?

If you’re building enterprise software that needs to clear procurement and security review, Anthropic’s 14/15 enterprise score is the signal. If you’re building infrastructure-heavy applications where token costs and availability are the constraint, Google’s 17/20 compute score is the signal. If you’re building developer tools where momentum and ecosystem matter, OpenAI’s 10/10 momentum score is the signal.

The scorecard is also a useful forcing function for thinking about what you’d weight differently. The author explicitly flagged that compute at 20 points might be too low — several of the AI models that ran the same exercise suggested it should be higher. Momentum at 10 points might be too low for people paying close attention to the daily news cycle, but probably too high for anyone making a three-year infrastructure bet.

When you’re building production applications on top of these models, the abstraction level matters. Tools like Remy take a different approach to that abstraction: you write a spec — annotated markdown — and the full-stack application gets compiled from it, TypeScript backend, SQLite database, auth, deployment, all of it. The spec is the source of truth; the generated code is derived output. The point being that as the model landscape shifts, the layer you’re building at determines how much the underlying lab competition actually affects you.

What to Watch in the Next 30 Days

Not a coding agent. A product manager.

Remy doesn't type the next file. Remy runs the project — manages the agents, coordinates the layers, ships the app.

BY MINDSTUDIO

Google IO is the most important near-term event in this ranking. A strong showing on coding-based use cases would move their momentum score from 3 to something meaningfully higher, which would likely push their overall score above the current 74 tie with OpenAI. A weak showing — or a strong showing on everything except coding — probably leaves the momentum gap intact through at least Q3.

The Anthropic compute situation is the second thing to watch. Their 14/15 enterprise score and 8/10 momentum score are both real, but both are constrained by their 10/20 compute score. If they close that gap — through new infrastructure deals or the AWS partnership maturing — the overall ranking shifts. If they don’t, the enterprise score starts to erode as buyers who want to go direct to the source find that the source can’t reliably serve them.

Amazon is probably the most undercounted lab in the current ranking. Their 64 overall score and 6/10 momentum score both have room to move. The OpenAI partnership on Bedrock is a real signal — not just a distribution deal, but a statement about where enterprise AI buying is heading. AWS CEO Matt Garman’s comment that “their production applications run in AWS, their data is in AWS, they trust the security of AWS” is the enterprise buyer psychology in one sentence.

If you want to build your own scorecard and see how your weights compare to the author’s or to the AI consensus, the tool is at aipowerrank.ai. The community rankings will be more interesting than any single person’s take, including this one.

The ranking that matters most is the one that reflects your actual use case. A lab that scores 74 overall but 8/15 on enterprise is a different bet than a lab that scores 70 overall but 14/15 on enterprise, depending entirely on what you’re building and for whom.