OpenAI on AWS Bedrock vs Claude on Bedrock — What the New Competitive Landscape Means for Enterprise AI Buyers

If You’re on AWS Bedrock, You Now Have a Real Choice

Until April 28, 2026, the decision was mostly made for you. If your infrastructure ran on AWS and you wanted a frontier model through Bedrock, Anthropic’s Claude was the obvious path. OpenAI required Azure. That friction was real, and it quietly shaped a generation of enterprise AI decisions.

That changed overnight. OpenAI models, Codex, and managed agents became available on AWS Bedrock on April 28 — one day after Microsoft and OpenAI restructured their partnership. The timing was not a coincidence. The original Microsoft deal contained an exclusivity clause that prevented OpenAI from distributing through competing clouds. Once that clause was removed, the AWS deal went live within 24 hours. It had clearly been waiting.

The Signal quote that circulated after the announcement put it plainly: “People underestimate how big of a deal it is that OpenAI models are now on Bedrock. I’ve met so many companies that defaulted to Anthropic and Claude because they were already on Bedrock, and for a long time that was basically the path of least resistance.”

That path of least resistance now has a fork. If you’re an enterprise AI buyer evaluating OpenAI on AWS Bedrock vs Anthropic Claude on Bedrock, this post is the comparison you need to make that call.

What Actually Changed in the Microsoft Deal (and Why It Unlocked This)

Plans first. Then code.

PROJECTYOUR APP

SCREENS12

DB TABLES6

BUILT BYREMY

1280 px · TYP.

yourapp.msagent.ai

A · UI · FRONT END

Remy writes the spec, manages the build, and ships the app.

The restructured Microsoft-OpenAI agreement, announced April 27, 2026, had four meaningful changes. Two of them directly created the conditions for the AWS deal.

First, the AGI clause was removed. The original agreement contained a provision that would have terminated Microsoft’s access to OpenAI models the moment OpenAI declared it had achieved AGI. That clause is now gone, replaced by a fixed end date: Microsoft’s license runs through 2032.

Second, and more consequentially for this comparison, Microsoft’s license became non-exclusive. Previously, OpenAI was effectively locked into Azure as its primary distribution channel. Non-exclusive means OpenAI can now sell through AWS, Google Cloud, or anyone else.

Microsoft also stopped paying a revenue share to OpenAI under the new terms. In exchange, Satya Nadella framed what Microsoft got as a good deal: “We have a frontier model royalty-free with all the IP rights that we will have access to all the way to 32, and we fully plan to exploit it.”

The subtext is that Microsoft locked in its position through 2032 while OpenAI gained the freedom to expand distribution. Both sides got something. The side effect for enterprise buyers is that the model landscape on Bedrock just got significantly more competitive.

The Dimensions That Actually Matter for Enterprise Buyers

Before comparing the two options head-to-head, it’s worth being precise about what enterprise buyers are actually evaluating. Not every dimension matters equally for every use case.

Cloud infrastructure fit. If you’re already on AWS, Bedrock is your managed inference layer. Both Claude and OpenAI models are now available through it. The question shifts from “which cloud” to “which model family.”

Model capability per task type. Claude and GPT have different strengths. Claude Opus 4.7 has historically led on long-context document tasks and instruction-following precision. GPT-5.5 has shown strong coding performance, particularly inside harnesses — Endor Labs found that GPT-5.5 in Cursor’s harness scored 23.5% on their security correctness benchmark, slightly ahead of Opus 4.7 in the same harness at 22.9%. The real-world coding performance comparison between GPT-5.5 and Claude Opus 4.7 goes deeper on what those differences look like in practice.

Pricing at scale. Token costs compound fast at enterprise volumes. GPT-5.5 runs $5 per million input tokens and $30 per million output tokens. Claude Opus 4.7 is $5 per million input and $25 per million output. For output-heavy workloads, that $5 difference per million tokens adds up. DeepSeek V4, for comparison, is $1.74 input and $3.48 output — nearly state-of-the-art capability at a fraction of the cost, which is why some enterprises are starting to look at open-weight models for cost-sensitive pipelines.

Compliance and data residency. Bedrock gives you AWS’s compliance posture for both model families. That’s a meaningful advantage over calling OpenAI or Anthropic APIs directly if you’re in a regulated industry.

Vendor relationship and reliability. Anthropic’s recent billing controversy — where Claude Code was detecting keywords like hermes.md in git commit messages and either refusing service or charging extra — raised legitimate questions about how Anthropic handles third-party harness usage. Anthropic did issue refunds and credited affected users after the incident went viral, but the underlying detection logic was the real concern. The fact that they were scanning code for harness keywords in the first place is the kind of thing that makes enterprise procurement teams nervous.

Claude on Bedrock: The Established Option

Claude has been on Bedrock for long enough that the integration is mature. If you’ve been running Claude through Bedrock for the past year, you have existing IAM roles, existing VPC configurations, existing CloudWatch logging. That operational familiarity is worth something.

Claude’s strengths are well-documented at this point. Long-context tasks — processing large documents, synthesizing research across many sources, maintaining coherence over extended conversations — are where Claude has historically performed best. The GPT-5.4 vs Claude Opus 4.6 comparison found Claude leading on document processing and instruction-following precision, though GPT has closed the gap in recent model generations.

Claude Haiku remains one of the better sub-agent models for cost-sensitive pipelines. If you’re building multi-step workflows where a frontier model handles orchestration and cheaper models handle routine subtasks, Haiku is worth benchmarking. The GPT-5.4 Mini vs Claude Haiku sub-agent comparison covers that tradeoff in detail.

The concern with Claude on Bedrock right now is less about model capability and more about Anthropic’s posture toward enterprise customers. The harness detection incident suggests Anthropic is actively trying to steer users toward its own tooling (Claude Code, Claude Cowork) and away from third-party frameworks. For enterprises that have built workflows on top of frameworks like Hermes or Open Claw, that’s a real operational risk. If your code mentions the wrong string in a commit message, you might get throttled or billed incorrectly.

Anthropic is also navigating a complicated moment with the US government. The company was designated a supply chain risk, and while the White House was reportedly working to unwind that designation, the situation remains unresolved. For government-adjacent enterprises, that uncertainty matters.

OpenAI on Bedrock: The New Entrant with Existing Credibility

OpenAI isn’t new to enterprise. What’s new is the ability to access OpenAI models through Bedrock’s managed inference layer rather than calling the OpenAI API directly. That distinction matters for enterprises that have standardized on AWS for compliance, billing consolidation, and data governance.

The models available through Bedrock now include OpenAI’s current lineup plus Codex and managed agents. Codex in particular is worth paying attention to. OpenAI has been actively expanding Codex beyond developer use cases — the recent update added task-type personalization (finance, product, marketing, operations, sales, data science, design) and a simplified UI for non-technical users. The bet OpenAI is making is that one interface for everyone is better than splitting technical and non-technical work into separate products, which is the approach Anthropic took with Claude Code vs Claude Cowork.

For coding-heavy workloads, the harness performance data is relevant. The Endor Labs benchmark found that GPT-5.5 in Cursor’s harness outperformed both GPT-5.5 in its native Codex harness and Opus 4.7 in its native Claude Code harness on security correctness. The implication is that model choice and harness choice are increasingly intertwined — you can’t evaluate GPT-5.5 in isolation from the runtime it’s operating in. For teams building on top of Bedrock’s managed agent infrastructure, this is worth testing empirically rather than assuming.

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

remy.msagent.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

The pricing picture for OpenAI on Bedrock will depend on what AWS charges for managed inference on top of OpenAI’s base rates. Historically, Bedrock adds a small markup for the managed layer. That’s a known cost for Claude; for OpenAI it’s worth confirming before committing to a production architecture.

One thing that’s genuinely uncertain: OpenAI’s distribution strategy is now explicitly multi-cloud. The AWS deal came within 24 hours of the Microsoft exclusivity clause being removed. Don’t be surprised if OpenAI models appear on Google Cloud next. For enterprises evaluating long-term vendor relationships, OpenAI’s willingness to distribute broadly is a positive signal — it reduces the risk of being locked into a single cloud provider’s preferred model.

Where Agent Infrastructure Fits Into This Decision

Both Claude and OpenAI now offer managed agents on Bedrock. This is worth treating as a separate evaluation from the base models.

The shift toward harness-as-a-service — where the agent loop, tool dispatch, sandboxing, and error handling are pre-built and managed — changes the calculus for enterprise buyers. You’re no longer just choosing a model; you’re choosing a runtime. Anthropic’s managed agents and OpenAI’s managed agents on Bedrock will have different capabilities, different pricing, and different integration patterns with the rest of AWS.

For teams building multi-model workflows, platforms like MindStudio handle this orchestration layer: 200+ models, 1,000+ integrations, and a visual builder for chaining agents and workflows. That kind of model-agnostic approach becomes more valuable as the number of viable frontier models on Bedrock increases — you can swap the underlying model without rebuilding the workflow.

The Anthropic vs OpenAI vs Google agent strategy comparison covers how each lab is thinking about the agentic layer differently, which is relevant context for evaluating their managed agent offerings on Bedrock.

Verdict: Which to Choose and When

Use Claude on Bedrock if:

You have existing Bedrock infrastructure and operational runbooks built around Claude. The migration cost of switching is real.
Your primary use cases are long-context document processing, research synthesis, or tasks requiring precise instruction-following over extended context windows.
You’re using Claude Haiku as a sub-agent model in cost-sensitive pipelines — Haiku’s price-to-capability ratio for routine tasks remains strong.
You’re not using third-party harnesses like Hermes or Open Claw in your Claude Code workflows, which avoids the harness detection issue entirely.

Use OpenAI on Bedrock if:

You’ve been using OpenAI’s API directly and want to consolidate billing and compliance under AWS without changing your model family.
Your primary use cases are coding, code review, or agentic tasks where GPT-5.5’s performance in coding harnesses is relevant.
You want access to Codex’s expanding non-developer capabilities (finance, operations, marketing workflows) through a managed AWS interface.
You’re building new infrastructure from scratch and don’t have existing Claude integrations to preserve.

Use both if:

You’re running multi-model workflows where different models handle different task types. The GPT-5.4 vs Claude Opus 4.6 vs Gemini 3.1 Pro benchmark comparison shows that no single model dominates across all task categories. A routing layer that sends document tasks to Claude and coding tasks to GPT is a legitimate architecture.

Consider open-weight models if:

Token costs are a primary constraint. DeepSeek V4 at $1.74/$3.48 per million tokens is close enough to frontier capability for many enterprise use cases — document summarization, pattern extraction, customer support — that the cost difference is hard to justify for high-volume pipelines.

Other agents start typing. Remy starts asking.

YOU SAID "Build me a sales CRM."

REMY ASKS

01 DESIGN Should it feel like Linear, or Salesforce?

02 UX How do reps move deals — drag, or dropdown?

03 ARCH Single team, or multi-org with permissions?

Scoping, trade-offs, edge cases — the real work. Before a line of code.

The Operational Question Nobody Asks Until It’s Too Late

Model capability comparisons are useful, but enterprise AI decisions often hinge on operational factors that don’t show up in benchmarks.

When you’re building production applications that compile a spec into a full-stack deployment — the approach tools like Remy take, where annotated markdown becomes a TypeScript backend, SQLite database, auth, and live deployment — the model you use for code generation matters less than the reliability of the inference endpoint and the predictability of the billing. A model that’s 5% better on SWE-bench but has unpredictable rate limits or billing surprises is worse for production use than a slightly less capable model with consistent behavior.

That’s the real reason the Anthropic harness detection incident matters beyond the immediate refunds. Enterprise buyers need to trust that their inference provider won’t change behavior based on what’s in their codebase. Bedrock’s managed layer adds a degree of insulation from that kind of provider-side behavior change, which is one argument for using either model through Bedrock rather than direct API access.

The competitive pressure is also worth tracking. AWS reported 28% year-over-year growth for Q1 2026, and Andy Jassy noted that demand for Trainium is so high they’re considering selling racks. In a token-scarce environment, having multiple frontier models available through a single managed inference layer gives enterprises more flexibility to route workloads based on availability and cost — not just capability.

The default choice for Bedrock users was Claude. That default is gone now. Whether that’s good or bad depends entirely on what you’re building.