Claude API Token Limits Just Jumped 10x — Every Tier's New Numbers Explained
Tier 1 input tokens jumped from 30k to 500k per minute. Here's the full breakdown of every Claude API tier's new limits.
The Numbers You Actually Need: Claude API Rate Limits, Every Tier, Right Now
Tier 1 input tokens per minute just went from 30,000 to 500,000. That’s not a gradual rollout or a beta — it’s effective immediately, and every tier got a comparable jump. Tier 2 moved from 450k to 2M. Tier 3 from 800k to 5M. Tier 4 from 2M to 10M. If you’ve been architecting around the old limits, your assumptions are wrong in the best possible way.
The output side is where things get dramatic for certain workloads. Tier 1 output tokens per minute went from 8,000 to 80,000 — a 10x increase in a single day. If you were running parallel agents and hitting output ceilings constantly, that wall just moved.
This post is specifically about the API rate limit changes. The Claude Code session limit doubling and the peak-hours throttling removal are real and worth knowing about, but they’re covered elsewhere. What matters here is what changed for API callers — the people building production systems, running batch jobs, and orchestrating multi-agent pipelines against the Opus endpoints.
What These Numbers Mean for Real Workloads
Before the change, Tier 1’s 30k input tokens per minute meant roughly 20–22 pages of context per minute. At 500k, you’re looking at something closer to 370 pages per minute. That’s not a marginal improvement — it’s a different category of what’s possible.
The output limit increase matters even more for certain patterns. At 8,000 output tokens per minute on Tier 1, running five agents in parallel each generating substantial responses was essentially impossible without aggressive rate-limit handling and queuing. At 80,000, you can actually do it. Multi-agent workflows that were theoretically correct but practically unusable become viable.
For Tier 4 API users — typically large enterprise accounts — the jump from 2M to 10M input tokens per minute means you can now saturate Opus with context at a scale that previously required careful batching and scheduling. If you’re running document analysis pipelines or large-scale code review systems, the throughput ceiling just moved by 5x.
The reason this happened at all is worth understanding. Anthropic’s head of growth Amal Avisari explained the prioritization directly: only a very small percentage of users were hitting weekly limits, but a much larger portion were hitting the 5-hour session limit. So they fixed the session limit first. The API rate limit increase came alongside the SpaceX compute deal — Anthropic now has full use of XAI’s Colossus 1 data center in Memphis, Tennessee: 220,000 Nvidia GPUs (mostly H100s) running at 300 MW capacity. That’s the compute that made these numbers possible.
What You Need to Check Before Rebuilding Anything
You need to know your current tier before you redesign anything around the new limits.
Your tier is determined by your account’s API usage history and spend. Anthropic’s usage tiers documentation defines the thresholds. Tier 1 is where new API accounts start. Tier 4 requires significant historical spend. If you’re not sure which tier you’re on, check the Anthropic Console under your account settings — it’s listed there.
You also need to be on an Opus model endpoint for these specific limits to apply. The announcement specifies these rate limit increases are for Claude Opus models. If you’re calling claude-sonnet-4-5 or claude-haiku-3-5, the new numbers don’t apply to those endpoints in the same way.
One more prerequisite: if you’ve been using Open Router to route around Anthropic’s paid API, these rate limit changes only apply to direct Anthropic API calls. Open Router has its own rate limiting layer on top.
Mapping the New Limits to Your Architecture
Step 1: Audit your current rate limit handling code.
If you built retry logic and backoff around the old limits, you probably have conservative assumptions baked in. Search your codebase for 429 handling, rate_limit_error catches, and any hardcoded delays. You now have significantly more headroom before hitting those paths.
A common pattern that becomes worth revisiting: if you were serializing requests that could have been parallel because you were worried about hitting 30k input tokens per minute on Tier 1, you can now parallelize more aggressively. Five concurrent requests each sending 50k tokens of context is now well within Tier 1 limits. Before today, that would have immediately rate-limited you.
Now you have a list of places in your code where you were artificially conservative.
Step 2: Recalculate your parallel agent capacity.
One coffee. One working app.
You bring the idea. Remy manages the project.
The output limit increase is the binding constraint for most multi-agent patterns. At the old 8,000 output tokens per minute on Tier 1, if each agent response averaged 400 tokens, you could sustain about 20 completions per minute. At 80,000, that’s 200 completions per minute — assuming your input tokens also fit, which at 500k/min they almost certainly do for most agent patterns.
For Tier 4, the math gets more interesting. 10M input tokens per minute means you could theoretically feed 10,000 requests of 1,000 tokens each in a single minute. The practical limit becomes your application architecture and network, not Anthropic’s API.
Now you have a realistic ceiling for your parallel workload.
Step 3: Revisit workflows you abandoned because of rate limits.
If you tried building an Opus-based pipeline six months ago and gave up because you kept hitting limits, the constraint may genuinely be gone. This is worth a direct retest rather than assuming the old experience still applies.
Specifically worth revisiting: anything using the 1M context window. At the old input token limits, sending a 1M token context meant consuming your entire per-minute budget in a single request, making it impractical for anything that needed more than one call per minute. At 500k/min on Tier 1 and 10M/min on Tier 4, large-context workflows become much more usable. The context management patterns that matter for Claude Code apply here too — compacting and managing context efficiently still matters even with higher throughput limits.
Now you have a list of previously-abandoned workflows worth retesting.
Step 4: Update your cost and throughput projections.
Higher rate limits don’t change per-token pricing, but they do change what’s achievable within a given time window. If you were previously rate-limited to processing X documents per hour, you can now project a significantly higher throughput ceiling. Reprice your cost-per-unit estimates accordingly — the economics of some batch processing jobs that looked marginal may now look different.
For teams building production agents on Opus, the comparison between Opus 4.7 and 4.6 is worth reading alongside this — higher rate limits are only useful if you’re on the right model version for your task.
Now you have updated projections that reflect actual current limits.
Step 5: Decide whether to stay on Tier 1 or push toward higher tiers.
If you’re on Tier 1 and the new 500k input / 80k output limits are sufficient for your use case, you may not need to do anything. If you’re building something that needs Tier 4’s 10M input tokens per minute, you need to have the API spend history to qualify — that’s not something you can shortcut.
The practical path to higher tiers is straightforward: use the API, spend money, wait for Anthropic to automatically upgrade your tier based on usage. There’s no application process. If you need to accelerate this, the only lever is increasing your API usage volume.
Now you know which tier you’re targeting and why.
Where Things Still Break
The rate limits increased, but a few failure modes are worth knowing about.
You can still hit context window limits independently of rate limits. Opus 4’s context window is 200k tokens. If you’re sending 500k tokens of input per minute, you’re sending multiple requests — each individual request still can’t exceed the context window. This is a different constraint from rate limits and hasn’t changed.
Remy doesn't write the code. It manages the agents who do.
Remy runs the project. The specialists do the work. You work with the PM, not the implementers.
Burst behavior is different from sustained throughput. Rate limits are typically enforced as rolling windows. Sending 500k tokens in the first second of a minute and then nothing for the remaining 59 seconds may still trigger rate limiting depending on how Anthropic implements the window. If you’re seeing unexpected 429s despite being under the per-minute limit, add a small delay between requests to smooth out burst patterns.
Tier upgrades aren’t instant. If you just created an API account and you’re expecting Tier 4 limits, you won’t have them. Tier assignment is based on historical usage. New accounts start at Tier 1 regardless of the new higher limits at that tier.
The output limit is still a real constraint for very long responses. At 80,000 output tokens per minute on Tier 1, a single response that generates 80,000 tokens (roughly 60,000 words) would consume your entire per-minute output budget. This is an edge case for most workloads, but if you’re generating very long documents or code files, it’s worth knowing.
These limits apply to Opus endpoints specifically. If your application routes between models dynamically — say, using Opus for complex reasoning and Haiku for simple classification — the rate limits for each model endpoint are separate. Don’t assume the Opus limit increase applies to your Haiku calls.
If you’re building orchestration that spans multiple models and need a visual way to manage those routing decisions, MindStudio handles this kind of multi-model coordination — 200+ models with a visual builder for chaining agents and workflows, which is useful when you’re trying to optimize which model handles which part of a pipeline without writing all the routing logic yourself.
Where to Take This Further
The rate limit increase is a symptom of a larger shift. Dario Amodei said at the Code with Claude event that Anthropic planned for 10x growth per year and saw 80x annualized growth in Q1 2026. The compute deals that followed — Colossus 1, plus agreements with Amazon (up to 5 GW), Google/Broadcom (5 GW beginning 2027, reported at $200B over five years), and Microsoft/Nvidia ($30B Azure capacity) — are Anthropic’s response to that gap between projected and actual demand.
The practical implication: more rate limit increases are likely as that compute comes online. Amal Avisari said explicitly that the weekly limits are next on the list to address once the compute is available. If you’re building systems today, design them to take advantage of higher limits rather than working around lower ones.
For teams building on top of Opus specifically, the Anthropic compute shortage context is useful background — understanding why the limits were where they were helps you understand how durable the new limits are likely to be.
One pattern worth building toward: if you’ve been using Opus plan mode to save tokens as a workaround for rate limits, you can now be less conservative about that. The workaround made sense when limits were tight. With 10x more output throughput on Tier 1, the calculus changes.
The broader question for teams building production systems on Claude is whether to treat these limits as a stable foundation or a moving target. My read is that they’re a moving target in the upward direction — Anthropic has committed to significant compute capacity coming online through 2027, and the pattern of the last few months has been limits going up, not down. Build for that trajectory.
Hire a contractor. Not another power tool.
Cursor, Bolt, Lovable, v0 are tools. You still run the project.
With Remy, the project runs itself.
For teams using Opus in full-stack applications — where the AI is one component of a larger system with a database, auth, and frontend — the abstraction level of how you build that surrounding infrastructure matters. Remy takes a spec-driven approach: you write annotated markdown describing your application, and it compiles a complete TypeScript backend, SQLite database, auth, and deployment from that spec. The spec is the source of truth; the generated code is derived output. That’s a different model than scaffolding by hand, and it pairs well with AI components that are themselves getting more capable and higher-throughput.
The rate limit numbers are the news. The real story is that Anthropic now has the compute to back them up.