xAI's Grok Roadmap: 7 Models in Training Now, Grok 5 at 10 Trillion Parameters — Full Timeline

xAI Is Training 7 Models Right Now — Here’s the Full Grok Roadmap Through Grok 5

Elon Musk just laid out the most specific AI release schedule he’s ever published, and you should pay attention to it. Grok 4.4 arrives in roughly two to three weeks at 1 trillion parameters. Grok 4.5 follows at 1.5 trillion parameters in four to five weeks. And sitting at the end of that runway: Grok 5, with variants at 6 trillion and 10 trillion parameters — with 7 models simultaneously in training on Colossus 2 right now.

That’s not a product vision. That’s a production schedule.

The context matters here. Grok 4.2, the model most people have actually used, runs on 500 billion parameters. Musk himself described it as “just 0.5t” and said it’s “missing some important training data.” He’s not exactly bullish on his own current product — which tells you something about where he thinks the value is.

The value, apparently, is about six months away.

Grok 4.3 Beta: The Starting Gun Nobody Noticed

Grok 4.3 beta is already live. xAI didn’t announce it with any fanfare, and most users haven’t touched it — it sits behind the Grok heavy tier, which starts at $300 per month.

That price point alone explains the low adoption. But the more important detail is what 4.3 actually represents: not a finished model, but a checkpoint. Musk noted that supplemental training has been added post-beta, meaning the model is still being improved after initial release.

TIME SPENT BUILDING REAL SOFTWARE

95%

5% Typing the code

95% Knowing what to build · Coordinating agents · Debugging + integrating · Shipping to production

Coding agents automate the 5%. Remy runs the 95%.

The bottleneck was never typing the code. It was knowing what to build.

Think of 4.3 as the first commit in a new branch. The real releases are what come next.

Grok 4.4: 1 Trillion Parameters in Two to Three Weeks

Grok 4.4 is the first major scale jump in this sequence. At 1 trillion parameters, it’s twice the size of the current public model. Musk gave a specific window: roughly two to three weeks from when he posted, with training data through early April.

Twice the parameters doesn’t automatically mean twice the capability — anyone who’s followed scaling law debates knows the relationship is more complicated than that. But the jump from 500B to 1T is meaningful, especially when paired with fresher training data.

The 4.4 release is also the first real test of xAI’s claimed velocity. If it ships on schedule, the rest of the roadmap becomes credible. If it slips, the whole timeline needs to be re-evaluated. For context on how xAI’s image generation side has been developing in parallel, see what Grok Imagine actually is and how it works.

Grok 4.5: 1.5 Trillion Parameters by Late May

Four to five weeks out sits Grok 4.5, at 1.5 trillion parameters. That’s three times the size of the current 4.2 model, arriving within roughly a month and a half of today.

The pace here is aggressive by any standard. Most frontier labs take months between major releases, and they’re typically not tripling parameter counts in consecutive drops. Musk’s framing — “some catching up to do” — suggests xAI views this sprint as a deficit-reduction exercise, not a victory lap.

Whether 4.5 actually closes the gap with GPT-5.4 or Claude Opus 4.6 depends on more than parameter count. Post-training, alignment work, and inference optimization all shape what a model actually does in practice. But the raw scale is real.

Colossus 2: Seven Models in Training Simultaneously

Here’s the detail that reframes everything else. xAI’s Colossus 2 cluster isn’t training one model at a time. It’s running seven concurrent training jobs right now.

The breakdown: Imagine V2 (the video model), two variants of 1 trillion parameters, two variants of 1.5 trillion parameters, a 6 trillion parameter model, and a 10 trillion parameter model. That’s the full Grok 4.x series and both Grok 5 variants, all in training at the same time.

This is a different kind of infrastructure bet than most labs are making. Anthropic and OpenAI have both faced compute constraints that forced program cuts and capacity rationing. Musk’s position is structurally different: Tesla’s GPU clusters, X’s data and infrastructure, SpaceX’s engineering talent, and Colossus — which was built in months, not years — all feed into xAI’s training capacity. The parallel training runs aren’t a flex. They’re the strategy.

For builders thinking about how to chain multiple models together across different capability tiers, MindStudio handles that orchestration layer — an enterprise AI platform with 200+ models, 1,000+ integrations, and a visual builder for composing agents and workflows without writing the plumbing code yourself.

The Two Grok 5 Variants: 6T and 10T

Grok 5 isn’t one model. It’s two.

Hire a contractor. Not another power tool.

Cursor, Bolt, Lovable, v0 are tools. You still run the project.
With Remy, the project runs itself.

The 6 trillion parameter variant and the 10 trillion parameter variant are both in training on Colossus 2 right now. The 10T model’s pre-training phase alone is approximately two months. That’s just pre-training — post-training, alignment, safety evaluations, inference optimization, and product integration all come after.

The 10T number is worth sitting with for a moment. Grok 4.2 is at 500B. Grok 5’s larger variant targets 10T. That’s a 20x scale increase from the model that’s currently in users’ hands. For comparison, the jump from GPT-3 to GPT-4 was roughly 10x by most estimates. A 20x jump in a single generation is not a routine upgrade.

Musk has been consistent about what he expects from Grok 5. In October 2025, he said it “will be indistinguishable from AGI.” When someone asked him directly whether one of these models would achieve AGI, he replied with two words: “Grok 5.”

That’s either the most confident product claim in AI history, or it’s going to be the most cited example of overpromising. There’s not much middle ground.

What “AGI” Actually Requires (And Why That Matters Here)

Musk’s AGI claim deserves scrutiny, and Google has actually done some of the work to make that scrutiny rigorous. Their paper, Measuring Progress Towards AGI, argues that AGI shouldn’t be treated as a single finish line that a company declares crossed. Instead, it should be measured by a broad cognitive profile: reasoning, memory, learning, attention, and problem-solving — all of them, consistently, at human-comparable levels.

Under that definition, a 10 trillion parameter model that dominates coding benchmarks but struggles with sustained reasoning or novel problem-solving doesn’t qualify. Scale is necessary but not sufficient.

This is the honest version of the question. The issue isn’t whether Grok 5 will be impressive — at 10T parameters with two months of pre-training compute behind it, it will almost certainly be impressive. The issue is whether “impressive” and “AGI” are the same thing.

Google’s framework says they’re not. And that framework is probably more useful than Musk’s two-word answer when you’re trying to evaluate what Grok 5 will actually be capable of. For a sense of where the current benchmark ceiling sits among frontier models, Claude Mythos’s 93.9% SWE-Bench score is a useful reference point for what Grok 5 would need to surpass.

The Release Timeline, Assembled

Here’s the full picture as it currently stands:

Now: Grok 4.3 beta is live, 500B parameters, $300/month heavy tier, supplemental training ongoing.

~2–3 weeks: Grok 4.4, 1 trillion parameters, training data through early April.

~4–5 weeks: Grok 4.5, 1.5 trillion parameters.

~2 months (pre-training only): Grok 5 10T variant completes pre-training. Post-training, alignment, and deployment work follows — putting a realistic public release window somewhere in late 2025 or early 2026.

Parallel: Grok 5 6T variant also in training on Colossus 2, timeline not separately specified.

The 4.4 and 4.5 releases are the near-term signal. If xAI ships them on schedule and they benchmark competitively against what OpenAI’s next frontier model delivers, the Grok 5 timeline becomes much more credible. If the 4.x releases slip or underperform, the 10T claims start to look like projection.

Why the Parallel Training Strategy Changes the Calculus

Most labs release models sequentially. Train, evaluate, release, repeat. xAI is doing something different: training across the full parameter range simultaneously, from 1T to 10T, with different use cases (the Imagine V2 video model sits alongside the language models in that seven-model queue).

The advantage of this approach is speed-to-capability. You don’t have to wait for 4.4 to finish before starting 4.5. You don’t have to wait for 4.5 to finish before starting Grok 5. The entire roadmap is in flight at once.

The risk is coordination complexity. Seven concurrent training runs means seven sets of infrastructure dependencies, seven sets of potential failure modes, and seven sets of post-training pipelines that all need to be managed. Colossus was built fast, but “built fast” and “operates reliably at scale” aren’t the same thing.

That said, xAI’s infrastructure position is genuinely unusual. Building Colossus in months rather than years is a real operational achievement, whatever you think of the models it produces. The engineering talent pipeline from SpaceX, combined with Tesla’s GPU infrastructure and X’s data assets, gives xAI a resource base that most AI labs can’t replicate.

The Stakes for the Grok 4.x Releases

The 4.4 and 4.5 models matter more than their version numbers suggest. They’re not just incremental releases — they’re proof points for the entire Grok 5 narrative.

If Grok 4.4 ships in two to three weeks and posts competitive results against current frontier models, it validates the training velocity. If Grok 4.5 follows on schedule at 1.5T and shows meaningful capability gains over 4.4, it validates the scaling approach. At that point, the 10T Grok 5 claim stops being a Musk tweet and starts being a credible development trajectory.

The inverse is also true. xAI has relatively low market share right now. Grok 4.2 at 500B parameters, with missing training data by Musk’s own admission, hasn’t moved the needle against GPT-5.4 or Claude. The 4.x series is xAI’s chance to demonstrate that the compute advantage translates into model quality, not just model size.

For developers building on top of these models, the parameter count is less important than the capability profile. A 1T model that reasons well and follows instructions reliably is more useful than a 1.5T model that’s inconsistent. The benchmarks for 4.4 and 4.5 will tell you more than the parameter counts.

When those models do arrive, the question of how to integrate them into production workflows becomes practical. Tools like Remy approach that integration problem from the spec layer — you write your application as annotated markdown, and the full-stack app (TypeScript backend, database, auth, deployment) gets compiled from it. The model powering the app becomes a configuration choice rather than an architectural dependency.

What Happens If Grok 5 Actually Delivers

Musk has set a high bar. “Indistinguishable from AGI” is not a modest claim. And the two-word answer to the AGI question — “Grok 5” — is the kind of statement that gets screenshotted and revisited.

Plans first. Then code.

PROJECTYOUR APP

SCREENS12

DB TABLES6

BUILT BYREMY

1280 px · TYP.

yourapp.msagent.ai

A · UI · FRONT END

Remy writes the spec, manages the build, and ships the app.

If Grok 5 launches and it’s genuinely beyond current frontier models on the cognitive profile that Google’s framework describes — broad reasoning, memory, learning, problem-solving, all of it — then xAI’s infrastructure bet looks prescient. Colossus 2 looks like a serious moat. The parallel training strategy looks like the right call.

If Grok 5 launches and it’s a strong but conventional language model, the narrative collapses. Not because the model would be bad, but because the expectation has been set so high that anything short of a qualitative leap reads as a miss.

The honest answer is that nobody outside xAI knows which of those outcomes is more likely. What we do know is the timeline: 4.4 in weeks, 4.5 shortly after, Grok 5 pre-training completing in roughly two months, with everything else that follows. The schedule is specific enough to hold xAI accountable to it.

That’s more than most AI roadmaps give you. Whether it’s enough to deliver what Musk is promising is a different question entirely — one that gets answered sometime before the end of 2025, if the timeline holds.