Text Generation Model

Grok 4

xAI's most powerful reasoning model, trained with massive-scale reinforcement learning to achieve world-leading performance on the hardest academic and scientific benchmarks.

Start Building with Grok 4 View All Models

Publisher

X.ai

Type Text

Context Window 256,000 tokens

Training Data July 2025

Input $3.00/MTok

Output $15.00/MTok

Try Grok 4 →

About Grok 4

Large-scale reasoning across science, math, and code

Grok 4 is a text generation model developed by xAI, released on July 9, 2025, and trained using reinforcement learning on xAI's 200,000-GPU Colossus cluster. It features a 256,000-token context window and was built with a 6x improvement in compute efficiency over its predecessor, with verifiable training data expanded well beyond mathematics and coding. The model is designed for tasks requiring deep reasoning, including expert-level problems in science, mathematics, and software development.

What distinguishes Grok 4 is its native tool use — it was trained to autonomously operate a code interpreter and web browser, selecting its own search queries to produce thorough answers. It also integrates real-time web search and X (Twitter) search, including keyword, semantic, and media search. A variant called Grok 4 Heavy runs multiple reasoning agents in parallel at inference time to handle the most demanding problems, and it was the first model to score above 50% on the Humanity's Last Exam benchmark. Grok 4 is available to SuperGrok and Premium+ subscribers on grok.com and through the xAI API.

Capabilities

What Grok 4 supports

Advanced Reasoning

Applies multi-step reasoning to expert-level problems in science, mathematics, and coding, trained via reinforcement learning at scale on xAI's 200,000-GPU Colossus cluster.

Native Tool Use

Autonomously selects and operates tools such as a code interpreter and web browser, choosing its own search queries to construct thorough, grounded answers.

Real-Time Web Search

Integrates live web search and X (Twitter) search — including keyword, semantic, and media search — to retrieve up-to-date information during a response.

Long Context Window

Supports a 256,000-token context window, enabling processing of lengthy documents, codebases, or multi-turn conversations in a single request.

Math & Science Problem Solving

Achieves 61.9% on USAMO 2025 olympiad math proofs (Grok 4 Heavy) and 50.7% on the Humanity's Last Exam text-only subset, as reported by xAI.

Parallel Agent Reasoning

Grok 4 Heavy runs multiple reasoning agents simultaneously at test time, allowing it to tackle problems that benefit from parallel exploration of solution paths.

Code Generation

Generates, debugs, and explains code across common programming languages, with access to a built-in code interpreter for execution and verification.

Agentic Task Execution

Handles multi-step autonomous tasks, achieving high scores on Vending-Bench, a benchmark designed to evaluate agentic decision-making over extended task sequences.

Ready to build with Grok 4?

Get Started Free

Performance

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark	What it tests	Score
MMLU-Pro	Expert knowledge across 14 academic disciplines	86.6%
GPQA Diamond	PhD-level science questions (biology, physics, chemistry)	87.7%
MATH-500	Undergraduate and competition-level math problems	99.0%
AIME 2024	American math olympiad problems	94.3%
LiveCodeBench	Real-world coding tasks from recent competitions	81.9%
HLE	Questions that challenge frontier models across many domains	23.9%
SciCode	Scientific research coding and numerical methods	45.7%
AIME 2025	American math olympiad problems (2025)	91.7%
SWE-bench Verified	Real GitHub issues requiring multi-file code fixes	69.1%

FAQ

Common questions about Grok 4

What is the context window for Grok 4?

Grok 4 supports a context window of 256,000 tokens, allowing it to process long documents, extended conversations, or large codebases in a single request.

How can I access Grok 4?

Grok 4 is available to SuperGrok and Premium+ subscribers on grok.com. Developers can also access it programmatically through the xAI API. Pricing details are listed on the xAI API Models & Pricing page.

What is Grok 4 Heavy and how does it differ from standard Grok 4?

Grok 4 Heavy is a variant that runs multiple reasoning agents in parallel at inference time, using additional compute to tackle the most demanding problems. It achieved 61.9% on USAMO 2025 and was the first model to exceed 50% on Humanity's Last Exam, according to xAI's published benchmarks.

What is the training data cutoff for Grok 4?

According to the available metadata, Grok 4's training date is listed as July 2025. However, the model also integrates real-time web search and X (Twitter) search, which allows it to retrieve information beyond its training cutoff during a conversation.

What hardware was used to train Grok 4?

Grok 4 was trained on xAI's Colossus cluster, which consists of 200,000 GPUs. xAI reports a 6x improvement in compute efficiency compared to prior training runs.

What benchmarks has Grok 4 been evaluated on?

According to xAI, Grok 4 scored 50.7% on the Humanity's Last Exam text-only subset, 15.9% on ARC-AGI V2, and 61.9% on USAMO 2025 math proofs (in its Grok 4 Heavy configuration). It also achieved high scores on Vending-Bench for agentic task performance.

Community Discussion

What people think about Grok 4

Community discussion ahead of Grok 4's release was largely anticipatory, with threads citing xAI engineers claiming it would represent a larger generational leap than Grok 3 was over Grok 2. After release, discussion focused on benchmark results, particularly its performance on Humanity's Last Exam and ARC-AGI V2.

A widely circulated thread raised concerns about content moderation after users reported the model producing inappropriate outputs, reflecting ongoing community scrutiny of safety guardrails. Discussions around Grok 4 Fast benchmarks and a rumored future Grok 4.20 release also generated engagement, indicating active interest in xAI's development roadmap.

r/singularity 330 pts 385 comments

xAI employee bragging about upcoming release of grok 4

r/singularity 257 pts 284 comments

xAI Engineer: "Grok 4 is coming, and its going to be a bigger jump from grok 3 than grok 3 was from 2."

r/singularity 237 pts 98 comments

xAI releases details and performance benchmarks for Grok 4 Fast

r/singularity 138 pts 88 comments

xAI to launch Grok 4.20 by Christmas

r/ChatGPT 5,222 pts 217 comments

Grok says its surname is Hitler

View more discussions →

Resources