Video Generation Model

Grok Imagine

X.ai's fast, native text-to-video and image-to-video generation model with built-in audio, multiple aspect ratios, and flexible creative modes.

Start Building with Grok Imagine View All Models

Publisher

X.ai

TypeVideo

Context Window5,000 tokens

Training DataAugust 2025

Price$0.05/second

SOURCE IMAGESOURCE VIDEO

Try Grok Imagine →

About Grok Imagine

Text and image to video with native audio

Grok Imagine Video is a video generation model developed by X.ai, capable of converting text prompts or static images into short video clips with synchronized audio. It launched in August 2025 and reached a major 1.0 release in February 2026. The model runs on X.ai's proprietary Aurora autoregressive engine, trained on 110,000 NVIDIA GB200 GPUs, and generates 720p video at 24 fps with clip lengths between 6 and 15 seconds.

What sets Grok Imagine Video apart is its built-in audio generation, which produces character dialogue, background music, and sound effects alongside the visuals without requiring separate post-production. It supports seven aspect ratios — including 16:9, 9:16, and 1:1 — and offers three creative modes: Normal, Fun, and Spicy. Generation typically completes in around 30 seconds, making it well suited for social media creators, marketers, and content teams that need fast turnaround on short-form video.

Capabilities

What Grok Imagine supports

Text-to-Video

Generates short video clips from a text prompt, producing 720p output at 24 fps with clip lengths ranging from 6 to 15 seconds.

Image-to-Video

Animates a static input image into a video clip, accepting image URLs as a direct input type.

Native Audio Generation

Automatically generates synchronized audio — including dialogue, background music, and sound effects — as part of the video output without separate editing.

Multiple Aspect Ratios

Supports seven aspect ratios (16:9, 9:16, 4:3, 3:4, 2:3, 3:2, and 1:1), selectable via the model's select input type.

Creative Mode Selection

Offers three generation modes — Normal, Fun, and Spicy — allowing users to tune tone and content style per request.

Fast Generation Speed

Produces video clips in approximately 30 seconds per generation, enabling high-volume content workflows.

Video URL Input

Accepts video URLs as a direct input type, enabling workflows that reference or build on existing video assets.

Ready to build with Grok Imagine?

Get Started Free

FAQ

Common questions about Grok Imagine

What is the context window for Grok Imagine Video?

The model has a context window of 5,000 tokens, which governs the length and detail of text prompts it can process.

What resolution and frame rate does the model output?

Grok Imagine Video generates clips at 720p resolution and 24 frames per second. It does not currently support 1080p or 4K output.

How long are the video clips it produces?

Generated clips range from 6 to 15 seconds in length.

Where can I find pricing information for this model?

Pricing details are available on the X.ai models and pricing page at https://docs.x.ai/developers/models.

What is the training data cutoff for Grok Imagine Video?

According to the available metadata, the model's training date is listed as August 2025.

What input types does the model accept?

The model accepts image URLs, video URLs, select inputs (for options like aspect ratio and creative mode), and numeric inputs.

Community Discussion

What people think about Grok Imagine

Community discussion around Grok Imagine Video has been generally positive, with users noting its entry into the public API as a notable milestone and discussing its placement on benchmark leaderboards.

Some commenters have focused on its speed and accessibility relative to other video generation tools, while others have raised questions about output quality and use cases for short-form content creation.

r/singularity104 pts46 comments

xAI Grok Imagine enters public API as major benchmarks update leaderboards today

View more discussions →

Resources