Video Generation Model

Gemini Omni Flash

Google's high-performance multimodal model for fast video generation, conversational editing, and cinematic control with native audio synthesis.

Start Building with Gemini Omni Flash View All Models

Publisher

Google

Type Video

Context Window 5,000 tokens

Training Data June 2026

Price Free/second

LATESTMULTIMODALEDITING

Try Gemini Omni Flash →

Overview

Gemini Omni Flash

**Gemini Omni Flash** is Google's high-performance multimodal model designed for high-speed video generation, editing, and cinematic control. Unlike traditional video models, Omni Flash is built on native multimodality, processing text, image, audio, and video simultaneously to deliver cohesive, consistent, and controllable output. ### Key Capabilities - **Native Multimodality**: Processes text, image, audio, and video inputs simultaneously for richer, more coherent generations - **Conversational Editing**: Iteratively refine and edit videos through natural language conversation using the Interactions API. The model preserves elements you don't mention while applying targeted edits - **World Knowledge**: Combines an understanding of physics with Gemini's broad knowledge of history, science, and cultural context, bridging photorealism with meaningful storytelling - **Text-to-Video**: Generate videos with synchronized audio from text descriptions, including scene details, camera movement, lighting, and mood - **Image-to-Video**: Bring product shots, illustrations, or photographs to life with motion - **Subject References**: Use reference images to incorporate specific subjects, characters, or styles into generated videos - **Stateful Editing**: Build on previous generations using `previous_interaction_id` to maintain video context across turns - **Video Upload Editing**: Upload your own videos via the Files API to apply transformations like style changes, object additions, or scene modifications - **Timing Control**: Use natural language or timecode syntax (e.g., `[0-3s]`) to control event timing within videos - **Text Rendering**: Render readable, correctly spelled text within generated videos ### Technical Details All generated videos include **SynthID watermarking** for invisible provenance verification. The model supports English fully, with results varying for other languages. Aspect ratios supported are 16:9 (landscape) and 9:16 (portrait). Gemini Omni Flash is ideal for creators, marketers, and developers who need fast, controllable video generation with the ability to iteratively refine results through conversation — without re-uploading or re-describing entire scenes.

Ready to build with Gemini Omni Flash?

Get Started Free

Resources