Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Video Generation Model

Gemini Omni Flash

Google's high-performance multimodal model for fast video generation, conversational editing, and cinematic control with native audio synthesis.

Publisher Google
Type Video
Context Window 5,000 tokens
Training Data June 2026
Price Free/second
LATESTMULTIMODALEDITING

Gemini Omni Flash

**Gemini Omni Flash** is Google's high-performance multimodal model designed for high-speed video generation, editing, and cinematic control. Unlike traditional video models, Omni Flash is built on native multimodality, processing text, image, audio, and video simultaneously to deliver cohesive, consistent, and controllable output. ### Key Capabilities - **Native Multimodality**: Processes text, image, audio, and video inputs simultaneously for richer, more coherent generations - **Conversational Editing**: Iteratively refine and edit videos through natural language conversation using the Interactions API. The model preserves elements you don't mention while applying targeted edits - **World Knowledge**: Combines an understanding of physics with Gemini's broad knowledge of history, science, and cultural context, bridging photorealism with meaningful storytelling - **Text-to-Video**: Generate videos with synchronized audio from text descriptions, including scene details, camera movement, lighting, and mood - **Image-to-Video**: Bring product shots, illustrations, or photographs to life with motion - **Subject References**: Use reference images to incorporate specific subjects, characters, or styles into generated videos - **Stateful Editing**: Build on previous generations using `previous_interaction_id` to maintain video context across turns - **Video Upload Editing**: Upload your own videos via the Files API to apply transformations like style changes, object additions, or scene modifications - **Timing Control**: Use natural language or timecode syntax (e.g., `[0-3s]`) to control event timing within videos - **Text Rendering**: Render readable, correctly spelled text within generated videos ### Technical Details All generated videos include **SynthID watermarking** for invisible provenance verification. The model supports English fully, with results varying for other languages. Aspect ratios supported are 16:9 (landscape) and 9:16 (portrait). Gemini Omni Flash is ideal for creators, marketers, and developers who need fast, controllable video generation with the ability to iteratively refine results through conversation — without re-uploading or re-describing entire scenes.

Ready to build with Gemini Omni Flash?

Get Started Free

Parameters & options

Mode Select
Default: text-to-video
Text to VideoImage to VideoReference to VideoEdit Video
Aspect Ratio Toggle Group
Default: 16:9
Input Image Image URL

An image to use as the starting frame or guide for the video.

Reference Images Image URL Array

Provide reference images of subjects, styles, or objects to include in the video.

Input Video Video URL

Upload a video to edit.

Start building with Gemini Omni Flash

No API keys required. Create AI-powered workflows with Gemini Omni Flash in minutes — free.