Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Text Generation Model

Qwen3.5 Omni Flash

Qwen's latest omni-modal model supporting text, image, video, and audio understanding with 10+ hours of audio comprehension, 60+ input languages, and fast, fluent multimodal interaction.

Publisher Qwen
Type Text
Context Window 256,000 tokens
Training Data March 2026
Input $0.40/MTok
Output $2.20/MTok
Provider Alibaba Cloud

Qwen3.5 Omni Flash

**Qwen3.5 Omni Flash** is the latest generation of Qwen's multimodal large model, supporting text, image, audio, and audio-visual understanding and interaction. As a comprehensive evolution of Qwen3-Omni, it represents a major step forward in unified multimodal AI. ### Key Capabilities - **Extended audio understanding**: Supports over 10 hours of audio understanding in a single session - **Audio-visual comprehension**: Handles over 400 seconds of 720P (1 FPS) audio-visual understanding and dialogue - **Broad language coverage**: Accepts audio input in 60+ languages and can produce speech output in 30+ languages - **Structured audio-visual understanding**: Powerful capabilities for extracting structured information from multimedia content - **Function calling & structured output**: Supports tool use, structured (JSON) outputs, and web search - **Flexible output modalities**: Can respond with text alone or combined text and audio ### Best Use Cases Qwen3.5 Omni Flash is well-suited for text creation, voice assistants, multimedia analysis, video/audio content understanding, and any application requiring natural, fluent multimodal understanding and interaction. This version is functionally equivalent to the snapshot `qwen3.5-omni-flash-2026-03-15`.

Ready to build with Qwen3.5 Omni Flash?

Get Started Free

Parameters & options

Max Temperature 2
Max Response Size 64,000 tokens

Start building with Qwen3.5 Omni Flash

No API keys required. Create AI-powered workflows with Qwen3.5 Omni Flash in minutes — free.