Text Generation Model

Qwen3.5 Omni Flash

Qwen's latest omni-modal model supporting text, image, video, and audio understanding with 10+ hours of audio comprehension, 60+ input languages, and fast, fluent multimodal interaction.

Start Building with Qwen3.5 Omni Flash View All Models

Publisher

Qwen

TypeText

Context Window256,000 tokens

Training DataMarch 2026

Input$0.40/MTok

Output$2.20/MTok

Provider

Alibaba Cloud

Try Qwen3.5 Omni Flash →

Overview

Qwen3.5 Omni Flash

**Qwen3.5 Omni Flash** is the latest generation of Qwen's multimodal large model, supporting text, image, audio, and audio-visual understanding and interaction. As a comprehensive evolution of Qwen3-Omni, it represents a major step forward in unified multimodal AI. ### Key Capabilities - **Extended audio understanding**: Supports over 10 hours of audio understanding in a single session - **Audio-visual comprehension**: Handles over 400 seconds of 720P (1 FPS) audio-visual understanding and dialogue - **Broad language coverage**: Accepts audio input in 60+ languages and can produce speech output in 30+ languages - **Structured audio-visual understanding**: Powerful capabilities for extracting structured information from multimedia content - **Function calling & structured output**: Supports tool use, structured (JSON) outputs, and web search - **Flexible output modalities**: Can respond with text alone or combined text and audio ### Best Use Cases Qwen3.5 Omni Flash is well-suited for text creation, voice assistants, multimedia analysis, video/audio content understanding, and any application requiring natural, fluent multimodal understanding and interaction. This version is functionally equivalent to the snapshot `qwen3.5-omni-flash-2026-03-15`.

Ready to build with Qwen3.5 Omni Flash?

Get Started Free

Resources