Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Video Generation Model

Wan 2.5

Alibaba's open-source AI video model that generates cinematic 1080p clips with fully synchronized audio — dialogue, ambient sound, and music — all in a single step.

Publisher Wan
Type Video
Context Window 2,000 tokens
Training Data September 2025
Price Free/second
Provider WaveSpeed
Source ImageSource Audio

Open-source video generation with synchronized audio

Wan 2.5 is an open-source AI video generation model developed by Alibaba's DAMO Academy. It generates videos up to 10 seconds long at resolutions ranging from 480p to 1080p HD, with native 4K available in preview, all rendered at 24 frames per second. The model's defining characteristic is its ability to generate audio and video simultaneously in a single step — producing character dialogue with lip-sync, environmental ambient sounds, and background music directly from a text or image prompt, without requiring separate post-production audio work. It supports multiple input modes including text-to-video, image-to-video, audio-to-video, and video-to-video refinement.

Wan 2.5 is designed for content creators, filmmakers, advertisers, and developers who need production-ready video with synchronized audio. It supports cinematic camera controls such as dolly, tracking, and crane movements, as well as lighting styles, depth of field, and particle effects like rain and fire. The model handles photorealistic, anime, illustrated, and stylized visual aesthetics, and processes prompts in at least 8 languages with matching audio generation. Its open-source nature makes it accessible for local deployment and integration into custom pipelines.

What Wan 2.5 supports

Text-to-Video

Generates video clips up to 10 seconds long from a text prompt at resolutions of 480p, 720p, or 1080p HD at 24fps.

Image-to-Video

Animates a source image into a video clip, using the provided image URL as the visual starting point for generation.

Synchronized Audio Generation

Produces dialogue with lip-sync, ambient environmental sounds, and background music in a single generation step alongside the video.

Cinematic Camera Controls

Supports named camera movements including dolly, tracking, and crane shots, as well as depth of field and color grading settings specified in the prompt.

Multilingual Prompt Input

Accepts prompts in at least 8 languages and generates matching audio output in the corresponding language.

Seed Control

Accepts a seed value as an input parameter, allowing reproducible generation results for a given prompt and settings combination.

Style Flexibility

Handles photorealistic, anime, illustrated, and other stylized visual aesthetics based on prompt instructions.

Video-to-Video Refinement

Accepts an existing video as input and applies prompt-guided modifications or style changes to produce a refined output.

Ready to build with Wan 2.5?

Get Started Free

Common questions about Wan 2.5

What is the context window for Wan 2.5?

Wan 2.5 has a context window of 2,000 tokens, which governs the length and detail of the text prompt it can process for a single generation request.

What video resolutions and durations does Wan 2.5 support?

Wan 2.5 generates videos at 480p, 720p, or 1080p HD resolutions, with native 4K available in preview. Videos can be up to 10 seconds long at 24 frames per second.

Does Wan 2.5 generate audio automatically, or does it require a separate step?

Audio generation is native and simultaneous — dialogue with lip-sync, ambient sounds, and background music are all produced in a single generation step alongside the video, with no separate post-production required.

What input types does Wan 2.5 accept?

Wan 2.5 accepts text prompts, image URLs (for image-to-video), audio inputs, select parameters for configuration options, and a seed value for reproducible outputs.

Is Wan 2.5 open source, and when was it trained?

Yes, Wan 2.5 is open source and was developed by Alibaba's DAMO Academy. Its training data has a cutoff of September 2025.

What languages does Wan 2.5 support for prompts?

Wan 2.5 processes prompts in at least 8 languages and generates audio output that matches the language used in the prompt.

What people think about Wan 2.5

Community discussion around Wan 2.5 on r/StableDiffusion has been largely enthusiastic, with users expressing strong interest in the model's video quality and its native audio generation capability. Several threads accumulated hundreds of upvotes, reflecting significant anticipation for the model's open-weight release.

A recurring concern in the threads is the availability of open weights, with multiple posts specifically calling for or awaiting the release of downloadable model files for local use. Some discussion also touched on how Wan 2.5 relates to or replaces other expected releases in the Wan model line, such as VACE 2.2.

View more discussions →

Parameters & options

Resolution Select
Default: 720p
1080p720p480p
Duration Select
Default: 5
5 seconds8 seconds
Negative Prompt Text

Description of what to exclude from the video.

Seed Seed

A specific value that is used to guide the 'randomness' of the generation.

Range: -1–2147483647

Start building with Wan 2.5

No API keys required. Create AI-powered workflows with Wan 2.5 in minutes — free.