Video Generation Model

Wan 2.5

Alibaba's open-source AI video model that generates cinematic 1080p clips with fully synchronized audio — dialogue, ambient sound, and music — all in a single step.

Start Building with Wan 2.5 View All Models

Publisher

Wan

Type Video

Context Window 2,000 tokens

Training Data September 2025

Price $0.05-$0.15/second

Provider

WaveSpeed

Source ImageSource Audio

Try Wan 2.5 →

About Wan 2.5

Open-source video generation with synchronized audio

Wan 2.5 is an open-source AI video generation model developed by Alibaba's DAMO Academy. It generates videos up to 10 seconds long at resolutions ranging from 480p to 1080p HD, with native 4K available in preview, all rendered at 24 frames per second. The model's defining characteristic is its ability to generate audio and video simultaneously in a single step — producing character dialogue with lip-sync, environmental ambient sounds, and background music directly from a text or image prompt, without requiring separate post-production audio work. It supports multiple input modes including text-to-video, image-to-video, audio-to-video, and video-to-video refinement.

Wan 2.5 is designed for content creators, filmmakers, advertisers, and developers who need production-ready video with synchronized audio. It supports cinematic camera controls such as dolly, tracking, and crane movements, as well as lighting styles, depth of field, and particle effects like rain and fire. The model handles photorealistic, anime, illustrated, and stylized visual aesthetics, and processes prompts in at least 8 languages with matching audio generation. Its open-source nature makes it accessible for local deployment and integration into custom pipelines.

Capabilities

What Wan 2.5 supports

Text-to-Video

Generates video clips up to 10 seconds long from a text prompt at resolutions of 480p, 720p, or 1080p HD at 24fps.

Image-to-Video

Animates a source image into a video clip, using the provided image URL as the visual starting point for generation.

Synchronized Audio Generation

Produces dialogue with lip-sync, ambient environmental sounds, and background music in a single generation step alongside the video.

Cinematic Camera Controls

Supports named camera movements including dolly, tracking, and crane shots, as well as depth of field and color grading settings specified in the prompt.

Multilingual Prompt Input

Accepts prompts in at least 8 languages and generates matching audio output in the corresponding language.

Seed Control

Accepts a seed value as an input parameter, allowing reproducible generation results for a given prompt and settings combination.

Style Flexibility

Handles photorealistic, anime, illustrated, and other stylized visual aesthetics based on prompt instructions.

Video-to-Video Refinement

Accepts an existing video as input and applies prompt-guided modifications or style changes to produce a refined output.

Ready to build with Wan 2.5?

Get Started Free

FAQ

Common questions about Wan 2.5

What is the context window for Wan 2.5?

Wan 2.5 has a context window of 2,000 tokens, which governs the length and detail of the text prompt it can process for a single generation request.

What video resolutions and durations does Wan 2.5 support?

Wan 2.5 generates videos at 480p, 720p, or 1080p HD resolutions, with native 4K available in preview. Videos can be up to 10 seconds long at 24 frames per second.

Does Wan 2.5 generate audio automatically, or does it require a separate step?

Audio generation is native and simultaneous — dialogue with lip-sync, ambient sounds, and background music are all produced in a single generation step alongside the video, with no separate post-production required.

What input types does Wan 2.5 accept?

Wan 2.5 accepts text prompts, image URLs (for image-to-video), audio inputs, select parameters for configuration options, and a seed value for reproducible outputs.

Is Wan 2.5 open source, and when was it trained?

Yes, Wan 2.5 is open source and was developed by Alibaba's DAMO Academy. Its training data has a cutoff of September 2025.

What languages does Wan 2.5 support for prompts?

Wan 2.5 processes prompts in at least 8 languages and generates audio output that matches the language used in the prompt.

Community Discussion

What people think about Wan 2.5

Community discussion around Wan 2.5 on r/StableDiffusion has been largely enthusiastic, with users expressing strong interest in the model's video quality and its native audio generation capability. Several threads accumulated hundreds of upvotes, reflecting significant anticipation for the model's open-weight release.

A recurring concern in the threads is the availability of open weights, with multiple posts specifically calling for or awaiting the release of downloadable model files for local use. Some discussion also touched on how Wan 2.5 relates to or replaces other expected releases in the Wan model line, such as VACE 2.2.

r/StableDiffusion 234 pts 219 comments

Wan 2.5

r/StableDiffusion 287 pts 132 comments

Ask nicely for Wan 2.5 to be open source

r/StableDiffusion 82 pts 106 comments

VACE 2.2 might not come instead WAN 2.5

r/StableDiffusion 92 pts 60 comments

There was a time when I used to wait for the release of a newly announced game or the next season of my favorite series — but now, more than anything in the world, I’m waiting for the open weights of Wan 2.5.

View more discussions →

Resources