Image Generation Model

Wan 2.5

Alibaba's open-source AI video model that generates cinematic 1080p video clips with fully synchronized audio — including dialogue, ambient sound, and music — all in a single step.

Start Building with Wan 2.5 View All Models

Publisher

Wan

Type Image

Context Window 2,000 tokens

Training Data September 2025

Price $0.03/image

Provider

WaveSpeed

Source Image

Try Wan 2.5 →

About Wan 2.5

Open-source video generation with synchronized audio

Wan 2.5 is an open-source AI video generation model developed by Alibaba's DAMO Academy. It produces video clips up to 10 seconds long at resolutions up to 1080p, and generates synchronized audio — including dialogue with lip-sync, ambient sound effects, and background music — alongside the visuals in a single generation step. The model accepts text prompts, still images, audio tracks, or existing video clips as input, and supports cinematic controls such as camera movement types, lighting styles, and depth of field specified directly in the prompt.

Wan 2.5 is designed for content creators, filmmakers, advertisers, and developers who need video output with accompanying audio without separate post-production workflows. It supports prompts and generated dialogue in at least 8 languages, and offers 480p, 720p, and 1080p as standard output resolutions with native 4K available in preview. Compared to its predecessor Wan 2.2, this version doubles the maximum video duration from 5 to 10 seconds, raises the standard resolution from 720p to 1080p, and introduces the audio generation system as an entirely new feature.

Capabilities

What Wan 2.5 supports

Image-to-Video

Animates a source image into a video clip up to 10 seconds long at resolutions up to 1080p. Accepts image URLs as direct input.

Text-to-Video

Generates video clips from natural language prompts, supporting cinematic controls like dolly shots, crane movements, and color grading specified inline.

Synchronized Audio Generation

Produces dialogue with lip-sync, environmental sound effects, and background music simultaneously with the video in a single generation step.

Multilingual Prompting

Accepts prompts and generates dialogue across at least 8 languages, enabling localized video content without separate translation workflows.

Seed Control

Accepts a numeric seed value to make generations reproducible, allowing consistent outputs when iterating on a prompt.

Resolution Selection

Supports 480p, 720p, and 1080p as standard output resolutions, with native 4K available in preview, configurable via numeric parameters.

Ready to build with Wan 2.5?

Get Started Free

FAQ

Common questions about Wan 2.5

What is the context window for Wan 2.5?

Wan 2.5 has a context window of 2,000 tokens, which applies to the text prompt input used to guide video generation.

What input types does Wan 2.5 accept?

Wan 2.5 accepts image URL arrays, text prompts, numeric parameters (such as resolution and duration settings), and a seed value for reproducibility.

Does Wan 2.5 generate audio as well as video?

Yes. Wan 2.5 generates synchronized audio — including dialogue with lip-sync, ambient sound effects, and background music — alongside the video in a single generation step, with no separate audio recording or post-production required.

What resolutions does Wan 2.5 support?

Standard output resolutions are 480p, 720p, and 1080p. Native 4K output is available in preview.

What is the training data cutoff for Wan 2.5?

According to the available metadata, Wan 2.5's training date is listed as September 2025.

Is Wan 2.5 open source?

Wan 2.5 is described as an open-source model developed by Alibaba's DAMO Academy. Community discussion on Reddit indicates that open weights availability was a topic of active interest around the time of its announcement.

Community Discussion

What people think about Wan 2.5

Reddit discussions around Wan 2.5 in the r/StableDiffusion community show considerable enthusiasm, with threads accumulating hundreds of upvotes and comments shortly after the model's announcement. Users frequently praised the model's video quality and the introduction of native audio generation as notable additions to the open-source video generation landscape.

A recurring concern in the community was the availability of open weights, with multiple threads specifically dedicated to requesting or anticipating their release. Some discussion also touched on how Wan 2.5 relates to other models in the Wan lineage, including speculation about whether VACE 2.2 would be superseded by this release.

r/StableDiffusion 234 pts 219 comments

Wan 2.5

r/StableDiffusion 290 pts 132 comments

Ask nicely for Wan 2.5 to be open source

r/StableDiffusion 84 pts 106 comments

VACE 2.2 might not come instead WAN 2.5

r/StableDiffusion 91 pts 60 comments

There was a time when I used to wait for the release of a newly announced game or the next season of my favorite series — but now, more than anything in the world, I’m waiting for the open weights of Wan 2.5.

View more discussions →

Resources