Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Image Generation Model

Wan 2.5

Alibaba's open-source AI video model that generates cinematic 1080p video clips with fully synchronized audio — including dialogue, ambient sound, and music — all in a single step.

Publisher Wan
Type Image
Context Window 2,000 tokens
Training Data September 2025
Price Free/image
Provider WaveSpeed
Source Image

Open-source video generation with synchronized audio

Wan 2.5 is an open-source AI video generation model developed by Alibaba's DAMO Academy. It produces video clips up to 10 seconds long at resolutions up to 1080p, and generates synchronized audio — including dialogue with lip-sync, ambient sound effects, and background music — alongside the visuals in a single generation step. The model accepts text prompts, still images, audio tracks, or existing video clips as input, and supports cinematic controls such as camera movement types, lighting styles, and depth of field specified directly in the prompt.

Wan 2.5 is designed for content creators, filmmakers, advertisers, and developers who need video output with accompanying audio without separate post-production workflows. It supports prompts and generated dialogue in at least 8 languages, and offers 480p, 720p, and 1080p as standard output resolutions with native 4K available in preview. Compared to its predecessor Wan 2.2, this version doubles the maximum video duration from 5 to 10 seconds, raises the standard resolution from 720p to 1080p, and introduces the audio generation system as an entirely new feature.

What Wan 2.5 supports

Image-to-Video

Animates a source image into a video clip up to 10 seconds long at resolutions up to 1080p. Accepts image URLs as direct input.

Text-to-Video

Generates video clips from natural language prompts, supporting cinematic controls like dolly shots, crane movements, and color grading specified inline.

Synchronized Audio Generation

Produces dialogue with lip-sync, environmental sound effects, and background music simultaneously with the video in a single generation step.

Multilingual Prompting

Accepts prompts and generates dialogue across at least 8 languages, enabling localized video content without separate translation workflows.

Seed Control

Accepts a numeric seed value to make generations reproducible, allowing consistent outputs when iterating on a prompt.

Resolution Selection

Supports 480p, 720p, and 1080p as standard output resolutions, with native 4K available in preview, configurable via numeric parameters.

Ready to build with Wan 2.5?

Get Started Free

Common questions about Wan 2.5

What is the context window for Wan 2.5?

Wan 2.5 has a context window of 2,000 tokens, which applies to the text prompt input used to guide video generation.

What input types does Wan 2.5 accept?

Wan 2.5 accepts image URL arrays, text prompts, numeric parameters (such as resolution and duration settings), and a seed value for reproducibility.

Does Wan 2.5 generate audio as well as video?

Yes. Wan 2.5 generates synchronized audio — including dialogue with lip-sync, ambient sound effects, and background music — alongside the video in a single generation step, with no separate audio recording or post-production required.

What resolutions does Wan 2.5 support?

Standard output resolutions are 480p, 720p, and 1080p. Native 4K output is available in preview.

What is the training data cutoff for Wan 2.5?

According to the available metadata, Wan 2.5's training date is listed as September 2025.

Is Wan 2.5 open source?

Wan 2.5 is described as an open-source model developed by Alibaba's DAMO Academy. Community discussion on Reddit indicates that open weights availability was a topic of active interest around the time of its announcement.

What people think about Wan 2.5

Reddit discussions around Wan 2.5 in the r/StableDiffusion community show considerable enthusiasm, with threads accumulating hundreds of upvotes and comments shortly after the model's announcement. Users frequently praised the model's video quality and the introduction of native audio generation as notable additions to the open-source video generation landscape.

A recurring concern in the community was the availability of open weights, with multiple threads specifically dedicated to requesting or anticipating their release. Some discussion also touched on how Wan 2.5 relates to other models in the Wan lineage, including speculation about whether VACE 2.2 would be superseded by this release.

View more discussions →

Parameters & options

Width Number
Default: 1024 Range: 768–1440
Height Number
Default: 1024 Range: 768–1440
Negative Prompt Text

Description of what to exclude from the video.

Seed Seed

A specific value that is used to guide the 'randomness' of the generation.

Range: -1–2147483647

Start building with Wan 2.5

No API keys required. Create AI-powered workflows with Wan 2.5 in minutes — free.