Wan 2.5
Alibaba's open-source AI video model that generates cinematic 1080p clips with fully synchronized audio — dialogue, ambient sound, and music — all in a single step.
Open-source video generation with synchronized audio
Wan 2.5 is an open-source AI video generation model developed by Alibaba's DAMO Academy. It generates videos up to 10 seconds long at resolutions ranging from 480p to 1080p HD, with native 4K available in preview, all rendered at 24 frames per second. The model's defining characteristic is its ability to generate audio and video simultaneously in a single step — producing character dialogue with lip-sync, environmental ambient sounds, and background music directly from a text or image prompt, without requiring separate post-production audio work. It supports multiple input modes including text-to-video, image-to-video, audio-to-video, and video-to-video refinement.
Wan 2.5 is designed for content creators, filmmakers, advertisers, and developers who need production-ready video with synchronized audio. It supports cinematic camera controls such as dolly, tracking, and crane movements, as well as lighting styles, depth of field, and particle effects like rain and fire. The model handles photorealistic, anime, illustrated, and stylized visual aesthetics, and processes prompts in at least 8 languages with matching audio generation. Its open-source nature makes it accessible for local deployment and integration into custom pipelines.
What Wan 2.5 supports
Text-to-Video
Generates video clips up to 10 seconds long from a text prompt at resolutions of 480p, 720p, or 1080p HD at 24fps.
Image-to-Video
Animates a source image into a video clip, using the provided image URL as the visual starting point for generation.
Synchronized Audio Generation
Produces dialogue with lip-sync, ambient environmental sounds, and background music in a single generation step alongside the video.
Cinematic Camera Controls
Supports named camera movements including dolly, tracking, and crane shots, as well as depth of field and color grading settings specified in the prompt.
Multilingual Prompt Input
Accepts prompts in at least 8 languages and generates matching audio output in the corresponding language.
Seed Control
Accepts a seed value as an input parameter, allowing reproducible generation results for a given prompt and settings combination.
Style Flexibility
Handles photorealistic, anime, illustrated, and other stylized visual aesthetics based on prompt instructions.
Video-to-Video Refinement
Accepts an existing video as input and applies prompt-guided modifications or style changes to produce a refined output.
Ready to build with Wan 2.5?
Get Started FreeCommon questions about Wan 2.5
What is the context window for Wan 2.5?
Wan 2.5 has a context window of 2,000 tokens, which governs the length and detail of the text prompt it can process for a single generation request.
What video resolutions and durations does Wan 2.5 support?
Wan 2.5 generates videos at 480p, 720p, or 1080p HD resolutions, with native 4K available in preview. Videos can be up to 10 seconds long at 24 frames per second.
Does Wan 2.5 generate audio automatically, or does it require a separate step?
Audio generation is native and simultaneous — dialogue with lip-sync, ambient sounds, and background music are all produced in a single generation step alongside the video, with no separate post-production required.
What input types does Wan 2.5 accept?
Wan 2.5 accepts text prompts, image URLs (for image-to-video), audio inputs, select parameters for configuration options, and a seed value for reproducible outputs.
Is Wan 2.5 open source, and when was it trained?
Yes, Wan 2.5 is open source and was developed by Alibaba's DAMO Academy. Its training data has a cutoff of September 2025.
What languages does Wan 2.5 support for prompts?
Wan 2.5 processes prompts in at least 8 languages and generates audio output that matches the language used in the prompt.
What people think about Wan 2.5
Community discussion around Wan 2.5 on r/StableDiffusion has been largely enthusiastic, with users expressing strong interest in the model's video quality and its native audio generation capability. Several threads accumulated hundreds of upvotes, reflecting significant anticipation for the model's open-weight release.
A recurring concern in the threads is the availability of open weights, with multiple posts specifically calling for or awaiting the release of downloadable model files for local use. Some discussion also touched on how Wan 2.5 relates to or replaces other expected releases in the Wan model line, such as VACE 2.2.
Wan 2.5
Ask nicely for Wan 2.5 to be open source
VACE 2.2 might not come instead WAN 2.5
There was a time when I used to wait for the release of a newly announced game or the next season of my favorite series — but now, more than anything in the world, I’m waiting for the open weights of Wan 2.5.
Parameters & options
Description of what to exclude from the video.
A specific value that is used to guide the 'randomness' of the generation.
Explore similar models
Start building with Wan 2.5
No API keys required. Create AI-powered workflows with Wan 2.5 in minutes — free.