Wan 2.7
Alibaba's unified video generation model supporting text-to-video, image-to-video, reference-to-video, and natural-language video editing — with multi-shot storytelling, native audio, and up to 15 seconds of 1080p output.
Wan 2.7
**Wan 2.7** is Alibaba's latest video generation model family, available through the DashScope API. It unifies four powerful video creation workflows into a single system: text-to-video, image-to-video, reference-to-video, and natural-language video editing — with intelligent shot scheduling, native audio generation, and highly realistic dynamic rendering. ### Generation Modes - **Text to Video**: Generates high-quality videos directly from text prompts with intelligent shot scheduling that supports multi-shot storytelling — producing multi-shot narrative videos with consistent subjects, scenes, and atmosphere, up to 15 seconds in length, with strong instruction following and improved visual fidelity. - **Image to Video**: Animates a provided first-frame image guided by a text prompt. Supports stable multi-speaker dialogue with natural, realistic vocal timbres and multi-camera storytelling. - **Reference to Video**: Uses a specified person or any object as a reference, precisely maintaining consistency of appearance and voice, and allows multi-character references for joint performances. - **Video Edit**: Performs localized or global edits to existing videos using natural-language prompts. Seamlessly replaces elements using image references and replicates complex dynamic processes, including motion, special effects, and camera movements. ### Key Capabilities - **Native audio generation**: Synchronized dialogue, vocals, and sound effects generated directly alongside video — including lip-synced speech and singing - **Multi-shot storytelling**: A single prompt can produce multi-scene narratives with automatic camera transitions and consistent characters across shots - **Output quality**: Generates video at **720p or 1080p** resolution in 16:9 or 9:16 aspect ratios, with clips up to **15 seconds** long - **Prompt enhancement**: Optional AI-powered prompt expansion for improved output quality Wan 2.7 is well suited for content creators, marketers, filmmakers, and developers who need a single flexible model for generating new videos with audio from scratch, animating still images, maintaining consistent characters across shots, or making precise edits to existing footage.
Ready to build with Wan 2.7?
Get Started FreeParameters & options
Image used as the first frame of the generated video.
The video to edit. Describe your edits in the prompt using natural language instructions. Supports both localized and global edits.
Provide reference images of people or objects to keep their appearance (and voice) consistent in the generated video. Reference them as "Character 1", "Character 2", etc. in the prompt. Supports multi-character joint performances.
Optionally provide reference images to seamlessly replace elements in the video.
Generate synchronized audio including dialogue, vocals, and sound effects alongside the video.
Multi-shot uses intelligent shot scheduling to generate multi-shot narrative videos with consistent subjects, scenes, and atmosphere.
Automatically rewrite and expand the prompt for improved output quality.
Explore similar models
Start building with Wan 2.7
No API keys required. Create AI-powered workflows with Wan 2.7 in minutes — free.