Video Generation Model

Wan 2.2

Wan 2.2 is an open-source video generation model from Alibaba's Tongyi Lab that uses a pioneering Mixture-of-Experts architecture to deliver cinematic-quality text-to-video and image-to-video results.

Start Building with Wan 2.2 View All Models

Publisher

Wan

Type Video

Context Window 1,000 tokens

Training Data July 2025

Price $0.0001/video

Provider

WaveSpeed

LoRASource Image

Try Wan 2.2 →

About Wan 2.2

MoE-based open-source text and image to video

Wan 2.2 is a multimodal video generation model developed by Alibaba's Tongyi Laboratory and released in July 2025 under the Apache 2.0 license. It is the first video diffusion model to apply a Mixture-of-Experts (MoE) architecture, which splits processing between high-noise expert networks that handle overall layout and composition and low-noise expert networks that refine fine details. The model supports both text-to-video and image-to-video generation, with native bilingual prompting in English and Chinese. It is available in a 5B parameter variant suited for consumer hardware and a 14B parameter variant for higher-quality output.

Wan 2.2 was trained on a dataset expanded significantly from its predecessor, with image data increasing by 65.6% and video data by 83.2%. It includes a dedicated aesthetic fine-tuning stage informed by film industry standards, further refined through reinforcement learning to align with human visual preferences. Specialized modules — Wan-Animate and Wan-Move — allow users to animate a character from a single image or transfer motion from one video to another subject. The model is natively supported by ComfyUI and accepts LoRA adapters and source images as inputs alongside text prompts.

Capabilities

What Wan 2.2 supports

Text-to-Video Generation

Generates video clips from written text prompts, supporting both English and Chinese input natively. The 14B parameter variant targets higher visual fidelity while the 5B variant is optimized for consumer hardware.

Image-to-Video Generation

Animates a static reference image into a dynamic video clip using the I2V pipeline. Accepts an image URL as input alongside a text prompt to guide motion and style.

LoRA Support

Accepts LoRA adapter weights to customize the model's visual style or subject matter without full retraining. LoRA inputs are specified directly in the generation request.

Character Animation

The Wan-Animate module animates a character from a single source image, producing a video with natural motion from a still photo.

Motion Transfer

The Wan-Move module transfers motion patterns from one video onto a different subject, enabling pose and movement replication across subjects.

Cinematic Aesthetic Control

Provides control over lighting, color grading, lens composition, and camera movement through text prompts. Aesthetic fine-tuning was informed by film industry standards and refined with reinforcement learning.

Seed-Based Reproducibility

Accepts a seed value as an input parameter, allowing users to reproduce identical outputs or systematically explore variations from a fixed starting point.

MoE Architecture

Uses a Mixture-of-Experts architecture that routes work between high-noise experts for layout and low-noise experts for detail refinement within a single diffusion model.

Ready to build with Wan 2.2?

Get Started Free

FAQ

Common questions about Wan 2.2

What is the context window for Wan 2.2?

Wan 2.2 has a context window of 1,000 tokens, which governs the length and complexity of text prompts it can process in a single generation request.

What model sizes are available for Wan 2.2?

Wan 2.2 is available in two sizes: a 5B parameter version designed for efficient use on consumer hardware and a 14B parameter version intended for higher-quality output. Both are available on Hugging Face under the Apache 2.0 license.

Is Wan 2.2 free to use commercially?

Yes. Wan 2.2 is released under the Apache 2.0 license, which permits free commercial use. The model weights are publicly available on Hugging Face.

What input types does Wan 2.2 accept?

Wan 2.2 accepts text prompts, image URLs (for image-to-video generation), LoRA adapter weights, configurable select options, and a seed value for reproducibility.

When was Wan 2.2 trained and released?

Wan 2.2 was released in July 2025 by Alibaba's Tongyi Laboratory. Its training data includes an image dataset 65.6% larger and a video dataset 83.2% larger than those used for its predecessor, Wan 2.1.

Does Wan 2.2 work with ComfyUI?

Yes. Wan 2.2 has native support in ComfyUI. Official tutorials and workflow documentation are available at docs.comfy.org.

Community Discussion

What people think about Wan 2.2

Community reception to Wan 2.2 on Reddit has been notably positive, with users highlighting the model's motion quality and its compatibility with third-party tools like SVI 2.0 Pro and Time-to-Move. The most upvoted thread, with over 6,000 upvotes, showcases motion transfer results, while a thread about seamless long-form video generation at 1280x720 over 20 seconds attracted over 2,100 upvotes and nearly 400 comments.

Users frequently discuss Wan 2.2 in the context of open-source workflows and ComfyUI pipelines, with particular interest in the Wan-Animate module for single-image character animation. No significant technical limitations or quality concerns dominate the threads, though discussions reflect that generation times and hardware requirements remain relevant considerations for longer or higher-resolution outputs.

r/StableDiffusion 532 pts 67 comments

Z-Image with Wan 2.2 Animate is my wet dream

r/StableDiffusion 2,171 pts 398 comments

SVI 2.0 Pro for Wan 2.2 is amazing, allowing infinite length videos with no visible transitions. This took only 340 seconds to generate, 1280x720 continuous 20 seconds long video, fully open source. Someone tell James Cameron he can get Avatar 4 done sooner and cheaper.

r/StableDiffusion 6,034 pts 175 comments

Time-to-Move + Wan 2.2 Test

View more discussions →

Resources