Image Generation Model

Qwen Image

Qwen-Image is a state-of-the-art image generation and editing model with exceptional text rendering capabilities, including complex Chinese character generation.

Start Building with Qwen Image View All Models

Publisher

Qwen

Type Image

Context Window 10,000 tokens

Training Data August 2025

Price $0.025/image

Provider

WaveSpeed

LoRASource Image

Try Qwen Image →

About Qwen Image

Image generation and editing with precise text rendering

Qwen-Image is an image generation and editing model developed by Alibaba's Qwen team. It accepts text prompts and source images as input and supports both text-to-image generation and a wide range of image editing tasks, including style transfer, object addition and removal, background changes, and pose manipulation. The model uses a dual-encoding architecture that processes images through both Qwen2.5-VL for semantic understanding and a VAE encoder for visual fidelity, feeding into an MMDiT backbone.

What distinguishes Qwen-Image from many other generation models is its ability to render complex text accurately within images, including multi-line layouts and logographic scripts such as Chinese characters. This capability is built using a curriculum learning strategy that progressively scales from simple to complex text rendering tasks during training. The model has been evaluated on benchmarks covering image generation, image editing, and text rendering, including GenEval, DPG, GEdit, LongText-Bench, ChineseWord, and CVTG-2K. It is well-suited for workflows that require accurate in-image typography, multilingual text, or detailed image editing from a source image.

Capabilities

What Qwen Image supports

Text-to-Image Generation

Generates images from text prompts across a wide range of artistic styles, evaluated on benchmarks including GenEval and DPG.

Image Editing

Edits source images via a reference imageUrl input, supporting style transfer, background changes, object addition, removal, replacement, and pose manipulation.

Complex Text Rendering

Renders multi-line, paragraph-level, and logographic text (including Chinese characters) within generated images, benchmarked on LongText-Bench, ChineseWord, and CVTG-2K.

LoRA Support

Accepts LoRA adapters as an input parameter, allowing fine-tuned style or subject customization to be applied at inference time.

Seed Control

Accepts a numeric seed input to enable reproducible image outputs across generation runs.

Image Understanding Tasks

Supports detection, segmentation, depth estimation, novel view synthesis, and super resolution as part of its unified architecture.

Ready to build with Qwen Image?

Get Started Free

FAQ

Common questions about Qwen Image

What is the context window for Qwen-Image?

The model has a context window of 10,000 tokens, as listed in the model metadata.

What input types does Qwen-Image accept?

Qwen-Image accepts an image URL (source image), numeric parameters for dimensions or other settings, LoRA adapter configurations, and a seed value for reproducibility.

What makes Qwen-Image's text rendering distinct?

The model uses a curriculum learning strategy that trains progressively from simple to complex text tasks, enabling accurate rendering of multi-line text and logographic scripts like Chinese characters within generated images.

What benchmarks has Qwen-Image been evaluated on?

Qwen-Image has been evaluated on GenEval and DPG for image generation; GEdit, ImgEdit, and GSO for image editing; and LongText-Bench, ChineseWord, and CVTG-2K for text rendering.

What is the training data cutoff for Qwen-Image?

The model's training date is listed as August 2025 in the model metadata.

Community Discussion

What people think about Qwen Image

Community reception for Qwen-Image has been notably positive, with users on r/StableDiffusion and r/LocalLLaMA praising its realism, text rendering accuracy, and unified generation and editing capabilities. A thread highlighting realism examples accumulated over 1,000 upvotes, and the release of version 2.0 generated significant discussion around its 7B parameter size and native 2K resolution support.

Some community members have raised questions about the availability of smaller or quantized variants, with users asking when a 7B version would be released. Content moderation behavior and NSFW output have also been discussed, and several threads reflect interest in running the model locally.

r/StableDiffusion 36 pts 85 comments

Qwen Image 2 is amazing, any idea when 7b is coming ?

r/StableDiffusion 130 pts 103 comments

Are we having another WAN moment with Qwen Image 2.0?

r/StableDiffusion 81 pts 66 comments

QWEN IMAGE EDIT 2511 can do (N)SFW by itself

r/StableDiffusion 1,012 pts 196 comments

Qwen-Image2512 is a severely underrated model (realism examples)

r/LocalLLaMA 522 pts 111 comments

Qwen-Image-2.0 is out - 7B unified gen+edit model with native 2K and actual text rendering

View more discussions →

Resources