Image Generation Model

Z Image Turbo

Alibaba's Tongyi Lab Z-Image is a 6-billion-parameter image generation foundation model that delivers state-of-the-art visual quality and prompt coherence using an efficient single-stream diffusion transformer architecture.

Start Building with Z Image Turbo View All Models

Publisher

Qwen

Type Image

Context Window 10,000 tokens

Training Data November 2024

Price $0.0001/image

Provider

WaveSpeed

LoRASource Image

Try Z Image Turbo →

About Z Image Turbo

Fast diffusion image generation with LoRA support

Z Image Turbo is a text-to-image generation model developed by Alibaba's Tongyi-MAI lab, built on a single-stream diffusion transformer architecture with 6 billion parameters. It is the distilled, few-step variant of the Z-Image foundation model, designed to produce high-quality images at faster inference speeds without significant quality degradation. The model incorporates a Reinforcement Learning from Human Feedback (RLHF) pipeline using DPO and GRPO stages to align outputs with human aesthetic preferences, and includes a built-in prompt enhancer with a reasoning chain to improve results from short or simple prompts.

Z Image Turbo accepts text prompts, source images, LoRA weights, and a seed value as inputs, making it suitable for both text-to-image and image editing workflows. Its training data infrastructure includes a Data Profiling Engine, Cross-modal Vector Engine, and a multi-level image captioning system covering OCR, world knowledge, and editing difference captions. The model is well-suited for creative professionals, developers building image generation pipelines, and researchers working with efficient diffusion transformer architectures.

Capabilities

What Z Image Turbo supports

Text-to-Image Generation

Generates images from text prompts using a single-stream diffusion transformer with 6 billion parameters, with a context window of up to 10,000 tokens for prompt input.

Fast Turbo Inference

Uses few-step distillation to reduce the number of diffusion steps required at inference time, enabling faster image generation with minimal quality loss.

LoRA Support

Accepts LoRA weight inputs directly, allowing users to apply fine-tuned character, style, or concept adapters on top of the base model.

Source Image Input

Accepts an existing image URL as a source input, enabling instruction-driven image editing via a dedicated continued training pipeline.

Prompt Enhancement

Includes a built-in prompt enhancer with a reasoning chain that expands short or simple prompts to improve output coherence and detail.

Seed-Based Reproducibility

Accepts a seed value as an explicit input, allowing users to reproduce identical outputs or make controlled variations across generation runs.

RLHF Alignment

Trained with a Reinforcement Learning from Human Feedback pipeline using DPO and GRPO stages to align generated images with human aesthetic preferences.

Ready to build with Z Image Turbo?

Get Started Free

FAQ

Common questions about Z Image Turbo

What is the context window for Z Image Turbo?

Z Image Turbo has a context window of 10,000 tokens, which applies to the text prompt input used to guide image generation.

What inputs does Z Image Turbo accept?

The model accepts a text prompt, a source image URL, LoRA weights, numeric parameters (such as dimensions or step counts), and a seed value for reproducibility.

What is the difference between Z Image and Z Image Turbo?

Z Image is the full foundation model with 6 billion parameters. Z Image Turbo is a distilled variant trained to produce comparable quality images in fewer diffusion steps, resulting in faster inference times.

Who developed Z Image Turbo and when was it trained?

Z Image Turbo was developed by Alibaba's Tongyi-MAI lab. The training data has a cutoff of November 2024.

Does Z Image Turbo support image editing, or only text-to-image generation?

The model supports both text-to-image generation and image editing. Image editing is enabled through a dedicated continued training pipeline that accepts a source image URL alongside an instruction prompt.

Can I use custom LoRA weights with Z Image Turbo?

Yes. The model accepts LoRA weights as a direct input, allowing you to apply fine-tuned adapters for specific characters, styles, or concepts during generation.

Community Discussion

What people think about Z Image Turbo

Community discussions around Z Image Turbo are active and generally positive, with users frequently praising its prompt-following accuracy and image quality relative to other locally-runnable models. Threads comparing it to models like FLUX Dev, Qwen-Image-2512, and GLM-Image have attracted hundreds of upvotes, indicating strong interest in how it performs in direct side-by-side evaluations.

A recurring use case in community threads is LoRA training and character consistency, with users sharing comparisons of character LoRAs trained on Z Image Turbo versus other models. Some discussions also focus on post-processing pipelines that combine Z Image Turbo with upscaling tools like SeedVR2 and face enhancement, suggesting users sometimes find the base output benefits from additional refinement steps.

r/StableDiffusion 530 pts 181 comments

Z-Image-Turbo vs Qwen Image 2512

r/StableDiffusion 431 pts 83 comments

🚀 ⚡ Z-Image-Turbo-Boosted 🔥 — One-Click Ultra-Clean Images (SeedVR2 + FlashVSR + Face Upscale + Qwen-VL)

r/StableDiffusion 128 pts 77 comments

Same prompt, same seed, 6 models — Chroma vs Flux Dev vs Qwen vs Klein 4B vs Z-Image Turbo vs SDXL

r/StableDiffusion 131 pts 74 comments

Comparison: Trained the same character LoRAs on Z-Image Turbo vs Qwen 2512

r/StableDiffusion 64 pts 44 comments

[Pt2] Local Comparison: GLM-Image vs Flux.2 Dev vs Z-Image Turbo vs Qwen-Image-2512 , All BF16

View more discussions →

Resources