Z Image Turbo
Alibaba's Tongyi Lab Z-Image is a 6-billion-parameter image generation foundation model that delivers state-of-the-art visual quality and prompt coherence using an efficient single-stream diffusion transformer architecture.
Fast diffusion image generation with LoRA support
Z Image Turbo is a text-to-image generation model developed by Alibaba's Tongyi-MAI lab, built on a single-stream diffusion transformer architecture with 6 billion parameters. It is the distilled, few-step variant of the Z-Image foundation model, designed to produce high-quality images at faster inference speeds without significant quality degradation. The model incorporates a Reinforcement Learning from Human Feedback (RLHF) pipeline using DPO and GRPO stages to align outputs with human aesthetic preferences, and includes a built-in prompt enhancer with a reasoning chain to improve results from short or simple prompts.
Z Image Turbo accepts text prompts, source images, LoRA weights, and a seed value as inputs, making it suitable for both text-to-image and image editing workflows. Its training data infrastructure includes a Data Profiling Engine, Cross-modal Vector Engine, and a multi-level image captioning system covering OCR, world knowledge, and editing difference captions. The model is well-suited for creative professionals, developers building image generation pipelines, and researchers working with efficient diffusion transformer architectures.
What Z Image Turbo supports
Text-to-Image Generation
Generates images from text prompts using a single-stream diffusion transformer with 6 billion parameters, with a context window of up to 10,000 tokens for prompt input.
Fast Turbo Inference
Uses few-step distillation to reduce the number of diffusion steps required at inference time, enabling faster image generation with minimal quality loss.
LoRA Support
Accepts LoRA weight inputs directly, allowing users to apply fine-tuned character, style, or concept adapters on top of the base model.
Source Image Input
Accepts an existing image URL as a source input, enabling instruction-driven image editing via a dedicated continued training pipeline.
Prompt Enhancement
Includes a built-in prompt enhancer with a reasoning chain that expands short or simple prompts to improve output coherence and detail.
Seed-Based Reproducibility
Accepts a seed value as an explicit input, allowing users to reproduce identical outputs or make controlled variations across generation runs.
RLHF Alignment
Trained with a Reinforcement Learning from Human Feedback pipeline using DPO and GRPO stages to align generated images with human aesthetic preferences.
Ready to build with Z Image Turbo?
Get Started FreeCommon questions about Z Image Turbo
What is the context window for Z Image Turbo?
Z Image Turbo has a context window of 10,000 tokens, which applies to the text prompt input used to guide image generation.
What inputs does Z Image Turbo accept?
The model accepts a text prompt, a source image URL, LoRA weights, numeric parameters (such as dimensions or step counts), and a seed value for reproducibility.
What is the difference between Z Image and Z Image Turbo?
Z Image is the full foundation model with 6 billion parameters. Z Image Turbo is a distilled variant trained to produce comparable quality images in fewer diffusion steps, resulting in faster inference times.
Who developed Z Image Turbo and when was it trained?
Z Image Turbo was developed by Alibaba's Tongyi-MAI lab. The training data has a cutoff of November 2024.
Does Z Image Turbo support image editing, or only text-to-image generation?
The model supports both text-to-image generation and image editing. Image editing is enabled through a dedicated continued training pipeline that accepts a source image URL alongside an instruction prompt.
Can I use custom LoRA weights with Z Image Turbo?
Yes. The model accepts LoRA weights as a direct input, allowing you to apply fine-tuned adapters for specific characters, styles, or concepts during generation.
What people think about Z Image Turbo
Community discussions around Z Image Turbo are active and generally positive, with users frequently praising its prompt-following accuracy and image quality relative to other locally-runnable models. Threads comparing it to models like FLUX Dev, Qwen-Image-2512, and GLM-Image have attracted hundreds of upvotes, indicating strong interest in how it performs in direct side-by-side evaluations.
A recurring use case in community threads is LoRA training and character consistency, with users sharing comparisons of character LoRAs trained on Z Image Turbo versus other models. Some discussions also focus on post-processing pipelines that combine Z Image Turbo with upscaling tools like SeedVR2 and face enhancement, suggesting users sometimes find the base output benefits from additional refinement steps.
Z-Image-Turbo vs Qwen Image 2512
🚀 ⚡ Z-Image-Turbo-Boosted 🔥 — One-Click Ultra-Clean Images (SeedVR2 + FlashVSR + Face Upscale + Qwen-VL)
Same prompt, same seed, 6 models — Chroma vs Flux Dev vs Qwen vs Klein 4B vs Z-Image Turbo vs SDXL
Comparison: Trained the same character LoRAs on Z-Image Turbo vs Qwen 2512
[Pt2] Local Comparison: GLM-Image vs Flux.2 Dev vs Z-Image Turbo vs Qwen-Image-2512 , All BF16
Documentation & links
Parameters & options
Up to 3 LoRAs.
Controls the strength of the transformation if an input image is provided. Higher values produce outputs more different from the input image.
Description of what to exclude from the video.
A specific value that is used to guide the 'randomness' of the generation.
Explore similar models
Start building with Z Image Turbo
No API keys required. Create AI-powered workflows with Z Image Turbo in minutes — free.