Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Image Generation Model

Z Image Turbo

Alibaba's Tongyi Lab Z-Image is a 6-billion-parameter image generation foundation model that delivers state-of-the-art visual quality and prompt coherence using an efficient single-stream diffusion transformer architecture.

Publisher Qwen
Type Image
Context Window 10,000 tokens
Training Data November 2024
Price Free/image
Provider WaveSpeed
LoRASource Image

Fast diffusion image generation with LoRA support

Z Image Turbo is a text-to-image generation model developed by Alibaba's Tongyi-MAI lab, built on a single-stream diffusion transformer architecture with 6 billion parameters. It is the distilled, few-step variant of the Z-Image foundation model, designed to produce high-quality images at faster inference speeds without significant quality degradation. The model incorporates a Reinforcement Learning from Human Feedback (RLHF) pipeline using DPO and GRPO stages to align outputs with human aesthetic preferences, and includes a built-in prompt enhancer with a reasoning chain to improve results from short or simple prompts.

Z Image Turbo accepts text prompts, source images, LoRA weights, and a seed value as inputs, making it suitable for both text-to-image and image editing workflows. Its training data infrastructure includes a Data Profiling Engine, Cross-modal Vector Engine, and a multi-level image captioning system covering OCR, world knowledge, and editing difference captions. The model is well-suited for creative professionals, developers building image generation pipelines, and researchers working with efficient diffusion transformer architectures.

What Z Image Turbo supports

Text-to-Image Generation

Generates images from text prompts using a single-stream diffusion transformer with 6 billion parameters, with a context window of up to 10,000 tokens for prompt input.

Fast Turbo Inference

Uses few-step distillation to reduce the number of diffusion steps required at inference time, enabling faster image generation with minimal quality loss.

LoRA Support

Accepts LoRA weight inputs directly, allowing users to apply fine-tuned character, style, or concept adapters on top of the base model.

Source Image Input

Accepts an existing image URL as a source input, enabling instruction-driven image editing via a dedicated continued training pipeline.

Prompt Enhancement

Includes a built-in prompt enhancer with a reasoning chain that expands short or simple prompts to improve output coherence and detail.

Seed-Based Reproducibility

Accepts a seed value as an explicit input, allowing users to reproduce identical outputs or make controlled variations across generation runs.

RLHF Alignment

Trained with a Reinforcement Learning from Human Feedback pipeline using DPO and GRPO stages to align generated images with human aesthetic preferences.

Ready to build with Z Image Turbo?

Get Started Free

Common questions about Z Image Turbo

What is the context window for Z Image Turbo?

Z Image Turbo has a context window of 10,000 tokens, which applies to the text prompt input used to guide image generation.

What inputs does Z Image Turbo accept?

The model accepts a text prompt, a source image URL, LoRA weights, numeric parameters (such as dimensions or step counts), and a seed value for reproducibility.

What is the difference between Z Image and Z Image Turbo?

Z Image is the full foundation model with 6 billion parameters. Z Image Turbo is a distilled variant trained to produce comparable quality images in fewer diffusion steps, resulting in faster inference times.

Who developed Z Image Turbo and when was it trained?

Z Image Turbo was developed by Alibaba's Tongyi-MAI lab. The training data has a cutoff of November 2024.

Does Z Image Turbo support image editing, or only text-to-image generation?

The model supports both text-to-image generation and image editing. Image editing is enabled through a dedicated continued training pipeline that accepts a source image URL alongside an instruction prompt.

Can I use custom LoRA weights with Z Image Turbo?

Yes. The model accepts LoRA weights as a direct input, allowing you to apply fine-tuned adapters for specific characters, styles, or concepts during generation.

What people think about Z Image Turbo

Community discussions around Z Image Turbo are active and generally positive, with users frequently praising its prompt-following accuracy and image quality relative to other locally-runnable models. Threads comparing it to models like FLUX Dev, Qwen-Image-2512, and GLM-Image have attracted hundreds of upvotes, indicating strong interest in how it performs in direct side-by-side evaluations.

A recurring use case in community threads is LoRA training and character consistency, with users sharing comparisons of character LoRAs trained on Z Image Turbo versus other models. Some discussions also focus on post-processing pipelines that combine Z Image Turbo with upscaling tools like SeedVR2 and face enhancement, suggesting users sometimes find the base output benefits from additional refinement steps.

View more discussions →

Parameters & options

Width Number
Default: 1024 Range: 256–1536
Height Number
Default: 1024 Range: 256–1536
LoRAs LoRA

Up to 3 LoRAs.

Input Image Strength Number

Controls the strength of the transformation if an input image is provided. Higher values produce outputs more different from the input image.

Default: 0.5 Range: 0–1 (step 0.1)
Negative Prompt Text

Description of what to exclude from the video.

Seed Seed

A specific value that is used to guide the 'randomness' of the generation.

Range: -1–2147483647

Start building with Z Image Turbo

No API keys required. Create AI-powered workflows with Z Image Turbo in minutes — free.