Image Generation Model

Z Image Turbo Controlnet

Alibaba's Tongyi Lab Z-Image is a 6-billion-parameter image generation foundation model that delivers state-of-the-art visual quality and prompt coherence using an efficient single-stream diffusion transformer architecture.

Start Building with Z Image Turbo Controlnet View All Models

Publisher

Qwen

Type Image

Context Window 10,000 tokens

Training Data November 2024

Price $0.0001/image

Provider

WaveSpeed

Source Image

Try Z Image Turbo Controlnet →

About Z Image Turbo Controlnet

Fast image generation with ControlNet guidance

Z Image Turbo Controlnet is an image generation model developed by Alibaba's Tongyi-MAI lab, built on a single-stream diffusion transformer architecture with 6 billion parameters. It uses a few-step distillation approach (the Turbo variant) to accelerate inference while preserving output quality, and incorporates ControlNet to allow structural guidance from a source image. The model was trained with a multi-level captioning system and a data infrastructure that includes a Cross-modal Vector Engine and World Knowledge Topological Graph to improve semantic alignment between prompts and outputs.

This model is well-suited for workflows that require both speed and structural control over generated images, such as guided creative generation, image editing pipelines, and rapid prototyping. It accepts image URLs as source inputs alongside configurable parameters including seed values for reproducibility. An RLHF alignment pipeline using DPO and GRPO stages was applied to bring outputs closer to human aesthetic preferences, and a built-in prompt enhancer with reasoning chain helps produce better results from short or underspecified prompts.

Capabilities

What Z Image Turbo Controlnet supports

ControlNet Guidance

Accepts a source image URL to provide structural or compositional control over the generated output, enabling guided image generation from a reference.

Turbo Inference

Uses few-step distillation to reduce the number of diffusion steps required at inference time, producing results faster without significant quality degradation.

Text-to-Image Generation

Generates images from text prompts using a 6-billion-parameter single-stream diffusion transformer, with a built-in prompt enhancer that applies a reasoning chain to improve results from short inputs.

Seed-Based Reproducibility

Accepts a numeric seed input so that generation results can be reproduced exactly across multiple runs with the same parameters.

RLHF Alignment

Trained with a reinforcement learning from human feedback pipeline using DPO and GRPO stages to align generated images with human aesthetic preferences.

Configurable Generation Parameters

Exposes multiple select-type inputs allowing users to configure generation options such as style or quality mode directly within the request.

Ready to build with Z Image Turbo Controlnet?

Get Started Free

FAQ

Common questions about Z Image Turbo Controlnet

What is the context window for this model?

The model has a context window of 10,000 tokens, as specified in its metadata.

Who developed Z Image Turbo Controlnet?

It was developed by Alibaba's Tongyi-MAI lab and is published under the Qwen publisher on MindStudio.

What inputs does this model accept?

The model accepts an image URL (for ControlNet source guidance), two select-type configuration inputs, a numeric parameter, and a seed value for reproducibility.

What is the training cutoff date for this model?

According to the metadata, the model's training date is November 2024.

How does the Turbo variant differ from the base Z-Image model?

The Turbo variant applies few-step distillation to the base 6-billion-parameter Z-Image model, reducing the number of diffusion steps needed at inference time for faster generation while aiming to preserve output quality.

Do I need to provide an API key to use this model on MindStudio?

No API key is required. You can use Z Image Turbo Controlnet directly through MindStudio without managing separate API credentials.

Resources