Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Image Generation Model

Z Image Turbo Controlnet

Alibaba's Tongyi Lab Z-Image is a 6-billion-parameter image generation foundation model that delivers state-of-the-art visual quality and prompt coherence using an efficient single-stream diffusion transformer architecture.

Publisher Qwen
Type Image
Context Window 10,000 tokens
Training Data November 2024
Price $0.0001/image
Provider WaveSpeed
Source Image

Fast image generation with ControlNet guidance

Z Image Turbo Controlnet is an image generation model developed by Alibaba's Tongyi-MAI lab, built on a single-stream diffusion transformer architecture with 6 billion parameters. It uses a few-step distillation approach (the Turbo variant) to accelerate inference while preserving output quality, and incorporates ControlNet to allow structural guidance from a source image. The model was trained with a multi-level captioning system and a data infrastructure that includes a Cross-modal Vector Engine and World Knowledge Topological Graph to improve semantic alignment between prompts and outputs.

This model is well-suited for workflows that require both speed and structural control over generated images, such as guided creative generation, image editing pipelines, and rapid prototyping. It accepts image URLs as source inputs alongside configurable parameters including seed values for reproducibility. An RLHF alignment pipeline using DPO and GRPO stages was applied to bring outputs closer to human aesthetic preferences, and a built-in prompt enhancer with reasoning chain helps produce better results from short or underspecified prompts.

What Z Image Turbo Controlnet supports

ControlNet Guidance

Accepts a source image URL to provide structural or compositional control over the generated output, enabling guided image generation from a reference.

Turbo Inference

Uses few-step distillation to reduce the number of diffusion steps required at inference time, producing results faster without significant quality degradation.

Text-to-Image Generation

Generates images from text prompts using a 6-billion-parameter single-stream diffusion transformer, with a built-in prompt enhancer that applies a reasoning chain to improve results from short inputs.

Seed-Based Reproducibility

Accepts a numeric seed input so that generation results can be reproduced exactly across multiple runs with the same parameters.

RLHF Alignment

Trained with a reinforcement learning from human feedback pipeline using DPO and GRPO stages to align generated images with human aesthetic preferences.

Configurable Generation Parameters

Exposes multiple select-type inputs allowing users to configure generation options such as style or quality mode directly within the request.

Ready to build with Z Image Turbo Controlnet?

Get Started Free

Common questions about Z Image Turbo Controlnet

What is the context window for this model?

The model has a context window of 10,000 tokens, as specified in its metadata.

Who developed Z Image Turbo Controlnet?

It was developed by Alibaba's Tongyi-MAI lab and is published under the Qwen publisher on MindStudio.

What inputs does this model accept?

The model accepts an image URL (for ControlNet source guidance), two select-type configuration inputs, a numeric parameter, and a seed value for reproducibility.

What is the training cutoff date for this model?

According to the metadata, the model's training date is November 2024.

How does the Turbo variant differ from the base Z-Image model?

The Turbo variant applies few-step distillation to the base 6-billion-parameter Z-Image model, reducing the number of diffusion steps needed at inference time for faster generation while aiming to preserve output quality.

Do I need to provide an API key to use this model on MindStudio?

No API key is required. You can use Z Image Turbo Controlnet directly through MindStudio without managing separate API credentials.

Parameters & options

Reference Image Image URL

Reference image URL for ControlNet to extract structural guidance from.

ControlNet Mode Select

ControlNet mode: 'depth' for depth map guidance, 'canny' for edge detection, 'pose' for human pose estimation, 'none' for no control.

Default: depth
DepthCannyPoseNone
Size Select

Output image size in pixels (width*height).

Default: 1024*1024
1024×1024 (Square)1024×1536 (Portrait)1536×1024 (Landscape)768×10241024×768768×13441344×768512×512
Strength Number

Controls how strongly the ControlNet guidance affects the output. Higher values follow the control signal more strictly.

Default: 0.7 Range: 0–1 (step 0.05)
Seed Seed

Random seed for reproducible generation. Use -1 for random seed.

Range: -1–2147483647

Start building with Z Image Turbo Controlnet

No API keys required. Create AI-powered workflows with Z Image Turbo Controlnet in minutes — free.