Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Video Generation Model

LTX-2 19b

LTX-2 19B is Lightricks' open-source AI model that simultaneously generates cinematic 4K video and synchronized audio from text or image inputs in a single unified workflow.

Publisher Lightricks
Type Video
Context Window 1,000 tokens
Training Data January 2026
Price Free/video
Provider WaveSpeed
Source ImageLoRA

Unified 4K video and audio generation from text

LTX-2 19B is an open-source video generation model developed by Lightricks and released on January 6, 2026. It uses an asymmetric dual-stream Diffusion Transformer architecture to generate video and synchronized audio together in a single unified process, rather than producing silent video and adding audio as a separate step. The model accepts text prompts, reference images, or existing video clips as input and outputs native 4K video with flexible frame-rate control and support for extended clip durations.

What distinguishes LTX-2 19B is its simultaneous audiovisual output, where ambient sound, environmental effects, and speech synchronization are generated alongside the video frames. The model supports LoRA fine-tuning for camera motion control and custom stylization, and offers NVFP4 and FP8 quantization formats that reduce VRAM usage by up to 60% and accelerate generation up to 3x. A distilled 8-step fast generation mode runs 5–6 times faster than the full model, and on an RTX 4090 with NVFP4 quantization an 8-second 720p clip can be produced in approximately 25 seconds. It is well suited for film-style storytelling, advertising production, and any workflow requiring tight audiovisual coherence.

What LTX-2 19b supports

Unified AV Generation

Generates video and scene-aware audio simultaneously in one pass using a dual-stream Diffusion Transformer, eliminating the sync issues common in separate audio-video pipelines.

Native 4K Output

Produces video at native 4K resolution with flexible frame-rate control and support for extended clip durations beyond standard short-form outputs.

Image-to-Video

Accepts a reference image URL as input and animates it into a video clip, preserving visual content from the source image across generated frames.

LoRA Camera Control

Supports Low-Rank Adaptation (LoRA) modules for precise camera motion control, enabling film-style cinematography directions such as pans, zooms, and tracking shots.

Quantized Inference

Supports NVFP4 and FP8 quantization formats that reduce VRAM usage by up to 60% and accelerate generation up to 3x compared to full-precision inference.

Fast Distilled Mode

Offers an 8-step distilled generation mode that runs 5–6x faster than the full model, producing an 8-second 720p clip in approximately 25 seconds on an RTX 4090 with NVFP4.

Text-to-Video

Generates video directly from text prompts, translating scene descriptions into temporally stable video clips with synchronized audio.

Seed Control

Accepts a manual seed value as input, allowing reproducible generation runs and controlled variation across outputs.

Ready to build with LTX-2 19b?

Get Started Free

Common questions about LTX-2 19b

What is the context window for LTX-2 19B?

LTX-2 19B has a context window of 1,000 tokens, as specified in the model metadata.

Is LTX-2 19B open source and can it be run locally?

Yes, LTX-2 19B is fully open source. It can be deployed locally without any cloud dependency, and model files are available on Hugging Face. It is also compatible with ComfyUI via community integrations.

What hardware is required to run LTX-2 19B locally?

The model supports NVFP4 and FP8 quantization, which reduce VRAM requirements by up to 60%. With NVFP4 quantization on an RTX 4090, an 8-second 720p clip can be generated in approximately 25 seconds. Exact minimum VRAM requirements depend on the quantization format and output resolution chosen.

Does LTX-2 19B generate audio as well as video?

Yes. LTX-2 19B generates video and synchronized audio together in a single unified process. The audio output includes ambient sound, environmental effects, and speech synchronization that correspond to the on-screen action.

What input types does LTX-2 19B accept?

The model accepts text prompts, reference image URLs, and existing video clips as inputs. It also supports LoRA configuration, numeric parameters, toggle group settings, and a manual seed value for reproducibility.

When was LTX-2 19B released and who developed it?

LTX-2 19B was developed by Lightricks and released on January 6, 2026. It was added to MindStudio on January 13, 2026.

What people think about LTX-2 19b

Community reception on r/StableDiffusion has been notably positive, with users sharing multi-clip demonstrations of LTX-2's audio-synced image-to-video outputs, including stitched 20-second sequences set to full music tracks. The most upvoted threads focus on the model's ability to synchronize generated video with external MP3 audio, with one workflow comparison post reaching 972 upvotes and 216 comments.

Users have also explored GGUF quantized variants for text-to-video use cases and shared readable workflow configurations for ComfyUI, indicating active community effort around local deployment and workflow optimization. Discussions around distilled LoRA quality settings suggest users are actively tuning the tradeoff between generation speed and output fidelity.

View more discussions →

Parameters & options

Resolution Select
Default: 720p
1080p720p480p
Duration Number
Default: 5 Range: 5–20
LoRAs LoRA

Up to 3 LoRAs.

Aspect Ratio Toggle Group
Default: 16:9
Seed Seed

A specific value that is used to guide the 'randomness' of the generation.

Range: -1–2147483647

Start building with LTX-2 19b

No API keys required. Create AI-powered workflows with LTX-2 19b in minutes — free.