LTX-2 19b
LTX-2 19B is Lightricks' open-source AI model that simultaneously generates cinematic 4K video and synchronized audio from text or image inputs in a single unified workflow.
Unified 4K video and audio generation from text
LTX-2 19B is an open-source video generation model developed by Lightricks and released on January 6, 2026. It uses an asymmetric dual-stream Diffusion Transformer architecture to generate video and synchronized audio together in a single unified process, rather than producing silent video and adding audio as a separate step. The model accepts text prompts, reference images, or existing video clips as input and outputs native 4K video with flexible frame-rate control and support for extended clip durations.
What distinguishes LTX-2 19B is its simultaneous audiovisual output, where ambient sound, environmental effects, and speech synchronization are generated alongside the video frames. The model supports LoRA fine-tuning for camera motion control and custom stylization, and offers NVFP4 and FP8 quantization formats that reduce VRAM usage by up to 60% and accelerate generation up to 3x. A distilled 8-step fast generation mode runs 5–6 times faster than the full model, and on an RTX 4090 with NVFP4 quantization an 8-second 720p clip can be produced in approximately 25 seconds. It is well suited for film-style storytelling, advertising production, and any workflow requiring tight audiovisual coherence.
What LTX-2 19b supports
Unified AV Generation
Generates video and scene-aware audio simultaneously in one pass using a dual-stream Diffusion Transformer, eliminating the sync issues common in separate audio-video pipelines.
Native 4K Output
Produces video at native 4K resolution with flexible frame-rate control and support for extended clip durations beyond standard short-form outputs.
Image-to-Video
Accepts a reference image URL as input and animates it into a video clip, preserving visual content from the source image across generated frames.
LoRA Camera Control
Supports Low-Rank Adaptation (LoRA) modules for precise camera motion control, enabling film-style cinematography directions such as pans, zooms, and tracking shots.
Quantized Inference
Supports NVFP4 and FP8 quantization formats that reduce VRAM usage by up to 60% and accelerate generation up to 3x compared to full-precision inference.
Fast Distilled Mode
Offers an 8-step distilled generation mode that runs 5–6x faster than the full model, producing an 8-second 720p clip in approximately 25 seconds on an RTX 4090 with NVFP4.
Text-to-Video
Generates video directly from text prompts, translating scene descriptions into temporally stable video clips with synchronized audio.
Seed Control
Accepts a manual seed value as input, allowing reproducible generation runs and controlled variation across outputs.
Ready to build with LTX-2 19b?
Get Started FreeCommon questions about LTX-2 19b
What is the context window for LTX-2 19B?
LTX-2 19B has a context window of 1,000 tokens, as specified in the model metadata.
Is LTX-2 19B open source and can it be run locally?
Yes, LTX-2 19B is fully open source. It can be deployed locally without any cloud dependency, and model files are available on Hugging Face. It is also compatible with ComfyUI via community integrations.
What hardware is required to run LTX-2 19B locally?
The model supports NVFP4 and FP8 quantization, which reduce VRAM requirements by up to 60%. With NVFP4 quantization on an RTX 4090, an 8-second 720p clip can be generated in approximately 25 seconds. Exact minimum VRAM requirements depend on the quantization format and output resolution chosen.
Does LTX-2 19B generate audio as well as video?
Yes. LTX-2 19B generates video and synchronized audio together in a single unified process. The audio output includes ambient sound, environmental effects, and speech synchronization that correspond to the on-screen action.
What input types does LTX-2 19B accept?
The model accepts text prompts, reference image URLs, and existing video clips as inputs. It also supports LoRA configuration, numeric parameters, toggle group settings, and a manual seed value for reproducibility.
When was LTX-2 19B released and who developed it?
LTX-2 19B was developed by Lightricks and released on January 6, 2026. It was added to MindStudio on January 13, 2026.
What people think about LTX-2 19b
Community reception on r/StableDiffusion has been notably positive, with users sharing multi-clip demonstrations of LTX-2's audio-synced image-to-video outputs, including stitched 20-second sequences set to full music tracks. The most upvoted threads focus on the model's ability to synchronize generated video with external MP3 audio, with one workflow comparison post reaching 972 upvotes and 216 comments.
Users have also explored GGUF quantized variants for text-to-video use cases and shared readable workflow configurations for ComfyUI, indicating active community effort around local deployment and workflow optimization. Discussions around distilled LoRA quality settings suggest users are actively tuning the tradeoff between generation speed and output fidelity.
LTX-2 I2V synced to an MP3: Distill Lora Quality STR 1 vs .6 - New Workflow Version 2.
LTX-2 Audio Synced to added MP3 i2v - 6 examples 3 realistic 3 animated - Non Distilled - 20s clips stitched together (Music: Dido's "Thank You")
LTX-2 audio input and i2v video. 4x 20 sec clips stitched together (Muisc: Dog Days are Over)
Using GGUF models for LTX-2 in T2V
LTX-2 readable(?) workflows — some improvements + new workflows added
Documentation & links
Parameters & options
Up to 3 LoRAs.
A specific value that is used to guide the 'randomness' of the generation.
Explore similar models
Start building with LTX-2 19b
No API keys required. Create AI-powered workflows with LTX-2 19b in minutes — free.