Video Generation Model

LTX-2.3

LTX-2.3 is Lightricks' open-source 22-billion-parameter multimodal model that generates synchronized audio and video in a single pass at up to 4K resolution.

Start Building with LTX-2.3 View All Models

Publisher

Lightricks

TypeVideo

Context Window1,000 tokens

Training DataMarch 2026

Price$0.10-$0.80/video

Provider

WaveSpeed

TEXT TO VIDEOIMAGE TO VIDEO

Try LTX-2.3 →

About LTX-2.3

Open-source 4K video and audio generation

LTX-2.3 is a multimodal video generation model developed by Lightricks and released in March 2026. Built on a Diffusion Transformer architecture with 22 billion parameters, it generates synchronized audio and video in a single forward pass at resolutions up to 4K at 50 frames per second, for clips up to 20 seconds long. It is available as open-source software with open weights under a permissive license, and can be run locally, accessed via API, or deployed on-premises.

The model introduces several architectural updates over its predecessor, including a rebuilt variational autoencoder for sharper texture and edge detail, a gated attention text connector for improved prompt adherence, and an upgraded vocoder trained on filtered audio data for cleaner output. It supports native portrait-mode output at 1080×1920 and ships in four checkpoint variants — dev, distilled, fast, and pro — with the distilled variant completing generation in as few as 8 denoising steps. LTX-2.3 is aimed at independent creators, small studios, and developers who need a production-ready open-source foundation for video creation without licensing fees.

Capabilities

What LTX-2.3 supports

Text to Video

Generates video clips from text prompts at resolutions up to 4K at 50 FPS, for clips up to 20 seconds long.

Image to Video

Animates a provided image into a video clip, using an imageUrl input to anchor the first frame of generation.

Synchronized Audio Output

Produces audio and video together in a single forward pass, eliminating the need for separate audio post-processing.

Portrait Mode Support

Generates video natively at 1080×1920 resolution without cropping from a landscape output.

Fast Distilled Generation

The distilled checkpoint variant completes video generation in as few as 8 denoising steps for rapid iteration.

Configurable Generation Parameters

Accepts numeric inputs, toggle groups, and seed values to control resolution, duration, and reproducibility of outputs.

Multiple Checkpoint Variants

Ships in four variants — dev, distilled, fast, and pro — allowing users to trade generation speed against output quality.

Ready to build with LTX-2.3?

Get Started Free

FAQ

Common questions about LTX-2.3

What is the context window for LTX-2.3?

LTX-2.3 has a context window of 1,000 tokens, which governs the length and detail of text prompts it can process.

What is the maximum video resolution and length LTX-2.3 can produce?

LTX-2.3 can generate video at up to 4K resolution at 50 frames per second, for clips up to 20 seconds in duration.

Is LTX-2.3 open source, and can I run it locally?

Yes. LTX-2.3 is released as open-source software with open weights under a permissive license. It can be run locally via LTX Desktop, accessed through the Lightricks API, or deployed on-premises using the published weights on Hugging Face.

What checkpoint variants are available?

LTX-2.3 ships in four checkpoint variants: dev, distilled, fast, and pro. The distilled variant is optimized for speed and can complete generation in as few as 8 denoising steps.

When was LTX-2.3 trained and released?

LTX-2.3 was released in March 2026, with a training data cutoff date also noted as March 2026 in the model metadata.

Does LTX-2.3 generate audio as well as video?

Yes. LTX-2.3 generates synchronized audio and video in a single forward pass. Audio quality was improved in this version through filtered training data and an upgraded vocoder.

Community Discussion

What people think about LTX-2.3

Community reception on r/StableDiffusion has been largely positive, with the launch announcement thread accumulating over 700 upvotes and 147 comments. Users highlighted the rebuilt VAE, native portrait mode, improved image-to-video quality, and the new vocoder as meaningful upgrades over the previous version.

Some community members raised concerns about whether LTX remains viable for professional filmmaking workflows, with one thread arguing it falls short of production requirements. Discussions also covered the availability of an NV FP4 quantized variant and updates to the LTX Desktop application.

r/StableDiffusion188 pts58 comments

Lightricks/LTX-2.3 · Hugging Face

r/StableDiffusion727 pts147 comments

LTX-2.3 is live: rebuilt VAE, improved I2V, new vocoder, native portrait mode, and more

r/StableDiffusion122 pts142 comments

I’m sorry, but LTX still isn’t a professionally viable filmmaking tool

r/StableDiffusion238 pts98 comments

LTX Desktop update: what we shipped, what's coming, and where we're headed

r/StableDiffusion125 pts93 comments

Official LTX-2.3-nvfp4 model is available

View more discussions →

Resources