Text to Speech Model

TTS

OpenAI's fast text-to-speech model, optimized for low-latency voice generation across a wide range of applications.

Start Building with TTS View All Models

Publisher

OpenAI

Type Text to Speech

Context Window 4,096 tokens

Training Data November 2024

Price $15.00 / 1M characters

Try TTS →

About TTS

Low-latency text-to-speech from OpenAI

TTS (tts-1) is OpenAI's text-to-speech model designed for speed and responsiveness. It converts written text into natural-sounding audio and is optimized to minimize the delay between text input and audio output. The model supports a 4096-token context window and is accessible through the OpenAI API, making it straightforward to integrate into existing applications and workflows.

TTS is well-suited for use cases where timely audio delivery matters, such as interactive voice assistants, customer service systems, educational tools, and entertainment applications. OpenAI also offers a sibling model, tts-1-hd, which prioritizes audio fidelity over speed. Developers who need the fastest possible voice response times will find tts-1 the appropriate choice, while those who can tolerate slightly higher latency in exchange for higher audio quality may opt for tts-1-hd.

Capabilities

What TTS supports

Low-Latency Speech

Generates audio from text with minimal delay, making it suitable for near real-time voice applications like interactive assistants.

Natural Voice Output

Produces fluid, human-like speech from written text across a range of supported voices including alloy, echo, fable, onyx, nova, and shimmer.

Multiple Audio Formats

Outputs audio in several formats including MP3, Opus, AAC, and FLAC, allowing developers to choose the format that fits their delivery requirements.

Text Input Processing

Accepts plain text input up to 4096 tokens per request and converts it to spoken audio in a single API call.

API Integration

Available via the OpenAI REST API, enabling scalable voice output that can be embedded into products, pipelines, and third-party platforms.

Speed Control

Supports a configurable speech speed parameter ranging from 0.25x to 4.0x, giving developers control over the pacing of generated audio.

Ready to build with TTS?

Get Started Free

FAQ

Common questions about TTS

What is the maximum input length for tts-1?

The model supports a context window of 4096 tokens per request, which corresponds to the maximum amount of text that can be converted to speech in a single API call.

How is tts-1 priced?

OpenAI prices tts-1 based on the number of characters in the input text. Current pricing details are available on the OpenAI pricing page at platform.openai.com/pricing.

What voices are available with tts-1?

tts-1 supports six built-in voices: alloy, echo, fable, onyx, nova, and shimmer. Each voice has a distinct tone and style, but no custom voice cloning is supported natively through this model.

What audio formats does tts-1 output?

The model can output audio in MP3, Opus, AAC, and FLAC formats. MP3 is the default format returned by the API.

What is the difference between tts-1 and tts-1-hd?

tts-1 is optimized for low latency and faster audio delivery, while tts-1-hd trades some speed for higher audio quality. Both models share the same voices and input format.

What is the training data cutoff for tts-1?

According to the provided metadata, the model's training date is listed as November 2024.

Community Discussion

What people think about TTS

Community discussion around OpenAI's TTS models is largely focused on comparisons with open-source and locally-run alternatives, with threads highlighting projects like Qwen3-TTS and Orpheus-FastAPI that offer OpenAI-compatible endpoints. Users frequently discuss latency benchmarks and voice quality as key evaluation criteria when choosing between hosted and self-hosted TTS solutions.

A notable thread on r/singularity flagged that OpenAI quietly released updated versions of its TTS and other audio models dated 2025-12-15, suggesting ongoing iteration on the model family. The Reddit threads found are not exclusively about tts-1 itself, so direct community sentiment about this specific model is limited in the available data.

r/LocalLLaMA 322 pts 171 comments

[Release] Qwen3-TTS: Ultra-Low Latency (97ms), Voice Cloning & OpenAI-Compatible API

r/LocalLLaMA 179 pts 111 comments

Orpheus-FastAPI: Local TTS with 8 Voices & Emotion Tags (OpenAI Endpoint Compatible)

r/StableDiffusion 2,608 pts 555 comments

Long consistent Ai Anime is almost here. Wan 2.1 with LoRa. Generated in 720p on 4090

r/singularity 136 pts 29 comments

OpenAI just stealth-dropped new "2025-12-15" versions of their Realtime, TTS and Transcribe models in the API.

r/LocalLLaMA 110 pts 26 comments

Thanks to you guys, Soprano TTS now supports OpenAI-compatible endpoint, ONNX, ComfyUI, WebUI, and CLI on CUDA, MPS, ROCm, and CPU!

View more discussions →

Resources