TTS
OpenAI's fast text-to-speech model, optimized for low-latency voice generation across a wide range of applications.
Low-latency text-to-speech from OpenAI
TTS (tts-1) is OpenAI's text-to-speech model designed for speed and responsiveness. It converts written text into natural-sounding audio and is optimized to minimize the delay between text input and audio output. The model supports a 4096-token context window and is accessible through the OpenAI API, making it straightforward to integrate into existing applications and workflows.
TTS is well-suited for use cases where timely audio delivery matters, such as interactive voice assistants, customer service systems, educational tools, and entertainment applications. OpenAI also offers a sibling model, tts-1-hd, which prioritizes audio fidelity over speed. Developers who need the fastest possible voice response times will find tts-1 the appropriate choice, while those who can tolerate slightly higher latency in exchange for higher audio quality may opt for tts-1-hd.
What TTS supports
Low-Latency Speech
Generates audio from text with minimal delay, making it suitable for near real-time voice applications like interactive assistants.
Natural Voice Output
Produces fluid, human-like speech from written text across a range of supported voices including alloy, echo, fable, onyx, nova, and shimmer.
Multiple Audio Formats
Outputs audio in several formats including MP3, Opus, AAC, and FLAC, allowing developers to choose the format that fits their delivery requirements.
Text Input Processing
Accepts plain text input up to 4096 tokens per request and converts it to spoken audio in a single API call.
API Integration
Available via the OpenAI REST API, enabling scalable voice output that can be embedded into products, pipelines, and third-party platforms.
Speed Control
Supports a configurable speech speed parameter ranging from 0.25x to 4.0x, giving developers control over the pacing of generated audio.
Ready to build with TTS?
Get Started FreeCommon questions about TTS
What is the maximum input length for tts-1?
The model supports a context window of 4096 tokens per request, which corresponds to the maximum amount of text that can be converted to speech in a single API call.
How is tts-1 priced?
OpenAI prices tts-1 based on the number of characters in the input text. Current pricing details are available on the OpenAI pricing page at platform.openai.com/pricing.
What voices are available with tts-1?
tts-1 supports six built-in voices: alloy, echo, fable, onyx, nova, and shimmer. Each voice has a distinct tone and style, but no custom voice cloning is supported natively through this model.
What audio formats does tts-1 output?
The model can output audio in MP3, Opus, AAC, and FLAC formats. MP3 is the default format returned by the API.
What is the difference between tts-1 and tts-1-hd?
tts-1 is optimized for low latency and faster audio delivery, while tts-1-hd trades some speed for higher audio quality. Both models share the same voices and input format.
What is the training data cutoff for tts-1?
According to the provided metadata, the model's training date is listed as November 2024.
What people think about TTS
Community discussion around OpenAI's TTS models is largely focused on comparisons with open-source and locally-run alternatives, with threads highlighting projects like Qwen3-TTS and Orpheus-FastAPI that offer OpenAI-compatible endpoints. Users frequently discuss latency benchmarks and voice quality as key evaluation criteria when choosing between hosted and self-hosted TTS solutions.
A notable thread on r/singularity flagged that OpenAI quietly released updated versions of its TTS and other audio models dated 2025-12-15, suggesting ongoing iteration on the model family. The Reddit threads found are not exclusively about tts-1 itself, so direct community sentiment about this specific model is limited in the available data.
[Release] Qwen3-TTS: Ultra-Low Latency (97ms), Voice Cloning & OpenAI-Compatible API
Orpheus-FastAPI: Local TTS with 8 Voices & Emotion Tags (OpenAI Endpoint Compatible)
Long consistent Ai Anime is almost here. Wan 2.1 with LoRa. Generated in 720p on 4090
OpenAI just stealth-dropped new "2025-12-15" versions of their Realtime, TTS and Transcribe models in the API.
Thanks to you guys, Soprano TTS now supports OpenAI-compatible endpoint, ONNX, ComfyUI, WebUI, and CLI on CUDA, MPS, ROCm, and CPU!
Documentation & links
Parameters & options
Voice to use in TTS
Explore similar models
Start building with TTS
No API keys required. Create AI-powered workflows with TTS in minutes — free.