Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Text to Speech Model

TTS

OpenAI's fast text-to-speech model, optimized for low-latency voice generation across a wide range of applications.

Publisher OpenAI
Type Text to Speech
Context Window 4,096 tokens
Training Data November 2024
Price $0.03/1K chars

Low-latency text-to-speech from OpenAI

TTS (tts-1) is OpenAI's text-to-speech model designed for speed and responsiveness. It converts written text into natural-sounding audio and is optimized to minimize the delay between text input and audio output. The model supports a 4096-token context window and is accessible through the OpenAI API, making it straightforward to integrate into existing applications and workflows.

TTS is well-suited for use cases where timely audio delivery matters, such as interactive voice assistants, customer service systems, educational tools, and entertainment applications. OpenAI also offers a sibling model, tts-1-hd, which prioritizes audio fidelity over speed. Developers who need the fastest possible voice response times will find tts-1 the appropriate choice, while those who can tolerate slightly higher latency in exchange for higher audio quality may opt for tts-1-hd.

What TTS supports

Low-Latency Speech

Generates audio from text with minimal delay, making it suitable for near real-time voice applications like interactive assistants.

Natural Voice Output

Produces fluid, human-like speech from written text across a range of supported voices including alloy, echo, fable, onyx, nova, and shimmer.

Multiple Audio Formats

Outputs audio in several formats including MP3, Opus, AAC, and FLAC, allowing developers to choose the format that fits their delivery requirements.

Text Input Processing

Accepts plain text input up to 4096 tokens per request and converts it to spoken audio in a single API call.

API Integration

Available via the OpenAI REST API, enabling scalable voice output that can be embedded into products, pipelines, and third-party platforms.

Speed Control

Supports a configurable speech speed parameter ranging from 0.25x to 4.0x, giving developers control over the pacing of generated audio.

Ready to build with TTS?

Get Started Free

Common questions about TTS

What is the maximum input length for tts-1?

The model supports a context window of 4096 tokens per request, which corresponds to the maximum amount of text that can be converted to speech in a single API call.

How is tts-1 priced?

OpenAI prices tts-1 based on the number of characters in the input text. Current pricing details are available on the OpenAI pricing page at platform.openai.com/pricing.

What voices are available with tts-1?

tts-1 supports six built-in voices: alloy, echo, fable, onyx, nova, and shimmer. Each voice has a distinct tone and style, but no custom voice cloning is supported natively through this model.

What audio formats does tts-1 output?

The model can output audio in MP3, Opus, AAC, and FLAC formats. MP3 is the default format returned by the API.

What is the difference between tts-1 and tts-1-hd?

tts-1 is optimized for low latency and faster audio delivery, while tts-1-hd trades some speed for higher audio quality. Both models share the same voices and input format.

What is the training data cutoff for tts-1?

According to the provided metadata, the model's training date is listed as November 2024.

What people think about TTS

Community discussion around OpenAI's TTS models is largely focused on comparisons with open-source and locally-run alternatives, with threads highlighting projects like Qwen3-TTS and Orpheus-FastAPI that offer OpenAI-compatible endpoints. Users frequently discuss latency benchmarks and voice quality as key evaluation criteria when choosing between hosted and self-hosted TTS solutions.

A notable thread on r/singularity flagged that OpenAI quietly released updated versions of its TTS and other audio models dated 2025-12-15, suggesting ongoing iteration on the model family. The Reddit threads found are not exclusively about tts-1 itself, so direct community sentiment about this specific model is limited in the available data.

View more discussions →

Parameters & options

Voice Select

Voice to use in TTS

Default: alloy
AlloyEchoFableOnyxNovaShimmer

Start building with TTS

No API keys required. Create AI-powered workflows with TTS in minutes — free.