Text to Speech Model

TTS HD

AI model converting text to high-quality, natural-sounding speech with TTS-1-HD optimized for quality.

Start Building with TTS HD View All Models

Publisher

OpenAI

Type Text to Speech

Context Window 4,096 tokens

Training Data n/a

Price $30.00 / 1M characters

Try TTS HD →

About TTS HD

High-quality text to natural speech conversion

TTS HD (model ID: tts-1-hd) is a text-to-speech model developed by OpenAI that converts written text into natural-sounding spoken audio. It accepts a text input of up to 4096 tokens and produces audio output in a variety of supported voices. TTS-1-HD is the quality-optimized variant in OpenAI's TTS model family, designed to produce higher-fidelity audio compared to the standard TTS-1 offering.

The model is well-suited for applications that require clear, natural-sounding voice output, such as voice assistants, audiobook narration, accessibility tools, and content creation workflows. It supports multiple built-in voices and can output audio in formats including MP3, Opus, AAC, and FLAC. Developers access the model through OpenAI's API, and it is available on MindStudio without requiring separate API key management.

Capabilities

What TTS HD supports

Text to Speech

Converts written text into spoken audio output. Accepts up to 4096 tokens of input text per request.

Multiple Voice Options

Supports a selection of built-in voices (e.g., alloy, echo, fable, onyx, nova, shimmer) to vary the tone and style of generated speech.

Audio Format Support

Outputs audio in multiple formats including MP3, Opus, AAC, and FLAC to suit different playback and storage requirements.

Quality-Optimized Output

The HD variant applies additional processing to produce higher-fidelity audio compared to the standard TTS-1 model, reducing artifacts in the output.

API Integration

Accessible via OpenAI's REST API, allowing developers to integrate speech synthesis directly into applications and pipelines.

Ready to build with TTS HD?

Get Started Free

FAQ

Common questions about TTS HD

What is the maximum input length for TTS HD?

TTS HD supports a context window of 4096 tokens per request, which corresponds to the maximum amount of text that can be converted to speech in a single API call.

What is the difference between TTS-1 and TTS-1-HD?

TTS-1-HD is the quality-optimized variant of OpenAI's text-to-speech model family. It is designed to produce higher-fidelity audio output, while TTS-1 is optimized for lower latency at the cost of some audio quality.

What audio formats does TTS HD support?

TTS HD can output audio in MP3, Opus, AAC, and FLAC formats, as documented in OpenAI's text-to-speech guide.

What voices are available with TTS HD?

OpenAI provides six built-in voices for TTS HD: alloy, echo, fable, onyx, nova, and shimmer. Each voice has a distinct tone and character.

Does TTS HD have a knowledge cutoff date?

TTS HD is a speech synthesis model and does not rely on a training knowledge cutoff in the same way language models do. The metadata lists the training date as not applicable.

How is TTS HD priced?

Pricing for TTS HD is set by OpenAI and is based on the number of characters processed. Refer to OpenAI's official pricing page for current rates, as pricing may change over time.

Resources