Text to Speech Model

Minimax Speech 2.8 HD

MiniMax Speech 2.8 HD is a studio-quality text-to-speech model that delivers broadcast-ready, emotionally expressive audio rivaling professional voice actors.

Start Building with Minimax Speech 2.8 HD View All Models

Publisher

MiniMax

TypeText to Speech

Context Window50,000 tokens

Training DataJanuary 2026

Price$0.007/run

Provider

WaveSpeed

SPEECH

Try Minimax Speech 2.8 HD →

About Minimax Speech 2.8 HD

Studio-quality text-to-speech with emotional expression

MiniMax Speech 2.8 HD is a high-definition text-to-speech model developed by MiniMax, built on an autoregressive Transformer architecture with a Flow-VAE decoder. Instead of using traditional mel-spectrogram vocoders, it models speech in a learned latent space, which produces audio with natural cadence, proper intonation, and emotional depth. The model accepts up to 50,000 tokens of input text and was trained through January 2026.

The model offers 17 or more expressive voice presets spanning different genders, ages, and speaking styles, along with support for natural interjections such as laughs, sighs, and gasps embedded directly in text. Users can control emotion, speed, volume, pitch, sample rate, bitrate, channel configuration, and output format. These features make it well suited for audiobook production, video voiceovers, podcast creation, e-learning narration, accessibility applications, and game development.

Capabilities

What Minimax Speech 2.8 HD supports

Voice Presets

Provides 17 or more built-in voice options spanning different genders, ages, and speaking styles, selectable via a dropdown input.

Emotion Control

Allows setting the emotional tone of synthesized speech — such as happy or calm — to match the intended content context.

Natural Interjections

Supports embedding over 20 human sounds like (laughs), (sighs), and (gasps) directly in input text for lifelike delivery.

Audio Format Control

Exposes configurable parameters for sample rate, bitrate, channel configuration, and output format through dedicated select inputs.

Speech Rate & Pitch

Accepts numeric inputs to adjust playback speed, volume level, and pitch independently for fine-grained audio tuning.

Custom Pronunciation

Supports a custom pronunciation dictionary to handle brand names, acronyms, and specialized terminology with precise phonetic control.

Large Text Input

Accepts up to 50,000 tokens of input text in a single request, enabling long-form content like full audiobook chapters.

Ready to build with Minimax Speech 2.8 HD?

Get Started Free

FAQ

Common questions about Minimax Speech 2.8 HD

What is the maximum input length for MiniMax Speech 2.8 HD?

The model supports a context window of 50,000 tokens, which allows for long-form content such as full chapters or extended scripts in a single request.

What audio output formats and quality settings are available?

Users can configure sample rate, bitrate, channel (mono or stereo), and output format through dedicated select inputs, giving full control over the final audio file.

Can I control how the voice sounds beyond just selecting a preset?

Yes. In addition to choosing from 17 or more voice presets, you can adjust speed, volume, pitch, and emotional tone, and embed natural interjections like (laughs) or (sighs) directly in the input text.

What is the training data cutoff for this model?

The model's training date is listed as January 2026.

What types of applications is MiniMax Speech 2.8 HD best suited for?

The model is designed for use cases that require high-fidelity, human-sounding audio, including audiobook production, video voiceovers, podcast creation, e-learning narration, accessibility tools, and game development.

Resources

Documentation & links

Announcement Blog PostAnnouncements

→

Model Page on WaveSpeedAIPlayground

→

MiniMax Official WebsiteOther

→

MiniMax API DocumentationDocumentation

→

Configuration

Parameters & options

VoiceSelect

Voice preset to use for speech synthesis.

Default: Friendly_Person

Wise WomanFriendly PersonInspirational GirlDeep Voice ManCalm WomanCasual GuyLively GirlPatient ManYoung KnightDetermined ManLovely GirlDecent BoyImposing MannerElegant ManAbbessSweet Girl 2Exuberant Girl

SpeedNumber

Speech speed multiplier.

Default: 1

VolumeNumber

Volume level.

Default: 1

PitchNumber

Pitch adjustment.

EmotionSelect

Emotional tone of the speech delivery.

HappySadAngryFearfulDisgustedSurprisedNeutral

Sample RateSelect

Audio sample rate in Hz.

Default: 44100

16,000 Hz24,000 Hz32,000 Hz44,100 Hz (default)

BitrateSelect

Audio bitrate in bits per second.

Default: 128000

32,00064,000128,000 (default)256,000

ChannelSelect

Audio channel configuration.

MonoStereo

FormatSelect

Output audio format.

MP3WAVFLACOGGPCM

Language BoostSelect

Boost recognition for a specific language.

AutoAfrikaansArabicBulgarianCatalanChineseChinese (Yue)CroatianCzechDanishDutchEnglishFilipinoFinnishFrenchGermanGreekHebrewHindiHungarianIndonesianItalianJapaneseKoreanMalayNorwegianNynorskPersianPolishPortugueseRomanianRussianSlovakSlovenianSpanishSwedishTamilThaiTurkishUkrainianVietnamese

English NormalizationToggle Group

Improves number-reading performance in English text (dates, currencies, etc.).

Related models

Explore similar models

Start building with Minimax Speech 2.8 HD

No API keys required. Create AI-powered workflows with Minimax Speech 2.8 HD in minutes — free.

Get Started Free Explore All Models