Text to Speech Model

Gemini 3.1 Flash TTS

The Gemini 3.1 Flash TTS Preview model provides powerful, low-latency speech generation with natural outputs, steerable prompts, and new expressive audio tags for precise narration control.

Start Building with Gemini 3.1 Flash TTS View All Models

Publisher

Google

Type Text to Speech

Context Window 16,384 tokens

Training Data April 2026

Input $1.00/MTok

Output $20.00/MTok

LATEST

Try Gemini 3.1 Flash TTS →

Overview

Gemini 3.1 Flash TTS

The Gemini 3.1 Flash TTS Preview model provides powerful, low-latency speech generation with natural outputs, steerable prompts, and new expressive audio tags for precise narration control.

Ready to build with Gemini 3.1 Flash TTS?

Get Started Free

Configuration

Parameters & options

Max Response Size 16,384 tokens

Voice Select

Prebuilt voice preset to use.

Default: Kore

Zephyr (bright)Puck (upbeat)Charon (informative)Kore (firm)Fenrir (excitable)Leda (youthful)Orus (firm)Aoede (breezy)Callirhoe (easy-going)Autonoe (bright)Enceladus (breathy)Iapetus (clear)Umbriel (easy-going)Algieba (smooth)Despina (smooth)Erinome (clear)Algenib (gravelly)Rasalgethi (informative)Laomedeia (upbeat)Achernar (soft)Alnilam (firm)Schedar (even)Gacrux (mature)Pulcherrima (forward)Achird (friendly)Zubenelgenubi (casual)Vindemiatrix (gentle)Sadachbia (lively)Sadaltager (knowledgeable)Sulafat (warm)

Style Instruction Prompt

Optional natural-language direction for delivery (e.g. "Say cheerfully:", "Whisper softly:", "Narrate dramatically:"). Prepended to the input before synthesis. Leave blank for a neutral read. You can also embed expressive audio tags directly in your input text like [happy], [whisper], [laughing].

Related models