AI Generation

Text to Speech

The Text to Speech block takes a string of text and generates an audio file from it using a speech model.

Start Building with Text to Speech View All Capabilities

Overview

Convert text into a generated audio file

The Text to Speech block takes a string of text and generates an audio file from it using a speech model. The primary input is the text field, which contains the exact words to be spoken. You can optionally specify a destination variable to store the resulting audio URL, and you can configure whether the generated asset is hidden from the gallery using the intermediateAsset flag. The block also accepts an optional speech model override; if none is provided, it falls back to the workflow's default speech model settings.

Use cases

What you can build

Real-world workflows powered by the Text to Speech block.

Narrated Article Delivery

Convert a written article or blog post into spoken audio so users can listen to content instead of reading it.

Voice-Enabled Chatbot Responses

Take text responses generated by an AI model and convert them to audio for a voice-based conversational interface.

Language Learning Pronunciation

Generate spoken audio from vocabulary words or phrases to help users hear correct pronunciation in a language learning workflow.

Accessibility Audio Output

Produce audio versions of text-based content to make workflows accessible to users who prefer or require audio output.

Automated Podcast Production

Feed a script into the block to generate a spoken audio file as part of an automated podcast or audio content pipeline.

Notification Audio Alerts

Convert short notification or alert messages into audio files that can be played back to users within an application workflow.

Ready to add Text to Speech to your workflow?

Get Started Free

FAQ

Common questions about Text to Speech

What are the required parameters for this block?

The only required parameter is the text field, which contains the exact string of words to be spoken. All other fields — destinationVar, intermediateAsset, and speechModelOverride — are optional.

What does the block return?

The block outputs an audioUrl field, which is a string containing the URL of the generated audio file.

How do I access the generated audio URL in later steps of my workflow?

You can specify a variable name in the destinationVar field. The block will save the audio URL to that variable, making it available for use in subsequent blocks.

Which speech model does the block use?

By default, the block uses the speech model configured at the workflow level. You can override this on a per-block basis using the optional speechModelOverride parameter.

What kinds of workflows commonly use this block?

This block fits into workflows that produce audio output for users, such as voice assistants, narrated content delivery, language learning tools, and accessibility-focused applications. It can also be used in background processing pipelines where the audio URL is stored in a variable for later use.

Related capabilities

Add Text to Speech to your workflow

Build powerful AI workflows with drag-and-drop blocks. No coding required.

Get Started Free Explore All Capabilities

Text to Speech

Convert text into a generated audio file

What you can build

Narrated Article Delivery

Voice-Enabled Chatbot Responses

Language Learning Pronunciation

Accessibility Audio Output

Automated Podcast Production

Notification Audio Alerts

Common questions about Text to Speech

Related capabilities

User Message

Generate Image

Generate Video

Generate Chart

Generate Music

Generate LipSync

Add Text to Speech to your workflow