Transcribe Audio
The Transcribe Audio block takes an audio file and converts its spoken content into text using a transcription model.
Convert audio files to text using AI transcription
The Transcribe Audio block takes an audio file and converts its spoken content into text using a transcription model. It accepts a URL pointing to the audio file and returns a single output field, text, containing the transcribed content as a string. An optional prompt field lets you supply context — such as the language being spoken, speaker names, or subject matter — to help the model produce more accurate results.
The block also supports an optional transcription model override, allowing you to specify a particular model configuration rather than relying on the workflow's default. If no override is provided, the block falls back to whatever transcription model is set at the workflow level. The transcribed text can be saved directly to a variable using the destinationVar field, making it straightforward to pass the output into downstream blocks.
This block fits into workflows that need to process spoken audio — for example, summarizing recorded meetings, extracting information from voice memos, generating captions for audio content, or feeding transcribed speech into further AI analysis steps. It sits naturally at the beginning of a pipeline wherever audio input needs to be converted into text before additional processing.
What you can build
Real-world workflows powered by the Transcribe Audio block.
Meeting Summary Automation
Transcribe a recorded meeting audio file and pass the text to a summarization block to generate concise meeting notes automatically.
Voice Memo Processing
Convert voice memos from a URL into text so the content can be searched, tagged, or stored in a database.
Podcast Content Extraction
Transcribe podcast episodes to extract quotes, topics, or key points for content repurposing workflows.
Customer Call Analysis
Transcribe recorded customer support calls and route the text through sentiment analysis or issue classification blocks.
Audio Caption Generation
Convert audio content to text as a first step in generating captions or subtitles for video and audio media.
Multilingual Audio Intake
Use the prompt field to specify the spoken language, then transcribe audio submissions from users in different languages for downstream translation or analysis.
Ready to add Transcribe Audio to your workflow?
Get Started FreeCommon questions about Transcribe Audio
What parameters are required to use this block?
The only required parameter is audioUrl, which must be a URL pointing to the audio file you want to transcribe. The prompt, destinationVar, and transcriptionModelOverride fields are all optional.
What does the block return?
The block returns an object with a single field, text, which is a string containing the transcribed content of the audio file.
What is the prompt field used for?
The prompt field accepts an optional string that provides context to the transcription model, such as the language being spoken, speaker names, or domain-specific terminology, to help improve transcription accuracy.
Can I choose which transcription model is used?
Yes. The transcriptionModelOverride field lets you specify a particular transcription model configuration. If you leave it empty, the block uses the default transcription model set at the workflow level.
What kinds of workflows commonly use this block?
This block is commonly used at the start of workflows that process spoken audio, such as meeting summarization, customer call analysis, voice memo extraction, and any pipeline where audio content needs to be converted to text before further AI processing.
Related capabilities
Add Transcribe Audio to your workflow
Build powerful AI workflows with drag-and-drop blocks. No coding required.