Speech to Text Model

Scribe v1

ElevenLabs' first-generation speech-to-text transcription model, offering accurate audio transcription across multiple languages.

Start Building with Scribe v1 View All Models

Publisher

ElevenLabs

TypeTranscription

Price$0.007/min

Try Scribe v1 →

About Scribe v1

First-generation speech-to-text transcription from ElevenLabs

Scribe v1 is ElevenLabs' original speech-to-text model, designed to convert spoken audio into written transcripts. Built as the foundation of ElevenLabs' transcription offering, it enables developers and creators to automatically transcribe audio and video content through the ElevenLabs API. The model supports transcription across multiple languages, making it usable in multilingual workflows and automation pipelines.

Scribe v1 has been deployed in use cases ranging from voice note capture to content production tooling. It has since been succeeded by Scribe v2, which adds features such as support for 90+ languages, speaker diarization for up to 32 speakers, word-level timestamps, and entity detection. Developers starting new projects are directed by ElevenLabs to use Scribe v2, while Scribe v1 remains available for existing integrations.

Capabilities

What Scribe v1 supports

Audio Transcription

Converts spoken audio from audio and video files into written text transcripts. Accessible via the ElevenLabs API for use in automated pipelines.

Multilingual Support

Transcribes speech across a range of languages, enabling use in multilingual content workflows.

API Access

Available through the ElevenLabs API, allowing integration into developer workflows, automation pipelines, and third-party applications.

Transcript Output

Returns transcription results as structured text output suitable for downstream processing, storage, or display.

Ready to build with Scribe v1?

Get Started Free

FAQ

Common questions about Scribe v1

What is Scribe v1 used for?

Scribe v1 is used to transcribe spoken audio from audio and video files into written text. It has been used in workflows such as voice note capture, content production, and automated transcription pipelines via the ElevenLabs API.

Does Scribe v1 support multiple languages?

Yes, Scribe v1 supports transcription across multiple languages, making it suitable for multilingual workflows. However, its successor Scribe v2 expands this to 90+ languages.

What is the context window for Scribe v1?

No context window size is specified in the available metadata for Scribe v1, as it is a speech-to-text transcription model rather than a language model.

Has Scribe v1 been replaced by a newer model?

Yes. ElevenLabs has released Scribe v2, which adds speaker diarization for up to 32 speakers, support for 90+ languages, word-level timestamps, keyterm prompting, and entity detection. ElevenLabs recommends Scribe v2 for new applications.

How is Scribe v1 accessed?

Scribe v1 is accessible via the ElevenLabs API. It can be integrated into developer workflows and automation pipelines for audio and video transcription tasks.

Community Discussion

What people think about Scribe v1

Community discussions referencing Scribe v1 appear primarily in the context of broader speech-to-text benchmarking threads on r/LocalLLaMA. Participants in these threads evaluated ElevenLabs' transcription models alongside other cloud and local STT options, particularly for long-form and medical dialogue use cases.

A recurring theme in these discussions is the use of medical dialogue as a benchmark domain, where transcription accuracy and handling of specialized terminology are key concerns. The threads do not focus exclusively on Scribe v1 and cover a wide range of competing models, suggesting users evaluate it as one option among many rather than a standalone solution.

r/LocalLLaMA78 pts25 comments

I benchmarked 26 local + cloud Speech-to-Text models on long-form medical dialogue and ranked them + open-sourced the full eval

r/LocalLLaMA28 pts16 comments

Benchmark: 15 STT models on long-form medical dialogue

View more discussions →

Resources