AI Audio: Voice, Speech &amp; Music

GPT & OpenAIWorkflowsAutomation

How to Build a Voice Agent with Real-Time Translation Using OpenAI GPT Realtime 2

OpenAI GPT Realtime 2 supports live translation across 70 languages. Learn how to build a real-time translation voice agent using the API and agentic tools.

LLMs & ModelsAI ConceptsUse Cases

What Is IBM Granite Speech 4.1? Three ASR Models and When to Use Each

IBM Granite Speech 4.1 offers three ASR models: a base model, a Plus model with diarization, and a non-auto-regressive model for ultra-fast bulk transcription.

GPT & OpenAIGeminiComparisons

OpenAI GPT Realtime 2 vs Google Gemini TTS: Which AI Voice API Wins?

Compare OpenAI GPT Realtime 2 and Google Gemini TTS on expressiveness, speed, language support, and agentic capabilities to choose the right voice API.

WorkflowsAutomationAI Concepts

How to Add Speaker Diarization to Your AI Transcription Workflow

Speaker diarization identifies who said what in audio. Learn how IBM Granite Speech 4.1 Plus adds speaker labels, word timestamps, and incremental decoding.

May 12, 2026

GPT Realtime 2 vs GPT Realtime Translate: Which Voice Model Do You Need?

OpenAI's new voice models serve different use cases. Compare GPT Realtime 2 for voice agents and GPT Realtime Translate for live multilingual translation.

GPT & OpenAILLMs & ModelsComparisons

May 12, 2026

What Is Speaker Diarization? How IBM Granite Speech 4.1 Plus Identifies Speakers

Speaker diarization labels who said what in a transcript. Learn how IBM Granite Speech 4.1 Plus handles speaker attribution and word-level timestamps.

LLMs & ModelsWorkflowsAI Concepts

May 11, 2026

How to Build a Live Translation Voice Agent with OpenAI's GPT Realtime API

GPT Realtime Translate supports 70+ input languages with real-time speech translation. Learn how to build a live translation agent using the API.

GPT & OpenAIWorkflowsIntegrations

May 11, 2026

GPT Realtime 2 vs GPT Realtime Translate vs Whisper: Which Voice Model Do You Need?

OpenAI released three new realtime voice models. Compare GPT Realtime 2, Translate, and Whisper to find the right one for your voice agent.

GPT & OpenAILLMs & ModelsComparisons

GPT & OpenAIMulti-AgentLLMs & Models

GPT Realtime 2 Can Stay Silent on Command and Keep Listening — Here's Why That Changes Voice Agents

GPT Realtime 2 can be told to go silent, listen to a side conversation, and re-engage on command — solving the biggest friction point in live voice agents.

ComparisonsGPT & OpenAILLMs & Models

GPT Realtime Translate vs Traditional Real-Time Translation APIs — Is OpenAI's Pace-Matched Approach Worth It?

GPT Realtime Translate waits for verb-position keywords before translating, producing more natural dialogue. Here's how it stacks up against existing solutions.

GPT & OpenAILLMs & ModelsAI Concepts

GPT Realtime Voice Models: GPT Realtime 2, Translate, and Whisper Explained

OpenAI released three new realtime voice models with GPT-5 reasoning, live translation across 70 languages, and streaming speech-to-text. Here's what each does.

GPT & OpenAIWorkflowsAutomation

How to Build a Voice Agent with OpenAI's Realtime API: Step-by-Step Setup Guide

OpenAI's Realtime API now supports reasoning, tool calls, and interruption handling. Here's how to set up your first voice agent from scratch.

GPT & OpenAILLMs & ModelsWorkflows

OpenAI Launches 3 New Realtime Voice API Models: What Builders Need to Know Right Now

OpenAI dropped three new realtime voice API models at once: a reasoning voice agent, a live translator, and a streaming transcription model. Here's what's new.

GPT & OpenAIAutomationWorkflows

How to Build a Production Voice Agent with GPT Realtime 2 API: Step-by-Step Setup Guide

GPT Realtime 2 supports reasoning and parallel tool calls during voice. Here's how to set it up via API and avoid the silence problem with preambles.

GPT & OpenAIWorkflowsIntegrations

How to Build a Voice Agent with Real-Time Translation Using OpenAI's API

GPT Realtime Translate supports 70+ input languages with live speech translation. Learn how to build a multilingual voice agent using OpenAI's new API.

GPT & OpenAILLMs & ModelsAutomation

GPT Realtime 2's 'Stay Quiet' Command Is a New Voice AI Primitive — Here's What It Unlocks

You can now tell GPT Realtime 2 to listen silently while you have a side conversation. This single feature changes how voice agents handle real meetings.

GPT & OpenAILLMs & ModelsComparisons

GPT Realtime Translate vs Traditional Interpretation: Is 70-Language Live AI Translation Ready for Production?

GPT Realtime Translate handles 70+ languages and maintains speaker pace. Here's how it compares to traditional interpretation pipelines for real use cases.

GPT & OpenAILLMs & ModelsAI Concepts

GPT Realtime Voice Models Explained: GPT Realtime 2, Translate, and Whisper

OpenAI released three new realtime voice models via API. Here's what GPT Realtime 2, Realtime Translate, and Realtime Whisper do and when to use each.

LLMs & ModelsAutomationAI Concepts

IBM Granite Speech 4.1 Transcribes an Hour of Audio in 2 Seconds: 5 Things That Make It Different

IBM's Granite Speech 4.1 hits 1820x real-time speed and leads the Hugging Face ASR leaderboard at 5.33% WER. Here's what makes the architecture different.

LLMs & ModelsComparisonsOptimization

IBM Granite Speech 4.1 vs Whisper X: Should You Switch Your Transcription Pipeline?

Granite Speech 4.1 Plus beats customized Whisper X on word-level timestamps and leads the open ASR leaderboard. Here's when to switch and when to stay.