Topic

AI Audio: Voice, Speech & Music

AI for audio — real-time voice agents (Pika Me-style), text-to-speech, voice cloning (ElevenLabs), music generation (Suno, Udio), sound effects, audio editing, transcription. Anything where the output or input is audio.

June 9, 2026

What Is NVIDIA Nemotron 3.5 ASR? The Streaming Speech-to-Text Model for AI Agents

NVIDIA Nemotron 3.5 ASR is a 600M parameter streaming model supporting 40 languages. Learn how cache-aware streaming and word boosting make it agent-ready.

LLMs & ModelsIntegrationsAI Concepts

June 8, 2026

Cache-Aware Streaming ASR: How NVIDIA Nemotron 3.5 Cuts Transcription Latency

Cache-aware streaming reuses encoder states instead of reprocessing audio chunks, cutting latency by up to 17x. Here's how it works for real-time transcription.

LLMs & ModelsAI ConceptsWorkflows

June 8, 2026

What Is Miso One? The Open-Source Voice Model That Sounds Like a Real Human

Miso One is an open-weight TTS model that produces highly emotive, human-sounding speech. Here's what it can do and how it compares to closed voice models.

LLMs & ModelsAI ConceptsContent Creation

June 8, 2026

What Is NVIDIA Nemotron 3.5 ASR? The Streaming Speech-to-Text Model Explained

NVIDIA Nemotron 3.5 ASR is a 600M streaming model supporting 40 languages with cache-aware architecture. Learn how it works and when to use it.

LLMs & ModelsWorkflowsAI Concepts

June 8, 2026

Word Boosting in AI Transcription: How to Fix Product Names and Rare Vocabulary

Word boosting lets you inject custom vocabulary into ASR models at decode time—no fine-tuning needed. Here's how it works and when to use it.

WorkflowsAutomationAI Concepts

June 7, 2026

MAI Transcribe 1.5: Is Microsoft's New Model the Best Transcription AI?

MAI Transcribe 1.5 claims to be the world's most accurate transcription model and 5x faster than competitors. Here's what the data shows.

LLMs & ModelsAI ConceptsComparisons

June 7, 2026

Miso One Voice Model: The Open-Source TTS That Sounds Like a Real Human

Miso One is an open-weight voice model that claims to be the most emotive TTS available. Learn how it compares and how to run it locally.

LLMs & ModelsAI ConceptsContent Creation

June 6, 2026

MAI Transcribe 1.5: Is Microsoft's New Model Really the Best Transcription AI?

MAI Transcribe 1.5 claims to be the world's most accurate and fastest transcription model—5x faster than competitors. Here's what the benchmarks show.

LLMs & ModelsComparisonsAI Concepts

June 2, 2026

How to Use ElevenLabs Dubbing V2 to Localize AI-Generated Content at Scale

ElevenLabs Dubbing V2 preserves your voice and emotion across 175 languages. Learn how to use it to localize videos for global audiences.

IntegrationsContent CreationWorkflows

June 2, 2026

How to Use ElevenLabs Music V2 for Commercial Content: Licensed AI Music Explained

ElevenLabs Music V2 is trained on licensed data and cleared for commercial use. Learn how to generate original music for your AI workflows and content.

IntegrationsContent CreationUse Cases

June 1, 2026

ElevenLabs Dubbing V2: How to Dub Videos While Preserving Your Voice and Emotion

ElevenLabs Dubbing V2 translates videos into 175+ languages while keeping your original voice, emotion, and facial expressions. Here's how it works.

IntegrationsContent CreationUse Cases

May 30, 2026

How to Use ElevenLabs Voice Cloning to Replace AI-Generated Voices in Video

Seedance 2.0 often generates Rick and Morty-style voices. Learn how to use ElevenLabs voice cloning to replace them with original characters in your AI videos.

Content CreationVideo GenerationUse Cases

May 29, 2026

ElevenLabs Music V2 vs Suno AI: Which AI Music Generator Wins in 2026?

ElevenLabs Music V2 and Suno AI take different approaches to AI music. Compare voice quality, genre performance, multilingual support, and pricing.

LLMs & ModelsContent CreationComparisons

May 28, 2026

ElevenLabs Music V2 vs Suno AI: Which AI Music Generator Is Better?

Compare ElevenLabs Music V2 and Suno AI on voice quality, genre performance, token efficiency, and pricing to find the best AI music tool for your needs.

ComparisonsContent CreationAI Concepts

May 28, 2026

What Is ElevenLabs Music V2? AI Music Generation with Multilingual Support

ElevenLabs Music V2 is a major upgrade for AI music generation. Learn its strengths, weaknesses, pricing, and how it compares to Suno and Stable Audio.

Content CreationAI ConceptsUse Cases

May 26, 2026

How to Use Voice Agents for Business: ElevenLabs, RAG, and Calendar Booking

Voice agents are finally production-ready. Learn how to build a voice agent that handles customer questions, books appointments, and integrates with your CRM.

AutomationIntegrationsUse Cases

May 25, 2026

Stable Audio 3.0: What Open-Weight AI Music Generation Means for Content Creators

Stability AI's Stable Audio 3.0 generates 6-minute songs with open weights. Learn what it can do, how it compares to Suno, and how to use it in workflows.

Content CreationAI ConceptsUse Cases

May 24, 2026

How to Build a Voice Agent That Books Meetings Using ElevenLabs and Calendly

Learn how to build a no-code voice agent that answers questions, checks availability, and books meetings directly on your calendar using ElevenLabs.

IntegrationsAutomationUse Cases

May 24, 2026

What Is Stable Audio 3.0? Stability AI's Open-Weight Music Generation Model

Stable Audio 3.0 generates up to 6-minute songs and sound effects with open weights. Learn what it can do and how it compares to Suno and Udio.

Stable DiffusionAI ConceptsContent Creation

May 18, 2026

What Is DramaBox by Resemble AI? Open-Source Emotional Text-to-Speech Explained

DramaBox generates text-to-speech with emotional arcs, breath control, and voice cloning from 10 seconds of audio. Here's how it works and how to try it.

AI ConceptsContent CreationUse Cases

May 18, 2026

What Is LipDub? Open-Source Multilingual Lip-Sync for AI-Generated Video

LipDub is an open-source tool built on LTX that replaces dialogue in video with new speech in any language while preserving the original performance.

Video GenerationContent CreationAI Concepts

May 17, 2026

What Is LipDub? Open-Source Multilingual Lip-Sync for AI Video Explained

LipDub is an LTX-based in-context LoRA that replaces what characters say in video while preserving the original performance, camera movement, and expression.

Video GenerationAI ConceptsContent Creation

May 15, 2026

DramaBox by Resemble AI: Open-Source Text-to-Speech with Emotional Acting

DramaBox is an open-source TTS model that generates speech with pacing, breath control, and emotional arcs. Learn how to run it locally for free.

LLMs & ModelsAI ConceptsUse Cases

May 15, 2026

What Is LipDub? Multilingual Lip-Sync for AI-Generated Video Explained

LipDub is an in-context LoRA for LTX that replaces dialogue in existing videos while preserving original performance and camera movement.

Video GenerationAI ConceptsLLMs & Models