ElevenLabs Dialogue v3
Generate multi-speaker dialogue audio from text inputs with precise voice segment timing using Turbo v3 model. Ideal for podcasts, conversations, and character dialogues.
View detailsInputs
Loading workflow structure...
Overview
ElevenLabs Dialogue v3 generates multi-speaker dialogue audio from a sequence of text turns, each paired with a voice ID. Use it for podcast-style exchanges, character conversations, multi-speaker ad reads, or scripted product conversations that need speaker timing.
Use cases
- Turn a two- or multi-speaker script into generated conversation audio.
- Create a podcast intro, customer-style scene, or character dialogue draft for a campaign asset.
- Generate speaker timing data for editing, captions, or downstream audio/video synchronization.
- Test voices, language settings, stability, and text normalization before a final recording.
Input tips
- Add 1-50 dialogue turns; each turn needs text and a voice_id.
- Use different voice IDs for different speakers, and keep speaker turns clearly separated.
- Use supported audio tags such as [laughing], [whispering], [pause], or emotion cues only when they fit the script.
- Set language only when you need the model to enforce a specific ISO language code.
- Adjust stability when you need more emotional range or more consistent delivery.
- Choose output_format only when a specific handoff format matters; otherwise use the default.
- Use seed for repeatability, but do not treat it as guaranteed.
Expected output
The AI Tool returns one generated multi-speaker audio file with downloadable URL, content type, optional file size, alignment JSON URLs when available, a voice-segments JSON URL, and cost metadata. The output view renders an audio player and, when speaker data is available, a speaker timeline with downloadable JSON.
Caveats
- Voice IDs must be valid and usable for the selected dialogue.
- Generated voices, timing, emotional tags, and pronunciation need listening review before publishing.
- Seeded generation is best effort; exact determinism is not guaranteed.
- Speaker timeline and alignment files are auxiliary JSON downloads, not a full transcript editor.
- Long, ambiguous, or poorly separated turns can make speaker attribution harder to review.
- This AI Tool generates audio only; it does not create video or avatars.
Related AI Tools

ElevenLabs TTS v3
Generate high-quality speech from text with character-level timing using Turbo v3 model. Fast generation with 29 language support.

ElevenLabs TTS Multilingual v2
Generate high-quality speech from text with character-level timing using Multilingual v2 model. Supports style exaggeration and speaker boost for enhanced voice quality.

Minimax Speech v2.8
Generate high-quality natural speech audio from text using Minimax Speech v2.8 models with expressive voice options and emotion control (up to 10K characters)