Text to Speech

ElevenLabs TTS Multilingual v2

Generate high-quality speech from text with character-level timing using Multilingual v2 model. Supports style exaggeration and speaker boost for enhanced voice quality.

View details

Try it in Ampere

Inputs

Loading input fields...

Execution Steps

Loading workflow structure...

Loading curated examples...

Overview

ElevenLabs TTS Multilingual v2 turns text into speech audio with a selected voice ID using ElevenLabs Multilingual v2. Use it for multilingual voiceover drafts, ad narration, product demos, podcasts, or explainer scripts when style exaggeration, speaker boost, speed, continuity text, and character-level timing are useful.

Use cases

Generate voiceover audio for a demo, ad, podcast segment, or explainer script.
Use style exaggeration and speaker boost to test more expressive or more voice-faithful delivery.
Generate character-level timing JSON for captions, lip-sync prep, or audio-text synchronization.
Use previous_text and next_text to improve continuity across separately generated script sections.

Input tips

Keep text under 10,000 characters.
Provide a valid ElevenLabs voice_id from available default or custom voices.
Set language only when you need Multilingual v2 to enforce a specific language.
Use stability, similarity_boost, style, speed, and speaker boost to shape delivery.
Choose output_format only when a specific audio handoff format matters; otherwise use the default.
Use SSML break tags up to 3 seconds, dashes, or ellipses for pauses and hesitation.
Use seed for repeatability, but treat it as best effort.

Expected output

The AI Tool returns one generated speech audio file with downloadable URL, content type, optional file size, alignment JSON URLs when available, and cost metadata. The shared ElevenLabs TTS view renders an audio player and download links for character-level timing and normalized timing when present.

Caveats

Voice IDs must be valid and permitted for use.
Generated speech should be reviewed for pronunciation, tone, pacing, and brand fit.
Style, similarity, speaker boost, speed, and latency settings can change delivery and generation time.
Seeded generation is best effort; exact determinism is not guaranteed.
Timing JSON is auxiliary synchronization data, not a full transcript editor.
Some requested formats or language settings may fail if unsupported by the selected model.

ElevenLabs TTS Multilingual v2

Inputs

Use cases

Input tips

Expected output

Caveats

Related AI Tools

ElevenLabs TTS v3

ElevenLabs Dialogue v3

Minimax Speech v2.8