Text to Speech

Minimax Speech-02 Async

Generate high-quality natural speech audio from long text using Minimax Speech-02 async models with expressive voice options and emotion control (up to 50K characters)

View details

Try it in Ampere

Inputs

Loading input fields...

Execution Steps

Loading workflow structure...

Loading curated examples...

Overview

Minimax Speech-02 Async turns long text into natural-sounding speech audio with Speech-02 async models, selectable voices, emotion controls, language boost, pronunciation overrides, and broad audio-format options. Use it for long-form narration, explainers, training content, and script batches up to 50,000 characters.

Use cases

Generate longer narration for explainer, training, podcast, or campaign-review scripts.
Create one long audio draft instead of splitting content into smaller sync runs.
Use Speech-02 HD or Turbo for long scripts that need standard MiniMax voice controls.
Export mp3, wav, flac, pcm, or aac files for different review or handoff needs.

Input tips

Keep text under 50,000 characters and listen through the result before sharing.
Use a built-in MiniMax voice ID or an approved custom cloned voice ID.
Choose speech-02-hd for the default quality path or speech-02-turbo when speed matters.
Use <#x#> pause markers only when custom pauses are needed.
Set language_boost to a target language or auto, and add pronunciation overrides for custom terms.
Choose mp3, wav, pcm, flac, or aac output; set sample rate, bitrate, and mono/stereo channels as needed.

Expected output

The AI Tool returns one generated audio file with a downloadable URL, content type, file name, file size, async task ID, file ID, billed-character count, status metadata, and cost metadata. The shared Speech-02 output view renders audio playback and character-count metadata when available.

Caveats

Async runs can take longer because the task is created, polled, retrieved, and made available for download.
Generated speech should be reviewed for pronunciation, pacing, emotion, and brand fit.
Invalid or unavailable system or custom voice IDs will fail validation.
Emotion choices are limited to the supported named emotions; fluent and whisper are not supported.
The async output schema does not guarantee duration, sample rate, bitrate, channels, or word count metadata.
Use the standard Minimax Speech-02 AI Tool for shorter scripts that do not need the 50K limit.

Minimax Speech-02 Async

Inputs

Use cases

Input tips

Expected output

Caveats

Related AI Tools

Minimax Speech-02

Minimax Speech v2.8 Async

Minimax Voice Clone