Minimax Speech-02 Async
Generate high-quality natural speech audio from long text using Minimax Speech-02 async models with expressive voice options and emotion control (up to 50K characters)
View detailsInputs
Loading workflow structure...
Overview
Minimax Speech-02 Async turns long text into natural-sounding speech audio with Speech-02 async models, selectable voices, emotion controls, language boost, pronunciation overrides, and broad audio-format options. Use it for long-form narration, explainers, training content, and script batches up to 50,000 characters.
Use cases
- Generate longer narration for explainer, training, podcast, or campaign-review scripts.
- Create one long audio draft instead of splitting content into smaller sync runs.
- Use Speech-02 HD or Turbo for long scripts that need standard MiniMax voice controls.
- Export mp3, wav, flac, pcm, or aac files for different review or handoff needs.
Input tips
- Keep text under 50,000 characters and listen through the result before sharing.
- Use a built-in MiniMax voice ID or an approved custom cloned voice ID.
- Choose speech-02-hd for the default quality path or speech-02-turbo when speed matters.
- Use <#x#> pause markers only when custom pauses are needed.
- Set language_boost to a target language or auto, and add pronunciation overrides for custom terms.
- Choose mp3, wav, pcm, flac, or aac output; set sample rate, bitrate, and mono/stereo channels as needed.
Expected output
The AI Tool returns one generated audio file with a downloadable URL, content type, file name, file size, async task ID, file ID, billed-character count, status metadata, and cost metadata. The shared Speech-02 output view renders audio playback and character-count metadata when available.
Caveats
- Async runs can take longer because the task is created, polled, retrieved, and made available for download.
- Generated speech should be reviewed for pronunciation, pacing, emotion, and brand fit.
- Invalid or unavailable system or custom voice IDs will fail validation.
- Emotion choices are limited to the supported named emotions; fluent and whisper are not supported.
- The async output schema does not guarantee duration, sample rate, bitrate, channels, or word count metadata.
- Use the standard Minimax Speech-02 AI Tool for shorter scripts that do not need the 50K limit.
Related AI Tools

Minimax Speech-02
Generate high-quality natural speech audio from text using Minimax Speech-02 models with expressive voice options and emotion control (up to 10K characters)

Minimax Speech v2.8 Async
Generate high-quality natural speech audio from long text using Minimax Speech 2.8 async models with expressive voice options and English normalization for long-form input (up to 50K characters)

Minimax Voice Clone
Clone voices from audio samples for personalized text-to-speech synthesis using Minimax voice cloning technology