Skip to main content
Text to Speech

Minimax Speech-02 Async

Generate high-quality natural speech audio from long text using Minimax Speech-02 async models with expressive voice options and emotion control (up to 50K characters)

View details

Inputs

Loading input fields...
Execution Steps

Loading workflow structure...

Loading curated examples...

Overview

Minimax Speech-02 Async turns long text into natural-sounding speech audio with Speech-02 async models, selectable voices, emotion controls, language boost, pronunciation overrides, and broad audio-format options. Use it for long-form narration, explainers, training content, and script batches up to 50,000 characters.

Use cases

  • Generate longer narration for explainer, training, podcast, or campaign-review scripts.
  • Create one long audio draft instead of splitting content into smaller sync runs.
  • Use Speech-02 HD or Turbo for long scripts that need standard MiniMax voice controls.
  • Export mp3, wav, flac, pcm, or aac files for different review or handoff needs.

Input tips

  • Keep text under 50,000 characters and listen through the result before sharing.
  • Use a built-in MiniMax voice ID or an approved custom cloned voice ID.
  • Choose speech-02-hd for the default quality path or speech-02-turbo when speed matters.
  • Use <#x#> pause markers only when custom pauses are needed.
  • Set language_boost to a target language or auto, and add pronunciation overrides for custom terms.
  • Choose mp3, wav, pcm, flac, or aac output; set sample rate, bitrate, and mono/stereo channels as needed.

Expected output

The AI Tool returns one generated audio file with a downloadable URL, content type, file name, file size, async task ID, file ID, billed-character count, status metadata, and cost metadata. The shared Speech-02 output view renders audio playback and character-count metadata when available.

Caveats

  • Async runs can take longer because the task is created, polled, retrieved, and made available for download.
  • Generated speech should be reviewed for pronunciation, pacing, emotion, and brand fit.
  • Invalid or unavailable system or custom voice IDs will fail validation.
  • Emotion choices are limited to the supported named emotions; fluent and whisper are not supported.
  • The async output schema does not guarantee duration, sample rate, bitrate, channels, or word count metadata.
  • Use the standard Minimax Speech-02 AI Tool for shorter scripts that do not need the 50K limit.