Skip to main content
Text to Speech

Minimax Speech v2.8 Async

Generate high-quality natural speech audio from long text using Minimax Speech 2.8 async models with expressive voice options and English normalization for long-form input (up to 50K characters)

View details

Inputs

Loading input fields...
Execution Steps

Loading workflow structure...

Loading curated examples...

Overview

Minimax Speech v2.8 Async turns long text into natural-sounding speech audio using MiniMax Speech 2.8 async models. Use it for long-form narration, training content, podcast segments, and script batches up to 50,000 characters when the standard v2.8 AI Tool's 10,000-character limit is too small.

Use cases

  • Generate narration for longer product walkthroughs, lessons, or podcast-style segments.
  • Turn a long campaign script into one audio file without splitting it into 10K-character chunks.
  • Test HD versus Turbo model choices for longer voiceover drafts.
  • Use pronunciation overrides for product names, acronyms, or brand terms in long scripts.

Input tips

  • Keep text under 50,000 characters and review it for pronunciation before generating audio.
  • Use a built-in MiniMax voice ID or an approved custom cloned voice ID.
  • Choose speech-2.8-hd for default quality or speech-2.8-turbo when speed matters.
  • Use <#x#> pause markers only when custom pauses are needed.
  • Set language_boost to a target language or auto for automatic detection.
  • Choose mp3, pcm, or flac output; set sample rate, bitrate, and mono/stereo channels for the handoff.
  • Use emotion, speed, volume, pitch, and sound effects sparingly for long scripts.

Expected output

The AI Tool returns one generated audio file with a downloadable URL, content type, file name, file size, async task ID, file ID, billed-character count, status metadata, and cost metadata. The shared Minimax speech output view renders an audio player and character-count metadata when available.

Caveats

  • Async runs can take longer because the task is created, polled, retrieved, and then saved.
  • Generated speech should be reviewed for pronunciation, pacing, emotion, and brand fit.
  • Invalid or unavailable system or custom voice IDs will fail validation.
  • The async output schema does not guarantee duration, sample rate, bitrate, channels, or word count metadata.
  • Pronunciation overrides and language boost help, but long scripts still need listening review.
  • Use the standard Minimax Speech v2.8 AI Tool for shorter scripts that do not need the 50K limit.