Minimax Speech-02
Generate high-quality natural speech audio from text using Minimax Speech-02 models with expressive voice options and emotion control (up to 10K characters)
View detailsInputs
Loading workflow structure...
Overview
Minimax Speech-02 turns text into speech audio with selectable voices, emotion control, language optimization, pronunciation guidance, and detailed audio settings. Use it for reliable text-to-speech drafts for ads, product videos, podcast segments, explainers, and narration up to 10,000 characters.
Use cases
- Generate a spoken version of a product script before recording a final voiceover.
- Create narration audio for a social video, demo, tutorial, or podcast segment.
- Test voice IDs, emotion, speed, pitch, language boost, and audio formats for a production handoff.
Input tips
- Keep text under 10,000 characters and listen through the result before sharing.
- Use a built-in MiniMax voice ID or an approved custom cloned voice ID.
- Choose speech-02-hd for the default quality path or speech-02-turbo when speed matters.
- Use emotion, speed, volume, and pitch controls to tune delivery.
- Add pronunciation overrides for names, acronyms, or product terms.
- Choose mp3 for most previews; wav, pcm, flac, and aac are available when needed.
Expected output
The AI Tool returns one generated speech audio file with a downloadable URL, content type, file name, file size, duration, sample rate, bitrate, audio format, channel count, word count, billed-character count, status metadata, and cost metadata. The Speech-02 template renders an audio player and key technical details.
Caveats
- Generated speech should be reviewed for pronunciation, tone, pacing, and brand fit.
- Invalid or unavailable voice IDs will fail validation.
- Emotion choices are limited to the supported named emotions; fluent and whisper are not supported.
- Pronunciation overrides can help with custom terms but are not a substitute for listening review.
- Longer text and richer settings can increase generation time and cost.
Related AI Tools

Minimax Speech v2.8
Generate high-quality natural speech audio from text using Minimax Speech v2.8 models with expressive voice options and emotion control (up to 10K characters)

Minimax Voice Clone
Clone voices from audio samples for personalized text-to-speech synthesis using Minimax voice cloning technology

Minimax Voice Design
Design custom AI voices from text descriptions for personalized text-to-speech synthesis using Minimax voice design technology