Text to Speech

Minimax Speech-02

Generate high-quality natural speech audio from text using Minimax Speech-02 models with expressive voice options and emotion control (up to 10K characters)

View details

Try it in Ampere

Inputs

Loading input fields...

Execution Steps

Loading workflow structure...

Loading curated examples...

Overview

Minimax Speech-02 turns text into speech audio with selectable voices, emotion control, language optimization, pronunciation guidance, and detailed audio settings. Use it for reliable text-to-speech drafts for ads, product videos, podcast segments, explainers, and narration up to 10,000 characters.

Use cases

Generate a spoken version of a product script before recording a final voiceover.
Create narration audio for a social video, demo, tutorial, or podcast segment.
Test voice IDs, emotion, speed, pitch, language boost, and audio formats for a production handoff.

Input tips

Keep text under 10,000 characters and listen through the result before sharing.
Use a built-in MiniMax voice ID or an approved custom cloned voice ID.
Choose speech-02-hd for the default quality path or speech-02-turbo when speed matters.
Use emotion, speed, volume, and pitch controls to tune delivery.
Add pronunciation overrides for names, acronyms, or product terms.
Choose mp3 for most previews; wav, pcm, flac, and aac are available when needed.

Expected output

The AI Tool returns one generated speech audio file with a downloadable URL, content type, file name, file size, duration, sample rate, bitrate, audio format, channel count, word count, billed-character count, status metadata, and cost metadata. The Speech-02 template renders an audio player and key technical details.

Caveats

Generated speech should be reviewed for pronunciation, tone, pacing, and brand fit.
Invalid or unavailable voice IDs will fail validation.
Emotion choices are limited to the supported named emotions; fluent and whisper are not supported.
Pronunciation overrides can help with custom terms but are not a substitute for listening review.
Longer text and richer settings can increase generation time and cost.

Minimax Speech-02

Inputs

Use cases

Input tips

Expected output

Caveats

Related AI Tools

Minimax Speech v2.8

Minimax Voice Clone

Minimax Voice Design