Minimax Voice Clone
Clone voices from audio samples for personalized text-to-speech synthesis using Minimax voice cloning technology
View detailsInputs
Loading workflow structure...
Overview
Minimax Voice Clone creates a reusable MiniMax voice from a real audio sample, then generates a short preview so the result can be checked before using it in text-to-speech. Use it only for voices you have permission to clone for narration, ads, demos, or branded audio workflows.
Use cases
- Create an approved founder or spokesperson voice for future MiniMax speech generation.
- Prepare a reusable voice for product demos, ad mockups, podcast segments, or explainer narration.
- Clone a voice sample, listen to the preview, then use the voice ID in MiniMax TTS AI Tools.
Input tips
- Use only audio from a speaker who has approved the cloning use case.
- Provide a public mainAudioUrl in mp3, m4a, or wav format, 10 seconds to 5 minutes, up to 20 MB.
- Give the voice a clear name so it is easy to select later.
- Add a voice description when you need usage notes or speaker context.
- Optional prompt audio must be under 8 seconds and must include matching clonePromptText.
- Use noise reduction or volume normalization when the sample needs cleanup.
Expected output
The AI Tool returns a cloned voice record with its name, MiniMax voice ID, provider label, creation time, and a generated sample-audio preview. The output view lets you copy the voice ID and play the preview before using the voice in MiniMax text-to-speech AI Tools.
Caveats
- Do not clone voices without clear permission and an appropriate use case.
- Main audio shorter than 10 seconds fails; longer audio may be truncated for the clone.
- Prompt audio improves similarity and stability only when paired with accurate matching text.
- Noisy, compressed, overlapping, or music-backed samples can reduce clone quality.
- Always listen to the preview before using the voice in production drafts.
Related AI Tools

Minimax Speech-02
Generate high-quality natural speech audio from text using Minimax Speech-02 models with expressive voice options and emotion control (up to 10K characters)

Minimax Speech v2.8
Generate high-quality natural speech audio from text using Minimax Speech v2.8 models with expressive voice options and emotion control (up to 10K characters)

Minimax Voice Design
Design custom AI voices from text descriptions for personalized text-to-speech synthesis using Minimax voice design technology