InWorld Voice Clone
Clone voices from audio samples using InWorld AI for personalized text-to-speech synthesis with multilingual support
View detailsInputs
Loading workflow structure...
Overview
InWorld Voice Clone creates a reusable InWorld voice from a public audio sample, then generates sample audio so you can preview the result. Use it when you have permission to use a speaker's voice and want a voice ID for future InWorld text-to-speech drafts.
Use cases
- Create a cloned voice for recurring narration, ad reads, product demos, or explainer drafts.
- Generate a preview sample immediately after cloning to check whether the voice is usable.
- Save a voice ID that can be reused in InWorld Text-to-Speech runs.
- Provide a transcript to help the AI Tool process the source sample more accurately.
Input tips
- Provide a public audioSampleUrl that can be downloaded without login.
- Use mp3, m4a, or wav audio under 20 MB.
- Use clear speech; 10 seconds to 5 minutes is the useful sample range.
- Choose the language spoken in the sample.
- Keep voiceName short and distinctive; optional voiceDescription can add context.
- Provide audioTranscription when you have it; enable noise removal only for noticeably noisy samples.
Expected output
The AI Tool returns cloned voice metadata with voice ID, name, provider, language, creation timestamp, sample audio preview URL, validation details such as warnings, errors, detected language, or transcription when available, and cost metadata. The output view supports copying the voice ID and playing the preview sample.
Caveats
- Only clone voices you have rights or permission to use.
- Sample quality strongly affects clone quality; noisy, clipped, accented, or mixed-speaker audio may need review.
- Background-noise removal can help noisy samples but may reduce quality on clean audio.
- Validation warnings or errors may require a cleaner or longer sample.
- This AI Tool creates a reusable voice ID and preview audio; use InWorld Text-to-Speech to generate full scripts.
Related AI Tools

InWorld Text-to-Speech
Generate ultra-realistic speech from text using InWorld AI TTS models with rich expressive voices and word-level timestamp alignment

Minimax Voice Clone
Clone voices from audio samples for personalized text-to-speech synthesis using Minimax voice cloning technology

Minimax Voice Design
Design custom AI voices from text descriptions for personalized text-to-speech synthesis using Minimax voice design technology