Skip to main content
Voice Design

InWorld Voice Clone

Clone voices from audio samples using InWorld AI for personalized text-to-speech synthesis with multilingual support

View details

Inputs

Loading input fields...
Execution Steps

Loading workflow structure...

Loading curated examples...

Overview

InWorld Voice Clone creates a reusable InWorld voice from a public audio sample, then generates sample audio so you can preview the result. Use it when you have permission to use a speaker's voice and want a voice ID for future InWorld text-to-speech drafts.

Use cases

  • Create a cloned voice for recurring narration, ad reads, product demos, or explainer drafts.
  • Generate a preview sample immediately after cloning to check whether the voice is usable.
  • Save a voice ID that can be reused in InWorld Text-to-Speech runs.
  • Provide a transcript to help the AI Tool process the source sample more accurately.

Input tips

  • Provide a public audioSampleUrl that can be downloaded without login.
  • Use mp3, m4a, or wav audio under 20 MB.
  • Use clear speech; 10 seconds to 5 minutes is the useful sample range.
  • Choose the language spoken in the sample.
  • Keep voiceName short and distinctive; optional voiceDescription can add context.
  • Provide audioTranscription when you have it; enable noise removal only for noticeably noisy samples.

Expected output

The AI Tool returns cloned voice metadata with voice ID, name, provider, language, creation timestamp, sample audio preview URL, validation details such as warnings, errors, detected language, or transcription when available, and cost metadata. The output view supports copying the voice ID and playing the preview sample.

Caveats

  • Only clone voices you have rights or permission to use.
  • Sample quality strongly affects clone quality; noisy, clipped, accented, or mixed-speaker audio may need review.
  • Background-noise removal can help noisy samples but may reduce quality on clean audio.
  • Validation warnings or errors may require a cleaner or longer sample.
  • This AI Tool creates a reusable voice ID and preview audio; use InWorld Text-to-Speech to generate full scripts.