Skip to main content
Avatar Video

MultiTalk Multi-Speaker Video

Generate talking avatar videos with two speakers conversing from a portrait image and two text inputs using MultiTalk with dual voice synthesis

View details

Inputs

Loading input fields...
Execution Steps

Loading workflow structure...

Loading curated examples...

Overview

MultiTalk Multi-Speaker Video turns one portrait image and two written speaker lines into a talking-avatar conversation with built-in voice synthesis. Use it for podcast-style snippets, interview drafts, dialogue explainers, and campaign review clips when you want to write both speakers instead of supplying audio.

Use cases

  • Create a two-speaker talking-avatar draft from one portrait and two script lines.
  • Prototype dialogue for an interview, podcast teaser, explainer, or campaign concept.
  • Assign different built-in voices to speaker one and speaker two.
  • Compare frame count, resolution, acceleration, and seed settings for review variants.

Input tips

  • Provide a public image_url that can be fetched without login.
  • Write a prompt describing the conversation setting, speaker behavior, and visual style.
  • Put the first speaker's line in first_text_input and the second speaker's line in second_text_input.
  • Choose voice1 and voice2 from the supported built-in voices; defaults are Sarah and Roger.
  • Keep both script lines concise enough for the intended clip length.
  • Choose 41-241 frames; 191 is the default.
  • Use 480p or 720p resolution, plus acceleration and seed for draft variants.

Expected output

The AI Tool returns one generated talking-avatar video with a downloadable URL, duration in seconds, optional content type, file name, file size, the seed used, and cost metadata. The MultiTalk output view renders video playback and shows the model label plus seed.

Caveats

  • This AI Tool synthesizes both voices from text; use MultiTalk Multi-Speaker Audio-to-Video when you have audio tracks.
  • Voice options are limited to the supported built-in list.
  • One portrait may limit how distinct two speakers feel; review whether the speaker setup is clear.
  • Private, expired, or blocked image URLs will fail.
  • Generated voices and facial motion should be reviewed for realism, consent, brand fit, and policy fit.
  • Frame count, resolution, and acceleration settings guide generation, but output still needs timing review.