Skip to main content
Avatar Video

Hunyuan Avatar

Generate avatar videos from a portrait image and audio using Tencent's Hunyuan Avatar model for natural lip-synced speech animation with turbo mode support

View details

Inputs

Loading input fields...
Execution Steps

Loading workflow structure...

Loading curated examples...

Overview

Hunyuan Avatar turns a portrait image and audio track into a lip-synced avatar video using Tencent's Hunyuan Avatar model. Use it for founder or spokesperson clips, character explainers, social talking-head drafts, and campaign tests when you already have the image and voice audio.

Use cases

  • Turn a founder or product narrator portrait and voiceover into a short talking-video draft.
  • Create social or explainer avatar clips from a character image and premade audio.
  • Use frame count and inference steps to test timing and quality tradeoffs for short clips.
  • Use a seed when you want a repeatable follow-up run.

Input tips

  • Provide public image_url and audio_url values that can be fetched without login.
  • Use a clear portrait image with a visible face and clean speech audio.
  • Add optional text only when scene context should guide the avatar video.
  • Choose 129-401 frames at 25 FPS; output length may be capped by the audio length.
  • Use 30-50 inference steps; higher values can take longer.
  • Leave turbo_mode on for the default faster path; use seed for repeatable variants.

Expected output

The AI Tool returns one generated avatar video with a downloadable URL, duration in seconds, optional content type, file name, file size, and cost metadata. The shared video output view renders the video for playback, review, and download and shows duration when available.

Caveats

  • Private, expired, or blocked image and audio URLs will fail.
  • Poor audio, cropped portraits, low-quality faces, or mismatched speech can reduce lip-sync quality.
  • Generated facial motion should be reviewed for realism, consent, brand fit, and policy fit.
  • This AI Tool uses premade audio; it does not create the voice track, transcript, captions, or script.
  • Frame count is generated at 25 FPS and may be capped by input audio length.
  • Turbo mode and inference-step choices guide speed and quality but do not guarantee a specific result.