Skip to main content
Text to Video

Kling v2.6 Pro Text-to-Video

Generate high-quality videos with native audio from text prompts using Kuaishou's Kling v2.6 Pro model with Chinese and English voice support

View details

Inputs

Loading input fields...
Execution Steps

Loading workflow structure...

Loading curated examples...

Overview

Kling v2.6 Pro Text-to-Video turns a written prompt into a 5- or 10-second generated video using Kuaishou's Kling v2.6 Pro model, with native audio enabled by default. Use it for prompt-only launch visuals, short ad concepts, storyboard beats, and social clips when generated audio is part of the brief.

Use cases

  • Draft a short ad scene with generated motion and audio from one written brief.
  • Create vertical, square, or widescreen variations of the same campaign moment.
  • Test spoken or sound-led story ideas in Chinese or English before production review.
  • Turn off generate_audio when you only need silent motion exploration.

Input tips

  • Write a prompt that describes subject, setting, action, camera movement, mood, and any sound or dialogue.
  • Keep prompts under 2,500 characters.
  • Choose 5 or 10 seconds; 5 seconds is the default.
  • Choose 16:9, 9:16, or 1:1 aspect ratio.
  • Leave generate_audio on for native audio; turn it off for silent drafts.
  • Use cfg_scale and negative_prompt only when prompt adherence or unwanted details need tuning.

Expected output

The AI Tool returns one generated video with a downloadable URL and optional content type, file name, file size, and cost metadata. The Kling v2.6 output view renders the video for playback, review, and download; duration is requested input, not returned as output metadata.

Caveats

  • This text-to-video AI Tool does not use source images, end frames, reference images, or explicit voice IDs.
  • Generated motion, audio, dialogue, people, products, brand marks, and text should be reviewed before use.
  • Audio is generated from the prompt; the AI Tool does not accept a separate script or audio file.
  • The schema notes Chinese and English voice output; other languages may be translated to English.
  • Duration is requested input and is not returned as video metadata.
  • Use image-to-video AI Tools when existing visuals must control the frame.