New Release

The Future of Video Diffusion is Here

Name: Unified Video Diffusion with Kling O1: Generate, Edit, and Extend Videos
Author: Kling O1 (Omni One)

Kling O1 redefines video diffusion by unifying text-to-video, image-to-video, and advanced editing into a single, powerful model.

Generate Now Learn More

Unified Workflow

Consistent Characters

Conversational Editing

Kling O1 interface showcasing video diffusion capabilities

Unleash Your Creative Vision with Kling O1

Experience unprecedented control and creative freedom in video generation and editing.

Unified Multi-Modal Engine

Kling O1's Multi-Modal Visual Language (MVL) framework handles text, images, videos, and subjects as combinable instructions within the same large model. No more switching between tools.

Consistent Characters & Scenes

Maintain character faces, clothing details, and props consistently across multiple shots with up to 5 reference images, even with changing camera angles using our universal reference system.

Conversational Video Editing

Edit videos using natural language commands. Remove unwanted objects, change the time of day, or apply style repainting—all without manual masking or keyframing.

Effortless Video Creation in 3 Steps

From initial concept to final cut, Kling O1 streamlines your video production workflow.

Input Your Vision

Start with text prompts, reference images, or existing video clips to define your desired outcome.

Refine with Conversation

Use natural language commands to edit, adjust, and refine your video in real-time.

Generate & Share

Generate high-quality video outputs with precise duration control and native audio synchronization.

Frequently Asked Questions about Kling O1

Get answers to common questions about Kling O1 and its capabilities.

Kling O1 is the world's first unified multi-modal video foundation model, integrating text-to-video, image-to-video, video editing, style repainting, and shot extension into a single semantic space. This eliminates the need to switch between separate tools.

Kling O1's subject-based reference system uses up to 5 reference images to ensure consistent character faces, clothing, and props across multiple shots, even with changing camera angles.

Yes, Kling O1 enables conversational post-production, allowing you to perform pixel-level semantic reconstruction using natural language instructions without manual masking or keyframing.

Kling O1 supports 3-10 seconds of video generation per clip, and up to 2 minutes of continuous video with synchronized audio.