New Release

The Future of Video Diffusion is Here

Kling O1 redefines video diffusion by unifying text-to-video, image-to-video, and advanced editing into a single, powerful model.

Unified Workflow
Consistent Characters
Conversational Editing
Kling O1 interface showcasing video diffusion capabilities

Unleash Your Creative Vision with Kling O1

Experience unprecedented control and creative freedom in video generation and editing.

Unified Multi-Modal Engine

Unified Multi-Modal Engine

Kling O1's Multi-Modal Visual Language (MVL) framework handles text, images, videos, and subjects as combinable instructions within the same large model. No more switching between tools.

Consistent Characters & Scenes

Consistent Characters & Scenes

Maintain character faces, clothing details, and props consistently across multiple shots with up to 5 reference images, even with changing camera angles using our universal reference system.

Conversational Video Editing

Conversational Video Editing

Edit videos using natural language commands. Remove unwanted objects, change the time of day, or apply style repainting—all without manual masking or keyframing.

Effortless Video Creation in 3 Steps

From initial concept to final cut, Kling O1 streamlines your video production workflow.

1

Input Your Vision

Start with text prompts, reference images, or existing video clips to define your desired outcome.

2

Refine with Conversation

Use natural language commands to edit, adjust, and refine your video in real-time.

3

Generate & Share

Generate high-quality video outputs with precise duration control and native audio synchronization.

Frequently Asked Questions about Kling O1

Get answers to common questions about Kling O1 and its capabilities.

Kling O1 is the world's first unified multi-modal video foundation model, integrating text-to-video, image-to-video, video editing, style repainting, and shot extension into a single semantic space. This eliminates the need to switch between separate tools.
Kling O1's subject-based reference system uses up to 5 reference images to ensure consistent character faces, clothing, and props across multiple shots, even with changing camera angles.
Yes, Kling O1 enables conversational post-production, allowing you to perform pixel-level semantic reconstruction using natural language instructions without manual masking or keyframing.
Kling O1 supports 3-10 seconds of video generation per clip, and up to 2 minutes of continuous video with synchronized audio.

Explore More

Discover other tools to enhance your workflow.

Ready to experience the future of video creation?

Join the Kling O1 revolution and unlock unprecedented creative possibilities.