New Release

The Future of Video Creation is Here: Multi-Modal Video Generation

Generate videos from text, images, and more, all within a unified multi-modal video engine. Experience unmatched creative flexibility with Kling O1.

Cutting-Edge AI
Unified Workflow
Limitless Creativity
Kling O1 Multi-Modal Video Generator Interface

Unleash Your Creativity with Our Multi-Modal Video Generator

Kling O1's unified video foundation model allows you to create videos in ways you never thought possible.

Unified Multi-Modal Video Engine

Unified Multi-Modal Video Engine

Generate videos from text, images, subjects, and more, all within one unified semantic space. No more switching between separate tools and plugins. Powered by Kuaishou's innovative Kling O1.

Conversational Video Editing

Conversational Video Editing

Edit your videos using natural language commands. Remove unwanted objects, change the time of day, or apply different styles without manual masking or keyframing. Effortless video post-production with Kling O1.

Consistent Characters Across Shots

Consistent Characters Across Shots

Maintain consistent character faces, clothing, and props across multiple shots, even with changing camera angles. Use up to 5 reference images to build your subject. Perfect for creating engaging storylines using Kling O1.

Create Stunning Videos in Three Easy Steps

Leverage the power of Kling O1 to bring your visions to life.

1

Input Your Media

Upload text, images, or video clips. Use up to 5 reference photos to ensure consistent character design using the subject based reference system.

2

Customize Your Video

Use natural language commands to edit, style, and extend your video. Control duration up to 10 seconds per clip and easily create transitions.

3

Generate and Share

Generate your final video with synchronized audio. Share your creations with the world.

Frequently Asked Questions

Learn more about creating multi-modal videos with Kling O1.

You can use text prompts, images, and existing video clips as input. Kling O1 supports a wide range of formats.
Kling O1 uses a subject-based reference system, allowing you to upload up to 5 reference images to maintain consistent character features, clothing, and props across multiple shots.
You can perform pixel-level semantic reconstruction by using natural language commands such as 'remove the passerby in the background', 'turn daytime into dusk', or 'change it to pixel art style' without masking!
Kling O1 supports video generation from 3-10 seconds per clip, and can generate up to 2 minutes of continuous video with synchronized audio.

Ready to Create Stunning Videos with Kling O1?

Experience the power of unified multi-modal video generation.