Kling 1.6 Model through Kling API

PiAPI

PiAPI

December 20, 2024

Hi developers!

On December 19th 2024, Kuaishou, the team behind Kling AI, has just released its new model, Kling 1.6, making the announcement in a post on X! And here at PiAPI, we already have the Kling 1.6 API available for our users!

A screenshot of the official announcement on X about Kling 1.6 — Official announcement on X about Kling 1.6

With the release of Kling 1.6, Kuaishou has claimed that the new model has improved prompt adherence, more consistent and dynamic results, and a 195% overall improvement rate when compared with the Kling 1.5 model. As Kling API (before the update) is already one of the top AI video generation tools like Pika API or Luma Labs Dream Machine API, the new improvement can be a major update enhancing img2vid prompting for AI movie creation.

However, at PiAPI, we have done our own versions 1.5 and 1.6 comparisons for both text-to-video and image-to-video, with the evaluation frameworks used shown below and the subsequent results shown in the blog.

Evaluation framework

Text-to-video

For the text-to-video comparions between Kling 1.6 API and Kling 1.5 API we are going to be using the comprehensive text-to-video evaluation from Labelbox plus "text adherence". This is the same framework we had used in our previous blog comparing Luma Dream Machines 1.0 vs 1.5 namely:

• Prompt Adherence
• Text Adherence
• Video Realism
• Artifacts

Image-to-video

For the image-to-video comparisons between Kling 1.6 API and Kling 1.5 API we are going to be using the AIGCBench (Artificial Intelligence Generated Content Bench).This is the same framework we used in our previous blog about using the Kling Motion Brush through Kling API namely:

• Control-Video Alignment (Prompt adherence)
• Motion Affects
• Temporal Consistency
• Video Quality

Text-to-video Comparison

For the text-to-video comparisons between Kling 1.6 API and Kling 1.5 API, we are going to simply put the exact same prompts into both models then directly compare the resulting videos side by side.

Example 1

A GIF of a Christmas tree generated by Kling API in both versions 1.5 and 1.6 — Prompt: "The camera rotates around a large, decorated Christmas tree as snow falls gently. It slowly zooms in on the glowing golden star at the top of the Christmas tree."

Both videos have some issues with adhering to the prompt. In the video generated by the Kling 1.5 API, the golden star is not placed on top of the Christmas tree. Meanwhile, in the video generated by the Kling 1.6 API, the camera fails to rotate around the tree as expected.

While both videos are quite realistic, the Kling 1.6 video loses some points due to the snow falling indoors, which doesn't follow real-world physics. Apart from that, neither video exhibits any noticeable artifacts.

Overall, for this particular example, Kling 1.5 performs slightly better than Kling 1.6.

Example 2

A GIF of a woman running on a track field holding a bottle of water generated by Kling API in both versions 1.5 and 1.6 — Prompt: "A woman is running on a track field, holding a bottle of water and drinking from it. She is wearing a shirt with the word 'NFL' clearly printed on it"

Both videos have a woman running on a track field and holding a bottle of water, but Kling 1.6 API does not have the woman drinking from the water bottle, whereas Kling 1.5 follows the prompt by including this detail. Both models exhibit poor text adherence, as the words on the woman's shirt in both videos do not resemble "NFL" in any way. Despite this, both videos are realistic and free of visible artifacts.

Overall, Kling AI 1.5 API performs better in adhering to the prompt compared to Kling AI 1.6 API.

Example 3

A GIF of a cartoon bird with the flu sneezing into a tissue, with a billboard behind it generated by Kling API in both versions 1.5 and 1.6 — Prompt: "A cartoon-style bird, visibly sick with the flu, sneezing into a tissue. Behind it, a billboard reads 'Bird Flu Medicine' in bold, clear text."

Both videos have a cartoon-style bird sneezing into something, but they differ in how they follow the prompt. Kling 1.6 adheres more closely to the prompt by having the bird sneeze into a tissue, while Kling 1.5 shows the bird sneezing into a pink towel. As for text adherence, neither video fully follows the prompt as both videos do not display the words "bird flu medicine" on the billboards behind the birds.

In terms of animation style, both videos look good within their respective cartoon animation styles. However, there are some visible artifacts in each. In the Kling 1.6 AI API video, the hand holding the tissue appears too humanoid, resembling a human hand rather than a bird wing. Additionally, the fingers seem to pass through the tissue. In the Kling 1.5 API video, a flickering artifact briefly appears in the bottom left corner of the screen during the middle of the video.

Overall, both videos are similar in quality; with a little bit more improvement, Kling 1.6 API could even be a good animated movie generator!

Image-to-video Comparison

For the image-to-video comparisons between Kling 1.6 API and Kling 1.5 API, we will first generate an image using Midjourney API. This image, along with an identical prompt, will then be used as input for both models.

Example 1

Below is the image generated using Midjourney API, which was then used as input for Kling API in this example.

An image of Superman flying in the sky generated by Midjourney — Prompt: "A hyper-realistic side view of Superman flying through a clear blue sky, his red cape flowing dramatically behind him, arms fully extended forward in a classic flying pose."

Now that we have an image generated by Midjourney API, we will be inserting that image alongside a new prompt into both Kling 1.5 API and Kling 1.6 API, and below are the videos generated

A GIF of superman flying in the sky generated by Kling API in both versions 1.5 and 1.6 — Prompt: "Superman is flying through the sky at high speed. The camera follows him, smoothly tracking his movement. As he continues flying, the camera gradually zooms in, focusing on his face until it fills the frame."

The videos generated by both Kling 1.5 API and Kling 1.6 API closely follow the prompt, having Superman flying with the camera zooming in on his face. Both videos exhibit a similar level of dynamism, particularly with the capes flowing in the background, and maintain consistent motion throughout. There are no noticeable artifacts in either video.

Overall, the quality of both videos is similar.

Example 2

Below is the image generated using Midjourney API, which was then used as input for Kling AI API in this example.

An image of a cat in Steve Harvey's hands generated by Midjourney — Prompt: "Steve Harvey standing behind the Family Feud podium, holding a fluffy white cat in his arms. The Family Feud logo is prominently displayed on the front of the podium."

Now that we have an image generated by Midjourney API, we will be inserting that image alongside a new prompt into both Kling 1.5 AI API and Kling 1.6 AI API, and below are the videos generated

A GIF of a cat jumping out of Steve Harvey's hands generated by Kling API in both versions 1.5 and 1.6 — Prompt: "A man is holding a cat in his hands. Suddenly, the cat leaps out of his hands, jumping toward the camera. The man looks shocked and surprised as the cat jumps away from him, moving quickly toward the viewer."

Kling 1.6 AI API adheres more closely with the prompt than Kling 1.5 AI API. In the video generated by Kling 1.6, the cat jumps directly toward the camera, while in the video created by Kling 1.5, the cat jumps to the left. Both videos are dynamic, but the Kling 1.6 video is a lot more realistic. The expression on Steve Harvey's face looks more natural, and both his movements and the cat's movements appear more natural. Additionally, the Kling 1.6 video is free of visible artifacts, unlike the Kling 1.5 video, where a noticeable artifact appears after the cat jumps from Steve Harvey's hands—specifically, a small, cat-like creature that seems to appear out of nowhere, which is clearly unrealistic.

Overall, the video produced by Kling 1.6 is far better in terms of quality when compared to the one generated by Kling 1.5.

Example 3

Below is the image generated using Midjourney API, which was then used as input for Kling API in this example.

An image of an anime-style picture of a girl fishing generated by Midjourney — Prompt: "Close-up of an anime-style girl with a focused expression, standing on a wooden dock over a calm lake, holding a fishing rod. The serene water and soft, scenic background fade out, focusing on her face and fishing action."

Now that we have an image generated by Midjourney API, which looks straight out of an anime art ai generator library, we will be inserting that image alongside a new prompt into both Kling 1.5 API and Kling 1.6 API, and below are the videos generated.

A GIF of an anime-style scene of a girl fishing generated by Kling API in both versions 1.5 and 1.6 — Prompt: "Anime-style video, A girl uses all her strength to reel in a big fish from the lake, then pumps her fist in the air in triumph after catching it."

Neither video fully adheres to the prompt. In the video generated by Kling 1.5 API, the girl fails to pump her fist in the air after catching the fish. Meanwhile, in the Kling 1.6 API video, there is no fish at all.

Both videos feature dynamic motion effects. The lake's water in the background moves realistically, and the girl moves dynamically in both clips. However, the animation in the Kling 1.6 video appears much smoother overall. That said, in the Kling 1.5 video there is a very visible artifact the girl's face undergoes unnatural morphing.

Overall, Kling 1.6 is slightly better than Kling 1.5 for this example.

Conclusion

Based on the six examples provided, it's clear to see that there are minimal differences between Kling 1.5 API and Kling 1.6 API for text-to-video generation. In fact, considering the three examples we’ve analyzed, you could even argue that Kling 1.5 outperforms Kling 1.6 in this area.

However, when it comes to image-to-video generation, the Kling 1.6 API shows significant improvements, particularly in terms of movement quality and how well the model follows prompts related to movement. With that being said, we still don't believe that it is a 195% improvement when compared to Kling 1.5 like Kuaishou claimed, but these improved aspects would make the tool very valuable for workflows such as creating a motion meme.

With this major leap, we can see that Kling 1.6 is on par or even exceeds other tools on the market, such as the popular Sora video generator. It can reimagine an image to video using AI very well, it can be a tool for artwork creation (ex. if developers want to create an AI frame generator), and it can even be a great tool for lip sync AI.

We hope that you found our comparison useful! And if you are interested check out our other generative AI APIs from PiAPI!

Kling 1.6 Model through Kling API

Evaluation framework

Text-to-video

Image-to-video

Text-to-video Comparison

Example 1

Example 2

Example 3

Image-to-video Comparison

Example 1

Example 2

Example 3

Conclusion

More Stories

Nano Banana WINS over Flux Kontext: AI Image Editing Showdown

Nano Banana API vs Flux Kontext API [2025]: Pricing, Speed & Free Playground