Kling 1.5 vs 1.0 - A Comparison through Kling API

A logo of PiAPI
PiAPI

The wait is finally over!

On September 19th 2024, Kling, the team behind one of the most advanced text/image-to-video generative AI models currently on the market, has just released its new model, Kling 1.5, making the announcement on their official website.

Kling 1.5 Release

A screenshot of Kling's announcement of Kling 1.5's release on their official website
Official announcement of Kling 1.5's release on their official website

With the new 1.5 version update, Kling promises improvements in image quality, dynamic quality, and prompt relevance. Kling has also added a new motion brush feature, where you can precisely define the movement of any element in your image, giving you unparalleled control over the motion and performance of your videos.

Their tests suggest a 95% performance increase, but we’ll be conducting our own comparison to confirm. Given PiAPI's position as the leading provider of generative APIs, including the Kling API, we are committed to extensively testing our products before launch and maintaining that quality through continuous testing post-release.

Evaluation Framework

Regarding the evaluation framework used for this comparison, we have taken the comprehensive text-to-video evaluation framework from Labelbox, while adding "Text Adherence" into the mix since judging from our user feedback, text adherence is an important aspect of generative video for it to become more prevalent as a productivity tool than something just used for entertainment. This is the same framework we had used in our previous blog comparing Luma Dream Machines 1.0 vs 1.5, namely:

  1. Prompt Adherence
  2. Text Adherence
  3. Video Realism
  4. Artifacts


For what each of these categories mean and how they could be rated, feel free to read our previous blog for more detail.

Kling 1.0 and Kling 1.5 Comparison

And now, let's see how the original Kling 1.0 model fairs against the new Kling 1.5 model.

Example 1: MLB Detroit Tigers Player Hitting a Baseball with a Baseball Bat

A GIF of a comparison of two different videos of an MLB baseball player hitting a baseball with a baseball bat, with the words "detroit tigers" written on his shirt, one generated by Kling 1.0, while the other is generated by Kling 1.5
Prompt: "An MLB baseball player hitting a baseball (ball) with a baseball bat, with the words "Detroit Tigers" writen on his shirt"

Prompt Adherence

The video generated by Kling 1.0 includes most of the elements from the prompt: the MLB player, the baseball bat, and the baseball, but it falls short of capturing the core action: the baseball player hitting the ball with his bat. Meanwhile, the video generated by Kling 1.5 includes all the elements, including the one missing from Kling 1.0, though the motion appears somewhat unnatural.

Text Adherence

Both outputs have low text adherence, as neither video has the words "Detroit Tigers" or anything close to it in text

Video Realism

Both videos show impressive realism, with shadows beneath the MLB players' caps adjusting as their heads moves. However, the video generated by Kling 1.5 stands out for its higher definition.

Artifacts

Both outputs display noticeable artifacts. In the video generated by Kling 1.0, a baseball player holds a ball at first, but it quickly morphs into a bat. While in Kling 1.5's generated video, the baseball’s movement toward the player and the bat’s contact with it feel noticeably unnatural.

The final artifact is present in both videos, although the prompt specified players wearing "Detroit Tigers" shirts, the players are wearing caps featuring the New York Yankees logo. We believe this error is present because the Yankees, as baseball’s most popular team, were the most frequently used in Kling’s training data for the term "baseball".

Overall, we see only a slight improvement from the video generated by Kling 1.5.

Example 2: AI Cat Playing a Red Electric Guitar in the Forest

A GIF of a comparison of two different videos of a an ai cat playing a red electric guitar in the forest, one generated by Kling 1.0, while the other is generated by Kling 1.5
prompt: "A cat playing a red electric guitar in the forest"

Prompt Adherence

Both videos display all the elements described in the prompt: a cat playing a red electric guitar, and a forest surrounding it.

Video Realism

The level of realism displayed in video generated by Kling 1.5 is very high due to the cat's dynamic movements, especially the movement of its' hands and head. The forest's reflection on his electric guitar is also of higher definition.

Artifacts

The video generated by Kling 1.0 contains a significant error. The left arm, intended to be the cat's paw, bears more of a resemblance to that of a human hand, with peach skin clearly visible. Meanwhile, in the video generated by Kling 1.5, the cat’s left hand has human-like fingers, but the black fur conceals this detail, making it far less noticeable than the skin-toned features seen in the other video.


Overall, we see a moderate improvement from the video generated by Kling 1.5.

Example 3: A Cartoon Monkey and Cartoon Dog Hugging Each Other

A GIF of a comparison of two different videos containing a cartoon monkey and a cartoon dog hugging beneath a large tree, one generated by Kling 1.0 and the other generated by Kling 1.5
Prompt: "A cartoon monkey and a cartoon dog hugging beneath a large tree"

Prompt Adherence

Both videos display the large tree. But only the video generated by Kling 1.5 displays both the cartoon monkey and the cartoon dog hugging. However, the video generated by Kling 1.0 shows two animal bodies embracing but with just one head, which resembles a monkey’s. Both bodies in the 1.0 video have monkey-like features, such as white hands with fingers, with no sign of dog-like paws.

Video Realism

Both versions show the core elements of the cartoon style, though the animation styles are different. This is likely caused by the lack of specificity in the prompt. But one thing of note is that the video generated by Kling 1.5 is more dynamic, as evident in the monkey's changing expressions and the dog's wagging tail.

Artifacts

The video from Kling 1.0 was riddled with artifacts: the unnatural distortion in both the characters, an absent dog, and vanishing hands. While in the video made by Kling 1.5 showed no such flaws.


For this prompt, it's evident that the video output produced by Kling 1.5 marks a significant improvement over its predecessor.

Example 4: Earth with a Moon and a Mini Moon

A GIF of a comparison of two different videos of a moon and a mini moon rotating around the earth, one generated by Kling 1.0, and the other generated by Kling 1.5
Prompt: "The earth, with 2 moons rotating around it"

Prompt Adherence

Both versions captured the prompt descriptions. Although the version generated by Kling 1.5 has a more accurate depiction of the Earth and its moons.

Video Realism

Both videos display an impressive amount of realism, but the version produced by Kling 1.5 provides a more convincing portrayal. The video generated by Kling 1.5 shows more dynamic camera movements and shifting shadows, adding a heightened sense of realism, especially as the Earth's shadow slowly covers the second moon.

Artifacts

Kling 1.0's video output presents a few noticeable artifacts: Earth's landmasses are fully white, and the moons having mismatched colors, with one being orange and the second moon having differently colored terrain. Meanwhile, the output from Kling 1.5 has no artifacts. In fact, it even has a shockingly accurate depiction of Earth's geography!


Overall, we see a general improvement in Kling 1.5's video output.

Example 5: Messi, Wearing a Shirt with the words "Champions League", Kicking a Soccer Ball into a Goal Post

A GIF of a comparison of two different videos of Lionel Messi, wearing a shirt with the words "Champions League", kicking a soccer ball into a goal psot
Prompt: "Messi kicking a soccer ball into a goal post, with the words "Champions League" written on his shirt"

Prompt Adherence

Both outputs show poor prompt adherence, but Kling 1.5 performs slightly better than Kling 1.0. While both models generate an image of a soccer player, neither one resembles Lionel Mess. Kling 1.0's output doesn't have a soccer ball, a goal post, or the visible action of Messi kicking the ball into a goal post. On the other hand, Kling 1.5's output does show a goalpost and a kicking motion, but the soccer ball itself is still missing

Text Adherence

Both videos display low text adherence, with neither video having the words "Champions League" or anything resembling the writing on their shirts.

Video Realism

Though both videos are realistic, Kling 1.5 outperforms its predecessor by a long shot. This is because Kling 1.0 merely produced a zoom-out of a static image, while Kling 1.5 renders Messi kicking a ball, with his hair swaying and his shirt subtly shifting with the motion.

Artifacts

The background of Kling 1.0's output is heavily blurred, especially when compared to the output of Kling 1.5

Conclusion

Based on the various examples provided above, it is clear that Kling 1.5 surpasses its predecessor, Kling 1.0 in terms of overall quality, prompt adherence, and video realism. Despite this, both models fall short in the text adherence category, frequently producing incoherent and inaccurate text.

We hope you found this comparison blog useful! If interested, please also check out our other generative AI APIs from PiAPI!


More Stories