Kling Elements through Kling API

A logo of PiAPI
PiAPI

Hi Developers!


On January 21st, 2025, the team behind Kling AI has just released Kling Elements, a major update for Kling, making the announcement in a post on X! And here at PiAPI, we already have Kling elements available for our users through our Kling API!

A screenshot of the official announcement on X about Kling Elements
Official announcement on X about Kling Elements

So how exactly does Kling Elements improve on the current image to video AI generation model?

Kling Elements allows users to upload up to 4 images as elements (such as people, animals, objects, or scenes) and describe their actions and interactions within the prompt. Kling API then generates a video based on the elements (images used as input) and the prompt, ensuring that the video maintains a consistent style and appearance. This feature is especially useful for achieving consistency in character design and objects, ensuring they remain consistent with the images uploaded when the Kling API generates a video.

Evaluation Framework

Regarding the evaluation framework used for this comparison, we have taken the AIGCBench (Artificial Intelligence Generated Content Bench). Although this is a framework designed to be used by computers, we've adjusted it for human evaluation, adapting it from an automated system into a manual process. This is the same framework we had used in our previous blog about Kling Motion Brush, namely:

  1. 1. Control Video Alignment (Prompt adherence)
  2. 2. Motion Affects
  3. 3. Temporal Consistency
  4. 4. Video Quality

For what each of these categories means and how they could be rated, feel free to read our previous blog for more detail.

Testing Kling Elements through Kling API

For the upcoming tests, we will first present the images that will be used as input into Kling API using the Kling Elements feature. Afterward, we’ll showcase the video generated, along with the corresponding prompt.

Example 1: Drake vs Kendrick

For our first example, we’ll focus on Drake and Kendrick Lamar, especially since Kendrick recently performed at the Super Bowl. Given that the topic of Drake vs Kendrick has been the talk of the town lately, we thought this would be an interesting choice.

Below is an image of the 4 Kling elements we are going to use as input into Kling API, alongside a description of each element:

Element 1: A picture of Kendrick Lamar

Element 2: A picture of Drake

Element 3: A picture of the super bowl stadium

Element 4: A picture of the Grammy award

PNG image of the 4 Kling elements that are going to be used as input into Kling API for the Drake vs Kendrick example
The 4 Kling elements that are going to be used as input into Kling API for the Drake vs Kendrick example

And here is the output video, along with the prompt shown in the description.

A GIF of Kendrick Lamar and Drake fighting over a Grammy award generated by Kling API using Kling elements
Prompt: "Kendrick Lamar and Drake in the super bowl stadium fighting over a Grammy award"

Control-Video Alignment

For control-video alignment, the result closely follows both the prompt and the provided images as both Kendrick Lamar and Drake have the same facial features and are wearing the exact clothes from the images used as input. However, it is far from perfect. A few noticeable issues include the a minor chain that Kendrick Lamar is wearing, which doesn’t resemble a lowercase "a" as seen in the image, but rather looks more like the number "8." Another major issue is with the Super Bowl stadium, which appears more like a concert venue, as there is no visible football field.

Motion Affects

In terms of motion effects, the movements of both Kendrick Lamar and Drake appear quite realistic and dynamic. Kendrick turns towards Drake at the beginning, and Drake’s hands move noticeably as he shifts back and forth.

Temporal Consistency

For temporal consistency, there are two noticeable artifacts. The first is that the person standing behind Kendrick Lamar appears as a garbled, humanoid mess, making it unclear what it is supposed to be. Our best guess is that Kling was attempting to create a mascot character, possibly due to the mention of a "Super Bowl stadium" in the prompt. The second, less noticeable artifact occurs midway through the video when Kendrick Lamar raises his left hand for the first time—he seems to be holding what looks like a microphone, but it suddenly disappears.

Video Quality

For video quality, there is a noticeable issue at the beginning of the video, where Kendrick Lamar's face and hands are heavily blurred for some reason, before returning to normal later on.


Overall, even though the output for this example is very impressive for character consistency, there are still quite a few errors and artifacts present in the video.

Example 2: Duolingo Owl Death and Revival

For our second example, we’ll attempt to revive the Duolingo Owl, following the recent announcement of the Duolingo Owl Death on their official Twitter account. Using the Kling API, we’ll try to bring the owl back to life.

Below is an image of the two Kling elements we are going to use as input into Kling API, alongside a description of each element:

Element 1: A picture of the Duolingo Owl

Element 2: A picture of a cartoon graveyard

PNG image of the 2 Kling elements that are going to be used as input into Kling API for the Duolingo Owl Death and Revival example
The 2 Kling elements that are going to be used as input into Kling API for the Duolingo Owl Death and Revival example

And below is the output video, alongside the prompt shown in the description.

GIF of the duolingo owl flying towards the viewer in a cartoon graveyard generated by Kling API using Kling elements
Prompt: "A cartoon-style green Duolingo owl rising from a grave in a graveyard like a zombie, first sticking a green wing out of the grave before fully popping out"

Control-Video Alignment

For control-video alignment, both the Duolingo owl and the cartoon graveyard are generated very impressively in the video. However, it doesn’t fully follow the prompt. The Duolingo owl is positioned atop the grave from the start, instead of rising like a zombie and sticking its green wing out of the grave as prompted.

Motion Affects

In terms of motion effects, the video is pretty dynamic, with the camera slowly zooming in and the Duolingo owl flapping its wings slightly before flying towards the viewer. However, the dust effects during its jump appear too realistic and don't match the cartoonish style of the Duolingo owl and the graveyard background.

Temporal Consistency

For this example, there appear to be no issues with temporal consistency, and everything seems to be good.

Video Quality

Throughout the video, there are moments where the Duolingo Owl's eyes appear slightly blurred; aside from that, there are no issues.

Overall, it is similar to the last example, the character and background consistency is very impressive but has some minor issues quality-wise.

Example 3: Donald Trump wearing a Philadelphia Eagles jersey

For our third example, we’ll have Donald Trump wearing a Philadelphia Eagles jersey, celebrating the Eagles super bowl wins in 2025. Given the Super Bowl season, or "Super Bowl time" as some would call it, we thought this would be a fun and fitting example.

Below is an image of the 4 Kling elements we are going to use as input into Kling API, alongside a description of each element:

Element 1: A picture of a Philadelphia Eagles jersey

Element 2: A picture of Donald Trump

Element 3: A picture of a bald eagle

Element 4: A picture of the super bowl stadium

An image of the 4 Kling elements that are going to be used as input into Kling API for the Donald Trump wearing a Philadelphia Eagles Jersey
The 4 Kling elements that are going to be used as input into Kling API for the Donald Trump wearing a Philadelphia Eagles Jersey

And below is the output video, alongside the prompt shown in the description.

GIF of Donald Trump wearing a green philadelphia eagles jersey with an eagle perched on his shoulder, gazing down at a stadium filled with cheering fans generated by Kling API using Kling elements
Prompt: "Donald Trump wearing a green Philadelphia Eagles jersey, with a bald eagle perched on his shoulder, gazing down at a Super Bowl stadium filled with cheering fans"

Control-Video Alignment

This example appears to have followed the prompt and input images very closely. The only issue is with Trump’s Philadelphia Eagles jersey, where the text adherence is off, and the back of his shirt differs from the provided image.

Motion Affects

The motion effects are dynamic, with the bald eagle settling on Trump’s shoulder at the start of the video, followed by the camera circling from in front of him to behind.

Temporal Consistency

There is one minor artifact, though it's well-hidden. In the middle of the video, when showing Trump's side view, a man holding a camera appears behind him, but the camera has two lenses, which looks unnatural.

Video Quality

There are two issues with video quality: first, the eagle’s face is slightly blurred throughout, and second, the crowd of people around is heavily blurred.


Overall, this is the best example of the Kling Elements feature among all the examples, and it’s very impressive in terms of what it generated.

Example 4: Captain America

For our fourth example, we’ll feature Chris Evans as Captain America battling the Red Hulk, given the recent buzz surrounding Captain America 4. Below is an image of the 4 Kling elements we are going to use as input into Kling API, alongside a description of each element:

Element 1: A picture of Chris Evans Captain America

Element 2: A picture of a LEGO Captain America Shield

Element 3: A picture of the Red Hulk

Element 4: A picture of an asteroid

An image of the 4 Kling elements that are going to be used as input into Kling API for the Captain America example
The 4 Kling elements that are going to be used as input into Kling API for the Captain America example

And below is the output video, alongside the prompt shown in the description.

GIF of captain america in space with a lego captain america shield, with an asteroid behind him generated by Kling API using Kling elements
Prompt: "Chris Evans Captain America, wielding a LEGO Captain America shield, battling the Red Hulk on a moving asteroid"

Control Video Alignment

This example seems to be missing one element entirely, specifically element 3, the Red Hulk, which is not featured in the video. Additionally, it doesn't follow the prompt accurately, as Captain America is not fighting the Red Hulk. However, aside from that, it’s still quite impressive. The rest of the prompt and input images are followed closely, and if you look carefully, you can see that Captain America's shield is made of LEGO, exactly as instructed, and matches the LEGO Captain America shield from the provided image.

Motion Affects

In terms of motion effects, the video is quite dynamic, with the camera zooming in on Chris Evans as Captain America, while the asteroid flies in the background. By the end of the video, it appears that Captain America is preparing for battle.

Temporal Consistency

Regarding temporal consistency, everything seems to be in order, with no visible artifacts.

Video Quality

The video quality for this example is quite good, with minimal blurring.


Overall, the character consistency is quite good in this example, but it missed one element, which is a significant issue.

Example 5: Anime girl with a black cat

For our final example, we’ll test how well the Kling Elements feature works with the anime art style.

Below is an image of the 4 Kling elements we are going to use as input into Kling API, alongside a description of each element:

Element 1: An image of an AI generated goth anime girl

Element 2: An image of an anime black cat

Element 3: An image of a baby wearing a Christmas hat

Element 4: An image of an anime background depicting a rainy day in a traditional Japanese town

An image of the 4 Kling elements that are going to be used as input into Kling API for the Anime girl example
The 4 Kling elements that are going to be used as input into Kling API for the Anime girl example

And below is the output video, alongside the prompt shown in the description.

Gif of an AI generated goth anime girl petting a cat wearing a Christmas hat and a red scarf with a  rainy traditional japanese town background generated by Kling API using Kling elements
Prompt: "Anime style, a goth girl petting an anime black cat wearing a Christmas hat in her lap, with everything— including the cat and the rainy traditional Japanese town— portrayed in 2D anime style."

Control-Video Alignment

For control-video alignment, the AI Christmas animated black cat is shown in a realistic style, instead of the anime art style as specified in the prompt. Aside from that, everything else is quite impressive. The AI generated goth anime girl matches the image provided exactly, along with the background and Christmas hat. Even though the cat is shown in a realistic style instead of an anime one, it still retains the red scarf as seen in the image used as input.

Motion Affects

The video is dynamic, with the AI generated goth anime girl's hand petting the cat and the cat's head moving in sync with her hand. However, the Christmas hat on the cat appears to be stuck to the girl’s hand instead, as it would have realistically fallen off if it were following real-world physics.

Temporal Consistency

There are several noticeable artifacts in the video. The first is the Christmas hat, which is on the AI generated goth anime girl's hand instead of the cat, as mentioned in the motion effects. The second is that the girl's hand, the one petting the cat, appears more realistic than anime style, which looks out of place.

Video Quality

The only issue with video quality in this example is that the AI Christmas animated black cat's face appears slightly blurred throughout the video.

Overall, this example is quite impressive, as the video maintains strong consistency with the images, despite a few issues.

Conclusion

Based on the five examples provided, the strengths of the Kling Elements feature are clear. Its character consistency is particularly impressive, with characters appearing exactly as they do in the images or as described in the prompt, including their clothing. The faces of the characters are also quite impressive, closely resembling those in the provided images. However, there are still some weaknesses, such as the blurring of animal faces—in all examples with animals, their faces were either slightly or heavily blurred throughout the video and the artifacts, which, to be fair, are present in most AI-generated videos. Another big issue is that it doesn't always follow the prompt exactly, as seen in example 2, and it missed an element entirely in example 4.

In conclusion, the Kling Elements feature is still very impressive, and we look forward to Kuaishou further developing Kling and ironing out the kinks.

Also, if you are interested in other AI APIs that PiAPI provides, feel free to check them out!


More Stories