New AI turns voice into video from photos

Researchers at Alibaba Group's Institute for Intelligent Computing presented their new artificial intelligence (AI) system capable of transforming photos of a person's face into animated videos, as if they were talking or singing.

The technology, called Emote Portrait Alive (EMO), combines the static image with audio of people speaking or singing. In the presentation of the technology, one of the videos released was of Mona Lisa, a famous painting by Leonardo da Vinci, “speaking” (see below).

Read more:

AI combines photos and audio to create animated videos

  • Despite the novelty, past researchers have already demonstrated processing photos of faces for semi-animated use;
  • However, the Alibaba team went further by adding sound;
  • Furthermore, they did so without using 3D models or facial references;
  • Instead, researchers used diffusion modeling based on training AI on large datasets of audio and video files;
  • Around 250 hours of data were used to create the EMO.

According to the TechXploreBy automatically converting audio waves into video frames, researchers created a tool that captures subtle human gestures, speech quirks, and other characteristics that identify an animated image of a face as human.

The videos recreated the likely mouth shapes and movements used to form words and sentences with expressions typically associated with such movements.

On the team's GitHub, there are several other videos exemplifying the success of the tool. There, they also claimed that EMO surpasses other applications in terms of realism and expressiveness.

The team further noted that the length of the final video depends on the length of the original audio track attached to the tool. In the videos, we see the original image side-by-side with the person speaking or singing in the voice of the person who was recorded in the original audio track.

They stress, however, that the use of EMO will need to be restricted or monitored to prevent its unethical use.

The group published the results of its tool and more details of its development on the preprint server arXiv.

Related Articles

Back to top button