Google can use AI to isolate voices in a crowd – with impressive results

0
331
  • Google researchers have devised a deep learning model that can isolate individual voices in a video.
  • The model aims to replicate the ability of humans to isolate certain sounds.
  • The researchers hope that the tech will have a number of uses including improving hearing aids and automatic subtitles.

Google researchers have come up with a way of isolating the voice of a single speaker in a video from other voices and background noise. The method uses a deep learning model that can computationally produce videos in which the speech of specific people is enhanced.

It uses both the audio and visual signals of the speaker, such as the movement of the mouth, to replicate the ability of humans to effectively focus on one sound. This is a phenomenon also known as the cocktail party effect.

In a blog post, Google explains that in order to develop the method, the researchers gathered a collection of 100,000 high-quality videos and talks from Youtube. They then produced around 2,000 hours of video featuring single people talking to the camera without any background interference.

Editor’s Pick

related article

Use your Google Home to play a Lost in Space game

The new Netflix reimagining of the classic 60s television show Lost in Space is available to watch now. But if you have some time and want to get amped up for a weekend binge watch, …

Using this video, Google then created what it calls “synthetic cocktail parties” made up of face videos, their corresponding speech from separate video sources, and non-speech background noise. It then trained the model to be able to split these cocktail parties into separate audio for each speaker in the video.

The post claims that users of the model simply have to select the face of the person in the video that they want to hear.

The results provided through videos on the blog are pretty impressive.

A sports debate that is almost unintelligible due to the participants shouting over each other becomes crystal clear after the voices of each speaker are separated. In another video, the tech is able to isolate the sound of someone talking in the background of a video conference call.

As for potential uses, Google has focused on it being used as a pre-process for automatic video captioning. In a video in the blog post, captions are clearly improved after the tech is used to isolate the sounds of the people in the video.

Editor’s Pick

related article

What is a blockchain? – Gary Explains

If you have heard of Bitcoin or any of the other cryptocurrencies, then you have probably heard the word blockchain or the term "blockchain technology". But what is a blockchain? How does it apply to …

However, it doesn’t take a wild leap of the imagination to think of other ways that this tech could be used. Adding cameras to smart speakers could seriously improve the way these speakers hear and understand instructions. Meanwhile, adding it to the video camera on your phone could improve the sound quality of your videos. Google also mentions that the tech could be put towards improving hearing aids.

Of course, it would also appear to make it incredibly easy for someone with this tech to indiscriminately spy on any individual within a large crowd.

Best not to think about that, though.

Up next: Artificial Intelligence vs Machine Learning: What’s the difference?

via Android Authority

April 16, 2018 at 12:50AM