Google researchers have developed an artificial intelligence application that can isolate a single person's voice from among a mixture of sounds, including other voices and background noise.
In a blog post, two of the researchers, Inbar Mosseri and Oran Lang, said they believe the breakthrough could have a wide range of applications, including improved audio captioning on TV and hearing aids that work better, “especially in situations where there are multiple people speaking.”
The technology emulates the so-called "cocktail party effect," in which a person with good hearing is able to focus attention on a particular speaker in a crowded, noisy environment by mentally filtering out other voices and sounds. Until now, machines have had difficulty doing that.
The researchers took 2,000 hours of video clips of people giving talks on YouTube and combined them to create synthetic versions of the cocktail party environment. Then they trained the artificial intelligence program to analyze the speakers’ faces for signs of when they were speaking, such as mouth movement, and to link those cues to their words. As a result, the program could isolate a clean audio signal for an individual speaker.
While improving hearing aids, the technology could lead to more accurate automatic captioning for TV and video, even when individual voices overlap, the researchers said.