Google AI Voice Isolation Will Deliver Much If It Can Shrink to Fit Consumer Devices

Google AI voice isolation in use in a noisy cafe

Intelligent machines may be on the verge of taking you to a place you only think you’ve been inhabiting for years. You may think holding a video conference in the middle of a crowded café or bustling city street is the perfect mix of public and private, a balance of work and pleasure, but the people on the other end of your call may beg to differ.

You’ve probably got your headphones in, after all, and can’t hear the din that’s being carried through your smartphone or laptop microphone and into someone else’s conference room. Google, however, has developed a smart device that can make your oasis of silence into reality.

Google AI voice isolation promises the ability to pick out your voice and eliminate all other sounds around you. If the tech can be applied to mobile devices, it could make it sound like you’re the only person in the room, no matter how hectic that room may be.

Google AI Voice Isolation Solves the Cocktail Party Effect

Google AI is the tech giant’s newly christened department for all things computer learning and thinking. Its team took pride of place at the recent I/O Convention–they’re the ones teaching Google Assistant how to speak informally, like an actual human, and they use fancy terms like “neural network computing” to convey the complexity and technological advancement of their research.

Their latest invention, however, goes by a much simpler title: The Cocktail Party Effect. Essentially what they’ve created is a device capable of separating out the speech of a single person from a room full of noise, in much the same way that humans can focus on an intimate conversation despite the noise of a rambunctious cocktail party. It empowers a computer’s audio and video system to use visual as well as auditory clues to focus on the source of a sound, and then exclude everything around that source. It’s like lip-reading for the hearing empowered, and under laboratory conditions, it looks like this:

It’s an impressive trick, which could one day have a professional function.

Smarter Personal Devices

Google AI’s engineers sifted through more than 200,000 hours of optimally edited videos from Youtube and taped lectures to teach their robotic cocktail party guest to lip-read–and then put it to use in a humble cafeteria:

Presuming Google doesn’t intend to turn this smart device over to the CIA for the purpose of espionage, this kind of mundane environment is where it will ultimately find a niche. But when it comes down to applying such complex computing power to profitable and commercial uses, the problem is size. As the iPhone demonstrated when Apple took over the world, convenience is king in the eye of the consumer.

If we’re going to use this tech in noisy social situations, or over the drama of a family breakfast at home, it has to fit discreetly into our lives and into the devices that already hold a place in our homes.

Achieve that and video conferencing becomes very life-friendly.

Video Calling from Home–and on the Road

Noise canceling peripherals for video conferencing already exist on the market. Many webcams and microphones are very good at using algorithms to filter out unwanted noises like echoes and continuous background noise from machinery or air conditioners (consult the VC Daily Accessories Guide for more information).

Those methods, though, aren’t nearly as targeted as Google’s sight and sound version, because they’re not able to distinguish between specific human voices. There are several ways Google’s AI voice isolation tech could be deployed, but each must strike a balance between performance and availability.

Firstly, we’re envisioning the tech as a feature within a smartphone-based video calling app. You’re likely to run into problems with processing power and battery drain, but it would be an enormously useful feature for impromptu video calls.

If you know you’re going to be out in public during a call then you could plan to bring your laptop with you. A laptop will have plenty of power to run the Cocktail Effect, although built-in cameras are currently inferior to dedicated webcams.

So, you could put Google’s tech in a webcam? We love this idea, but you’d need your computer nearby (although the latest webcams can do some remarkable things, like facial recognition and 4K streaming).

The best option, however, may be a new hybrid device, something like the Amazon Echo Show. This video-enabled version of the smart hub or home assistant is more compact than a standalone laptop and more powerful than a phone. It’s already equipped with a form of AI via the Alexa assistant and it works as a permanent accessory for the home. You’d have to juice up the battery life to make it work on the road, but it’s solid enough to house a big computer brain.

Being able to be heard clearly despite all around you tumbling in sonic chaos is an obvious asset to video conferencing; it’s one we’ve been waiting on for a long time. Now we just need to find a way to put the tech to work.

Subscribe to VC Daily