Google Assistant is already pretty handy. It can fill in your payment info on take-out orders, help get the kids to school on time, and control your stereo systems’ volume and your home’s smart light schedules. However, the company executives have shown off some of the new features arriving soon for the AI.
‘Look and Talk’ by Google Assistant
The first of these is ‘Look and Talk’. Instead of having to repeatedly start your requests to Google Assistant with ‘Hey Google,’ this new feature relies on computer vision and voice matching to constantly pay attention to the user. As Sissie Hsiao, Google’s VP of Assistant, explained on stage, all the user has to do is look at their Nest Hub Max and state their request. Google is also developing a series of quick commands that users will be able to shout out without having to gaze longingly at their tablet screen or say “Hey Google” first — things like ‘turn on the lights’ and ‘set a 10-minute alarm.’
Google’s VP of Assistant
All of the data captured in that interaction — specifically the user’s face and voice prints, used to verify the user — are processed locally on the Hub itself, Hsiao continued, and not shared with Google or anyone else. What’s more, you’ll have to specifically opt into the service before you can use it.
Enhanced Proximity and AI
According to Hsiao, the backend of this process relies on a half-dozen machine learning models and 100 camera and mic inputs — i.e., proximity, head orientation and gaze direction — to ensure that the machine knows when you’re talking to it versus talking in front of it. The company also claims that it worked diligently to make sure that this system works for people across the full spectrum of human skin tones.
Looking ahead, Google plans to continue refining its NLP models to further enhance the responsiveness and fidelity of Google Assistant’s responses by building new, more powerful speech and language models that can understand the nuances of human speech, Hsiao said. “Google Assistant will be able to better understand the imperfections of human speech without getting tripped up — including the pauses, ‘umms’ and interruptions — making your interactions feel much closer to a natural conversation.”