Google
is pretty great at figuring out what a user is saying, but is it any
good at knowing who's saying it? Just look at current smart speaker
technology, which can be easily fooled.
Google
might have a pretty simple solution, however. Its researchers have
created a deep learning system that is able to single out voices. It
does this by literally looking at people's faces when they're
talking.
First,
the researchers trained its system to recognize individual people
speaking alone. After which they created virtual noise — adding
other people to make a fake crowd — as a way to teach the
artificial intelligence to separate various audio tracks into
distinct parts and thus allowing the system to recognize which is
which.
The
results are astounding. As seen in the video below, the AI is able to
separate the voices of two stand-up comedians even if their
individual speeches are overlapping, and it does this just by looking
at their faces. The trick works even if the comedians' faces are only
partially seen, such as when it's slightly blocked by a microphone.
More:
Comments
Post a Comment