MIT and Google researchers have made AI that can link sound, sight, and text to understand the world.
Today, MIT and Google published two new papers that explain first steps for making AI see, hear, and read in a holistic way—an approach that could upend how we teach our machines about the world.
In a statement-a post-doctoral AI research at MIT Yusuf Aytar said..
“It doesn’t matter if you see a car or hear an engine, you instantly recognize the same concept. The information in our brain is aligned naturally,”
Researchers aren’t teaching the algorithms anything new, but instead creating a way for them to link, or align, knowledge from one sense to another.
To train this system, the MIT group first showed the neural network video frames that were associated with audio. After the network found the objects in the video and the sounds in the audio, it tried to predict which objects correlated to which sounds.
Next, the team fed images with captions showing similar situations into the same algorithm, so it could associate words with the objects and actions pictured. Same idea: first the network separately identified all the objects it could find in the pictures, and the relevant words, and then matched them.
One algorithm that can align its idea of an object across sight, sound, and text can automatically transfer what it’s learned from what it hears to what it sees.
So, that’s great move from MIT and Google in the field of AI.