November 18, 2017
If you want to blame someone for the hoopla around artificial intelligence, 69-year-old Google researcher Geoff Hinton is a good candidate.
The droll University of Toronto professor jolted the field onto a new trajectory in October 2012. With two grad students, Hinton showed that an unfashionable technology he’d championed for decades called artificial neural networks permitted a huge leap in machines’ ability to understand images. Within six months, all three researchers were on Google’s payroll. Today neural networks transcribe our speech, recognize our pets, and fight our trolls.
But Hinton now belittles the technology he helped bring to the world. “I think the way we’re doing computer vision is just wrong,” he says. “It works better than anything else at present but that doesn’t mean it’s right.”
In its place, Hinton has unveiled another “old” idea that might transform how computers see—and reshape AI. That’s important because computer vision is crucial to ideas such as self-driving cars, and having software that plays doctor.
Late last week, Hinton released two research papers that he says prove out an idea he’s been mulling for almost 40 years. “It’s made a lot of intuitive sense to me for a very long time, it just hasn’t worked well,” Hinton says. “We’ve finally got something that works well.”
Hinton’s new approach, known as capsule networks, is a twist on neural networks intended to make machines better able to understand the world through images or video. In one of the papers posted last week, Hinton’s capsule networks matched the accuracy of the best previous techniques on a standard test of how well software can learn to recognize handwritten digits.
In the second, capsule networks almost halved the best previous error rate on a test that challenges software to recognize toys such as trucks and cars from different angles. Hinton has been working on his new technique with colleagues Sara Sabour and Nicholas Frosst at Google’s Toronto office.
Capsule networks aim to remedy a weakness of today’s machine-learning systems that limits their effectiveness. Image-recognition software in use today by Google and others needs a large number of example photos to learn to reliably recognize objects in all kinds of situations. That’s because the software isn’t very good at generalizing what it learns to new scenarios, for example understanding that an object is the same when seen from a new viewpoint.
To teach a computer to recognize a cat from many angles, for example, could require thousands of photos covering a variety of perspectives. Human children don’t need such explicit and extensive training to learn to recognize a household pet.
Hinton’s idea for narrowing the gulf between the best AI systems and ordinary toddlers is to build a little more knowledge of the world into computer-vision software. Capsules—small groups of crude virtual neurons—are designed to track different parts of an object, such as a cat’s nose and ears, and their relative positions in space. A network of many capsules can use that awareness to understand when a new scene is in fact a different view of something it has seen before.
Hinton formed his intuition that vision systems need such an inbuilt sense of geometry in 1979, when he was trying to figure out how humans use mental imagery. He first laid out a preliminary design for capsule networks in 2011. The fuller picture released last week was long anticipated by researchers in the field. “Everyone has been waiting for it and looking for the next great leap from Geoff,” says Kyunghyun Cho, a professor at NYU who works on image recognition.
It’s too early to say how big a leap Hinton has made—and he knows it. The AI veteran segues from quietly celebrating that his intuition is now supported by evidence, to explaining that capsule networks still need to be proven on large image collections, and that the current implementation is slow compared to existing image-recognition software.
Hinton is optimistic he can address those shortcomings. Others in the field are also hopeful about his long-maturing idea.
Roland Memisevic, cofounder of image-recognition startup Twenty Billion Neurons, and a professor at University of Montreal, says Hinton’s basic design should be capable of extracting more understanding from a given amount of data than existing systems. If proven out at scale, that could be helpful in domains such as healthcare, where image data to train AI systems is much scarcer than the large volume of selfies available around the internet.
In some ways, capsule networks are a departure from a recent trend in AI research. One interpretation of the recent success of neural networks is that humans should encode as little knowledge as possible into AI software, and instead make them figure things out for themselves from scratch. Gary Marcus, a professor of psychology at NYU who sold an AI startup to Uber last year, says Hinton’s latest work represents a welcome breath of fresh air. Marcus argues that AI researchers should be doing more to mimic how the brain has built-in, innate machinery for learning crucial skills like vision and language. “It’s too early to tell how far this particular architecture will go, but it’s great to see Hinton breaking out of the rut that the field has seemed fixated on,” Marcus says.
UPDATED, Nov. 2, 12:55 PM: This article has been updated to include the names of Geoff Hinton’s co-authors.
This article was originally published by: