MIT Develops Algorithm to Accelerate Neural Networks by 200x

March 23, 2019

Neural networks have been a hot topic of late, but evaluating the most efficient way to build one for processing a given stack of data is still an arduous affair. Designing systems that can use algorithms to build themselves in the most optimal fashion is still a nascent field — but MIT researchers have reportedly developed an algorithm that can accelerate the process by up to 200x.

The NAS (Neural Architecture Search, in this context) algorithm they developed “can directly learn specialized convolutional neural networks (CNNs) for target hardware platforms — when run on a massive image dataset — in only 200 GPU hours,” MIT News reports. This is a massive improvement over the 48,000 hours Google reported taking to develop a state-of-the-art NAS algorithm for image classification. The goal of the researchers is to democratize AI by allowing researchers to experiment with various aspects of CNN design without needing enormous GPU arrays to do the front-end work. If finding state of the art approaches requires 48,000 GPU arrays, precious few people, even at large institutions, will ever have the opportunity to try.

Algorithms produced by the new NAS were, on average, 1.8x faster than the CNNs tested on a mobile device with similar accuracy. The new algorithm leveraged techniques like path level binarization, which stores just one path at a time to reduce memory consumption by an order of magnitude. MIT doesn’t actually link out to specific research reports, but from a bit of Google sleuthing, the referenced articles appear to be here and here — two different research reports from an overlapping group of researchers. The teams focused on pruning entire potential paths for CNNs to use, evaluating each in turn. Lower probability paths are successively pruned away, leaving the final, best-case path.


Basic diagram of an artificial neural network. Credit: BY-SA 3.0













The new model incorporated other improvements as well. Architectures were checked against hardware platforms for latency when evaluated. In some cases, their model predicted superior performance for platforms that had been dismissed as inefficient. For example, 7×7 filters for image classification are typically not used, because they’re quite computationally expensive — but the research team found that these actually worked well for GPUs.

“This goes against previous human thinking,” Han Cai, one of the scientists, told MIT News. “The larger the search space, the more unknown things you can find. You don’t know if something will be better than the past human experience. Let the AI figure it out.”

These efforts to improve AI performance and capabilities are still at the stage where huge improvements are possible. As we’ve recently discussed, over time the field will be constrained by the same discoveries driving it forward. Accelerators and AI processors offer tremendous near-term performance advantages, but they aren’t a fundamental replacement for the scaling historically afforded by the advance of Moore’s law.

This article was originally published by:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s