As artificial intelligence continues to advance rapidly, concerns about its environmental sustainability have come to the forefront. While ethical considerations dominate discussions, the substantial ecological impact of AI technologies cannot be overlooked.
In a groundbreaking study from June, researchers at the University of Massachusetts Amherst revealed astonishing figures about AI's environmental cost. Their report indicates that training and searching certain neural network architectures generates approximately 626,000 pounds of carbon dioxide emissions. This staggering amount equals nearly five times the lifetime emissions of an average American vehicle, including its manufacturing process.
The environmental challenge becomes even more pronounced during the model deployment phase, where deep neural networks must be implemented across various hardware platforms, each with unique characteristics and computational capabilities.
In response to this pressing issue, MIT researchers have pioneered an innovative automated AI system designed to train and operate neural networks with unprecedented efficiency. Their findings demonstrate that by enhancing computational efficiency through key innovations, this revolutionary system can significantly reduce carbon emissions—sometimes bringing them down to just a few hundred pounds.
The researchers' groundbreaking system, which they've dubbed a once-for-all network, trains a single extensive neural network containing numerous pretrained subnetworks of varying dimensions. These subnetworks can be customized for different hardware platforms without requiring retraining. This approach dramatically slashes the energy typically needed to train each specialized neural network for new platforms—which can encompass billions of internet of things (IoT) devices. When employed to train a computer-vision model, the process required merely 1/1,300th of the carbon emissions compared to today's state-of-the-art neural architecture search methods, while simultaneously reducing inference time by 1.5-2.6 times.
"Our goal is creating smaller, more environmentally friendly neural networks," explains Song Han, an assistant professor in the Department of Electrical Engineering and Computer Science. "Until now, searching for efficient neural network architectures has carried an enormous carbon footprint. However, we've successfully reduced this footprint by orders of magnitude with our novel approaches."
This research was conducted using Satori, an energy-efficient computing cluster donated to MIT by IBM, capable of performing 2 quadrillion calculations per second. The findings will be presented next week at the International Conference on Learning Representations. Han collaborated on the paper with four undergraduate and graduate students from EECS, MIT-IBM Watson AI Lab, and Shanghai Jiao Tong University.
Developing the "Once-for-All" Network Architecture
The research team constructed their system upon a recent AI advancement known as AutoML (automatic machine learning), which eliminates the need for manual network design. Neural networks automatically explore vast design spaces to find architectures specifically tailored to particular hardware platforms. However, a significant training efficiency challenge remained: each model had to be individually selected and trained from scratch for its specific platform architecture.
"How can we efficiently train all these networks for such a diverse range of devices—from $10 IoT gadgets to $600 smartphones?" Han questions. "Given the tremendous diversity of IoT devices, the computational cost of neural architecture search would become prohibitively expensive."
The researchers developed an innovative AutoML system that trains only a single, large "once-for-all" (OFA) network serving as a "mother" network, which contains an extremely high number of subnetworks that are sparsely activated from this mother network. OFA shares all its learned weights with every subnetwork—meaning they come essentially pretrained. Consequently, each subnetwork can operate independently during inference without requiring retraining.
The team trained an OFA convolutional neural network (CNN)—commonly utilized for image-processing tasks—with versatile architectural configurations, including varying numbers of layers and "neurons," diverse filter sizes, and different input image resolutions. When presented with a specific platform, the system uses the OFA as the search space to identify the optimal subnetwork based on accuracy and latency tradeoffs that correspond to the platform's power and speed limitations. For an IoT device, for instance, the system will identify a smaller subnetwork. For smartphones, it will select larger subnetworks, but with different structures depending on individual battery life and computational resources. OFA separates model training from architecture search, distributing the one-time training cost across numerous inference hardware platforms and resource constraints.
This approach relies on a "progressive shrinking" algorithm that efficiently trains the OFA network to support all subnetworks simultaneously. It begins by training the full network at maximum size, then progressively reduces the network dimensions to include smaller subnetworks. Smaller subnetworks are trained with assistance from larger subnetworks, enabling them to develop together. Ultimately, all subnetworks of different sizes are supported, allowing rapid specialization based on the platform's power and speed constraints. It supports many hardware devices with zero training cost when adding a new device.
In total, the researchers discovered that one OFA can encompass more than 10 quintillion—that's a 1 followed by 19 zeros—architectural configurations, covering probably all platforms ever needed. However, training the OFA and searching it ultimately proves far more efficient than spending hours training each neural network individually for every platform. Moreover, OFA doesn't compromise accuracy or inference efficiency. Instead, it delivers state-of-the-art ImageNet accuracy on mobile devices. And when compared with state-of-the-art industry-leading CNN models, the researchers assert that OFA provides 1.5-2.6 times speedup, with superior accuracy.
"This represents breakthrough technology," Han states. "If we want to run powerful AI on consumer devices, we must discover ways to shrink AI down to size."
"The model demonstrates remarkable compactness. I'm thrilled to see OFA continuing to push the boundaries of efficient deep learning on edge devices," notes Chuang Gan, a researcher at the MIT-IBM Watson AI Lab and co-author of the paper.
"If rapid progress in AI is to continue, we must reduce its environmental impact," emphasizes John Cohn, an IBM fellow and member of the MIT-IBM Watson AI Lab. "The advantage of developing methods to make AI models smaller and more efficient is that these models may also perform better."