The neural network artificial intelligence models utilized in applications like medical image processing and speech recognition perform operations on hugely complex data structures that require an infinite amount of computation to process. That is one reason deep-learning models eat a lot energy.
To enhance the efficiency of AI models, MIT researchers created an automatic system that permits developers of deep learning algorithms to concurrently benefit from two forms of data redundancy. This reduces the quantity of computation, bandwidth, and memory storage needed for machine learning operations.
Existing techniques for optimizing algorithms may be cumbersome and typically only allow developers to capitalize on either sparsity or symmetry — two several types of redundancy that exist in deep learning data structures.
By enabling a developer to construct an algorithm from scratch that takes advantage of each redundancies without delay, the MIT researchers’ approach boosted the speed of computations by nearly 30 times in some experiments.
Since the system utilizes a user-friendly programming language, it could optimize machine-learning algorithms for a big selection of applications. The system could also help scientists who aren’t experts in deep learning but need to improve the efficiency of AI algorithms they use to process data. As well as, the system could have applications in scientific computing.
“For a very long time, capturing these data redundancies has required lots of implementation effort. As an alternative, a scientist can tell our system what they would really like to compute in a more abstract way, without telling the system exactly the best way to compute it,” says Willow Ahrens, an MIT postdoc and co-author of a paper on the system, which will probably be presented on the International Symposium on Code Generation and Optimization.
She is joined on the paper by lead writer Radha Patel ’23, SM ’24 and senior writer Saman Amarasinghe, a professor within the Department of Electrical Engineering and Computer Science (EECS) and a principal researcher within the Computer Science and Artificial Intelligence Laboratory (CSAIL).
Cutting out computation
In machine learning, data are sometimes represented and manipulated as multidimensional arrays often known as tensors. A tensor is sort of a matrix, which is an oblong array of values arranged on two axes, rows and columns. But unlike a two-dimensional matrix, a tensor can have many dimensions, or axes, making tensors harder to govern.
Deep-learning models perform operations on tensors using repeated matrix multiplication and addition — this process is how neural networks learn complex patterns in data. The sheer volume of calculations that should be performed on these multidimensional data structures requires an infinite amount of computation and energy.
But due to the way in which data in tensors are arranged, engineers can often boost the speed of a neural network by cutting out redundant computations.
For example, if a tensor represents user review data from an e-commerce site, since not every user reviewed every product, most values in that tensor are likely zero. The sort of data redundancy known as sparsity. A model can save time and computation by only storing and operating on non-zero values.
As well as, sometimes a tensor is symmetric, which implies the highest half and bottom half of the information structure are equal. On this case, the model only must operate on one half, reducing the quantity of computation. The sort of data redundancy known as symmetry.
“But while you attempt to capture each of those optimizations, the situation becomes quite complex,” Ahrens says.
To simplify the method, she and her collaborators built a brand new compiler, which is a pc program that translates complex code into a less complicated language that may be processed by a machine. Their compiler, called SySTeC, can optimize computations by robotically profiting from each sparsity and symmetry in tensors.
They began the strategy of constructing SySTeC by identifying three key optimizations they’ll perform using symmetry.
First, if the algorithm’s output tensor is symmetric, then it only must compute one half of it. Second, if the input tensor is symmetric, then algorithm only must read one half of it. Finally, if intermediate results of tensor operations are symmetric, the algorithm can skip redundant computations.
Simultaneous optimizations
To make use of SySTeC, a developer inputs their program and the system robotically optimizes their code for all three forms of symmetry. Then the second phase of SySTeC performs additional transformations to only store non-zero data values, optimizing this system for sparsity.
In the long run, SySTeC generates ready-to-use code.
“In this fashion, we get the advantages of each optimizations. And the interesting thing about symmetry is, as your tensor has more dimensions, you possibly can get much more savings on computation,” Ahrens says.
The researchers demonstrated speedups of nearly an element of 30 with code generated robotically by SySTeC.
Since the system is automated, it could possibly be especially useful in situations where a scientist desires to process data using an algorithm they’re writing from scratch.
In the longer term, the researchers need to integrate SySTeC into existing sparse tensor compiler systems to create a seamless interface for users. As well as, they would really like to make use of it to optimize code for more complicated programs.
This work is funded, partly, by Intel, the National Science Foundation, the Defense Advanced Research Projects Agency, and the Department of Energy.