A technique for designing neural networks optimally fitted to certain tasks


Neural networks, a variety of machine-learning model, are getting used to assist humans complete a wide selection of tasks, from predicting if someone’s credit rating is high enough to qualify for a loan to diagnosing whether a patient has a certain disease. But researchers still have only a limited understanding of how these models work. Whether a given model is perfect for certain task stays an open query.

MIT researchers have found some answers. They conducted an evaluation of neural networks and proved that they might be designed so that they are “optimal,” meaning they minimize the probability of misclassifying borrowers or patients into the incorrect category when the networks are given loads of labeled training data. To realize optimality, these networks have to be built with a particular architecture.

The researchers discovered that, in certain situations, the constructing blocks that enable a neural network to be optimal will not be those developers use in practice. These optimal constructing blocks, derived through the brand new evaluation, are unconventional and haven’t been considered before, the researchers say.

In a paper published this week within the , they describe these optimal constructing blocks, called activation functions, and show how they might be used to design neural networks that achieve higher performance on any dataset. The outcomes hold at the same time as the neural networks grow very large. This work could help developers select the proper activation function, enabling them to construct neural networks that classify data more accurately in a wide selection of application areas, explains senior writer Caroline Uhler, a professor within the Department of Electrical Engineering and Computer Science (EECS).

“While these are latest activation functions which have never been used before, they’re easy functions that somebody could actually implement for a selected problem. This work really shows the importance of getting theoretical proofs. When you go after a principled understanding of those models, that may actually lead you to latest activation functions that you simply would otherwise never have considered,” says Uhler, who can also be co-director of the Eric and Wendy Schmidt Center on the Broad Institute of MIT and Harvard, and a researcher at MIT’s Laboratory for Information and Decision Systems (LIDS) and its Institute for Data, Systems and Society (IDSS).

Joining Uhler on the paper are lead writer Adityanarayanan Radhakrishnan, an EECS graduate student and an Eric and Wendy Schmidt Center Fellow, and Mikhail Belkin, a professor within the Halicioğlu Data Science Institute on the University of California at San Diego.

Activation investigation

A neural network is a variety of machine-learning model that’s loosely based on the human brain. Many layers of interconnected nodes, or neurons, process data. Researchers train a network to finish a task by showing it hundreds of thousands of examples from a dataset.

For example, a network that has been trained to categorise images into categories, say dogs and cats, is given a picture that has been encoded as numbers. The network performs a series of complex multiplication operations, layer by layer, until the result is only one number. If that number is positive, the network classifies the image a dog, and whether it is negative, a cat.

Activation functions help the network learn complex patterns within the input data. They do that by applying a metamorphosis to the output of 1 layer before data are sent to the subsequent layer. When researchers construct a neural network, they select one activation function to make use of. Additionally they select the width of the network (what number of neurons are in each layer) and the depth (what number of layers are within the network.)

“It seems that, for those who take the usual activation functions that folks use in practice, and keep increasing the depth of the network, it gives you actually terrible performance. We show that for those who design with different activation functions, as you get more data, your network will get well and higher,” says Radhakrishnan.

He and his collaborators studied a situation by which a neural network is infinitely deep and wide — which suggests the network is built by continually adding more layers and more nodes — and is trained to perform classification tasks. In classification, the network learns to position data inputs into separate categories.

“A clean picture”

After conducting an in depth evaluation, the researchers determined that there are only 3 ways this type of network can learn to categorise inputs. One method classifies an input based on nearly all of inputs within the training data; if there are more dogs than cats, it should resolve every latest input is a dog. One other method classifies by selecting the label (dog or cat) of the training data point that the majority resembles the brand new input.

The third method classifies a latest input based on a weighted average of all of the training data points which might be just like it. Their evaluation shows that that is the one approach to the three that results in optimal performance. They identified a set of activation functions that at all times use this optimal classification method.

“That was one of the crucial surprising things — regardless of what you select for an activation function, it’s just going to be considered one of these three classifiers. We’ve formulas that can inform you explicitly which of those three it will be. It’s a really clean picture,” he says.

They tested this theory on a several classification benchmarking tasks and located that it led to improved performance in lots of cases. Neural network builders could use their formulas to pick an activation function that yields improved classification performance, Radhakrishnan says.

In the longer term, the researchers need to use what they’ve learned to research situations where they’ve a limited amount of information and for networks that will not be infinitely wide or deep. Additionally they need to apply this evaluation to situations where data would not have labels.

“In deep learning, we wish to construct theoretically grounded models so we are able to reliably deploy them in some mission-critical setting. This can be a promising approach at getting toward something like that — constructing architectures in a theoretically grounded way that translates into higher ends in practice,” he says.

This work was supported, partly, by the National Science Foundation, Office of Naval Research, the MIT-IBM Watson AI Lab, the Eric and Wendy Schmidt Center on the Broad Institute, and a Simons Investigator Award.


What are your thoughts on this topic?
Let us know in the comments below.


0 0 votes
Article Rating
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

Would love your thoughts, please comment.x