Unpacking the “black box” to construct higher AI models

-

When deep learning models are deployed in the actual world, perhaps to detect financial fraud from bank card activity or discover cancer in medical images, they are sometimes in a position to outperform humans.

But what exactly are these deep learning models learning? Does a model trained to identify skin cancer in clinical images, for instance, actually learn the colours and textures of cancerous tissue, or is it flagging another features or patterns?

These powerful machine-learning models are typically based on artificial neural networks that may have hundreds of thousands of nodes that process data to make predictions. As a result of their complexity, researchers often call these models “black boxes” because even the scientists who construct them don’t understand all the pieces that is occurring under the hood.

Stefanie Jegelka isn’t satisfied with that “black box” explanation. A newly tenured associate professor within the MIT Department of Electrical Engineering and Computer Science, Jegelka is digging deep into deep learning to grasp what these models can learn and the way they behave, and how you can construct certain prior information into these models.

“At the tip of the day, what a deep-learning model will learn relies on so many aspects. But constructing an understanding that’s relevant in practice will help us design higher models, and likewise help us understand what is occurring inside them so we all know when we are able to deploy a model and when we are able to’t. That’s critically necessary,” says Jegelka, who can also be a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Institute for Data, Systems, and Society (IDSS).

Jegelka is especially enthusiastic about optimizing machine-learning models when input data are in the shape of graphs. Graph data pose specific challenges: As an illustration, information in the information consists of each details about individual nodes and edges, in addition to the structure — what’s connected to what. As well as, graphs have mathematical symmetries that should be respected by the machine-learning model in order that, as an example, the identical graph all the time results in the identical prediction. Constructing such symmetries right into a machine-learning model is frequently hard.

Take molecules, as an example. Molecules will be represented as graphs, with vertices that correspond to atoms and edges that correspond to chemical bonds between them. Drug firms will probably want to use deep learning to rapidly predict the properties of many molecules, narrowing down the number they need to physically test within the lab.

Jegelka studies methods to construct mathematical machine-learning models that may effectively take graph data as an input and output something else, on this case a prediction of a molecule’s chemical properties. This is especially difficult since a molecule’s properties are determined not only by the atoms inside it, but additionally by the connections between them.  

Other examples of machine learning on graphs include traffic routing, chip design, and recommender systems.

Designing these models is made even tougher by the proven fact that data used to coach them are sometimes different from data the models see in practice. Perhaps the model was trained using small molecular graphs or traffic networks, however the graphs it sees once deployed are larger or more complex.

On this case, what can researchers expect this model to learn, and can it still work in practice if the real-world data are different?

“Your model just isn’t going to have the ability to learn all the pieces due to some hardness problems in computer science, but what you possibly can learn and what you possibly can’t learn relies on the way you set the model up,” Jegelka says.

She approaches this query by combining her passion for algorithms and discrete mathematics along with her excitement for machine learning.

From butterflies to bioinformatics

Jegelka grew up in a small town in Germany and have become enthusiastic about science when she was a highschool student; a supportive teacher encouraged her to take part in a global science competition. She and her teammates from the U.S. and Hong Kong won an award for an internet site they created about butterflies, in three languages.

“For our project, we took images of wings with a scanning electron microscope at an area university of applied sciences. I also got the chance to make use of a high-speed camera at Mercedes Benz — this camera often filmed combustion engines — which I used to capture a slow-motion video of the movement of a butterfly’s wings. That was the primary time I actually got in contact with science and exploration,” she recalls.

Intrigued by each biology and arithmetic, Jegelka decided to review bioinformatics on the University of Tübingen and the University of Texas at Austin. She had just a few opportunities to conduct research as an undergraduate, including an internship in computational neuroscience at Georgetown University, but wasn’t sure what profession to follow.

When she returned for her final yr of faculty, Jegelka moved in with two roommates who were working as research assistants on the Max Planck Institute in Tübingen.

“They were working on machine learning, and that sounded really cool to me. I had to jot down my bachelor’s thesis, so I asked on the institute in the event that they had a project for me. I began working on machine learning on the Max Planck Institute and I loved it. I learned a lot there, and it was a terrific place for research,” she says.

She stayed on on the Max Planck Institute to finish a master’s thesis, after which launched into a PhD in machine learning on the Max Planck Institute and the Swiss Federal Institute of Technology.

During her PhD, she explored how concepts from discrete mathematics may also help improve machine-learning techniques.

Teaching models to learn

The more Jegelka learned about machine learning, the more intrigued she became by the challenges of understanding how models behave, and how you can steer this behavior.

“You’ll be able to achieve this much with machine learning, but provided that you have got the precise model and data. It just isn’t only a black-box thing where you throw it at the information and it really works. You truly should give it some thought, its properties, and what you would like the model to learn and do,” she says.

After completing a postdoc on the University of California at Berkeley, Jegelka was hooked on research and decided to pursue a profession in academia. She joined the college at MIT in 2015 as an assistant professor.

“What I actually loved about MIT, from the very starting, was that the people really care deeply about research and creativity. That’s what I appreciate probably the most about MIT. The people here really value originality and depth in research,” she says.

That give attention to creativity has enabled Jegelka to explore a broad range of topics.

In collaboration with other faculty at MIT, she studies machine-learning applications in biology, imaging, computer vision, and materials science.

But what really drives Jegelka is probing the basics of machine learning, and most recently, the difficulty of robustness. Often, a model performs well on training data, but its performance deteriorates when it’s deployed on barely different data. Constructing prior knowledge right into a model could make it more reliable, but understanding what information the model needs to achieve success and how you can construct it in just isn’t so easy, she says.

She can also be exploring methods to enhance the performance of machine-learning models for image classification.

Image classification models are all over the place, from the facial recognition systems on mobile phones to tools that discover fake accounts on social media. These models need massive amounts of information for training, but because it is pricey for humans to hand-label hundreds of thousands of images, researchers often use unlabeled datasets to pretrain models as an alternative.

These models then reuse the representations they’ve learned after they are fine-tuned later for a particular task.

Ideally, researchers want the model to learn as much as it may well during pretraining, so it may well apply that knowledge to its downstream task. But in practice, these models often learn only just a few easy correlations — like that one image has sunshine and one has shade — and use these “shortcuts” to categorise images.

“We showed that it is a problem in ‘contrastive learning,’ which is a regular technique for pre-training, each theoretically and empirically. But we also show you can influence the kinds of data the model will learn to represent by modifying the varieties of data you show the model. That is one step toward understanding what models are literally going to do in practice,” she says.

Researchers still don’t understand all the pieces that goes on inside a deep-learning model, or details about how they’ll influence what a model learns and the way it behaves, but Jegelka looks forward to proceed exploring these topics.

“Often in machine learning, we see something occur in practice and we try to grasp it theoretically. This can be a huge challenge. You desire to construct an understanding that matches what you see in practice, so you can do higher. We’re still just initially of understanding this,” she says.

Outside the lab, Jegelka is a fan of music, art, traveling, and cycling. But lately, she enjoys spending most of her free time along with her preschool-aged daughter.

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

2 COMMENTS

0 0 votes
Article Rating
guest
2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

2
0
Would love your thoughts, please comment.x
()
x