Machine-learning models can fail when they struggle to make predictions for people who were underrepresented within the datasets they were trained on.
For example, a model that predicts one of the best treatment option for somebody with a chronic disease could also be trained using a dataset that comprises mostly male patients. That model might make incorrect predictions for female patients when deployed in a hospital.
To enhance outcomes, engineers can try balancing the training dataset by removing data points until all subgroups are represented equally. While dataset balancing is promising, it often requires removing great amount of information, hurting the model’s overall performance.
MIT researchers developed a brand new technique that identifies and removes specific points in a training dataset that contribute most to a model’s failures on minority subgroups. By removing far fewer datapoints than other approaches, this system maintains the general accuracy of the model while improving its performance regarding underrepresented groups.
As well as, the technique can discover hidden sources of bias in a training dataset that lacks labels. Unlabeled data are much more prevalent than labeled data for a lot of applications.
This method is also combined with other approaches to enhance the fairness of machine-learning models deployed in high-stakes situations. For instance, it would someday help ensure underrepresented patients aren’t misdiagnosed on account of a biased AI model.
“Many other algorithms that try to deal with this issue assume each datapoint matters as much as every other datapoint. On this paper, we’re showing that assumption isn’t true. There are specific points in our dataset which are contributing to this bias, and we are able to find those data points, remove them, and get well performance,” says Kimia Hamidieh, an electrical engineering and computer science (EECS) graduate student at MIT and co-lead writer of a paper on this system.
She wrote the paper with co-lead authors Saachi Jain PhD ’24 and fellow EECS graduate student Kristian Georgiev; Andrew Ilyas MEng ’18, PhD ’23, a Stein Fellow at Stanford University; and senior authors Marzyeh Ghassemi, an associate professor in EECS and a member of the Institute of Medical Engineering Sciences and the Laboratory for Information and Decision Systems, and Aleksander Madry, the Cadence Design Systems Professor at MIT. The research will likely be presented on the Conference on Neural Information Processing Systems.
Removing bad examples
Often, machine-learning models are trained using huge datasets gathered from many sources across the web. These datasets are far too large to be rigorously curated by hand, so that they may contain bad examples that hurt model performance.
Scientists also know that some data points impact a model’s performance on certain downstream tasks greater than others.
The MIT researchers combined these two ideas into an approach that identifies and removes these problematic datapoints. They seek to resolve an issue often called worst-group error, which occurs when a model underperforms on minority subgroups in a training dataset.
The researchers’ recent technique is driven by prior work by which they introduced a technique, called TRAK, that identifies an important training examples for a particular model output.
For this recent technique, they take incorrect predictions the model made about minority subgroups and use TRAK to discover which training examples contributed essentially the most to that incorrect prediction.
“By aggregating this information across bad test predictions in the proper way, we’re capable of find the precise parts of the training which are driving worst-group accuracy down overall,” Ilyas explains.
Then they remove those specific samples and retrain the model on the remaining data.
Since having more data normally yields higher overall performance, removing just the samples that drive worst-group failures maintains the model’s overall accuracy while boosting its performance on minority subgroups.
A more accessible approach
Across three machine-learning datasets, their method outperformed multiple techniques. In a single instance, it boosted worst-group accuracy while removing about 20,000 fewer training samples than a standard data balancing method. Their technique also achieved higher accuracy than methods that require making changes to the inner workings of a model.
Since the MIT method involves changing a dataset as an alternative, it might be easier for a practitioner to make use of and might be applied to many kinds of models.
It could actually even be utilized when bias is unknown because subgroups in a training dataset should not labeled. By identifying datapoints that contribute most to a feature the model is learning, they’ll understand the variables it’s using to make a prediction.
“It is a tool anyone can use after they are training a machine-learning model. They’ll take a look at those datapoints and see whether or not they are aligned with the aptitude they are attempting to show the model,” says Hamidieh.
Using the technique to detect unknown subgroup bias would require intuition about which groups to search for, so the researchers hope to validate it and explore it more fully through future human studies.
In addition they need to improve the performance and reliability of their technique and ensure the strategy is accessible and easy-to-use for practitioners who could someday deploy it in real-world environments.
“When you might have tools that allow you critically take a look at the information and work out which datapoints are going to guide to bias or other undesirable behavior, it gives you a primary step toward constructing models which are going to be more fair and more reliable,” Ilyas says.
This work is funded, partially, by the National Science Foundation and the U.S. Defense Advanced Research Projects Agency.