Making AI models more trustworthy for high-stakes settings

The paradox in medical imaging can present major challenges for clinicians who try to discover disease. For example, in a chest X-ray, pleural effusion, an abnormal buildup of fluid within the lungs, can look very very similar to pulmonary infiltrates, that are accumulations of pus or blood.

A man-made intelligence model could assist the clinician in X-ray evaluation by helping to discover subtle details and boosting the efficiency of the diagnosis process. But because so many possible conditions might be present in a single image, the clinician would likely want to contemplate a set of possibilities, somewhat than only having one AI prediction to guage.

One promising method to produce a set of possibilities, called conformal classification, is convenient because it could possibly be readily implemented on top of an existing machine-learning model. Nevertheless, it could possibly produce sets which might be impractically large.

MIT researchers have now developed an easy and effective improvement that may reduce the scale of prediction sets by as much as 30 percent while also making predictions more reliable.

Having a smaller prediction set may help a clinician zero in on the best diagnosis more efficiently, which could improve and streamline treatment for patients. This method might be useful across a variety of classification tasks — say, for identifying the species of an animal in a picture from a wildlife park — because it provides a smaller but more accurate set of options.

“With fewer classes to contemplate, the sets of predictions are naturally more informative in that you simply are selecting between fewer options. In a way, you should not really sacrificing anything by way of accuracy for something that’s more informative,” says Divya Shanmugam PhD ’24, a postdoc at Cornell Tech who conducted this research while she was an MIT graduate student.

Shanmugam is joined on the paper by Helen Lu ’24; Swami Sankaranarayanan, a former MIT postdoc who’s now a research scientist at Lilia Biosciences; and senior writer John Guttag, the Dugald C. Jackson Professor of Computer Science and Electrical Engineering at MIT and a member of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). The research shall be presented on the Conference on Computer Vision and Pattern Recognition in June.

Prediction guarantees

AI assistants deployed for high-stakes tasks, like classifying diseases in medical images, are typically designed to provide a probability rating together with each prediction so a user can gauge the model’s confidence. For example, a model might predict that there’s a 20 percent probability a picture corresponds to a specific diagnosis, like pleurisy.

Nevertheless it is difficult to trust a model’s predicted confidence because much prior research has shown that these probabilities will be inaccurate. With conformal classification, the model’s prediction is replaced by a set of probably the most probable diagnoses together with a guarantee that the right diagnosis is somewhere within the set.

However the inherent uncertainty in AI predictions often causes the model to output sets which might be far too large to be useful.

For example, if a model is classifying an animal in a picture as one in every of 10,000 potential species, it would output a set of 200 predictions so it could possibly offer a powerful guarantee.

“That is kind of a couple of classes for somebody to sift through to work out what the best class is,” Shanmugam says.

The technique will also be unreliable because tiny changes to inputs, like barely rotating a picture, can yield entirely different sets of predictions.

To make conformal classification more useful, the researchers applied a way developed to enhance the accuracy of computer vision models called test-time augmentation (TTA).

TTA creates multiple augmentations of a single image in a dataset, perhaps by cropping the image, flipping it, zooming in, etc. Then it applies a pc vision model to every version of the identical image and aggregates its predictions.

“In this fashion, you get multiple predictions from a single example. Aggregating predictions in this fashion improves predictions by way of accuracy and robustness,” Shanmugam explains.

Maximizing accuracy

To use TTA, the researchers hold out some labeled image data used for the conformal classification process. They learn to aggregate the augmentations on these held-out data, robotically augmenting the photographs in a way that maximizes the accuracy of the underlying model’s predictions.

Then they run conformal classification on the model’s latest, TTA-transformed predictions. The conformal classifier outputs a smaller set of probable predictions for a similar confidence guarantee.

“Combining test-time augmentation with conformal prediction is easy to implement, effective in practice, and requires no model retraining,” Shanmugam says.

In comparison with prior work in conformal prediction across several standard image classification benchmarks, their TTA-augmented method reduced prediction set sizes across experiments, from 10 to 30 percent.

Importantly, the technique achieves this reduction in prediction set size while maintaining the probability guarantee.

The researchers also found that, regardless that they’re sacrificing some labeled data that may normally be used for the conformal classification procedure, TTA boosts accuracy enough to outweigh the fee of losing those data.

“It raises interesting questions on how we used labeled data after model training. The allocation of labeled data between different post-training steps is a crucial direction for future work,” Shanmugam says.

In the longer term, the researchers wish to validate the effectiveness of such an approach within the context of models that classify text as an alternative of images. To further improve the work, the researchers are also considering ways to cut back the quantity of computation required for TTA.

This research is funded, partly, by the Wistrom Corporation.

Making AI models more trustworthy for high-stakes settings

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Learn how to construct Visual AI Agents with NVIDIA Cosmos Reason and Metropolis

Sentence Transformers is joining Hugging Face!

Forget AGI—Sam Altman celebrates ChatGPT finally following em dash formatting rules

How Relevance Models Foreshadowed Transformers for NLP

Fusing Communication and Compute with Recent Device API and Copy Engine Collectives in NVIDIA NCCL 2.28

Making AI models more trustworthy for high-stakes settings

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.