3 Questions: Should we label AI systems like we do pharmaceuticals?

MIT News

Q: Why do we want responsible use labels for AI systems in health care settings?

A: In a health setting, we have now an interesting situation where doctors often depend on technology or treatments that aren’t fully understood. Sometimes this lack of awareness is prime — the mechanism behind acetaminophen as an example — but other times that is only a limit of specialization. We don’t expect clinicians to know the best way to service an MRI machine, as an example. As an alternative, we have now certification systems through the FDA or other federal agencies, that certify the usage of a medical device or drug in a particular setting.

Importantly, medical devices also have service contracts — a technician from the manufacturer will fix your MRI machine whether it is miscalibrated. For approved drugs, there are postmarket surveillance and reporting systems in order that hostile effects or events could be addressed, as an example if quite a lot of people taking a drug appear to be developing a condition or allergy.

Models and algorithms, whether or not they incorporate AI or not, skirt quite a lot of these approval and long-term monitoring processes, and that’s something we should be wary of. Many prior studies have shown that predictive models need more careful evaluation and monitoring. With newer generative AI specifically, we cite work that has demonstrated generation shouldn’t be guaranteed to be appropriate, robust, or unbiased. Because we don’t have the identical level of surveillance on model predictions or generation, it could be much more difficult to catch a model’s problematic responses. The generative models getting used by hospitals immediately may very well be biased. Having use labels is a method of ensuring that models don’t automate biases which can be learned from human practitioners or miscalibrated clinical decision support scores of the past.

Q: Your article describes several components of a responsible use label for AI, following the FDA approach for creating prescription labels, including approved usage, ingredients, potential uncomfortable side effects, etc. What core information should these labels convey?

A: The things a label should make obvious are time, place, and manner of a model’s intended use. As an illustration, the user should know that models were trained at a particular time with data from a particular time point. As an illustration, does it include data that did or didn’t include the Covid-19 pandemic? There have been very different health practices during Covid that would impact the information. That is why we advocate for the model “ingredients” and “accomplished studies” to be disclosed.

For place, we all know from prior research that models trained in a single location are likely to have worse performance when moved to a different location. Knowing where the information were from and the way a model was optimized inside that population can assist to make sure that users are aware of “potential uncomfortable side effects,” any “warnings and precautions,” and “hostile reactions.”

With a model trained to predict one consequence, knowing the time and place of coaching could allow you to make intelligent judgements about deployment. But many generative models are incredibly flexible and could be used for a lot of tasks. Here, time and place is probably not as informative, and more explicit direction about “conditions of labeling” and “approved usage” versus “unapproved usage” come into play. If a developer has evaluated a generative model for reading a patient’s clinical notes and generating prospective billing codes, they’ll disclose that it has bias toward overbilling for specific conditions or underrecognizing others. A user wouldn’t need to use this same generative model to choose who gets a referral to a specialist, although they might. This flexibility is why we advocate for extra details on the manner by which models ought to be used.

Generally, we advocate that you must train one of the best model you may, using the tools available to you. But even then, there ought to be quite a lot of disclosure. No model goes to be perfect. As a society, we now understand that no pill is ideal — there’s all the time some risk. We should always have the identical understanding of AI models. Any model — with or without AI — is limited. It could be providing you with realistic, well-trained, forecasts of potential futures, but take that with whatever grain of salt is suitable.

Q: If AI labels were to be implemented, who would do the labeling and the way would labels be regulated and enforced?

A: For those who don’t intend on your model to be utilized in practice, then the disclosures you’ll make for a high-quality research publication are sufficient. But once you plan your model to be deployed in a human-facing setting, developers and deployers should do an initial labeling, based on a few of the established frameworks. There ought to be a validation of those claims prior to deployment; in a safety-critical setting like health care, many agencies of the Department of Health and Human Services may very well be involved.

For model developers, I feel that knowing you will want to label the restrictions of a system induces more careful consideration of the method itself. If I do know that in some unspecified time in the future I’m going to should disclose the population upon which a model was trained, I might not need to disclose that it was trained only on dialogue from male chatbot users, as an example.

Eager about things like who the information are collected on, over what time period, what the sample size was, and the way you made the choice what data to incorporate or exclude, can open your mind as much as potential problems at deployment.

3 Questions: Should we label AI systems like we do pharmaceuticals?

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Deploy Your AI Assistant to Monitor and Debug n8n Workflows Using Claude and MCP

Tips on how to Evaluate Retrieval Quality in RAG Pipelines (Part 3): DCG@k and NDCG@k

OpenAI Is Quietly Constructing Your Next Health Assistant

Meta’s chief AI scientist maps his exit

Improving VMware migration workflows with agentic AI

3 Questions: Should we label AI systems like we do pharmaceuticals?

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.