A technique to interpret AI may not be so interpretable in any case

As autonomous systems and artificial intelligence grow to be increasingly common in day by day life, latest methods are emerging to assist humans check that these systems are behaving as expected. One method, called formal specifications, uses mathematical formulas that may be translated into natural-language expressions. Some researchers claim that this method may be used to spell out decisions an AI will make in a way that’s interpretable to humans.

MIT Lincoln Laboratory researchers wanted to examine such claims of interpretability. Their findings point to the other: Formal specifications don’t appear to be interpretable by humans. Within the team’s study, participants were asked to examine whether an AI agent’s plan would reach a virtual game. Presented with the formal specification of the plan, the participants were correct lower than half of the time.

“The outcomes are bad news for researchers who’ve been claiming that formal methods lent interpretability to systems. It is likely to be true in some restricted and abstract sense, but not for anything near practical system validation,” says Hosea Siu, a researcher within the laboratory’s AI Technology Group. The group’s paper was accepted to the 2023 International Conference on Intelligent Robots and Systems held earlier this month.

Interpretability is essential since it allows humans to put trust in a machine when utilized in the true world. If a robot or AI can explain its actions, then humans can resolve whether it needs adjustments or may be trusted to make fair decisions. An interpretable system also enables the users of technology — not only the developers — to grasp and trust its capabilities. Nonetheless, interpretability has long been a challenge in the sphere of AI and autonomy. The machine learning process happens in a “black box,” so model developers often cannot explain why or how a system got here to a certain decision.

“When researchers say ‘our machine learning system is accurate,’ we ask ‘how accurate?’ and ‘using what data?’ and if that information is not provided, we reject the claim. We have not been doing that much when researchers say ‘our machine learning system is interpretable,’ and we’d like to begin holding those claims as much as more scrutiny,” Siu says.

Lost in translation

For his or her experiment, the researchers sought to find out whether formal specifications made the behavior of a system more interpretable. They focused on people’s ability to make use of such specifications to validate a system — that’s, to grasp whether the system all the time met the user’s goals.

Applying formal specifications for this purpose is basically a by-product of its original use. Formal specifications are a part of a broader set of formal methods that use logical expressions as a mathematical framework to explain the behavior of a model. Since the model is built on a logical flow, engineers can use “model checkers” to mathematically prove facts concerning the system, including when it’s or is not possible for the system to finish a task. Now, researchers try to make use of this same framework as a translational tool for humans.

“Researchers confuse the incontrovertible fact that formal specifications have precise semantics with them being interpretable to humans. These are usually not the identical thing,” Siu says. “We realized that next-to-nobody was checking to see if people actually understood the outputs.”

Within the team’s experiment, participants were asked to validate a reasonably easy set of behaviors with a robot playing a game of capture the flag, mainly answering the query “If the robot follows these rules exactly, does it all the time win?”

Participants included each experts and nonexperts in formal methods. They received the formal specifications in 3 ways — a “raw” logical formula, the formula translated into words closer to natural language, and a decision-tree format. Decision trees particularly are sometimes considered within the AI world to be a human-interpretable strategy to show AI or robot decision-making.

The outcomes: “Validation performance on the entire was quite terrible, with around 45 percent accuracy, whatever the presentation type,” Siu says.

Confidently mistaken

Those previously trained in formal specifications only did barely higher than novices. Nonetheless, the experts reported way more confidence of their answers, no matter whether or not they were correct or not. Across the board, people tended to over-trust the correctness of specifications put in front of them, meaning that they ignored rule sets allowing for game losses. This confirmation bias is especially concerning for system validation, the researchers say, because persons are more more likely to overlook failure modes.

“We do not think that this result means we should always abandon formal specifications as a strategy to explain system behaviors to people. But we do think that so much more work needs to enter the design of how they’re presented to people and into the workflow wherein people use them,” Siu adds.

When considering why the outcomes were so poor, Siu recognizes that even individuals who work on formal methods aren’t quite trained to examine specifications because the experiment asked them to. And, pondering through all of the possible outcomes of a algorithm is difficult. Even so, the rule sets shown to participants were short, akin to not more than a paragraph of text, “much shorter than anything you’d encounter in any real system,” Siu says.

The team is not attempting to tie their results on to the performance of humans in real-world robot validation. As an alternative, they aim to make use of the outcomes as a start line to think about what the formal logic community could also be missing when claiming interpretability, and the way such claims may play out in the true world.

This research was conducted as part of a bigger project Siu and teammates are working on to enhance the connection between robots and human operators, especially those within the military. The strategy of programming robotics can often leave operators out of the loop. With the same goal of improving interpretability and trust, the project is attempting to allow operators to show tasks to robots directly, in ways which can be much like training humans. Such a process could improve each the operator’s confidence within the robot and the robot’s adaptability.

Ultimately, they hope the outcomes of this study and their ongoing research can higher the applying of autonomy, because it becomes more embedded in human life and decision-making.

“Our results push for the necessity to do human evaluations of certain systems and ideas of autonomy and AI before too many claims are made about their utility with humans,” Siu adds.

A technique to interpret AI may not be so interpretable in any case

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

MIT within the media: 2025 in review

Introducing the Open Leaderboard for Japanese LLMs!

ChatLLM Presents a Streamlined Solution to Addressing the Real Bottleneck in AI

From Files to Chunks: Improving HF Storage Efficiency

The Geometry of Laziness: What Angles Reveal About AI Hallucinations

A technique to interpret AI may not be so interpretable in any case

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.