Home Artificial Intelligence Researchers teach an AI to jot down higher chart captions

Researchers teach an AI to jot down higher chart captions

1
Researchers teach an AI to jot down higher chart captions

Chart captions that specify complex trends and patterns are vital for improving a reader’s ability to understand and retain the info being presented. And for individuals with visual disabilities, the data in a caption often provides their only technique of understanding the chart.

But writing effective, detailed captions is a labor-intensive process. While autocaptioning techniques can alleviate this burden, they often struggle to explain cognitive features that provide additional context.

To assist people creator high-quality chart captions, MIT researchers have developed a dataset to enhance automatic captioning systems. Using this tool, researchers could teach a machine-learning model to differ the extent of complexity and kind of content included in a chart caption based on the needs of users.

The MIT researchers found that machine-learning models trained for autocaptioning with their dataset consistently generated captions that were precise, semantically wealthy, and described data trends and complicated patterns. Quantitative and qualitative analyses revealed that their models captioned charts more effectively than other autocaptioning systems.  

The team’s goal is to offer the dataset, called VisText, as a tool researchers can use as they work on the thorny problem of chart autocaptioning. These automatic systems could help provide captions for uncaptioned online charts and improve accessibility for individuals with visual disabilities, says co-lead creator Angie Boggust, a graduate student in electrical engineering and computer science at MIT and member of the Visualization Group within the Computer Science and Artificial Intelligence Laboratory (CSAIL).

“We’ve tried to embed plenty of human values into our dataset in order that after we and other researchers are constructing automatic chart-captioning systems, we don’t find yourself with models that aren’t what people want or need,” she says.

Boggust is joined on the paper by co-lead creator and fellow graduate student Benny J. Tang and senior creator Arvind Satyanarayan, associate professor of computer science at MIT who leads the Visualization Group in CSAIL. The research can be presented on the Annual Meeting of the Association for Computational Linguistics.

Human-centered evaluation

The researchers were inspired to develop VisText from prior work within the Visualization Group that explored what makes an excellent chart caption. In that study, researchers found that sighted users and blind or low-vision users had different preferences for the complexity of semantic content in a caption. 

The group desired to bring that human-centered evaluation into autocaptioning research. To do this, they developed VisText, a dataset of charts and associated captions that might be used to coach machine-learning models to generate accurate, semantically wealthy, customizable captions.

Developing effective autocaptioning systems is not any easy task. Existing machine-learning methods often attempt to caption charts the way in which they might a picture, but people and models interpret natural images otherwise from how we read charts. Other techniques skip the visual content entirely and caption a chart using its underlying data table. Nonetheless, such data tables are sometimes not available after charts are published.

Given the shortfalls of using images and data tables, VisText also represents charts as scene graphs. Scene graphs, which may be extracted from a chart image, contain all of the chart data but in addition include additional image context.

“A scene graph is like the perfect of each worlds — it comprises just about all the data present in a picture while being easier to extract from images than data tables. Because it’s also text, we will leverage advances in modern large language models for captioning,” Tang explains.

They compiled a dataset that comprises greater than 12,000 charts — each represented as an information table, image, and scene graph — in addition to associated captions. Each chart has two separate captions: a low-level caption that describes the chart’s construction (like its axis ranges) and a higher-level caption that describes statistics, relationships in the info, and complicated trends.

The researchers generated low-level captions using an automatic system and crowdsourced higher-level captions from human staff.

“Our captions were informed by two key pieces of prior research: existing guidelines on accessible descriptions of visual media and a conceptual model from our group for categorizing semantic content. This ensured that our captions featured vital low-level chart elements like axes, scales, and units for readers with visual disabilities, while retaining human variability in how captions may be written,” says Tang.

Translating charts

Once that they had gathered chart images and captions, the researchers used VisText to coach five machine-learning models for autocaptioning. They desired to see how each representation — image, data table, and scene graph — and combos of the representations affected the standard of the caption.

“You possibly can take into consideration a chart captioning model like a model for language translation. But as a substitute of claiming, translate this German text to English, we’re saying translate this ‘chart language’ to English,” Boggust says.

Their results showed that models trained with scene graphs performed as well or higher than those trained using data tables. Since scene graphs are easier to extract from existing charts, the researchers argue that they could be a more useful representation.

Additionally they trained models with low-level and high-level captions individually. This system, generally known as semantic prefix tuning, enabled them to show the model to differ the complexity of the caption’s content.

As well as, they conducted a qualitative examination of captions produced by their best-performing method and categorized six sorts of common errors. As an example, a directional error occurs if a model says a trend is decreasing when it is definitely increasing.

This fine-grained, robust qualitative evaluation was vital for understanding how the model was making its errors. For instance, using quantitative methods, a directional error might incur the identical penalty as a repetition error, where the model repeats the identical word or phrase. But a directional error might be more misleading to a user than a repetition error. The qualitative evaluation helped them understand these kinds of subtleties, Boggust says.

These kinds of errors also expose limitations of current models and lift ethical considerations that researchers must consider as they work to develop autocaptioning systems, she adds.

Generative machine-learning models, resembling people who power ChatGPT, have been shown to hallucinate or give misinformation that may be misleading. While there’s a transparent profit to using these models for autocaptioning existing charts, it may lead to the spread of misinformation if charts are captioned incorrectly.

“Perhaps because of this we don’t just caption all the pieces in sight with AI. As an alternative, perhaps we offer these autocaptioning systems as authorship tools for people to edit. It will be significant to take into consideration these ethical implications throughout the research process, not only at the tip when we’ve got a model to deploy,” she says.

Boggust, Tang, and their colleagues need to proceed optimizing the models to scale back some common errors. Additionally they need to expand the VisText dataset to incorporate more charts, and more complex charts, resembling those with stacked bars or multiple lines. And they might also like to achieve insights into what these autocaptioning models are literally learning about chart data.

This research was supported, partly, by a Google Research Scholar Award, the National Science Foundation, the MLA@CSAIL Initiative, and the US Air Force Research Laboratory.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here