Home Artificial Intelligence In machine learning, synthetic data can offer real performance improvements

In machine learning, synthetic data can offer real performance improvements

In machine learning, synthetic data can offer real performance improvements

Teaching a machine to acknowledge human actions has many potential applications, corresponding to routinely detecting staff who fall at a construction site or enabling a sensible home robot to interpret a user’s gestures.

To do that, researchers train machine-learning models using vast datasets of video clips that show humans performing actions. Nevertheless, not only is it expensive and laborious to collect and label tens of millions or billions of videos, however the clips often contain sensitive information, like people’s faces or license plate numbers. Using these videos may additionally violate copyright or data protection laws. And this assumes the video data are publicly available in the primary place — many datasets are owned by corporations and aren’t free to make use of.

So, researchers are turning to synthetic datasets. These are made by a pc that uses 3D models of scenes, objects, and humans to quickly produce many ranging clips of specific actions — without the potential copyright issues or ethical concerns that include real data.

But are synthetic data as “good” as real data? How well does a model trained with these data perform when it’s asked to categorise real human actions? A team of researchers at MIT, the MIT-IBM Watson AI Lab, and Boston University sought to reply this query. They built an artificial dataset of 150,000 video clips that captured a big selection of human actions, which they used to coach machine-learning models. Then they showed these models six datasets of real-world videos to see how well they may learn to acknowledge actions in those clips.

The researchers found that the synthetically trained models performed even higher than models trained on real data for videos which have fewer background objects.

This work could help researchers use synthetic datasets in such a way that models achieve higher accuracy on real-world tasks. It could also help scientists discover which machine-learning applications could possibly be best-suited for training with synthetic data, in an effort to mitigate a number of the ethical, privacy, and copyright concerns of using real datasets.

“The last word goal of our research is to interchange real data pretraining with synthetic data pretraining. There’s a value in creating an motion in synthetic data, but once that is finished, you then can generate a limiteless variety of images or videos by changing the pose, the lighting, etc. That’s the fantastic thing about synthetic data,” says Rogerio Feris, principal scientist and manager on the MIT-IBM Watson AI Lab, and co-author of a paper detailing this research.

The paper is authored by lead creator Yo-whan “John” Kim ’22; Aude Oliva, director of strategic industry engagement on the MIT Schwarzman College of Computing, MIT director of the MIT-IBM Watson AI Lab, and a senior research scientist within the Computer Science and Artificial Intelligence Laboratory (CSAIL); and 7 others. The research will likely be presented on the Conference on Neural Information Processing Systems.   

Constructing an artificial dataset

The researchers began by compiling a recent dataset using three publicly available datasets of synthetic video clips that captured human actions. Their dataset, called Synthetic Motion Pre-training and Transfer (SynAPT), contained 150 motion categories, with 1,000 video clips per category.

They chose as many motion categories as possible, corresponding to people waving or falling on the ground, depending on the supply of clips that contained clean video data.

Once the dataset was prepared, they used it to pretrain three machine-learning models to acknowledge the actions. Pretraining involves training a model for one task to provide it a head-start for learning other tasks. Inspired by the way in which people learn — we reuse old knowledge after we learn something recent — the pretrained model can use the parameters it has already learned to assist it learn a recent task with a recent dataset faster and more effectively.

They tested the pretrained models using six datasets of real video clips, each capturing classes of actions that were different from those within the training data.

The researchers were surprised to see that each one three synthetic models outperformed models trained with real video clips on 4 of the six datasets. Their accuracy was highest for datasets that contained video clips with “low scene-object bias.”

Low scene-object bias signifies that the model cannot recognize the motion by taking a look at the background or other objects within the scene — it must give attention to the motion itself. For instance, if the model is tasked with classifying diving poses in video clips of individuals diving right into a swimming pool, it cannot discover a pose by taking a look at the water or the tiles on the wall. It must give attention to the person’s motion and position to categorise the motion.

“In videos with low scene-object bias, the temporal dynamics of the actions is more vital than the looks of the objects or the background, and that appears to be well-captured with synthetic data,” Feris says.

“High scene-object bias can actually act as an obstacle. The model might misclassify an motion by taking a look at an object, not the motion itself. It may well confuse the model,” Kim explains.

Boosting performance

Constructing off these results, the researchers want to incorporate more motion classes and extra synthetic video platforms in future work, eventually making a catalog of models which have been pretrained using synthetic data, says co-author Rameswar Panda, a research staff member on the MIT-IBM Watson AI Lab.

“We would like to construct models which have very similar performance and even higher performance than the prevailing models within the literature, but without being certain by any of those biases or security concerns,” he adds.

In addition they need to mix their work with research that seeks to generate more accurate and realistic synthetic videos, which could boost the performance of the models, says SouYoung Jin, a co-author and CSAIL postdoc. She can be serious about exploring how models might learn in a different way once they are trained with synthetic data.

“We use synthetic datasets to stop privacy issues or contextual or social bias, but what does the model actually learn? Does it learn something that’s unbiased?” she says.

Now that they’ve demonstrated this use potential for synthetic videos, they hope other researchers will construct upon their work.

“Despite there being a lower cost to obtaining well-annotated synthetic data, currently we should not have a dataset with the dimensions to rival the largest annotated datasets with real videos. By discussing the various costs and concerns with real videos, and showing the efficacy of synthetic data, we hope to motivate efforts on this direction,” adds co-author Samarth Mishra, a graduate student at Boston University (BU).

Additional co-authors include Hilde Kuehne, professor of computer science at Goethe University in Germany and an affiliated professor on the MIT-IBM Watson AI Lab; Leonid Karlinsky, research staff member on the MIT-IBM Watson AI Lab; Venkatesh Saligrama, professor within the Department of Electrical and Computer Engineering at BU; and Kate Saenko, associate professor within the Department of Computer Science at BU and a consulting professor on the MIT-IBM Watson AI Lab.

This research was supported by the Defense Advanced Research Projects Agency LwLL, in addition to the MIT-IBM Watson AI Lab and its member corporations, Nexplore and Woodside.



Please enter your comment!
Please enter your name here