Home Artificial Intelligence MIT scientists construct a system that may generate AI models for biology research

MIT scientists construct a system that may generate AI models for biology research

8
MIT scientists construct a system that may generate AI models for biology research

Is it possible to construct machine-learning models without machine-learning expertise?

Jim Collins, the Termeer Professor of Medical Engineering and Science within the Department of Biological Engineering at MIT and the life sciences faculty lead on the Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic), together with plenty of colleagues decided to tackle this problem when facing the same conundrum. An open-access paper on their proposed solution, called BioAutoMATED, was published on June 21 in .

Recruiting machine-learning researchers generally is a time-consuming and financially costly process for science and engineering labs. Even with a machine-learning expert, choosing the suitable model, formatting the dataset for the model, then fine-tuning it could possibly dramatically change how the model performs, and takes a number of work. 

“In your machine-learning project, how much time will you usually spend on data preparation and transformation?” asks a 2022 Google course on the Foundations of Machine Learning (ML). The 2 decisions offered are either “Lower than half the project time” or “Greater than half the project time.” Should you guessed the latter, you could be correct; Google states that it takes over 80 percent of project time to format the info, and that’s not even taking into consideration the time needed to border the issue in machine-learning terms.

“It could take many weeks of effort to work out the suitable model for our dataset, and it is a really prohibitive step for a number of folks that need to use machine learning or biology,” says Jacqueline Valeri, a fifth-year PhD student of biological engineering in Collins’s lab who’s first co-author of the paper. 

BioAutoMATED is an automatic machine-learning system that may select and construct an appropriate model for a given dataset and even handle the laborious task of knowledge preprocessing, whittling down a months-long process to simply a number of hours. Automated machine-learning (AutoML) systems are still in a comparatively nascent stage of development, with current usage primarily focused on image and text recognition, but largely unused in subfields of biology, points out first co-author and Jameel Clinic postdoc Luis Soenksen PhD ’20.

“The elemental language of biology relies on sequences,” explains Soenksen, who earned his doctorate within the MIT Department of Mechanical Engineering. “Biological sequences reminiscent of DNA, RNA, proteins, and glycans have the amazing informational property of being intrinsically standardized, like an alphabet. A number of AutoML tools are developed for text, so it made sense to increase it to [biological] sequences.”

Furthermore, most AutoML tools can only explore and construct reduced forms of models. “But you’ll be able to’t really know from the beginning of a project which model might be best in your dataset,” Valeri says. “By incorporating multiple tools under one umbrella tool, we actually allow a much larger search space than any individual AutoML tool could achieve by itself.”

BioAutoMATED’s repertoire of supervised ML models includes three types: binary classification models (dividing data into two classes), multi-class classification models (dividing data into multiple classes), and regression models (fitting continuous numerical values or measuring the strength of key relationships between variables). BioAutoMATED is even in a position to help determine how much data is required to appropriately train the chosen model.

“Our tool explores models which can be better-suited for smaller, sparser biological datasets in addition to more complex neural networks,” Valeri says. This is a bonus for research groups with latest data that will or might not be fitted to a machine learning problem.

“Conducting novel and successful experiments on the intersection of biology and machine learning can cost a number of money,” Soenksen explains. “Currently, biology-centric labs need to speculate in significant digital infrastructure and AI-ML trained human resources before they’ll even see if their ideas are poised to pan out. We wish to lower these barriers for domain experts in biology.” With BioAutoMATED, researchers have the liberty to run initial experiments to evaluate if it’s worthwhile to rent a machine-learning expert to construct a distinct model for further experimentation. 

The open-source code is publicly available and, researchers emphasize, it is simple to run. “What we might like to see is for people to take our code, improve it, and collaborate with larger communities to make it a tool for all,” Soenksen says. “We wish to prime the biological research community and generate awareness related to AutoML techniques, as a seriously useful pathway that might merge rigorous biological practice with fast-paced AI-ML practice higher than it’s achieved today.”

Collins, the senior writer on the paper, can be affiliated with the MIT Institute for Medical Engineering and Science, the Harvard-MIT Program in Health Sciences and Technology, the Broad Institute of MIT and Harvard, and the Wyss Institute. Additional MIT contributors to the paper include Katherine M. Collins ’21; Nicolaas M. Angenent-Mari PhD ’21; Felix Wong, a former postdoc within the Department of Biological Engineering, IMES, and the Broad Institute; and Timothy K. Lu, a professor of biological engineering and of electrical engineering and computer science.

This work was supported, partially, by a Defense Threat Reduction Agency grant, the Defense Advance Research Projects Agency SD2 program, the Paul G. Allen Frontiers Group, the Wyss Institute for Biologically Inspired Engineering of Harvard University; an MIT-Takeda Fellowship, a Siebel Foundation Scholarship, a CONACyT grant, an MIT-TATA Center fellowship, a Johnson & Johnson Undergraduate Research Scholarship, a Barry Goldwater Scholarship, a Marshall Scholarship, Cambridge Trust, and the National Institute of Allergy and Infectious Diseases of the National Institutes of Health. This work is an element of the Antibiotics-AI Project, which is supported by the Audacious Project, Flu Lab, LLC, the Sea Grape Foundation, Rosamund Zander and Hansjorg Wyss for the Wyss Foundation, and an anonymous donor.

8 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here