Home Artificial Intelligence Latest model offers a strategy to speed up drug discovery

Latest model offers a strategy to speed up drug discovery

0
Latest model offers a strategy to speed up drug discovery

Huge libraries of drug compounds may hold potential treatments for quite a lot of diseases, resembling cancer or heart disease. Ideally, scientists would love to experimentally test each of those compounds against all possible targets, but doing that type of screen is prohibitively time-consuming.

Lately, researchers have begun using computational methods to screen those libraries in hopes of speeding up drug discovery. Nevertheless, lots of those methods also take an extended time, as most of them calculate each goal protein’s three-dimensional structure from its amino-acid sequence, then use those structures to predict which drug molecules it’ll interact with.

Researchers at MIT and Tufts University have now devised another computational approach based on a style of artificial intelligence algorithm generally known as a big language model. These models — one well-known example is ChatGPT — can analyze huge amounts of text and determine which words (or, on this case, amino acids) are more than likely to look together. The brand new model, generally known as ConPLex, can match goal proteins with potential drug molecules without having to perform the computationally intensive step of calculating the molecules’ structures.

Using this method, the researchers can screen greater than 100 million compounds in a single day — far more than any existing model.

“This work addresses the necessity for efficient and accurate in silico screening of potential drug candidates, and the scalability of the model enables large-scale screens for assessing off-target effects, drug repurposing, and determining the impact of mutations on drug binding,” says Bonnie Berger, the Simons Professor of Mathematics, head of the Computation and Biology group in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), and considered one of the senior authors of the brand new study.

Lenore Cowen, a professor of computer science at Tufts University, can also be a senior writer of the paper, which appears this week within the . Rohit Singh, a CSAIL research scientist, and Samuel Sledzieski, an MIT graduate student, are the lead authors of the paper, and Bryan Bryson, an associate professor of biological engineering at MIT and a member of the Ragon Institute of MGH, MIT, and Harvard, can also be an writer. Along with the paper, the researchers have made their model available online for other scientists to make use of.

Making predictions

Lately, computational scientists have made great advances in developing models that may predict the structures of proteins based on their amino-acid sequences. Nevertheless, using these models to predict how a big library of potential drugs might interact with a cancerous protein, for instance, has proven difficult, mainly because calculating the three-dimensional structures of the proteins requires an incredible deal of time and computing power.

An extra obstacle is that these sorts of models don’t have track record for eliminating compounds generally known as decoys, that are very much like a successful drug but don’t actually interact well with the goal.

“One among the longstanding challenges in the sector has been that these methods are fragile, within the sense that if I gave the model a drug or a small molecule that looked almost just like the true thing, however it was barely different in some subtle way, the model might still predict that they are going to interact, although it mustn’t,” Singh says.

Researchers have designed models that may overcome this sort of fragility, but they are frequently tailored to only one class of drug molecules, they usually aren’t well-suited to large-scale screens since the computations take too long. 

The MIT team decided to take another approach, based on a protein model they first developed in 2019. Working with a database of greater than 20,000 proteins, the language model encodes this information into meaningful numerical representations of every amino-acid sequence that capture associations between sequence and structure.

“With these language models, even proteins which have very different sequences but potentially have similar structures or similar functions might be represented in an analogous way on this language space, and we’re in a position to benefit from that to make our predictions,” Sledzieski says.

Of their latest study, the researchers applied the protein model to the duty of determining which protein sequences will interact with specific drug molecules, each of which have numerical representations which can be transformed into a standard, shared space by a neural network. They trained the network on known protein-drug interactions, which allowed it to learn to associate specific features of the proteins with drug-binding ability, without having to calculate the 3D structure of any of the molecules.

“With this high-quality numerical representation, the model can short-circuit the atomic representation entirely, and from these numbers predict whether or not this drug will bind,” Singh says. “The advantage of that is that you just avoid the necessity to undergo an atomic representation, however the numbers still have all of the knowledge that you just need.”

One other advantage of this approach is that it takes under consideration the pliability of protein structures, which might be “wiggly” and tackle barely different shapes when interacting with a drug molecule.

High affinity

To make their model less more likely to be fooled by decoy drug molecules, the researchers also incorporated a training stage based on the concept of contrastive learning. Under this approach, the researchers give the model examples of “real” drugs and imposters and teach it to differentiate between them.

The researchers then tested their model by screening a library of about 4,700 candidate drug molecules for his or her ability to bind to a set of 51 enzymes generally known as protein kinases.

From the highest hits, the researchers selected 19 drug-protein pairs to check experimentally. The experiments revealed that of the 19 hits, 12 had strong binding affinity (within the nanomolar range), whereas nearly all of the various other possible drug-protein pairs would haven’t any affinity. 4 of those pairs certain with extremely high, sub-nanomolar affinity (so strong that a tiny drug concentration, on the order of parts per billion, will inhibit the protein).

While the researchers focused mainly on screening small-molecule drugs on this study, they at the moment are working on applying this approach to other sorts of drugs, resembling therapeutic antibodies. This sort of modeling could also prove useful for running toxicity screens of potential drug compounds, to ensure they don’t have any unwanted uncomfortable side effects before testing them in animal models.

“A part of the rationale why drug discovery is so expensive is since it has high failure rates. If we will reduce those failure rates by saying upfront that this drug is just not more likely to work out, that might go a good distance in lowering the price of drug discovery,” Singh says.

This latest approach “represents a big breakthrough in drug-target interaction prediction and opens up additional opportunities for future research to further enhance its capabilities,” says Eytan Ruppin, chief of the Cancer Data Science Laboratory on the National Cancer Institute, who was not involved within the study. “For instance, incorporating structural information into the latent space or exploring molecular generation methods for generating decoys could further improve predictions.”

The research was funded by the National Institutes of Health, the National Science Foundation, and the Phillip and Susan Ragon Foundation.

LEAVE A REPLY

Please enter your comment!
Please enter your name here