Home Artificial Intelligence Not One Size Suits All Drug Hunters Proceed to Pursue the Ultimate Breakthrough Are We Aiming for Faster? Cheaper? Higher? Exploring ML Approaches in Preclinical Drug Discovery Seamless Integration: Piecing Together Computational and Experimental Methods for Pipeline Success Discussion

Not One Size Suits All Drug Hunters Proceed to Pursue the Ultimate Breakthrough Are We Aiming for Faster? Cheaper? Higher? Exploring ML Approaches in Preclinical Drug Discovery Seamless Integration: Piecing Together Computational and Experimental Methods for Pipeline Success Discussion

2
Not One Size Suits All
Drug Hunters Proceed to Pursue the Ultimate Breakthrough
Are We Aiming for Faster? Cheaper? Higher?
Exploring ML Approaches in Preclinical Drug Discovery
Seamless Integration: Piecing Together Computational and Experimental Methods for Pipeline Success
Discussion

ML Models Must Match Their Use Cases in Drug Discovery

Co-authored by LabGenius’ CTO, Leo Wossnig.

Drug discovery is historically slow, expensive, and riddled with failures — AI/ML is changing this paradigm.

The drug development process stays expensive (estimated at $1–2B all-in cost) with high failure rates (estimated at >90% once reaching clinical trials). Most of us working in TechBio, specifically in drug discovery, are pursuing a typical goal: to get maximally efficacious treatments to patients as quickly and cheaply as possible.

Potential time saving with AI enabled drug discovery. (Source: Subbiah.)

Quicker and more efficacious drug development has been enabled by advances in lab automation, genomics, precision trials, wearables, sensor technology, and clinical trial recruitment and management (nice summary in Shah, et al.). While significant strides in these areas have been made, there continues to be a protracted strategy to go to unlock their full potential.

Here, we give attention to the power of machine learning (ML or artificial intelligence/AI) to revolutionise the drug discovery process. Within the media, now we have seen remarkable advances in ML, starting from acing standardised tests to generating art. Application of those same algorithms in drug discovery stays in its nascent phase, largely on account of challenges with acquiring the suitable training datasets and constructing biologically accurate models. Consequently, the vast potential of AI/ML to revolutionise this industry looms large, holding promise for a transformative breakthrough.

AI/ML in drug discovery has historically prioritised faster and cheaper but future breakthroughs will likely be focused on finding molecules.

The acceleration and price reduction of preclinical stages can have profound impact on the general drug discovery process. Biotech startups have already shown the power to drive down each of those variables to develop drugs faster and cheaper.

Biotech startups deliver latest molecular entities (NME) more cost efficiently than big pharma. (Source: Bay Bridge Bio evaluation.)

Regardless, there stays opportunity to achieve additional efficiencies: even a marginal reduction in time and price per program can lead to significant overall savings when multiplied by the sheer volume of early stage initiatives. As an illustration, if there are a thousand preclinical programs, and every can save every week of time and a thousand dollars in cost, .

The following breakthroughs are prone to give attention to higher quality drugs

Financial impact of savings in speed, quality, and price at each stage of the drug discovery pipeline. (Source: Bender and Cortes-Ciriano.)

Given the breakthroughs in speed and price, it seems likely that ML is most certainly to drive drug discovery by helping us to seek out and develop quality compounds. ‘Higher’ might be broadly defined as: superior drug targets that improve clinical outcomes; increased functional activity within the biophysical assays; lower rates of adversarial events in preclinical models and human subjects; and, ideally, higher efficacy in human patients. AI/ML has the potential to find latest molecules with superior properties along every certainly one of these axes, generating molecules that address the shortcomings of existing therapeutics. With higher quality comes reduced failure rates of medication throughout the invention pipeline, meaning more practical therapies for more patients.

Here we offer an outline of the preclinical stages at which ML might be used to enhance the invention of therapeutics, including the information available to deal with the issues and an outline of the present efforts.

This stage involves the identification of protein goal(s) and clinical indication(s) for a latest therapeutic. For instance, immuno-oncology often focuses on identifying receptors which can be uniquely co-expressed in cancers, but not (or to a lesser extent) on healthy tissues.

bulk gene expression (GEO, ArrayExpress), single cell gene expression (Human Cell Atlas, Single Cell Portal), proteomics (ProteomicsDB, PRIDE), histology (Human Protein Atlas), summary databases (Therapeutic Goal Database), and scientific literature (PubMed).

When it comes to publicly available data, goal identification is essentially the most data wealthy step of the drug discovery process. Available data sets span from molecular measurements that recorded human knowledge to approaches that integrate the information and prioritise targets with equal diversity.

Data driven approaches to focus on identification have focused on integrating different sources of omics data. Network biology, machine learning, and Bayesian approaches have all emerged to mix these data and propose therapeutic targets.

Examples of knowledge sources and approaches for his or her integration to discover promising drug targets. (Source: You, et al.)

With the emergence of CAR-T and bispecific antibody therapies, the query has modified from “which single goal maximally distinguishes cancer from normal tissues?” to “which combination of targets have this ability?” Multi-specifics (therapies which goal multiple antigens) can decrease the on-target, off-tumour effects of cancer therapies by limiting their killing of healthy cells. An identical effect might be achieved by engineering antibodies to have avidity driven activity (more lively when more antigen is present) or selective activity in physiological properties unique to the disease environment (e.g. low pH and high ATP levels in tumours).

Example integration of knowledge to construct logic gated CAR-T therapies. (Source: Dannenfelser, et al.)

Natural language processing (NLP) and biomedical query answering has been useful for querying data that’s otherwise locked in scientific literature. An emerging solution is to question large language models (LLMs) including each generic models, like ChatGPT, and domain specific models, like BioMedLM. For instance, querying ChatGPT “What proteins may very well be targeted for the treatment of triple negative breast cancer by antibody therapy?” yields the suggestions of EGFR, VEGF, PD-L1, PARP, and IGF-1R. While none of those are revolutionary proposals, more domain trained LLMs are prone to aid within the acceleration of goal identification within the near future.

2. Lead Identification

This step goals to seek out binders for the predefined goal.

protein structures (PDB, SAbDab), patents (Google Patents, Lens.Org).

Generative ML is a category of ML models (including GNNs, LLMs, GANS, VAEs, and diffusion networks) that generate novel data/responses. An emerging capability of generative ML in drug discovery is the de novo design of molecules. On this instance, an ML model generates entire molecules (SMILES, structures, sequences, etc.) when given an input drug goal often in the shape of a sequence or structure. Most of one of the best performing models for protein design are diffusion based, with some even co-generating sequence and structure concurrently.

Recent papers come out on a weekly basis that proceed to push the boundaries of what is feasible with these generative algorithms for protein sequence synthesis. For instance: Yeh et al. generate novel luciferases by ‘hallucinating’ structures after which identifying the sequences that fulfil them; Wu et al. give attention to binding repeated peptide sequences through docking and hashing peptides; and Luo, et al use diffusion models to generate antigen-specific binders.

. Example workflow for generating CDR designs for antigen-antibody interactions. (Source: Luo, et al.)

Because of the increasing availability of open-sourced or commercially available generative models most firms can now make use of them. Nonetheless, they’re left to grapple with find out how to evaluate these latest technological capabilities alongside their current toolkit; answering questions akin to “do you could have a better likelihood of success with de novo hits or naive/synthetic phage computer screen or animal immunisation?” and “What number of de novo hits must be tested before achieving biological validation of activity?”

For enterprise capital firms (VCs), these questions are compounded. How do they discover whether portfolio firms are capturing the utmost value of generative technologies? (see questions from a16z). As these are early stage methods, major limitations remain:

  • In drug discovery, coming up with complementary structures that bind shouldn’t be sufficient for a drug. As an alternative, we want to introduce multiple properties in parallel, akin to function (which commonly goes beyond the affinity of the protein against the goal, e.g. cytotoxicity, in vivo efficacy), developability (e.g. thermostability, yield, purity), and safety (e.g. specificity, immunogenicity). That is what we call goal-directed de novo design, goal-directed generative design, or multi-objective generative design.
  • Whilst they work relatively well when used off-the-shelf for quite common proteins and targets (e.g. kinases), these methods face challenges in area of interest areas akin to VHHs, multi-specifics (e.g. BiTEs), conditional antibodies, or other applications where little data is accessible (crystal structures for proteins are sometimes the limiting factor).
  • Going from mono-VHHs to multi-valent and multispecific antibodies, goes beyond the training data and capabilities of existing approaches. This might be driven by the complexity and novelty of the molecules, in addition to the challenges with predicting the impacts of long/flexible linker regions. A lot of biotechs’ attention is concentrated on these more complex formats as they’ve the potential to beat drawbacks in existing treatments.

Viewing the positive angle, on account of there being plenty of knowledge available for common targets, generative methods might already have the option to assist us with the design of higher binders. While binders are only step one within the lengthy drug discovery process, that is actually useful for a lot of biotech firms as it could reduce timelines ahead of the lead optimisation process starting.

3. Lead Optimisation

Requires finding one of the best molecule over an outlined space while concurrently optimising for multiple properties, generally known as ‘co-optimisation’.

most data at this stage are private assets.

Once a lead molecule has been identified for therapeutic development, the subsequent task is to optimise the therapeutic properties of the molecule so it may well progress towards the clinic. Unfortunately, this shouldn’t be so simple as maximising a single property of the molecule (e.g. binding selectivity to cancerous cells). As an alternative, drug developers must explore a high dimensional space to make sure that the molecule is efficacious, secure, manufacturable, and stable.

. Visualisation of drug development space being considered when developing latest therapeutics. (Source: LabGenius.)

Unsurprisingly, modulating the specified property often inadvertently worsens others (for example in medicinal chemistry). In a short time, these multi-dimensional design and measurement spaces turn out to be massive. As one example, let’s say now we have a lead mono-VHH molecule with a sequence of 130 amino acids for optimisation along the axes of potency, specificity, toxicity, immunogenicity, yield, and thermostability. There are as much as 20^130 variants of this (relatively short) molecule which we will measure along 6 axes. How can we efficiently explore and optimise this sequence space?

Researchers are commonly turning to lively learning methods to resolve this query. Energetic learning is a machine learning technique where the model actively and prospectively selects essentially the most informative data points to learn from, fairly than using a hard and fast set of coaching data. This approach helps improve the model’s performance with less (labelled) data, making the educational process more efficient and accurate. For background on lively learning, we recommend this very useful explanation of lively learning in Bayesian optimisation and this dive into more technical details.

Although the potential applications of lively learning are far reaching, in drug discovery it’s normally utilized in tandem with supervised ML to enhance a (multi objective) optimisation process. That is achieved by choosing the molecules such that the resulting models are optimal for the defined criteria along the Pareto frontier. Example published lively learning work includes the sector of antibody design (AntBO and Search engine optimisation, et al.) and (more bountifully) chemical exploration (Yang, et al., Berenger, et al, Khalak, et al., Gusev, et al., Thompson, et al., and Graff, et al.).

At LabGenius, now we have deployed lively learning through Multi-Objective Bayesian Optimisation to optimise a HER2 T-cell engager (full poster). Within the optimisation, we concurrently improve T-cell activation and tumour selectivity (increased activity in cells with high HER2 density relative to cells with low HER2 density) over a 5 cycle design campaign. The common performance of the highest 25 molecules improves with every cycle and surpasses the benchmark molecule, Runimotamab, after 5 cycles with respect to compound rating (which represents each normalised activation and selectivity).

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here