Home Artificial Intelligence Charge Descriptors (features) for molecular property prediction.

Charge Descriptors (features) for molecular property prediction.

Charge Descriptors (features) for molecular property prediction.

Molecular Atomic Charges

On this post i try to analyze concerning the charge descriptors fingerprints as one may call it . David Winkler 2009 published an article on it Towards Novel Universal Descriptors: Charge Fingerprints . It looks very interesting and may provide insights into the electrostatic properties of molecules, which play a very important role in molecular interactions and binding affinity. By encoding the partial charges of atoms, charge fingerprints will help in comparing the similarities and differences between molecules based on their charge distribution, and may assist in predicting the activity of compounds in biological systems or their physicochemical properties.

The calculation of charge fingerprints typically involves two predominant steps. First, the partial charges of atoms within the molecule are computed using a charge model, similar to Gasteiger or MMFF94 in open babel there are several ways you possibly can compute. One other quantum chemistry package is psi4 which might be utilized to calculate mulliken charges for every atom. These models are based on various approximations and empirical rules derived from quantum chemistry calculations and experimental data. Methods like Gasteiger’s approach to charge equalization and in addition the newer electronegativity equalization method (EEM) based on Sanderson’s
equation. Other methods similar to semiempirical molecular orbital methods, DFT, or ab initio methods will also be used to calculate atom charges if the bin boundaries are set appropriately. Nevertheless i didnt try to duplicate the paper but i attempted to make use of the thought to construct a fingerprints.

Once the partial charges are obtained, they might be encoded right into a fingerprint, which is generally a binary vector of fixed length. A standard approach to generate the fingerprint is by discretizing the charge values into bins and assigning each atom to a selected bin. This leads to a sparse binary vector, where each element corresponds to a specific atom and charge bin combination. The presence of a ‘1’ at a selected position within the vector indicates that the corresponding atom has a partial charge throughout the range of the associated bin. By comparing the charge fingerprints of various molecules, one can assess their similarity by way of electrostatic properties, which is crucial for various cheminformatics tasks similar to virtual screening, similarity searching, and property prediction.

Code below shows the best way the you possibly can generate this fingerprints with mmff94 force feild.

Adding hydrogens to a molecular structure before calculating charge descriptors is very important because hydrogen atoms play a big role within the distribution of charges inside a molecule. Most molecular representations, similar to SMILES or SDF, don’t explicitly include hydrogen atoms, as they are sometimes omitted for brevity and ease. Nevertheless, hydrogen atoms are involved in various chemical interactions, similar to hydrogen bonding and protonation/deprotonation, which may significantly impact a molecule’s charge distribution and its physicochemical properties. When calculating charge descriptors, the underlying charge models, like Gasteiger or MMFF94, need accurate information concerning the molecular structure to supply reliable partial charge estimates. By adding hydrogens explicitly to the molecule, you be sure that the charge models consider the proper bonding environment of every atom, leading to more accurate charge descriptors.

Then the subsequent part is pretty easy when you get the fingerprints and see whether those fingerprints does is sensible or not by training a model. I used xgboost here with 5 fold CV . The dataset i used to be enthusiastic about was herg which i considered from tdc benchmark study. Nevertheless i haven’t studied much on other datasets but results with these dataset looks this descriptor has something in it. The common auc got here around ROC-AUC: 0.7929

The test set results i tested goes below , it does look these features could possibly be a invaluable approach to use them into models.

Precision: 0.839 Recall: 0.959 Accuracy: 0.832 F1 Rating: 0.895

Charge Fingerprints might be highly invaluable in modeling various molecular properties and activities, as they supply insights into the electrostatic behavior of compounds, which is a key think about many chemical and biological interactions. By incorporating charge information into molecular models, it becomes possible to higher capture the nuances of molecular recognition, binding, and reactivity, resulting in more accurate predictions and improved understanding of the underlying molecular mechanisms.

Please leave comments when you find this concept useful and would like to explore more on this topic.



Please enter your comment!
Please enter your name here