Artificial Intelligence Approach to Find Lead Compounds for Treating Tumors

It has been demonstrated that MMP13 enzyme is related to most cancer cell tumors. The world’s largest traditional Chinese medicine database was applied to screen for structure-based drug design and ligand-based drug design. To predict drug activity, machine learning models (Random Forest (RF), AdaBoost Regressor (ABR), Gradient Boosting Regressor (GBR)), and Deep Learning models were utilized to validate the Docking results, and they obtained an R2 of 0.922 on the training set and 0.804 on the test set in the RF algorithm. For the Deep Learning algorithm, R2 of the training set is 0.90, and R2 of the test set is 0.810.

However, these TCM compounds fly away during the molecular dynamics (MD) simulation. They seek another method: peptide design. All peptide database were screened by the Docking process. Modification peptides were optimized the interaction modes, and the affinities were assessed with ZDOCK protocol and Refine Docked protein protocol. The 300 ns MD simulation evaluated the stability of receptor–peptide complexes. The double-site effect appeared on S2, a designed peptide based on a known inhibitor, when complexed with BCL2. S3, a designed peptide referred from endogenous inhibitor P16, competed against cyclin when binding with CDK6. The MDM2 inhibitors S5 and S6 were derived from the P53 structure and stable binding with MDM2. A flexible region of peptides S5 and S6 may enhance the binding ability by changing its own conformation, which was unforeseen. These peptides (S2, S3, S5, and S6) are potentially interesting to treat cancer; however, these findings need to be affirmed by biological testing, which will be conducted in the near future.

Pearson correlation coefficient matrix heat map of 204 selected features.

For tumors, the second most deadly disease around the world both in developed and developing countries, it was a challenge to catch a metastatic target. Activation of the epidermal growth factor receptor (EGFR), occurred in many types of tumors and promoted tumor progression. The macrophage migration inhibitory factor (MIF) combined with EGFR further blocked the excitation of EGFR. However, activation of EGFR by mutation or its ligand binding enhanced the secretion of MMP13, which degraded extracellular MIF and resulted in elimination of negative regulation of MIF on EGFR. Inhibited MMP13 could slow down the expansion of cancer cells and the deterioration of the disease. Recent research has shown that MMP13 is related to colorectal metastases, breast tumors, and knee osteoarthritis.

Principal component analysis (PCA) visualization: (a) 2D, (b) 3D.

Network pharmacology-based analysis provided a multitargets concept that is related to the idea of cancer treatment. The effects of multiple drugs not only improve the efficacy of the treatment but also kill the diseased cells before the emergence of drug resistance. Several related target proteins, cyclin-dependent kinase inhibitor 2A (CDKN2A), P53, B-cell lymphoma 2 (BCL-2), cyclin-dependent kinase 6 (CDK6), and E3 ubiquitin-protein ligase MDM2 should be studied as well.

CDKN2A causes cell cycle arrest and inhibits tumor cell proliferation in cell cultures when it overexpressed.The protein works by inhibiting the activity of cyclin-dependent kinase 4 (CDK4) or CDK6. Tumor suppressor P16INK4a could bind with CDK6, which further inhibits tumor growth. Mutations in P53 tumor suppressor factors are the most common genetic changes in human cancers. The inactivation or mutation of tumor suppressor genes and the disorder of the balance that inhibits apoptosis can promote tumor development.The selection of P53 mutations in the course of tumor occurrence and development may lead to parallel inactivation of multiple tumor suppressor genes, which may be the main reason for the high frequency of P53 mutations in cancer. The interaction between angiopoietin and the P53 TAD2 domain in cancer cells could inhibit the function of P53 tumor inhibitors and promote cell survival.Abnormal regulation of Bcl-2 family members makes it possible to escape apoptosis and tumor resistance in chemotherapy. The literature provides the rationale for testing combined therapies that use C–X–C chemokine receptor type 4 (CXCR4) and Bcl-2 inhibitors to increase the efficacy of these agents.

Fine tuning the structure of the optimizer in a neural network.

Artificial intelligence is realized by a system that combines representation learning with sophisticated ratiocination. Multiple processing layers in Deep Learning can represent multiple levels of abstraction, which greatly improves the technical level of drug discovery. It was demonstrated that deep neural nets (DNNs) can be used as a practical quantitative structure–activity relationship (QSAR) method.The Random Forest model is widely applied in the field of bioinformatics and provides compelling results. The AdaBoost algorithm has yielded good prediction performance in breast cancer analysis. The Gradient Boosting algorithm can aggregate its tree models to form a stronger predictor.

Four target proteins were docked, screening in the TCM database. The top 50 TCM candidates for different proteins were integrated in a network, and the intersection was focused on especially.

Targeting to a single protein has achieved little effective treatment for many diseases. Hence, the network pharmacology-based multiple targets were screened from the TCM database and applied in curing cancer. Deep Learning and other algorithms were used to find the potential drugs. The compounds in this study revealed poor performance during MD simulation;therefore, they focused on peptides for drug design. The peptides that they designed were stable in binding to the receptor through MD simulation. There was reason to believe that these peptides could affect the conformation of receptors. Cancer targeting peptides can significantly improve the selectivity and efficacy of existing chemotherapy drugs. The peptide has great prospects in the market.

Interaction modes of different complex candidates in 2D and 3D horizons: (A) nazlinin–MMP13 (a); (B) subaphylline–TP53 (d); (C) adrenaline–CDKN2A (i); (D) E41 (control)–MMP13.

To ensure several related targets of MMP13, a relationship was constructed from the Stitch database; the first and second shells were set as no more than 20 interactions. Pathways for cancer were specially highlighted by red points. Several known ligands were displayed with a rounded rectangle. Another three targets were sought out based on the combined score assessed by several pieces of evidence. The relationships between these three targets and cancer are available in the literature. The Kyoto Encyclopedia of Genes and Genomes (KEGG) database could provide protein-protein interaction in a pathway. The multitargets for treating tumor can be found by this method.

Related proteins. Three proteins (CDKN2A, TP53, BCL2) were selected in the stitch interaction database. Sphere points replaced several proteins, and the rounded rectangle displayed the compounds related to MMP13. The false discovery rate of the pathway in cancer was 1.73 × 10–9. The first and the second shells were both set as no more than 20. It was funny that MMP13 resulted in cancer through another related protein, like TP53.

The structures of the four target proteins were obtained from the Protein Data Bank (PDB). The crystal structure of MMP13 was complex with inhibitor (IC50 = 3 nM) with great resolution (1.3 Å). The complete structure of CDKN2A was acquired; none of the constraints showed violation bigger than 0.5 Å nor dihedral angle violation bigger than 5°. For one of the P53 variants (5 site mutation) gained from PDB (501H), the mutation sites were in the hypervariable region when a tumor occurred. A more rational way was screening after sequencing for different patients. Here, a multisite mutation model could provide more mutation information even if different mutation models could be very different in fact. The origin P53 protein (1TSR) could serve as an antitumor so that its conformation could be set as a control. The spatial structure was collected from PDB (2XA0).

Disorder analysis for four proteins. A disorder value lower than 0.5 could be a stable structure. The amino acids around the binding areas are displayed with a cyan color. Protein: (A) MMP13, (B) CDKN2A, (C) TP53, (D) BCL2.

All the proteins docked were screened by the TCM database using the Docking (ligandfit) protocol.The structure of docking proteins displayed the conformation rational from disorder validation, with the blue area meaning the key residues.

Residual plot. Different prediction models predicted compound activities based on known MMP13 inhibitors. (a) AdaBoost Regressor, (b) Gradient Boosting Regressor, (c) Random Forest.

The 41 MMP13 inhibitors were collected in known literature to create machine learning models and the Deep Learning model. To verify the reasonableness of QSAR models, 20% of the data was set to a test set for external validation. The predicted activities were provided as an assessment. The “Remove cell” module of Accelrys Discovery Studio (DS 2.5) was employed to process these crystal structures before the screening process, which included the following steps: Remove all top-level cells while leaving their constituent contents intact; Split structures into separate molecules.

Scatter plots to present the results of 350 experiments in the Deep Learning model.

Hydrophobic effect of the MMP13 binding site with different ligands displayed with 2D and 3D vision.

Cluster of MD results during 290–300 ns. Clustering results corresponding to different times and the ratio of different groups (pie graph) are provided. Unfortunately, all of the candidates flew away at the end of the simulation.

Complex of BCL2–S2 and MMP13–KKKK. (a) Double-site binding of S2 peptide (O and T) in BCL2; the peptide was designed to target the BH1 domain based on an inhibitor and acquired a high affinity for the BH3 domain; (b) RMSD and MSD during 300 ns MD simulation. Both complexes were bound stably in the MD period. T changed the conformation in the very beginning (nearly 25 ns) and led to the change of the complex RMSD. O altered the conformation at 200–250 ns. (c) SASA and gyrate analysis. All of the values were relatively stable. The SASA of O changed the same as its RMSD. (d) Change of N terminal in O. Auxiliary binding when the N terminal matched the conformation of BCL2.

Structure alternation of T. (a) Different types of T. (b) Remarkable amino acid of the “turn” node. In fact, residues 1–15 all had obvious changes. Two of the most significant residues are displayed. (c) Cavity pathway of BCL2. (d) RMSF analysis. The same structure of O and T had nearly different fluctuation. O was bound with the cycle “O”-type region, and residues 1–15 were flexible. T was just the opposite.

MD analysis of P53 and CDK6 proteins. (a) Free energy landscape (FEL). The Gibbs free energy was estimated based on the distribution of conformation. The structure with low Gibbs free energy would be set as a reference. (b) Vital hydrogen bonds in P53–S3 during the MD period. Binding modes changed a lot compared with the origin structure. The final binding type was similar to the low Gibbs energy structure. (c) RMSD and MSD of different complexes. The change of protein and ligand is provided. (d) Gyrate and SASA analysis. They were stable during MD.

Structure of CDK6 binding with S4. (a) Binding sites of P16 and S4. (b) Significant site of CDK6. S4 could influence these region, but Thr177 influenced phosphorylation. It could be an improved scheme. (c) Cavity that the peptide could reach. (d) Residue distance matrix of CDK6 when binding with S4.

MD analysis of MDM2 with a ligand. (a) Flexible region of MDM2 and S6. The most inconsistant amino acids are printed as red. Residues 20–28 of S6 could evolve into potential binding areas. (b) RMSD and MSD values; (c) gyrate and SASA change. S5 and S6 had similar effects on MDM2. (d) The reconstructed flexible area enhanced the binding ability of S5 and S6. The variability of peptide implied another connection potential, although most conditions are regarded as a bad region that cannot be controlled.

Vital hydrogen bonds of candidate complexes. The donor H and hydrogen acceptors were shared.

The “Prepare Protein” module was used to clean protein molecules, dewater, and hydrogenate the four proteins. The “Calculate Molecular Properties” module was employed to get 204 properties of these inhibitors. Using the Pearson correlation coefficient matrix to judge the correlation and orthogonality of the features, principal component analysis (PCA) and Lasso feature selection were applied for data preprocessing. The residual plot shows the difference between the dependent variables on the vertical axis and the horizontal axis. In these models, the residual is the difference between the observed value of the target variable (y) and the predicted value ()1.

  1. Artificial Intelligence Approach to Find Lead Compounds for Treating Tumors Jian-Qiang Chen, Hsin-Yi Chen, Wen-jie Dai, Qiu-Jie Lv, and Calvin Yu-Chian Chen The Journal of Physical Chemistry Letters 2019 10 (15), 4382-4400 DOI: 10.1021/acs.jpclett.9b01426