Large-scale classification of P-glycoproteininhibitors using SMILES-based descriptors

P-glycoprotein (Pgp) inhibition has been considered as an effective strategy towards combating multidrug-resistant cancers. Owing to the substrate promiscuity of Pgp, the classification of its interacting ligands is not an easy task and is an ongoing issue of debate. Chemical structures can be represented by the simplified molecular input line entry system (SMILES) in the form of linear string of symbols. In this study, the SMILES notations of 2254 Pgp inhibitors including 1341 active, and 913 inactive compounds were used for the construction of a SMILE-based classification model using CORrelation And Logic (CORAL) software. The model provided an acceptable predictive performance as observed from statistical parameters consisting of accuracy, sensitivity and specificity that afforded values greater than 70% and MCC value greater than 0.6 for training, calibration and validation sets. In addition, the CORAL method highlighted chemical features that may contribute to increased and decreased Pgp inhibitory activities. This study highlights the potential of CORAL software for rapid screening of prospective compounds from a large chemical space and provides information that could aid in the design and development of potential Pgp inhibitors.

Considerable attention has been given to P-glycoprotein (Pgp) transporter due to its clinical impacts on multidrug resistance and pharmacokinetic profiles of substrate drugs. Human Pgp is a protein belonging to the ATP binding cassette (ABC) family, which is encoded by multidrug resistance genes (i.e. MDR1) . Pgp is a 170 kD membrane-bound polypeptide comprising 1280 amino acids. Pgp functions as an efflux pump to extrude a wide range of structurally diverse hydrophobic substances out of the cell. Due to its expression in many physical barriers and pharmacokinetic-related organs, it plays a role in limiting cellular uptake, distribution, excretion and toxicity of many xenobiotics and toxic substances.

In addition, it influences the pharmacokinetic or ADMET (A = absorption, D = distribution, M = metabolism, E = excretion and T = toxicity) profiles of its substrate drugs. Pgp is considered as a contributing factor of multidrug resistance on account of many anticancer drugs being substrates of Pgp . Furthermore, Pgp overexpression is found in many types and various stages of cancer cells. The overexpression increases efflux activity, thereby impairing the delivery of anticancer agents from reaching their target sites. Although structurally unrelated, the broad specificity of Pgp allows a wide range of anticancer drugs to be recognized and extruded out of cancer cells. This phenomenon leads to a simultaneous resistance to a number of structurally and functionally unrelated anticancer agents called multidrug resistance. Hence, an inhibition of this efflux pump is one of the strategies geared towards improving the pharmacokinetics of drugs as well as combating multidrug resistance.

In this regard, the development of novel Pgp inhibitors for therapeutic applications is an active research area gaining much attention. Currently, many classes of Pgp inhibitors have been developed from diverse types of compounds and natural products . However, many aspects of Pgp and its interacting compounds need to be elucidate for the development of Pgp inhibitors which have a desired treatment outcome. Pgp is one of the most studied transporters, due to its promiscuity, The presence of multiple binding sites together with its broad specific recognition allows non-specific and simultaneous binding of hydrophobic compounds. In addition, many available experimental assays use different criteria to classify Pgp-interacting compounds, which leads to conflicting reports of their endpoints. The classification of Pgp ligands is not straightforward, and the issue is still under debate.

In this regard, many computational methods have been employed in an attempt to understand this transporter, such as quantitative structure-activity relationship (QSAR) , classification structure–property relationship (CSPR) , molecular docking and homology modelling. The molecular structure of a compound can be represented by simplified molecular input line entry system (SMILES) notations. SMILES is a chemical language designed for a human/machine interface . In a computational aspect, SMILES can be interpreted in a fast and compact manner, thereby significantly saving time and space. To date, SMILES is believed to be the best compromise between the human and machine aspect of chemical notation.

In addition, its ability to facilitate information processing beyond the conventional methods has been well documented. The use of SMILES for the development of QSAR/CSPR models can help avoid general problems of using molecular descriptors, such as the selection of an appropriate subset of informative descriptors from a large available set of descriptors and the interpretation of important descriptors obtained from the constructed models. Moreover, the use of the SMILES notation can greatly conserve time, as geometrical optimization is not required for descriptor calculations. CORrelation And Logic (CORAL) software ( is a computational tool for conformation-independent QSAR/CSPR analysis. Herein, the SMILES notations are used instead of molecular descriptors, which need to be calculated from optimized structures. The CORAL software has been successfully employed for predicting diverse types of compounds and biological activities, including anticancer, antiviral, antimalarial and toxicity of compounds . To date, the classification model constructed by CORAL software has not been reported for Pgp inhibitors. In this study, a CORAL software based on Monte Carlo technique was employed to construct a classification model from 2254 compounds (1341 active, 913 inactive).

In addition, the SMILES attributes influencing the Pgp inhibitory effects of the compounds were highlighted to provide a simple and rapid screening of potential Pgp inhibitors.

Prachayasittikul V, Worachartcheewan A, Toropova AP, Toropov AA, Schaduangrat N, Prachayasittikul V, Nantasenamat C. Large-scale classification of P-glycoprotein inhibitors using SMILES-based descriptors. SAR QSAR Environ Res. 2017 Jan;28(1):1-16. doi: 10.1080/1062936X.2016.1264468. Epub 2017 Jan 6. PMID: 28056566.