dockECR: Open consensus docking and ranking protocol for virtual screening of small molecules

The development of open computational pipelines to accelerate the discovery of treatments for emerging diseases allows finding novel solutions in shorter periods of time. Consensus molecular docking is one of these approaches, and its main purpose is to increase the detection of real actives within virtual screening campaigns. Here they present dockECR, an open consensus docking and ranking protocol that implements the exponential consensus ranking method to prioritize molecular candidates. The protocol uses four open source molecular docking programs: AutoDock Vina, Smina, LeDock and rDock, to rank the molecules. In addition, they introduce a scoring strategy based on the average RMSD obtained from comparing the best poses from each single program to complement the consensus ranking with information about the predicted poses. The protocol was benchmarked using 15 relevant protein targets with known actives and decoys, and applied using the main protease of the SARS-CoV-2 virus. For the application, different crystal structures of the protease, and frames obtained from molecular dynamics simulations were used to dock a library of 79 molecules derived from previously co-crystallized fragments. The ranking obtained with dockECR was used to prioritize eight candidates, which were evaluated in terms of the interactions generated with key residues from the protease.

Open initiatives for drug discovery purposes have become a priority to tackle neglected and emerging diseases affecting vulnerable populations. From a computational perspective, various initiatives are available to analyze public information and predict outcomes useful from a biological and chemical viewpoint. Fields such as cheminformatics and chemogenomics, allow the assessment of molecular candidates based on their physico-chemical properties and potential mechanism of action towards a target of interest. Many of these methods rely on curated data and open source software to plan, perform and share the results with the community. In critical situations, the massive sharing of scientific findings around novel treatments, or repositioning of known alternatives is crucial to advance in the fight against the causative agents.

In this scenario, alternatives like molecular docking are useful to screen and rank chemical libraries in a fast and massive way. With molecular docking it is possible to find the most favorable position, orientation and conformation (pose) for the binding of a molecule to, for example, a protein target, assigning a score that is the estimate of the likelihood of binding of each molecule and pose. However, the ability of docking software to accurately predict the docking pose can be affected by system-bias effects provided by parameter training or over-fitting. To overcome this limitation, the exponential consensus ranking (ECR) methodology was proposed, which can also include the flexibility of the biological target to increase the success rate of virtual screening in systems where little information is known.

Other protocols for consensus docking and scoring have been reported in the literature. One example is the DockBox, a package that facilitates the implementation of multiple docking programs and scoring functions for virtual screening purposes. The protocol proposes the score-based consensus docking as an alternative to classic consensus docking, with reported higher success rates on predicting poses based on enrichment factors of known active and decoy molecules. Similarly, other methodologies have implemented multiple docking approaches to filter a major range of false positives during virtual screening campaigns, as well as combining multiple scoring functions with trajectories obtained from molecular dynamics (MD) simulations for similar purposes. However, the ranking methodologies can discard molecular actives that are not necessarily detected by all the programs included in the consensus. Additionally, when ranking the molecules using traditional scoring functions, only the score and not the pose predicted by the docking program is taken into account.

General methodology of the protocol. (A) Analysis of single fragments and proposal of new molecular entities based on the combination of some fragments. (B) Selection of Mpro crystal structures and frames from MD simulations of the apo form. (C) dockECR approach using four open source docking programs, with a subsequent ECR ranking of the ligand library.

Therefore, they are implementing the already validated ECR method to provide a different metric to combine the results of widely used docking programs, and a metric based on the RMSD of the best ranked poses in a protocol publicly available for the community. Here they present dockECR, an open source consensus docking and ranking protocol for virtual screening campaigns. The code allows the parallelisation of the docking runs for multiple ligands, and applies the ECR method to find the most promising candidates. A set of active/decoys benchmarks of the protocol are included using 15 protein targets from the DUD-E dataset. As an application, they implemented the protocol with the main protease from SARS-CoV-2. A total of eight molecules were prioritized as an effort to share the computational findings with other researchers working in the field.

Fragments 161, 426 and 434 within the active site of Mpro. Interactions with the residues inside the cavity are shown as 2D plots. The colored areas correspond to the zone-classification made for the fragments, according to their position within the active site: group 1 (blue area), group 2 (red area) and group 3 (yellow area).

dockECR: Open consensus docking and ranking protocol for virtual screening of small molecules