top of page

Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction

Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14

The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). To further leverage the improved inter-residue distance predictions to enhance EMA, during the 2020 CASP14 experiment, they integrated several new inter-residue distance features with the existing model quality assessment features in several deep learning methods to predict the quality of protein structural models. According to the evaluation of performance in selecting the best model from the models of CASP14 targets, our three multi-model predictors of estimating model accuracy (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) achieve the averaged loss of 0.073, 0.079, and 0.081, respectively, in terms of the global distance test score (GDT-TS). The three methods are ranked first, second, and third out of all 68 CASP14 predictors. MULTICOM-DEEP, the single-model predictor of estimating model accuracy (EMA), is ranked within top 10 among all the single-model EMA methods according to GDT-TS score loss. The results demonstrate that inter-residue distance features are valuable inputs for deep learning to predict the quality of protein structural models. However, larger training datasets and better ways of leveraging inter-residue distance information are needed to fully explore its potentials.

The pipeline of MULTICOM EMA predictors. Multi-model methods (MULTICOM-CLUSTER/CONSTRUCT/AI/HYBRID) uses both single-model quality assessment features and multi-model quality assessment features, while single-model methods (MULTICOM-DEEP/DIST) only uses the single-model quality features.

In a protein structure prediction process, the estimation of model accuracy (EMA) or model quality assessment (QA) without the knowledge of native/true structures is important for selecting good tertiary structure models from many predicted models. EMA also provides valuable information for researchers to apply protein structural models in biomedical research. The previous studies have shown that the accurate estimation of the quality of a pool of predicted protein models is challenging. The performance of EMA methods largely depends on two major factors: the quality of predicted structures in a model pool and the precision of the methods for model ranking. The EMA methods had demonstrated the effectiveness in picking the high-quality models when the predicted models are more accurate. EMA methods can more readily distinguish the good-quality models from incorrectly folded structures using various existing model quality features identified from the models, including stereo-chemical correctness, the atomic statistical potential at the main chain and side chain, atomic solvent accessibility, secondary structure agreement, and residue-residue contacts. Conversely, these structural features become more conflicting on those poorly predicted models, which are commonly observed in a model pool consisting of predominantly low-quality models. Combining multiple individual model quality features has been demonstrated as an effective technique to provide a more robust and accurate estimation of model quality.

A bar plot of the average SHAP values of the features.

In recent years, the noticeable improvement has been achieved due to the feature integration by deep learning and the advent of the accurate prediction of inter-residue geometry constraints.

In the 13th Critical Assessment of Protein Structure Prediction (CASP13), the inter-residue contact information and deep learning were the key for DeepRank to achieve the best performance in ranking protein structural models with the minimum loss of GDT-TS score. Recently, inter-residue distance predictions have been used with more deep learning methods for the estimation of model accuracy. For instance, ResNetQA applied the combination of 2D and 1D deep residual networks to predict the local and global protein quality score simultaneously. It was trained on the data from three sources: CASP, CAMEO, and CATH. In the CASP14 experiment, DeepAccNet, a deep residual network to predict the local quality score, achieved the best performance in terms of Local Distance Difference Test (LDDT)score loss.

The boxplots of MULTICOM predictors’ performance on CASP14 targets. (A) GDT-TS score loss. (B) Pearson’s correlation score. Different colors/shapes denote different kinds of targets.

To investigate how residue-residue distance/contact features may improve protein model quality assessment with deep learning, they developed several EMA predictors to evaluate different ways of using contact and distance predictions as features in the 2020 CASP14 experiment. Some of these predictors are based on the features used in our CASP13 EMA predictors, while others use the new contact/distance-based features or new image similarity-derived features by treating predicted inter-residue distance maps as images and calculating the similarity between the distance maps predicted from protein sequences and the distance maps directly computed from the 3D coordinates of a model, which have not been used before in the field. All the methods predict a normalized GDT-TS score for a model of a target using deep learning, which estimates the quality of the model in the range from 0 (worst) to 1 (best).

The performance of MULTICOM EMA predictors on four different categories of targets (FM, FM/TBM, TBM-hard, and TBM-easy). (A) GDT-TS ranking loss. (B) Pearson’s correlation.

According to the nomenclature in the field, these CASP14 MULTICOM EMA predictors can be classified into two categories: multi-model methods (MULTICOM-CLUSTER, MULTICOM-CONSTRUCT, MULTICOM-AI, MULTICOM-HYBRID) that use some features based on the comparison between multiple models of the same protein target as input and single-model methods (MULTICOM-DEEP and MULTICOM-DIST) that only use the features derived from a single model without referring to any other model of the target. Multi-model methods had performed better than single-model methods in most cases in the past CASP experiments. However, multi-model methods may perform poorly when there are only a few good models in the model pool of a target, while the prediction of single-model methods for a model is not affected by other models in the pool. Moreover, single-model methods can predict the absolute quality score for a single protein model, while the score predicted by multi-model methods for a model depends on other models in the model pool. In the following sections, they describe the technical details of these two kinds of methods, analyze their performance in the CASP14 experiment, and report our findings.

The GDT-TS score of the best structural model (blue dots) for a target, the top model selected by MULTICOM-CONSTRUCT (green dots), and the top model selected by MULTICOM-DEEP (red dots). Closer to a blue dot, lower the loss of the top model represented by a green/blue dot. If a green/blue red dot overlaps with a blue dot, the GDT-TS loss is 0.

A good EMA example (T1028). (A) true distance map (darker collar means shorter distances). (B) predicted distance map. (C) top model selected by a MULTICOM predictor (MULTICOM-AI) (light blue) versus true structure (light yellow), both protein structures were visualized by Chimera(version 1.15, The GDT-TS ranking loss is 0. Red rectangles in the maps highlight some long-range contacts.

Chen, X., Liu, J., Guo, Z. et al. Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14. Sci Rep11, 10943 (2021).


bottom of page