A substantial challenge for genomic enzymology is the reliable annotation for proteins of unknown function. Described here is an interrogation of uncharacterized enzymes from the amidohydrolase superfamily using a structure-guided approach that integrates bioinformatics, computationalbiology, and molecular enzymology. Previously, Tm0936 from Thermotoga maritima was shown to catalyze the deamination of S-adenosylhomocysteine (SAH) to S-inosylhomocysteine (SIH). Homologues of Tm0936 homologues were identified, and substrate profiles were proposed by docking metabolites to modeled enzyme structures. These enzymes were predicted to deaminate analogues of adenosine including SAH, 5′-methylthioadenosine (MTA), adenosine (Ado), and 5′-deoxyadenosine (5′-dAdo). Fifteen of these proteins were purified to homogeneity, and the three-dimensional structures of three proteins were determined by X-ray diffraction methods. Enzyme assays supported the structure-based predictions and identified subgroups of enzymes with the capacity to deaminate various combinations of the adenosine analogues, including the first enzyme (Dvu1825) capable of deaminating 5′-dAdo. One subgroup of proteins, exemplified by Moth1224 from Moorella thermoacetica, deaminates guanine to xanthine, and another subgroup, exemplified by Avi5431 from Agrobacterium vitis S4, deaminates two oxidatively damaged forms of adenine: 2-oxoadenine and 8-oxoadenine. The sequence and structural basis of the observed substrate specificities were proposed, and the substrate profiles for 834 protein sequences were provisionally annotated. The results highlight the power of a multidisciplinary approach for annotating enzymes of unknown function.
The rate at which new genes are being sequenced greatly exceeds our ability to correctly annotate the functional properties of the corresponding proteins. Annotations based primarily on sequence identity to experimentally characterized proteins are often misleading because closely related sequences can have different functions, while highly divergent sequences can have identical functions.Unfortunately, our understanding of the principles that dictate the catalytic properties of enzymes, based on protein sequence alone, is often insufficient to correctly annotate proteins of unknown function. New methods must therefore be developed to define the sequence boundaries for a given catalytic activity, and new approaches must be formulated to identify those proteins that are functionally distinct from their close sequence homologues. To address these problems, they have developed a comprehensive strategy for the functional annotation of newly sequenced genes using a combination of structural biology, bioinformatics, computational biology, and molecular enzymology.The power of this multidisciplinary approach for discovering new reactions catalyzed by uncharacterized enzymes is being tested using the amidohydrolase superfamily (AHS) as a model system.
Sequence similarity network for proteins related to Tm0936 from Thermotoga maritima. A node (dot) represents an enzyme from a bacterial species, and an edge (a connecting line) indicates that the two proteins are related by a BLAST E-value of 10–100 or better. Proteins sharing sequence similarity with Tm0936 cluster into apparent subgroups, and 12 of these have been arbitrarily numbered and color-coded based on the network diagram. Subgroups are predicted to be functionally similar, and representatives from each subgroup, denoted by the letters a–p, were selected for purification and functional characterization.
The AHS is an ensemble of evolutionarily related enzymes capable of hydrolyzing amide, amine, or ester functional groups at carbon and phosphorus centers.More than 24 000 unique protein sequences have been identified in this superfamily, and they have been segregated into 24 clusters of orthologous groups (COG). One of these clusters, cog0402, catalyzes the deamination of nucleic acid bases.Previously, they successfully predicted that Tm0936, an enzyme from Thermotoga maritima, would catalyze the deamination of S-adenosylhomocysteine (SAH) to S-inosylhomocysteine (SIH).Here they significantly expand the scope of these efforts by addressing the functional and specificity boundaries for more than 1000 proteins homologous to Tm0936, resulting in the prediction and discovery of novel substrate profiles for neighboring enzyme subgroups.
To do so, they have integrated a physical library screen with the computational docking of high-energy reaction intermediates to homology models of fifteen previously uncharacterized proteins. To identify enzymes most closely related to Tm0936, they retrieved all the protein sequences that correlated with a BLAST E-value cutoff better than 10–36. This procedure identified 1358 proteins that were further sorted into smaller subgroups through the construction of a sequence similarity network at a BLAST E-value cutoff of 10–100. The minimal sequence identity between any two proteins in this network is 23%, and 12 representative subgroups (sg-1a through sg-11) were arbitrarily defined, colored-coded, and numbered.
The three-dimensional structure of Tm0936 (from sg-8) was previously determined in the presence of the product SIH. The most salient structural features for substrate recognition include Glu-84, Arg-136, Arg-148, and His-173 . These residues form electrostatic interactions with the 2′- and 3′-hydroxyls from the ribose sugar, the α-carboxylate of the homocysteine moiety, and N3 of the purine ring.The catalytic machinery is composed of a zinc ion that is coordinated by three histidine residues (His-55, His-57, His-173) and an aspartate (Asp-279), while proton-transfer reactions are facilitated by Glu-203 and His-228. The six residues required for metal binding and proton transfers are fully conserved in all 1358 proteins.. However, only those proteins within sg-8 fully conserve the four residues that are utilized in the recognition of SIH; sg-1 through sg-7 lack one or both of the carboxylate-binding arginine residues, while sg-9, sg-10, and sg-11 lack the two adenine/ribose recognition residues, histidine, and glutamate. All of these proteins are therefore anticipated to contain unique substrate profiles and to catalyze the deamination of unanticipated substrates.
Active site of Tm0936. The crystal structure of Tm0936 in the presence of SIH (green) highlights the four residues (dark gray) that are important for substrate binding. Arg-148 and Arg-136 bind to the carboxylate moiety of SAH; His-173 and Glu-84 interact with N3 of the purine ring and the 2′, 3′ hydroxyls of the ribose moiety, respectively. Faded residues denote the six residues that bind the zinc or facilitate proton-transfer reactions during the catalytic transformations.
Comparison of the crystallographic and modeled structures of Cv1032. (A) The crystal structure of Cv1032 in a catalytically productive state (white ribbon), the crystal structure of Cv1032 in a catalytically unproductive state (cyan ribbon), and the homology model of Cv1032 based on the X-ray structure of Tm0936 (yellow ribbon). The inosine bound in the active site of Cv1032 is highlighted (white stick). (B) The crystal structure of inosine (white stick) in the active site of Cv1032 (transparent white stick) and the docking pose of adenosine (yellow stick) in the modeled active site (transparent green stick) composed of the same set of residues as the active site.
Binding pose of guanine in the homology model of Moth1224. The docking pose of guanine in a high-energy intermediate state (yellow stick) in the modeled active site of Moth1224 (transparent white stick) is presented.
Structure-Guided Discovery of New Deaminase Enzymes Daniel S. Hitchcock, Hao Fan, Jungwook Kim, Matthew Vetting, Brandan Hillerich, Ronald D. Seidel, Steven C. Almo, Brian K. Shoichet, Andrej Sali, and Frank M. Raushel Journal of the American Chemical Society 2013 135 (37), 13927-13933 DOI: 10.1021/ja4066078