A computational approach to structural properties of glycoside hydrolase family 4 from bacteria

Structural bioinformatics approaches applied to the alphaand beta-glycosidases from the GH4 enzyme family reveal that, despite low sequence identity, these enzymes possess quite similar global structural characteristics reflecting a common reaction mechanism. Locally, there are a few distinctive structural characteristics of GH4 alphaand beta-glycosidases, namely, surface cavities with different geometric characteristics and two regions with highly dissimilar structural organizations and distinct physicochemical properties in the alphaand beta-glucosidases from Thermotoga maritima. We suggest that these structurally dissimilar regions may be involved in specific protein-protein interactions and this hypothesis is sustained by the predicted distinct functional partners of the investigated proteins. Also, we predict that alphaand beta-glycosidases from the GH4 enzyme family interact with difenoconazole, a fungicide, but there are different features of these interactions especially concerning the identified structurally distinct regions of the investigated proteins.


INTRODUCTION
Glycoside hydrolases (GH), also called glycosidases, are enzymes that catalyze the hydrolysis of the glycosidic bonds releasing smaller sugars.They play important roles both in nature and in industrial processes that involve biological conversion of the biomass to fuels, allowing fuel production with reduced costs (Yang et al., 2011).There are two types of glycosidases: alpha-glycosidase that acts on 1-4 linked alpha-glucose residues and beta-glycosidase acting upon beta 1-4 bonds linking two glucose or glucose-substituted molecules (McCarter and Withers, 1994).The CAZy (Carbohydrate-Active enZYmes) database (Cantarel et al., 2009) contains a sequence-based classification for at least 130 families of glycoside hydrolases.Glycoside hydrolase family 4 (GH4) enzymes are called glucosidases.They represent a special group of glycosidases, which includes both alpha-and betaglucosidases, and displays a reaction mechanism involving NAD + and divalent metal ion cofactors (Lodge et al., 2003).The group comprises alpha-glucosidases, alpha-galactosidases, alpha-glucuronidases, 6-phospho-alpha-glucosidases, and 6-phospho-beta-glucosidases.The majority of GH4 enzymes are of bacterial origin, the 6-phospho-beta-glucosidase from Thermotoga maritima (Yip & Withers, 2006) and 6-phospho-alpha-glucosidase from Bacillus subtilis (Yip et al., 2007) being the most studied enzymes of this family.
The genome of thermophilic bacteria, such as Thermotoga maritima and Geobacillus stearothermophilus, encodes a number of glycosidic enzymes that are involved in sugar and polymer catabolism by anaerobic fermentation.The by-products are carbon dioxide and hydrogen gas, the latter being used as a fuel (Chhabra et al., 2002, Ugwuanyi, 2008).These bacteria grow in hot waters of 35-90°C and their carbohydrate active enzymes are thermostable, which explains their numerous biotechnological applications (Conners et al., 2006).It also explains extensive studies concerning biochemical and structural features of thermophilic bacteria carbohydrate active enzymes.
Recent studies have proven that fungal and bacterial beta-glucosidases show favorable properties to be used in biotechnological applications (Del Pozo et al., 2012;Pei et al., 2012).
Crystallographic data are available for some of bacteria GH4 enzymes.The Protein Data Bank (Berman et al., 2000) contains 9 entries for structures of glycosidic enzymes and their complexes from: Thermotoga maritima (5, but 3 of them refer to the same protein in different complexes), Thermotoga neapolitana (1), Bacillus subtillis (2) and Geobacillus stearopthermophilus (1).Accordingly, we consider only 7 structural files in our study, as explained in the "Material and methods" section.
It is well known that enzymatic activity is influenced by structural features of both the enzyme and the substrate.This is also true for the enzymes hydrolyzing the glycosidic bonds, but due to the complexity of these enzymes and carbohydrate polymers, the mechanisms involved in glycosidic bond hydrolysis are still not fully understood.For example, the comparison of 6-phosphobeta-glucosidase of Thermotoga maritima with 6-phosphoalpha-glucosidase from Bacillus subtilis reveals a high degree of structural similarity between the two enzymes reflecting a possible common reaction mechanism for the two glucosidases with the specificity assured by small structural differences (Varrot et al., 2005).
) is a broad-spectrum fungicide used on a variety of fruit and vegetables crops (Thom et al., 1997).When it is used, it remains in soil for a considerable period of time.The European Food Safety Authority (EFSA) assessed the toxicity of this substance as low on soil macro-organisms but no long-term data are available, and no studies regarding the toxicity on soil microbiota were made to evaluate the potential effect on soil microbial communities (EFSA report, 2011).Difenoconazole degradation by soil microbial community is important in pollution prevention because the fungicide may be washed from soil, reach the groundwater and pollute the aquatic environment.It is known that DFC is very toxic to aquatic organisms and may cause long term adverse effects in the aquatic environments (EFSA report, 2011).For these reasons it is important to obtain new and valuable information about the interactions of soil microbial communities with DFC in order to prevent the pollution of the aquatic environment and to avoid soil degradation.
The goal of this study is a comparative analysis of the structural and molecular properties of the bacterial alpha-and beta-glucosidases belonging to the GH4 family to obtain a more detailed knowledge and to improve our understanding of their specific interactions.Also, the possible interactions of the GH4 family of alpha-and beta-glucosidases with difenoconazole are investigated.

MATERIAL AND METHODS
Within this paper several bioinformatics tools are used in order to reveal sequence and structure similarities or dissimilarities of bacterial alpha-and beta-glucosidases belonging to the GH4 family.Our analysis is based on the sequences and three dimensional structures avail-able for the considered proteins.The Protein Data Bank (PDB) entry codes of the structural files for the GH4 enzyme family from bacteria as well as the entry codes for sequences of these enzymes in the UniProt database (Leinonen et al., 2006) and their Enzyme Commission numbers (EC number) are presented in Table 1 .
Sequence similarity between the GH4 enzymes is analyzed by multiple sequence alignment using the ClustalW software (Larkin et al., 2007); global and local physicochemical properties of the protein chains are retrieved using the PotParam tool (Gaisteiger et al., 2005).We consider in our calculation the following properties computed using ProtParam: theoretical isoelectric point (pI), net charge, the aliphatic index and the grand average of hydropathicity (GRAVY).
The degree of dissimilarity of two three-dimensional protein structures is measured using the root-meansquare distance (RMSD) between equivalent atom pairs (Carugo & Eisenhaber, 1997).A zero value for the RMSD means identical structures and it increases for dissimilar structures.Structural similarity of the considered enzymes is compared using the structure matching tool in the Chimera software (Pettersen et al., 2004) and the surface and volume of each protein are also computed.
In the case of 6-phospho-beta-glucosidase from Thermotoga maritime (BglTm), there are three structural files: 1UP4 (Varrot et al., 2005) for the protein octamer in the monoclinic form, 1UP6 (Yip et al., 2004) for the protein octamer in the tetragonal form in complex with manganese, NAD and glucose-6-phosphate and 1UP7 (Varrot et al., 2005) for the protein octamer in the tetragonal form in complex with NAD and glucose-6-phosphate.The superimposition of the three determined crystallographic structures of BglTm shows RMSD values of 0.246 Å for 1UP6 compared to 1UP4, 0.252 Å for 1UP6 compared to 1UP7 and 0.275 Å for 1UP7 compared to 1UP4.As all the RMSD values are small, we consider that these three structures are highly similar and we take into account in our further analysis the 1UP6 structural file of chain A because it corresponds to the complex of the enzyme with both the cofactors (NAD and manganese ions) and the product (glucose-6-phosphate).
As the crystallographic structure of alpha-glucosidase from Thermotoga maritima (AglTm) represents a dimer and the overall root mean square deviation between the two monomers is 2.18 Å (Lodge et al., 2003), we have considered both monomers in our analysis.The biggest difference for the structures of monomers A and B is observed for region 316-355 that includes helix L (333-346) that is rotated in monomer B by 51.70° in comparison to the same helix in monomer A (Lodge et al., 2003).
The structural file of the putative alpha-galactosidase from Bacillus subtilis (pAglBs), PDB entry code 3FEF, represents a homotetramer.Structure superimposition for the monomers shows RMSDs for 434 aligned Cα atom pairs between 0.197 Å and 0.312 Å.As these values are small, we consider only chain A in our further analysis.Similarly, the structural file of the putative alpha-galactosidase from Thermotoga neapolitana (pAglTn) represents a homohexamer (3U95, Leisch et al., 2012) but the RSMD values for 462 aligned Cα atom pairs are small (between 0.191 Å and 0.274 Å) and we have considered only chain A in our studies.
Analysis and comparison of protein surface shapes and physicochemical properties, especially electrostatics and hydrophobicity, have provided a valuable contribution to the elucidation of protein function and molecular interactions.Within this study the surface properties of the considered enzymes are expressed in terms of: surface cavities and the global surface roughness which is quantitatively characterized by global surface fractal dimension.This quantity is defined using the fractal geometry concepts and its calculation is based on the method proposed by Lewis and Rees (1985) that considers the scaling law between the surface area (SA) and the radius of a rolling probe molecule (R) on the surface.The surface fractal dimension is determined from the slope of the double logarithmic plot of SA versus R. The surface area of the protein is computed using on-line free software GETAREA (Franczkiewicz & Braun, 1998; http://curie.utmb.edu/getarea.html)and probe radii of 1, 1.2, 1.4, 1.6, 1.8 and 2 Å.Also, surface properties of the investigated proteins are analyzed using the CASTp (Dundas et al, 2006) and 3Dsurfer (Li et al., 2008) online tools.These tools allow detection, visualization and characterization of cavities and/or protrusions present at the protein surface and thus characterization of its local geometric properties.Electrostatic properties are investigated using the PyMol software (DeLano, 2002).
Predicted functional partners for proteins may be obtained using Search Tool for the Retrieval of Interacting Genes/Proteins (STRING), an on-line free available tool (Snel et al., 2011).We only considered those interacting partners that had high confidence interaction scores, i.e. higher than 0.700.
The possible interactions of the GH4 enzymes with difenoconazole were analyzed using molecular docking performed using the SwissDock server with default parameters and accurate docking option (Grosdidier et al., 2011).The targets were prepared uniformly as an input for docking experiments by eliminating the ligand from the structural file (except the cations where it was the case) and using the Dock Prep tool of the CHIMERA software (Pettersen et al., 2004).The three-dimensional structure of difenoconazole was generated using FROG -Free Online druG conformation generation software (Leite et al., 2007) starting from its chemical formula in SMILES (Simplified Molecular-Input Line-Entry System) format (Weininger, 1988).

RESULTS AND DISCUSSION
Sequence alignment, presented in Fig. 1, reveals similarity scores between 10 and 50 for the sequences of the considered glucosidases, as presented in Table 2. Figure 1 also presents the elements of secondary structure corresponding to AglTm, 1OBB chain A (Lodge et al., 2003).
The similarity score, calculated for every pair of sequences that are aligned is the number of identities between the two sequences divided by the length of the alignment and represented as a percentage (Larkin et al., 2007).For the alpha-glucosidases the similarity scores vary between 10.14 and 50.53 and for the beta-glucosidases the similarity score is 36.39.
The low degree of sequence identity is not reflected in the GH4 enzymes structural properties.Structure matching using the CHIMERA software reveals a high degree of similarity between the analyzed structures that is expressed in terms of RMSD values for pairs of structures and presented in Table 2.
Based on the structural and sequence similarities presented in Table 2, two groups can be distinguished: enzymes 1OBB, 1VJT and 3U95 form the first group and 1U8X, 3FEF, 1UP6 and 1S6Y form the second group.The First group comprises alpha-glucosidases from Thermotoga maritima and Thermotoga neapolitana and is characterized by high sequence identity and structure similarity.Within this group, the superposition of chain A of AglTm with the other enzymes usually reveals smaller RMSD values than those obtained for chain B, so in our further analysis we consider only the A chain.
Alpha-glucosidases from Bacillus subtilis (AglBs) show small sequence and structure identity to those found in Thermotoga maritima and Thermotoga neapolitana; they bear more resemblance to the investigated beta-glucosidases.
The global properties of the investigated proteins are quite similar, as presented in Table 3.
The analyzed proteins have high values of aliphatic indexes revealing high relative volume occupied by the amino acids with aliphatic side chains.This is in good agreement with their known increased thermostability (Conners et al., 2006).Also, it is known that proteins found in thermophilic bacteria are characterized by high values of the aliphatic index (Ikai, 1980).The grand average of hydropaticity (GRAVY) indices have negative values illustrating the hydrophilic character of the investigated proteins.It is in agreement with their net charges and low theoretical isoelectric points indicating their acidic character.
The computed surface fractal dimensions are comparable and no significant differences have been observed between alpha-and beta-glucosidases.Varrot and coworkers (2005) performed a structural comparison between the A chains of four enzymes belonging to the GH4 familly: AglTm (PDB code 1OBB), AgrTm (PDB code 1VJT), AglBs (PDB code 1U8X) and BglTm (PDB code 1UP6).They found that these structures were similar, with some structural differences situated in the central region, comprising residues 220-310 of BglTm (PDB code 1UP6).
We extend the structural comparison for the A chains belonging to the other 3 enzymes considered in this study: pAglTn (PDB code 3U95), pAglBs (PDB code Regions with the highest structural dissimilarity are highlighted in grey (see Fig. 2-5).The elements of secondary structure corresponding to AglTm (O33830/1OBB chain A) are also presented with notations h for helix and e for sheet (Lodge et al., 2003).
3FEF) and BglGs (PDB code 1S6Y).Comparative structural analysis has been performed using the PyMol software (DeLano, 2002) and reveals the following: There are no regions with distinct structural organization between alpha-glucosidase (1OBB_A) and alpha-glucoronidase (1VJT) from Thermotoga maritima.
The LYS315-LYS351 region of AglTm (1OBB_A) has no correspondence in the structure of BglGs (1UP6_A) where 25 residues are not located in the structure the sequence is 30 residues shorter, for this reason we cannot analyze their structural similarity.
The LYS327-LYS338 region of pAglTn monomer A contains a short helix LYS327-HIS332 and the rest is unstructured.
All presented results reveal that at least one of the regions, PRO257-SER289 and LYS315-LYS351 of AglTm, is structurally distinct from the other investigated glucosidases and we focus our attention on these regions and compare their properties with those of the corresponding regions in the other investigated proteins.We must also mention that the structural differences in the two regions of AglTm are not unexpected since these regions correspond to the insertion/deletion part of the glucosidase sequences.Moreover, these structurally distinct regions could be associated with substrate specificity.
The global physicochemical properties of these regions are presented in Table 4.
Different physicochemical properties of the regions with distinct structural organization in the alpha-glucosidases and beta-glucosidases from correlate well with the sequence alignment reflecting sequence dissimilarity.Regions 1 of AglTm and AglBs are more hydrophilic and unstable than the corresponding regions of BglTm.Also, region 1 of AglTm has a basic character such as region 1 of BglTm, whereas region 1 of AglBs has an acidic character.This observation is also sustained by the higher electrostatic potential of region 1 of BglTm.Within region 1, residues ASP260 and ARG263 of AglTm are implicated in interactions with the substrate, i.e. maltose (Lodge et al., 2003), residue TYR265 of AglBs, is involved in the interaction with alpha-D-glucose-6-phosphate (Rajan et al., 2004) and the corresponding residues of BglTm are   not involved in interactions with either substrate or cofactors (Yip et al, 2004).
The identified structurally distinct regions 2 also differ in terms of thermostability, hydrophilicity and electrostatic properties.Within region 2 of 1OBB, the 313-334 fragment shows notable disorder reflected by high values for the temperature factors (Lodge et al., 2003).Furthermore, the 316-355 region contains the 332-347 helix which has a distinct orientation in the A and B monomers in the dimer.This region not only has differ-ent structural properties in the two monomers of 1OBB, it is also distinct from the corresponding regions of the other investigated glucosidases.It seems that this region is implicated in the dimerization process (Lodge et al., 2003).Except for the GLY290 residue of BglTm which is involved in the interaction with alpha-d-glucose-6-phosphate (Yip et al., 2004), the other residues belonging to regions 2 of the investigated proteins are not involved in interactions with substrates or cofactors (Lodge et al., 2003;Leisch et al., 2012;Rajan et al., 2004;Yip et al., 2004).
Comparison of the surfaces of the investigated proteins reveals distinctive local surface features suggesting potential distinct interacting partners.The number of surface cavities differs for the investigated proteins and also the geometric properties of the first three cavities (considered the biggest) are distinct, as presented in Table 5.
STRING results for the predicted functional partners of the investigated proteins are given in Table 6.STRING predicts the functional association between proteins based on the genomic association of their genes.For the query gene, the program retrieves all the genes that occur in proximity encode functionally interacting proteins that are part of the same protein complex or are members of the same metabolic pathway (Snel et al., 2000).
For AglTm, AgrTm and BglTm there is only one predicted common interaction partner, beta-glucosidase.Unfortunately, there are no experimental data to confirm these predicted interactions.
Using molecular docking on the SwissDock server (Grosdidier et al., 2011) we have tested the possible interaction between the investigated enzymes and difenoconazole, an fungicide.All considered enzymes are predicted to interact with difenoconazole, but there are some differences especially concerning the identified structurally distinct regions of proteins.The energies of the best scored pose for every enzyme-difenoconazole interaction are presented in Table 7.
The interacting energies for the most favorable interactions with the DFC molecule are comparable for all investigated proteins.In the case of AglTm, AgrTm and BglTm, those structural files also contain ligands (others  than cations).We can notice that DFC is able to bind to the protein in the same binding region as the ligands.It means that DFC can modulate the catalytic act of these enzymes, although, based only on the information we have collected, we cannot predict if this modulation will be an inhibition or an activation of the catalytic reaction with the natural substrate.We do not exclude the possibility that during the interaction of the enzyme with DFC a catalytic reaction is initiated.Figure 6 illustrates the identified poses for the predicted interactions of DFC with AglTm (A) and BglTm (B) and Fig. 7 illustrates the poses for the predicted interactions of DFC with the PRO257-SER289 and LYS315-LYS351 regions of AglTm.
For region 1 (PRO257-TRP295) of AglTm, there are three identified positions for DFC binding to the protein and the corresponding interaction energies are: -2804.92kcal/mol, -2800.82kcal/mol and -2799.60 kcal/mol, respectively.For region 2 (LYS315-LYS351) of AglTm, there is only one position for DFC binding and the interaction energy is -2802.36kcal/mol.
In the case of BglTm (1UP6), for region 1 (PO240-LEU260) there are seven identified poses for DFC binding, all of them concerning the TYR242 residue and the interaction energies between -2508.49kcal/mol and -2504.06kcal/mol.For region 2 (SER271-HIS300), there are 12 poses for DFC binding with the interaction energy -2511.93kcal/mol and -2496.30kcal/mol.Four residues: GLU271, ARG277, ARG289 and TYR294 are 6-phospho-beta-glucosidase from Geobacillus stearothermophilus (P84135) STRING was unable to find this protein as it has not an assigned gene name.For pAglBs, there are four identified poses for DFC binding to the ARG315-GLU330 region, all of them involving the GLU300 residue, with the interacting energies between -2188.35kcal/mol and -2176.51kcal/mol.
There are no predicted binding sites for DFC in the identified structurally distinct regions of AglBs (1U8X) and pAglTn (3U95_A).Analysis of the predicted poses for DFC interactions with AglBs reveals the involvement of residues ARG160, ARD162, GLU344, VAL343 and GLU370.Also, the DFC interactions with pAglTn usually involve residues ARG12, TYR85, TYR87, GLU304, ARG307 and GLU311.We notice that both charged and hydrophobic residues are important for DFC binding to GH4 enzymes.

CONCLUSIONS
A major challenge in biotechnological applications is to understand the mechanism of action of carbohydrate active enzymes from thermophilic bacteria in terms of their interactions and activity.As far as we know, it is the first study that compares the structural properties of the GH4 enzyme family using structural bio-  informatics approaches.Our results reveal similar global structural characteristics of these enzymes despite low sequence identity, and are in good agreement with other published data (Rajan et al., 2004).This global structural similarity reflects a common reaction mechanism of these involving NAD + and divalent metal ions as cofactors.Our findings also agree with the observation that the difference in substrate specificity between alpha-and beta-glucosidases from the GH4 family is due to simple steric factors and subtle modification of the protein conformation (Varrrot et al., 2005) and not to their distinct structural properties.
Except for AglTm (ASP260 and ARG 263) and AglBs (TYR265), residues belonging to region1 are situated at the interior of the proteins and are not involved in cofactor binding, catalytic activities or specific interactions with the substrates (Lodge et al., 2003;Leisch et al., 2012;Rajan et al, 2004;Varrot et al., 2005;Yip et al., 2004).Also, except for Gly290 of BglTm, residues belonging to region 2 are not involved in catalytic activities, but they are exposed to the solvent (Lodge et al., 2003;Leisch et al., 2012;Rajan et al, 2004;Varrot et al., 2005;Yip et al., 2004).It suggests that these structurally distinct regions may be involved in oligomerization processes or in other specific protein-protein interactions.This hypothesis is sustained by the presence of cavities with different geometric properties at the protein surfaces and by distinct predicted functional partners.Also, molecular docking studies reveal that all the investigated proteins are able to bind the fungicide, difenoconazole, but there are some differences in difenoconazole binding to the structurally distinct regions of the proteins.For the moment, we are not able to conclude if difenoconazole binding to GH4 enzymes increases or decreases the enzyme activity, or modulates it in an allosteric manner and further experimental data concerning GH4 enzyme family structures and interactions are needed to obtain a detailed knowledge on their reaction and interaction mechanisms with direct implications on their biotechnological applications.

Figure 1 .
Figure 1.Alignment of the sequences of bacterial glucosidases.Regions with the highest structural dissimilarity are highlighted in grey (see Fig.2-5).The elements of secondary structure corresponding to AglTm (O33830/1OBB chain A) are also presented with notations h for helix and e for sheet(Lodge et al., 2003).

Figure 6 .
Figure 6.Illustration of the identified poses for the predicted interactions of DFC (grey spheres) with AglTm (A) and BglTm (B) respectively.The proteins are presented as ribbon in dark grey and in black are shown the regions with distinct structural organization: PRO257-TRP295 and LYS315-LYS351 for AglTm (A), PRO240-LEU260 and GLU271-HIS300 for BglTm (B).

Figure 7 .
Figure 7. Illustration of the identified poses of DFC (grey spheres) located in the regions PRO257-TRP295 (A, B and C) and LYS315-LYS351 (D) of AglTm (black spheres).