Bacterial putative metacaspase structure from Geobacter sulfureducens as a template for homology modeling of type II Triticum aestivum metacaspase ( TaeMCAII )

Metacaspases, cysteine proteases belonging to the peptidase C14 family, are suspected of being involved in the programmed cell death of plants, although their sequences and substrate specificity differ from those of animal caspases. At present, the knowledge on the metacaspase reaction mechanism is based only on biochemical data and homology models constructed on caspase templates. Here we propose a novel template for metacaspase modeling and demonstrate important advantages in comparison to the conventionally used caspase templates. We also point out the connection between plant and bacterial metacaspases, underlining the prokaryotic roots of Programmed Cell Death (PCD).


InTroduCTIon
Programmed Cell Death (PCD) is one of numerous eukaryotic innovations, such as cell cycle regulation, the nuclear envelope, cytoskeleton and chromatin remodeling machinery.Apoptosis and autophagy were first observed and described in animals (Liu et al., 2005).Caspases, cysteine-dependent aspartate-directed proteases play an essential role in animal PCD (Earnshaw et al., 1999).
The similarity of animal apoptosis to the plant PCD at the ultrastructural level prompted investigators to search for caspase orthologs in plants (Collazo et al., 2006).However, analyses of Arabidopsis and Oryza genomes demonstrate an absence of ortholog caspase sequences in plants (Bonneau et al., 2008).In contrast, investigations on plant cells undergoing PCD have shown an increase in protease activity cleaving synthetic caspase substrates (Bonneau et al., 2008).The observed proteolytic activities in plant cells on their way to PCD were designated as 'caspase-like' because of the lack of caspase orthologs in the plant kingdom.In silico investigations at the end of the 20th century brought about the discovery of distant caspase relatives named metacaspases (Uren et al., 2000).Metacaspases were identified in iterative BLAST searches on the NCBI database by using caspase sequences and seemed to be the closest homologs of caspases found in the Arabidopsis thaliana genome (Uren et al., 2000).Metacaspases were divided into type I and type II based on their sequence and structure similarity.Type I metacaspases have an N-terminal prodomain similar to that found in the initiator inflammatory caspases.Type II metacaspases lack such a prodomain, but they contain a linker region between the large and small subunits.Moreover, metacaspases, like caspases, have a conserved catalytic His/Cys dyad and harbor two distinguishable domains in their tertiary structure (Uren et al., 2000;Vercammen et al., 2004).Because of their similarity to caspases, it was expected that they would exhibit caspase-like activities.Unexpectedly, studies with recombinant Arabidopsis thaliana and Picea abies metacaspases showed that they were unable to cleave caspase substrates (i.e., peptides containing Asp cleavage sites recognized by classical caspases).In contrast to caspases, their preferred cleavage site was after Arg or Lys in the peptide substrate (Vercammen et al., 2004;Vercammen et al., 2006;Bozhkov et al., 2005).Unfortunately, the lack of data on the plant metacaspase 3D structure makes the profound understanding of their catalytic mechanism and its differences or similarities impossible.Some attempts to model plant metacaspases based on the available animal caspase 7 structure template have been undertaken (Belenghi et al., 2007;Piszczek et al., in press).However, the low overall sequence similarity and distinct difference in substrate specificity undermine the value of these preliminary models.By analyzing the difference in substrate specificity between caspases and metacaspases one may initially suppose that it can be caused by the evolutionary history of the whole protein family.Looking closer at the evolution of PCD or apoptosis one can then conclude that this complicated system appeared for the first time in eukaryotic cells without direct prokaryotic ancestors.However, by deeply analyzing bacterial life cycles, some similarities to programmed cell death in the animal counterparts can be seen, such as spore generation among the bacilli, streptomyces and myxobacteria or bacterial cell death programmed by plasmid-encoded restriction-modification systems called plasmid addiction systems (Yarmolinsky, 1995).Aravind and Koonin (2002) show phyletic patterns of apoptotic machinery components in Eucaryota and propose a simplified way of gradual PCD evolution.The possible scenario based on distribution of caspaselike family members and other proteins involved in apoptosis assumes multiple infusions of bacterial genes.Aravind and Koonin base their theory on two pieces of evidence.First, phylogenetic analysis shows a similarity of eukaryotic metacaspases to its homologs from alfaproteobacteria.Given that the endosymbiont ancestor of the mitochondrion was an alfa-proteobacterium, it can indicate a bacterial origin of metacaspases.Second, bacterial homologs of caspase-related proteases show greater diversity in phyletic distribution, domain architecture and sequence than their eukaryotic counterparts.We suggest that the direction of horizontal gene transfer was from bacteria to eukaryotes.Hence, the question arises whether it would not be better to look for homology modeling templates for metacaspases among their bacterial homologs.In this study, we focused on the search for an appropriate template and on modeling the structure of Triticum aestivum type II metacaspase (TaeMCAII), the first cereal metacaspase sequenced and submitted to the GenBank by our lab (ACCN: GU130248.1 GI:267850616).The knowledge of the structure and function of the cereal metacaspases is of great importance for agriculture, as it can enable influencing the regulation of PCD processes, related to hypersensitivity response and reactions to abiotic stresses.

MAteRiAls AnD MetHoDs
Structure prediction and sequence analysis.Multiple sequence alignment of eight plant metacaspases, sixteen animal caspases and four bacterial sequences possessing the C14 caspase catalytic subunit were constructed using the MUSCLE algorithm and manually adjusted.Structure prediction for plant metacaspases was performed by the Fold Prediction Metaserver (http://genesilico.pl/)which provides access to 13 different fold recognition servers (Kurowski & Bujnicki, 2003).Secondary structure predictions were obtained using a Metaserver consensus, constructed based on 16 secondary structure prediction methods (pssfinder, netsurfp, sspred, sspro4, spine, cdm, psipred, fdm, ssp, jnet, sspal, soprano, sable, prof, nnssp and gor).
Structural models of TaeMCAII.Sequence alignments between TaeMCAII, human caspase 7 (PDB: 1F1J) and Geobacter sulfurreducens unknown protein GSU0718 were produced by the FFAS03 structure prediction method (Rychlewski et al., 2000;Jaroszewski et al., 2005) and manually adjusted to accommodate predicted secondary structures.Three-dimensional structure models were constructed by the program MODELLER (Sali et al., 1993;Eswar et al., 2006) using the combined Modeller9v8 Python scripts (automodel mode), modelmultichain and model-ligand.Out of the models presented by MODELLER, the one with the most favorable molpdf score was selected for further analysis.The MetaMQAP server (Pawlowski et al., 2008) was used to estimate the correctness of 3D models using a number of model quality assessment methods in a meta-analysis.
To compare binding energies for two structures that were assumed to have a caspase-like domain, human caspase 7 and bacterial protein GSU0718, Glide 4.5 (Schrodinger ® ) was used to automate the procedure for ligand-receptor docking in standard and extra-precision mode.Glide (Grid-based Ligand Docking with Energetics) searches were performed for favorable interactions between the receptor (a protein) and the molecules supposed to be ligands (i.e., chosen oligopeptides: original 1F1J ligand DEVD, caspase ligand VEID and two metacaspase ligands rich in acidic residues IRSK and VRPR).After generating a number of possible ligand orientations, the program evaluated the interactions of each of them with the receptor.The best poses entered the final step of the algorithm, which was an energy minimization of the ligand-receptor complex involving the OPLS-AA force field energy grid.Final scoring using the Glide Score function was carried out on the energyminimized poses.The ligand positions were then ranked using the Glide Score values.The receptor structure was held rigid, and ligand structures were fully flexible.For each ligand, about 20 possible conformations were considered.These were generated using the LLMOD mode of the Schrodinger MacroModel package.All poses were scored, and the best scoring ones for every oligopeptide were saved.

Results AnD DisCussion
After running fold predictions for a newly sequenced Triticum aestivum type II metacaspase one observes a very strong and significant hit always situated on the top of the list.This hit is repeated by almost all available fold prediction algorithms, such as COMA, COMPASS, Phyre, FFAS03 and HHsearch (corresponding values are given in Table 1).The hit is named 3BIJ and is the Pro-  PDB database) and only a 17% identity of amino acid sequence.The secondary structure is 32% helical (11 helices), and 20% is composed of beta-strands (12 strands).A preliminary BLAST search followed by hmmpfam and HHsearch analysis reveals the presence of Pfam Peptidase C1 domain PF00656, called the caspase domain.All these results seem to confirm the hypothesis that the unknown protein is a metacaspase, but why does its structure score so well?Is it better than the "classical" caspase template for plant type II metacaspase modeling?Multiple alignments of the TaeMCAII, GSU0716 and csp-7 (Fig. 1b) sequences show some striking features.First, the overall percent identity between the bacterial protein and T. aestivum metacaspase is higher than that of the csp-7/TaeMCAII pair.Second, two highly conserved (in bacterial and plant sequences) regions in the neighborhood of the catalytic dyad His-Cys can be observed.These 5-and 7-amino-acid-long motifs (YSGHG and SDSCHSG) do not have a counterpart in the caspase-7 sequence; however, they seem to be important for plants and other Eukaryota.The most important catalytic residues, His 144, Gly 145, Cys 186 (in the caspase 7 numbering), are conserved in all three sequences, but Gln 184, located two residues before the catalytic cysteine in caspase-7, is substituted in the plant metacaspase (as well as in the bacterial template) by Asp, which suggests that the specificity of the bacterial protein can be more similar to the plant one.Additionally, a comparison of the secondary structure prediction consensuses obtained from 16 different ssp methods with the secondary structure assignments derived from the two considered PDB files, confirms a higher similarity of the secondary structures of TaeMCAII and bacterial protein than TaeMCAII and the human caspase.To compare the conservation of these adjacent motifs among the whole group of plant and bacterial metacaspases, an MSA was constructed for seven plant metacaspases, sixteen animal caspases and four bacterial sequences possessing a C14 caspase catalytic domain (Fig. 1a).Multiple alignments for caspases and metacaspases confirm the division of the protein family into two groups, based on differences in the local environment of the catalytic dyad.In bacterial and plant metacaspases, three residues before HG (YSG) and six residues surrounding the catalytic cysteine (SDSCHSG) are strictly conserved (Fig. 1a).
In animal caspases, the catalytic dyad neighborhood is also conserved, but its sequence is different.Before the catalytic histidine, a [LM]S motif, and in the vicinity of the catalytic cysteine, a [IV]QACR motif can be identified.The caspase Gln residue from the [IV]QACR motif corresponds to the metacaspase Asp from the SD-SCHSG conserved sequence, which presumably influences the ligand specificity difference between caspases and metacaspases (Fig. 1a).Cysteine proteases have a common catalytic mechanism that is based on a nucleophilic cysteine thiol in the catalytic His-Cys dyad.The  first step of the reaction is deprotonation of the thiol group in the enzyme's active site by an adjacent amino acid with a basic side chain, usually a histidine residue.The next step is a nucleophilic attack by the deprotonated cysteine's anionic sulfur on the substrate carbonyl carbon.In this step, a fragment of the substrate is released with an amine terminus, the histidine residue in the protease is restored to its deprotonated form, and a thioester intermediate linking the new carboxy-terminus of the substrate to the cysteine thiol is formed.The thioester bond is subsequently hydrolyzed to generate a carboxylic acid moiety on the remaining substrate fragment, while regenerating the free enzyme (Domsalla & Melzig, 2008).The surroundings of the catalytic His-Cys dyad determine the substrate specificity, i.e., which peptide will have a greater affinity to the active site and which will be bound.Because there is a strict correspondence in this domain between TaeMCAII and GSU0716, one can expect that the latter could be a better template for metacaspase-like active site modeling than the less-similar animal caspases.In the next step, a comparison of caspase-7 (1F1J) and GSU0716 (3BIJ) PDB structures was performed to identify the main structural similarities and differences and to recognize potential binding sites of the unannotated protein from G. sulfurresducens.That comparison of the aforementioned experimentally obtained structures shows striking similarity between secondary structures and the overall folds of these two proteins, although there were also some differences, i.e., GSU0718 has more helical fragments, all located in the outer part of protein globule (Fig. 3).A second important observation based on raw structure data is the presence of conserved positions with respect to the secondary structure elements of caspase-7.The caspase-7 active site residues Cys 186, His 144, Gly 145 and Gln184 have corresponding residues in the GSU0716 sequence, where one can easily identify counterparts of the catalytic cysteine and histidine, constituting the catalytic dyad for all cysteine proteases, as well as an equivalent of the catalytic Gly 145, and Asp 133 corresponding to Gln184 of caspase 7. Looking closer at the molecular surface of the caspase-7 substrate binding site in the structure co-crystallized with oligopeptide ligand DEVD one notices that all four of the aforementioned catalytic residues are exposed and contact the potential substrate, which is not observed in the case of its bacterial structural homologue (Fig. 2).In this homologue, only His 84, the counterpart of His 144 in caspase-7, is partially exposed.The catalytic cleft is blocked by a short oligopeptide IRYRA that ideally covers the surfaces of Cys 135, Gly 85 and Asp 133 residues.After removing this fragment, the GSU0716 binding groove looks very similar to caspase one.The mode at which the IRYRA peptide is situated in the potential binding cleft of GSU0716 protein resembles the substrate pose from the 1F1J complex.Except for the electrostatic potential of surrounding residues, which have positive charge in the case of caspase-7, the 3BIJ structure in the neighborhood of the active site has a slightly negative character, and this could determine the presumably different substrate specificity (Fig. 2).Animal caspases are cysteine endopeptidases cleaving their substrates on the carboxyl side of aspartate residues (Kaufmann & Hengartner, 2001).Studies with recombinant plant metacaspases, including TaeMCAII, showed that metacaspases are unable to cleave caspase substrates and that their preferred cleavage site was after Arg or Lys residues (Vercammen et al., 2004;Bozhkov et al., 2005;Vercammen et al., 2006;Piszczek et al., 2012).Arginine and lysine have positively charged side-chains; thus, negatively charged residues surrounding the active site can facilitate ligand binding.
The dislodged fragment of the 3BIJ structure (IRYRA peptide) contains two arginine residues which could be a ligand for metacaspase-like proteases.To verify the putative affinity of metacaspase-like ligands to the GSU0716 protein, we performed ligand docking procedures for two receptors: the original structures of caspase-7 and the G. sulfurreducens potential template.Four different ligands were chosen for docking: two "caspase" ligands (i.e., Asp-Glu-Val-Asp and Val-Glu-Ile-Asp), and two potential "metacaspase" ligands (Ile-Arg-Ser-Lys and Val-Arg-Pro-Arg).The docking results confirmed the important role of the electrostatic character of the catalytic dyad surroundings (Supplementary Materials).The best scoring ligands for the 1F1J structures were the DEVD and VEID peptides with the GlideScores of -13.84 and -12.32 respectively.The results obtained for typical metacaspase-like substrates were visibly worse (above -9 GlideScore).In the case of the GSU0716 receptor, the results were opposite: the metacaspase-like ligands gave better scores in relation to the caspase peptides (twice as high) (Supplementary Materials).All these observations encouraged us to model the TaeMCAII structure based on the G. sulfurreducens template which has been described in details in another paper (Piszczek et al., 2012, in press).
Recently crystal structure of a Trypanosoma brucei metacaspase has been published (McLuskey et al., 2012).
The overall similarity of the sequence of the newly crystallized eukaryotic metacaspase T. brucei MCA2 to TaeMCAII is a bit higher than the similarity between the latter sequence and the bacterial protein GSU0716 (42% identity versus 30%, respectively), but still FFAS03 score for GSU0716 as a potential modelling template for type II metacaspase from wheat is better (-59.9 versus -56.4).Looking closer at all three sequences, it is noticeable that two motifs surrounding the catalytic dyad are generally conserved in all compared proteins, but in the neighborhood of catalytic cysteine in T. brucei MCA2 one can observe more differences to the corresponding region of the plant TaeMCAII than in the case of bacterial GSU0716.All seven amino acids (SDSCHSG) from T. aestivum MCAII are conserved in GSU0716, but in T. brucei metacaspase only the CHSG residues are present, preceded by a FDC triplet.It is difficult to interpret the meaning of this difference, but the substitution of serine by cysteine hardly ever is insignificant.

ConClusions
Until now, the lack of experimentally resolved tertiary structures for plant metacaspases has seriously limited researchers.The only available homology model of Arabidopsis thaliana metacaspase 9 is only preliminary and is based on the tertiary structure of the humane caspase-7 proenzyme, which is a rather distant homologue of metacaspases.Hopefully, the publication of the first eukaryotic metacaspase structure (MCA2 from Trypanosoma brucei) (McLuskey et al., 2012) should significantly change the situation.
In this work, we have proved that the crystal structure of the unknown bacterial protein GSU0716 can be used as a template for wheat metacaspase TaeMCAII homology modeling.We recommend the use of this unconventional template because of the relatively high sequence conservation in the regions surrounding the putative catalytic residues (Cys 135 and His 84) and high fold prediction scores.Another argument in favor of our choice is the fact that the surrounding of the GSU0716 putative catalytic site is negatively charged, as opposed to the vicinity of the catalytic centre in human caspase-7, which is enriched with positively charged side chains.The character of the residues building a protease binding site undoubtedly influences the substrate specificity of the enzyme.It is now well known that plant metacaspases differ in their substrate specificity from animal caspases.Although the activity profile of the unknown bacterial protein GSU0716 has not been investigated thus far, our in silico studies suggest that it has a metacaspaselike preference for substrates.The obtained results support numerous phylogenetic analyses indicating bacterial origin of metacaspases.

Figure 2 .
Figure 2. electrostatic potential surface map for two potential templates for taeMCAii modelling.(a) human caspase 7. (b) bacterial protein GSU0716.DEVD ligand of caspase-7 and IRYRA peptide of GSU0716 are presented in stick molecular representation in CPK colour scheme.

Figure 3 .
Figure 3.Comparison of structural models (a) Gsu0716 protein (b) human caspase-7.Top: overall structure of the catalytic (asymmetric) domains, Bottom: zoom on active sites with hypothetic catalytic dyad and Gln (Asp) residues differentiating the surface of the binding cleft.DEVD ligand of caspase-7 structure and IRYRA fragment of 3BIJ are represented as white ribbons.Models are depicted in ribbon representations with secondary structure succession coloring.Hypothetical catalytic residues are presented in stick model in CPK colors.

table 1 . Results of Fold Prediction Metaserver run: scores obtained for taeMCAii and Gsu0718 raw FAstA sequences and for chosen plant type i and type ii metacaspases sequences
tein Database record containing the structure of an unknown Geobacter sulfurreducens protein described as GSU0716, the Northeast Structural Genomics target GsR13.There is no information about its putative function or protein family, but one can extract quite interesting clues based on only structural information and database comparisons.GSU0716 finds close homologs with-in the group of bacterial members of peptidase_C14 pfam00656 family and shows homology to plant metacaspases with e-values in the range from 7e-10 to 2e-09.The structural similarity between GSU0716 and caspases is very high; for example, human caspase 7 p-value of comparison is 6.32e-06, with the RMSD of 2.47 Å (according to the jFATCAT-rigid algorithm ran on the