MHC-like protein

The cytomegalovirus (CMV) genome encodes four clusters of genes expressed immediately after infection--i.e.: UL36-38, UL122-123, TRS1-IRS1, and US3. The general function of these genes is associated with inhibition of cellular mechanisms of antiviral response. Although several biological processes have been mapped onto specific gene products, the knowledge of the molecular mechanism of their activity remains fragmentary. Here, we report the application of protein structure prediction methods in assigning the function to a glycosylated domain encoded by UL37 of CMV (gpUL37, UL37x3). The discerned similarity clearly points out that this domain represents a novel type of a major histocompatibility complex (MHC)-like protein, and consequently may play a central role in an additional mechanism of escape from antiviral response.


InTRodUCTIon
Human cytomegalovirus (HCMV) is an important agent of gastrointestinal infections and other diseases such as pneumonitis, hepatitis, and retinitis (Whitley, 1996), especially in immunocompromised patients (Steininger, 2007).HCMV is also one of the major viral agents associated with congenital disorders (Kenneson & Cannon, 2007).The pathogen, similarly to other species of Herpesviridae, has an ability to establish latent infection (Sinclair & Sissons, 2006) and exhibits various mechanisms preventing the apoptosis of infected cells (Andoniou & Degli-Esposti, 2006).The natural host cell range of human CMV is narrow and is mainly restricted to terminally differentiated epithelial, fibroblast and endothelial cells, as well as macrophages.
The human cytomegalovirus genome is a double-stranded DNA molecule of nearly 230 thousand base pairs (kbp), encoding more than 150 pro-teins.During productive infection, HCMV genes are expressed in a three-step cascade, with phases designated as immediate-early (IE), early and late.The genes expressed immediately after infection (immediate early gene expression) are clustered in the genome in four distinct loci: UL36-38, UL122-123, TRS1-IRS1, and US3 (Colberg-Poley, 1996).The general function of these genes is associated with inhibition of cellular mechanisms of antiviral response and temporal regulation of gene expression.The IE genes are involved in the cell cycle block in a p53-dependent manner (UL122-123), regulation of antiviral response executed by PKR kinase (TRS1, IRS1), as well as in inhibition of apoptosis triggering (UL36, N-terminal segment of UL37) and deactivation of viral antigen presentation via modulation of expression of MHC class I proteins (US3) (Colberg-Poley, 1996).The immediate early proteins encoded by UL122 and UL123 (IE1 and IE2, respectively) play a central role in initiation and maintenance of L. S. Wyrwicz and L. Rychlewski HCMV gene expression in both lytic and latent infection (Stenberg & Stinski, 1985).The presence of analogous genomic loci has been confirmed for all sequenced related viruses of Betaherpesvirinae, although a notable variance in the gene content was reported (Knipe et al., 2001).
The expression of genes from the UL36-UL38 region, followed by alternative splicing, results in the synthesis of several gene transcripts with a repertoire of corresponding protein products, whose individual functions have not been fully described yet (Colberg-Poley et al., 2000).Of these, UL36 encodes a nonglycosylated cytoplasmic protein of anti-apoptotic activity, due to its role as an inhibitor of a caspase cascade (Andoniou & Degli-Esposti, 2006).The analysis of conservation of UL36 across a spectrum of CMV genotypes suggests that this protein is mutated in many important laboratory strains, including the most commonly used strain AD169 (Patterson & Shenk, 1999).A recent study by Terhune et al., (2007) provided a strong evidence that yet another gene product of this gene cluster -pUL38 -is a cell-death inhibitor enabling efficient virus replication.
The dominant IE gene product involved in anti-apoptotic regulation is encoded by the UL37 gene (Goldmacher, 2002), spanning 2.7 kbp of the viral genome (Dolan et al., 2004).Lee et al. (2000) in their study on mutation of a murine homologue (M37) demonstrated that this protein is a virulence factor, required for virulence in vivo, but not essential for replication in vitro.Other previous studies clearly pointed out that the protein acts as a negative regulator of mitochondrial activation of apoptosis (viral mitochondria-localized inhibitor of apoptosis; vMIA (McCormick et al., 2003)).In turn, functional assays suggested that the anti-apoptotic activity of UL37 is achieved at a point downstream of caspase 8, but prior to cytochrome c release from mitochondria in a Bcl-2-like manner.This was noted despite a lack of traceable sequence similarity of UL37 to known cellular proteins with such activity (Andoniou & Degli-Esposti, 2006).In the mitochondrion the protein product seems to sequestrate Bax in the form of a vMIA-Bax complex (Arnoult et al., 2004;Poncet et al., 2004).Also by altering the mitochondrial bioenergetics vMIA causes a widely expressed cytopathic effect (Poncet et al., 2006).A recent molecular modelling study on vMIA was augmented with mutational assays to reveal the potential mechanism of vMIA's molecular activity.Applying the techniques of fold recognition resulted in a potential model of the three-dimensional structure of vMIA as a viral homologue of Bcl-X(L) (Pauleau et al., 2007).
The minimal fragment of UL37 exhibiting vMIA activity was mapped to the cleaved N-terminal fragment, encoded by the first exon (pUL37x1; (An-doniou & Degli-Esposti, 2006)).Apparently, a major fraction of the UL37 open reading frame encodes a highly glycosylated protein (gpUL37; UL37x3) transported through the secretory pathway, via endoplasmatic reticulum and the Golgi apparatus.The molecular function and impact on viral pathogenesis of this predominant fraction of UL37 open reading frame (gpUL37) is (as of now) undetermined (Andoniou & Degli-Esposti, 2006).
The Herpesviridae constitute a relatively old group of viruses, for which phylogenetic studies postulate an evolution parallel to that of the vertebrate host species (Montague & Hutchison, 2000).Thus, a significant fraction of the herpesvirus gene pool encompasses divergent homologues of host cellular proteins, adapted to host-specific and cell-type specific conditions (Holzerlandt et al., 2002).The exact number of such genes is probably underestimated, due to analyses conducted on the level of primary protein structure (i.e.pairwise amino-acid sequence similarities).In our previous studies, we applied state-of-the-art methods for distant homology detection and combined them with protein structure prediction in order to study several divergent Herpesviridae proteins and outline the molecular background of their function.The approach was successfully applied in the case of the critical invasion factor glycoprotein L (gL) (Wyrwicz & Rychlewski, 2007b), the Gammaherpesvirinae transcription factor BcRF1 (Wyrwicz & Rychlewski, 2007c), the Alphaherpesvirinae transcriptional regulator ICP4 (Wyrwicz & Rychlewski, 2007a) and the neurovirulence protein UL45 (Wyrwicz et al., 2008).Here, we present similar results of fold recognition and three dimensional modelling, as applied to Betaherpesviridae gpUL37 domain in order to postulate the molecular function of this immediate early glycoprotein.

MATeRIALS And MeTHodS
Assembly of UL37 family.The sequence of CMV UL37 (gi|9625722; strain AD169) was subjected to PSI-BLAST (Altschul et al., 1997) searches against the NCBI NR (non-redundant) protein database (National Center for Biotechnology Information; NCBI; 01/11/2007).The search was conducted until profile convergence, with the cut-off for inclusion in the profile set equal to 0.001.Subsequently, the obtained set of UL37 homologues was clustered at 90% of sequence identity, using the CD-HIT tool (Li & Godzik, 2006).Finally, the resulting sequences were aligned with ClustalW (Thompson et al., 1994) with minor manual corrections of the end result.
Molecular modelling of the UL37 protein.Sequences of the conserved globular domain of UL37 were subjected to the Structure Prediction Meta Server (http://bioinfo.pl/meta(Bujnicki et al., 2001)).At this stage, secondary structure prediction was also obtained using PsiPred (Jones, 1999) and ProfSec (Rost & Sander, 1993), accessed via the Meta Server.
The resulting structural alignment was manually refined according to the quality assessment tool -Verify3D (Eisenberg et al., 1997).Finally, homology models were obtained with Modeller version 6.2 (Sanchez & Sali, 2000) and screened with 3D-Jury -a consensus fold recognition method (Ginalski et al., 2003).
The predicted MHC fold consists of a β-sheet of eight strands and two long helices located at one side of the sheet in a ββββαββββα topology (Garboczi et al., 1996).The resulting domain has a hydrophobic ligand-binding cavity (located between the two helices).Most of the recognized MHC-fold proteins are involved in the binding of peptides in order to present either internal (Class I MHC) or external (Class II MHC) antigens in the process of acquired immune response.Here -the MHC domain is composed of two distinct fragments -referred to as α-1 and α-2 domains.
However, other ligands were also identified (Maenaka & Jones, 1999;Natarajan et al., 1999).Although the binding module and the ligands differ between the observed functional groups of MHC proteins, a constant feature of this fold is the relatively high instability of the structure in the absence of the ligand (Garboczi et al., 1996).
The structural alignment constructed on the basis of distant homology mappings, refined according to the predicted and observed secondary structures of gpUL37 and crystallographically solved proteins of MHC fold, respectively, is shown in Fig. 1.Although some discrepancy in the predictions of secondary structure is observed (compare Fig. 1), the overall topology of the proteins is conserved between the two protein families.A homology model of CMV gpUL37 built according to the alignment is shown in Fig. 2. Of note are the additional two pairs of conserved cysteine residues (marked with numbers above the alignment in Fig. 1), located in the model in spatial proximity.We postulate that these residues may stabilize the fold by both tightening the β-sheet and bridging one α-helix to the core part of the fold (Fig. 2).
The majority of proteins belonging to the MHC fold contain an additional all-β immunoglobulin-like domain, spacing the MHC fold from the cell membrane.In UL37 we also observe an additional segment located between the discussed domain and the transmembrane segment (Fig. 3).However, for this region (sometimes referred to as the basic domain (Hayajneh et al., 2001b)), we were unable to notice homology to any other proteins of known fold.
The molecular mimicry of viral proteins to host proteins of defined functions is not unusual and several such events have been reported (Michel-

Table 1. Summary of fold recognition analysis for CMV UL37 (gi|52139223).
The top scoring hits (for each method) are shown in bold font.Hits below 3D-Jury cutoff of 50.0 (corresponding to less than 5% of prediction error) are shown in italics.Methods were coded according to their common abbreviations: 3DPS, 3D-PSSM; MBAS, MetaBasic; FFA3, FFAS3., 2004).Also, the presence of a MHC class I homologue in the CMV genome, encoded by the UL18 gene, was previously reported (Beck & Barrell, 1988), further corroborating our present findings.
In the light of our analysis -the challenging question remains: "can gpUL37 bind any ligand?".The modelling performed for such a low sequence similarity (below 10%) cannot be very helpful in determining the ligand-binding specificity (as this is a case of quite distant fold mapping).
Another open area is the reason for the wide preservation of MHC-like proteins in viral genomes of Betaherpesvirinae.The development of acquired immune system in early vertebrates, resulted in the creation of a complex MHC-fold-mediated antigen presentation and recognition system, based on a set of specific protein-protein interactions.All analyzed genomes of Betaherpesvirinae contain a homologue of UL37 and this subfamily diverged from other Her-pesviridae early (when compared to the evolution of the MHC protein family).Thus, we might speculate that this particular IE gene was acquired by an ancestor of Betaherpesvirinae in the moment of the divergence of MHC-fold proteins in the host genome.
The gene product of UL37 has been suggested to provide an important anti-apoptotic mechanism, one active early in the infection.This observation was consistently proven by several research groups in various experimental models and the reliability of this functional assignment is very high (Andoniou & Degli-Esposti, 2006).The UL37 homologues from the majority of Betaherpesvirinae genomes have a constant fragment of the open reading frame (ORF) encoded by CMV exon 3 (i.e., gpUL37), with the anti-apoptotic molecule vMIA (encoded by exon 1) restricted to primates' CMV (HCMV, CCMV -compare Fig. 3).On this basis, we posit that vMIA was gained in the recent evolution of this Cytomegalovirus subfamily by inclusion of an additional exon in the ancestral IE gene.This information is also partially confirmed by the fact that both human and chimp CMV retain an ancestor signal peptide, located in the internal part of ORF, adjacent to the proposed MHC-like domain.This signal peptide is processed after the cleavage of vMIA (Mavinakere & Colberg-Poley, 2004).In such a situation, the functions of secreted gpUL37 and vMIA directed to the mitochondriae can be synergic only in terms of a temporal association during IE gene expression.
The observed low sequence conservation of the glycosylated domain of UL37 (gpUL37) can be a result of viral adaptation to host response factors.Notably, this adaptation is not only driven by the host, but may also result from the presence of specific factors of infected cells (tissues) or viral pathogenesis.The glycosylated domain of UL37 from the human pathogens analyzed exhibits only 7-9% of sequence identity (in comparison of CMV versus closely related HHV6 and HHV7 sequences).This is consistent with the average sequence identity observed across the whole group of gpUL37 domain homologues.Additionally, none of the several potential N-glycosylation sites throughout the domain is conserved in all analyzed species (compare Fig. 1).This observation is consistent with a study showing relatively high sequence variance within primary isolates from clinical samples.A number of reported polymorphisms were located at potential N-glycosylation sites (Hayajneh et al., 2001a), but the majority of these variants occurred at suboptimal N-glycosylation sites (according to the results of NetNGlyc method; not shown).
Since the binding of a ligand in the cavity of the MHC-like fold of gpUL37 remains disputable and we posit the existence of cysteine bonds bridging this unstable protein fold, we suggest that this  L. S. Wyrwicz and L. Rychlewski protein acts as a viral analog of the cellular proteins of the MHC complex (Kluczyk et al., 2004).Their postulated function is to interfere with either class I or class II MHC proteins, rather than being an active member of antigen presentation mechanisms.Therefore, gpUL37 may define a novel robust viral mechanism of inactivation of the cell antiviral response.In this, it may cooperate with the other CMV proteins observed in Betaherpesvirinae, e.g.: US3 -a small protein ligand blocking the antigen presentation by binding to MHC class II proteins (Johnson & Hegde, 2002).Alternatively, gpUL37 can function similarly to a different MHC-like protein, encoded by the CMV gene UL18 (Beck & Barrell, 1988;Lopez-Botet et al., 2001).The later gene product (pUL18) alternates the activity of NK lymphocytes and dendritic cells (Wagner et al., 2008).
In light of the above, further experimental assays (focused on mechanisms hypothesized here) are needed to elucidate the exact role of the glycosylated domain of UL37 homologues, during pathogenesis of Betaherpesvirinae infections.

Figure 2 .
Figure 2. A model of CMV UL37 MHC-like domain.Secondary structure elements are coded with colors: cyan, α-helix; magenta, β-strands.Location of cysteine residues creating disulphide bonds is shown in yellow.

Figure 3 .
Figure 3. overview of domain organization in UL37 orthologs (for virus abbreviations refer to Fig. 1).

Table 2 . Summary of results for fold recognition on the conserved domain of UL37 gene product, according to 3d- Jury prediction assessment method.
Template codes are Protein Data Bank (PDB) entry identifiers.