QUARTERLY Review Structure and biosynthesis of human salivary mucins �

Human salivary glands secrete two types of mucins: oligomeric mucin (MG1) with molecular mass above 1 MDa and monomeric mucin (MG2) with molecular mass of 200-250 kDa. Monomers of MG1 and MG2 contain heavily O-glycosylated tandem repeats located at the central domain of the molecules. MG1 monomers are linked by disulfide bonds located at sparsely glycosylated N- and C-end. MG1 are synthesized by mucous cells and MG2 by the serous cells of human salivary glands.

Human salivary glands secrete 1000-1500 ml per day of saliva composed of water, proteins and low molecular mass substances (mainly electrolytes).Up to 26% of the salivary proteins are mucins [1].The mucins of human saliva are extremely effective lubricants, which provide an effective barrier against desiccation and environmental insult.They control permeability of mucosal surface, limit penetration of potential irritants and toxins to mucous cells, protect mucosal cell membranes against proteases generated by bacteries in the bacterial plaques around the teeth and regulate colonization of the oral cavity by bacteria and viruses [2].The mucins are high molecular mass glycoproteins [3,4], in which proline and serine/threonine constitute up to 20-55% of total amino acids and are concentrated in one or several regions of the polypeptide.These serine/threonine residues are heavily glycosylated, and 40-80% of the mass of such mucins consists of O-linked oligosaccharides [5].The cysteines at the N-and C-ends may link mucin monomers by disulfide bridges forming linear mucin oligomers [6].Two types of mucins are present in human sa-liva: oligomeric mucin glycoprotein (MG1) with molecular mass above 1 MDa, and monomeric mucin glycoprotein (MG2) with molecular mass of 200-250 kDa [1].The submandibulary glands containing mucous cells (producing MG1) and serous cells (producing MG2) secrete 30% of the salivary mucins, while sublingual, labial and palatal glands (which contain mainly mucous cells) secrete 70% [7][8][9][10][11].Concentration of mucin secreted by sublingual glands is higher than that secreted by submandibulary glands, while the secretion of parotid glands is devoid of mucins [7,12].

POLYPEPTIDES OF SALIVARY MUCINS
Three genes encoding mucins in human salivary glands have been reported [5]: MUC1 localized to chromosome 1q21-24, which encodes a transmembrane mucin [13]  (Fig. 1B).Disulfide bridges and hydrophobic sites possessing negatively charged amino acids of the central domain of MG1 occupy seven cysteine-rich subdomains (Cys1-Cys7) rich in cysteine (70 residues), aspartic acid (37 residues) and glutamic acid (58 residues).Cysteine-rich subdomains contain also a large number of aromatic amino acids, i.e., tyrosine (31 residues), phenylalanine (31 residues) and tryptophan (17 residues) [13].Two laboratories reported that the C-terminal domain of MG1 comprises 804 [10] or 808 [17] amino acids, respectively.It is relatively rich in cysteine and proline, but relatively poor in threonine and serine [10,17] and has a similar structure to von Willebrand factor (vWF) [19].Based on the similarity to this factor was encoded six subdomains in the C-terminal domain of MG1 (Fig. 1C): Muc 11p 15 type subdomain (analogical to a subdomain found in products of genes MUC2 and MUC5AC clustered on chromosome 11p 15.5), vWF-A3uD 4 -like, vWF4D-like, vWFBlike, vWFC-like and CK [17].The Muc 11p 15 type subdomain consisting of 40 amino acids (similar to the polypeptide encoded by appropriate sequences of genes MUC2 and MUC5AC, but absent in vW factor) follows the central domain of MG1.The second, 69 amino-acid subdomain called vWF-A3uD 4, similar to the subdomain found in vW factor, was also found in the products of the genes MUC2 and MUC5AC.The third 378 aminoacid subdomain called vWFD4-like is similar not only to subdomains found in vWF, subdomains of the products of genes MUC2 and MU5AC, but also to subdomains in other proteins, such as zonadhesin [20] and vitellogenin.The vWFD-like subdomain contains vicinal cysteine motifs (CGXC) similar to the sequences found at the active sites of the enzyme that catalyses thiol protein disulfide interchange and may be involved in dimer formation between the C-terminal domains of MG1 [17,20].The C-terminal domain of MG1 contains one vWF B-like subdomain 40 amino acid long, instead of three B subdomains found in vWF.It also contains one vWFC-like subdomain, related to the C2 subdomain in vWF, instead of two corresponding subdomains in vWF.A hydrophobic segment (YXX 6 CCX 7 C) located between aa 658-678 in the vWFC-like subdomain (similar to that present in serum albumin) may be responsible for binding hydrophobic compounds to MG1 [17].The last subdomain of the C-terminal domain of MG1 similar to vWF and other mucins, is named CK or "cystine knot".This "knot" is probably constructed of three internal disulfide bridges linking cysteines 741-788,764-813 and 768-820.Cysteine 787 may be involved in the formation of intermolecular bridges linking MG1 dimers [10,17]  (Fig. 5).The remaining 20% of oligosaccharides are dominated by 5-7 saccharide units of core 2 type [25,26].Of 41 proposed O-linked oligosaccharide structures 23 were neutral, 15 sialylated and 3 sulfated [25].Some peripheral regions from O-linked oligosaccharides of MG2 contain antigenic determinants T, Le x , Le y , sialyl T and sialyl Le y [25].It has been noted that core 2 antigens Le x and sialyl Le x are potential ligands for selectins (adhesive molecules controlling leucocyte movement) [30] and some strains of pathogenic bacteria are associated with MG2 through antigen T, sialyl T or lactosamine sequences [29].
There is some controversy concerning oligosaccharide chains of MG1 and MG2.Slomiany et al. [27] in a study of five blood group B individuals showed that oligosaccharide chains of MG1 and MG2 have similar proportions of neutral and acidic oligosaccharides.The predominant neutral oligosaccharides in both glycoproteins are composed of 15 and 16 sugar units (Fig. 3).The majority of these oligosaccharides bore the blood group B determinants.Basing on these data Slomiany et al. in 1993 [27] suggested that MG1 and MG2 arise through the action of endogenous proteases on one high molecular mass mucus glycoprotein [28].However according to other authors [5,8,13,25,26] MG1 and MG2 have different polypeptide chains encoded by different genes [5,8,13] and carry very different carbohydrate structures [25].This opinion is consistent with the proposed polypeptide structures of MG1 [10,13,17] and MG2 [22], the known specificity of GalNAc polypeptide transferases [14] and the data suggesting that MG1 and MG2 are synthesized by different populations of cells in human salivary glands [9].

BIOSYNTHESIS OF SALIVARY MUCINS OLIGOSACCHARIDES
The biosynthesis and processing of the N-linked glycoproteins is now almost com-pletely understood [14].It has been shown that although low molecular mass rat submandibular mucin is N-glycosylated, this N-glycosylation is not required for its secretion [31].Recently, however, it has been reported that some mucins, e.g.N-glycosylated human colonic mucin (MUC2), become associated with the lectin-like chaperones (calreticulin and calnexin) by disulfide bridges, before folding to their correct structure in the ER [32].There is no data available on such chaperone action on MG1 or MG2.Recently new information concerning O-glycosylation appeared [14].Most studies indicate that O-glycosylation is initiated in the cis Golgi compartment.Glycosyltransferases can be divided into two types: those transferring the sugar residue with retention of its anomeric configuration, and those transferring with inversion.Both types of glycosyltransferases have similar nucleotide recognition domains [33].The biosynthesis of O-linked glycans begins with addition of GalNAc to serine or threonine residues in a polypeptide chain [34], catalysed by a family (6 members reported to the end of 1999 [35]) of GalNAc polypeptide transferases (UDPGalNAc : polypeptide N-acetyl-galactosaminyltransferases, GalNAcT (EC 2.4.1.41)),which are conserved in evolution from nematodes to humans [36].These GalNAc polypeptide transferases consist of short N-terminal cytoplasmic tails, transmembrane domains, extended stem regions and a large carboxyl terminal where the catalytic domains are located and which face the interior of the Golgi compartment [37][38][39].Genes for GalNAc1T, -T2 and -T3 are localized to chromosomes 18q12-q21, 1q412-q42, and 2q24-q31, respectively [40].This family of GalNAc polypeptide transferases has different expression patterns in different organs.The specificity for folded polypeptide chains of proteins and thus potential sites of O-glycosylation are different for each transferase [41].Glycosylation of certain acceptor sites by one of the GalNAc polypeptide transferases is required before other sites can be glycosylated by another GalNAc polypeptide transferase [31,35].In addition, individual GalNAc polypeptide transferases show different localization within the Golgi compartments [39], moreover, the sizes of their stem regions are different [36].The repertoire of GalNAc polypeptide transferases varies with cell localization and differentiation [42].GalNAcT3 is expressed in all layers of normal human mouth epithelia and in most squamous carcinoma cells, GalNAcT2 was found in undifferentiated cell layers in the normal epithelial and in most carcinoma cells except in well differentiated foci.GalNAcT1 shows a low level of expression, markedly enhanced in tumors [42].
Positions C3 and C6 of the core GalNAc may be substituted by Gal, GalNAc, GlcNAc or NeuAc (Fig. 3 genesis [48].The formation of these backbones and peripheral domains of glycans has been described previously [49].It is noteworthy that determinants Le x , Le y , A, B, H type 2 and sialyl Le y are synthesized from structure 2, determinants Le a , Le b , A, B and H type 1 are synthesized from structure 1 [49, 50] (Fig. 5) and that sialylation terminates the synthesis of O-linked oligosaccharides [14].

CONCLUSIONS AND FUTURE PROSPECTS
There is a consensus concerning the structure of the central and C-terminal regions of MG1.Soon we may expect a report on the N-terminal sequence of MG1.Also there is an agreement concerning the deduced sequence for MG2.Thirty seven O-linked oligosaccharide chains in MG1 and 41 in MG2 have been described, however, not all data on this subject are unanimous.There is a lack of consen-sus on the quantity and quality of oligosaccharide chains in human salivary mucins.Initiation of O-glycosylation takes place in the cis Golgi compartment, however, the subregions of Golgi where particular glycosylation steps take place are not yet known.The requirements of the tertiary structure of the substrates for particular glycosyltransferase action and eventual role of chaperones in this process are also not known.Rules have to be established for the competition of glycosyltransferases for a particular substrate.The number of mucin genes taking part in biosynthesis of the MG1 polypeptide chains remains to be established.conditions, have yet to be established.Also there is no data available on the relationship between secondary structure and biological function of salivary mucins.An evaluation of which cysteine residues take part in the formation of disulfide bridges and the conditions necessary for this linkage have not yet been published.
, and two genes encoding secretory mucins, MUC5B localized to chromosome 11p15.5encoding oligomeric mucin (MG1) [7] and MUC7 localized to chromosome 4q13-21 encoding monomeric MG2 [13].The limited expression of MUC5B in sublingual and submandibulary glands secreting huge amounts of oligomeric mucins suggests that MG1 is encoded by two or more mucin genes [10].The peptide sequence of MG1, deduced from the nucleotide sequence of the MUC5B gene, consists of about 5000 amino acids grouped in three domains: N-terminal, central and C-terminal [10] (Fig. 1A).It is suggested that the N-terminal domain of MUC5B gene product comprising about 450 amino acids is cysteine-rich and contains potential N-glycosylation sites [10].A single large exon encoding the central domain of MG1 comprising 3570 amino acids (rich in threonine (27%), serine (12.9%) and proline (10.6%)) has been described.The domain consists of 19 subdomains: seven cysteine-rich, 3 subdomains with no repeats, 5 irregular tandem repeats and 4 unique R-end sequences conserved over a long evolutionary time scale, with no typical repeats.

Figure 1 .
Figure 1.Structure of human sublingual oligomeric mucin (MG1)-the MUC5B gene product.A. General view (CHO, potential N-glycosylation site).B. Structure of the central domain of MG1: Cys (1-7), cysteine-rich subdomains; R01, R02, R03, subdomains with no repeats; RI-V, imperfect repeat subdomains; Re = R-end subdomains.UpA, UpB, UpC and UpD, super repeats of 528 amino acids (lollipop symbol potential N-glycosylation site).C. Comparison of the structure of the C-terminal domain of MG1 and C-terminal domains of other mucins (MUC5AC and MUC2 gene products), with the structure of the von Willebrand factor.Central (TR), end of central domain of MG1.Muc 11p15 type, the subdomain of the MUC5B gene product similar to the products encoded by appropriate segments of the MUC2 and MUC5AC genes clustered to chromosome 11p15.5,vWF-A3u4D, D-like, B-like, C-like and CK, subdomains of the C-terminal domain of the MUC5B gene product similar to the appropriate subdomains found in von Willebrand factor.(Based on the data presented by Desseyn et al. [13, 17] by permission of the American Society for Biochemistry and Molecular Biology, and Troxler et al. [10] by permission of Oxford University Press).
) has been suggested as a major site for O-linked GalNAc attachment to polypeptide, and is found in the majority of the 72 Thr/Ser-rich tandem repeats.In each of the RI-RV subdomains, one potential N-glycosylation site is found.The central domain of MG1 has four R-end subdomains.Each R-end subdomain is composed of 111 amino acids and has a high content of threonine (30.4%) and serine (18.2%) with numerous potential O-glycosylation sites.Each R-end subdomain has five TXXP sequences, which are major sites for O-linkage of GalNAc to polypeptide.The central domain ends with a peptide called R03, which is enriched in threonine (20%) and serine (20%), proline, phenylalanine and valine.In the central domain of MG1 one may distinguish four super-repeats called UpA-UpD.Each super-repeat (528 amino acid long) consists of the R, R-end and cysteine-rich subdomains[13].In the central domain of MG1 there are 7 potential N-glycosylation sites[13].MG1 is an oligomeric mucin composed of subunits linked by disulfide bridges[15], similarly to gastric mucin[6].MG1 is heavily O-glycosylated and contains numerous hydrophobic sites possessing hydrophobic as well as negatively charged amino acids in subdomains with no oligosaccharide chains[16].The deduced amino-acid sequence of the central domain of MG1 based on the structure of the MUC5B gene is consistent with the biochemical data.Potential sites for O-glycosylation of the central domain of MG1 are located in five Ser/Thr-rich 20-29 amino acid RI-RV tandem repeats (Ser/Thr 52.5%), four Ser/Thr-rich R-end subdomains without repeats and three subdomains R01-R03[13].
. The C-terminal domain of MG1 shows alternating hydrophobic and hydrophilic sequences.The most hydrophilic segment occurs in the CK domain [10].Six subdomains of the C-terminal region of the MUC5B gene product have conserved se-quences similar to those present in vWF, the MUC2 and MUC5AC gene products.Nearly all cysteine residues are conserved along with several other amino acids, suggesting a very similar tertiary structure for the C-terminal domain of MG1 compared to the above proteins.Based on the deduced amino-acid sequence of the C-terminal domain of MG1 computer simulation predicts 62% of b-turn and 13% of a-helix located between a 219-250 and 407-421 in the vWFD-like subdomain and 776-801 in the CK subdomain [17].The remaining 25% of the C-terminal domain consist of extended and coil structures.The rigid, rod-shaped conformation of the C-terminal domain is essential for oligomerization of MG1.It is noteworthy that 88% of cysteines in the C-terminal domain are localised in or at the vicinity of a b-turn in 11 out of 15 potential N-glycosylation sites [17].The structure of MG1 deduced from the nucleotide sequence of the MUC5B gene is consistent with the immuno-electron-microscopy picture in which filamentous structures of 0.5-10 mm were observed.The subunits of MG1 are linked by disulfide bridges and possess an exposed (not covered by carbohydrate chains) area 100-150 nm long at the ends and exposed intervals 100-180 nm long in the central part of the molecule.In the native molecule of MG1 some of these intervals are not exposed on the surface of the molecule [21].The amino-acid sequence (deduced from the MUC7 gene) of monomeric apomucin from human saliva (MG2) is composed of three domains: the N-terminal domain consisting of 144 amino acids, the central domain comprising amino acids 145-283, including six tandem repeats of 23 amino acids each, and the C-terminal domain consisting of 74 amino acids [22] (Fig. 2).The first 20 amino acids of the N-terminal domain of MG2 make up a leader peptide [23].The remaining portion of the N-terminal domain has two cysteine residues, three potential N-glycosylation sites [23] and 9 potential O-glycosylation sites (4 serine and 5 threonine residues).It was suggested that the N-terminal domain of secreted MG2 was involved in the formation of disulfide bridges causing self-association [24].It is worthy of note that the amino-acid sequence of the N-terminal domain of MG2 encoded by the MUC7 gene (Fig. 2B) is consistent with the data obtained by Edman degradation of the purified protein [25].The central domain of the MG2 polypeptide is rich in potential O-glycosylation sites, including a tandem repeat region of 138 residues.Evidence suggests that the tandem repeat sequences serve as constrained structural elements to stabilize poly-L-proline II conformation of the central domain of MG2.In the extended poly-Lproline conformation, the side chains are segregated farther apart from each other [22] than in other conformations.This segregation can promote O-glycosylation.If this central domain was extensively O-glycosylated and extended, for example as a b-structure, it would have an estimated length of approximately 65 nm.Results of electron microscopy and light scattering suggest that the mole-cules are random-coil structures with a radius of gyration below 15 nm [24].Trypsin digestion of native MG2 releases a 90 kDa fragment of central tandem repeats, which carries primarily O-linked oligosaccharides [25].The C-terminal domain of MG2 is devoid of cysteine residues but is rich in proline (5 residues), and has one potential N-glycosylation and 26 potential O-glycosylation sites (7 serine and 19 threonine residues) [22].
Data indicating which potential O-and N-glycosylation sites are glycosylated and what is the nature of regulatory factors are urgently awaited.The source of the microheterogeneity of the oligosacharide chains of human salivary mucins is unknown.Regulation of the biosynthesis of the polypeptide and oligosaccharide chains and the changes taking place in pathological