Cloning and characterization of Arabidopsis thaliana AtNAP57

Rat Nap57 and its yeast homologue Cbf5p are pseudouridine synthases involved in rRNA biogenesis, localized in the nucleolus. These proteins, together with H/ACA class of snoRNAs compose snoRNP particles, in which snoRNA guides the synthase to direct site-specific pseudouridylation of rRNA. In this paper we present an Arabidopsis thaliana protein that is highly homologous to Cbf5p (72% identity and 85% homology) and NAP57 (67% identity and 81% homology). Moreover, the plant protein has conserved structural motifs that are characteristic features of pseudouridine synthases of the TruB class. We have named the cloned and characterized protein AtNAP57 (Arabidopsis thaliana homologue of NAP57). AtNAP57 is a 565 amino-acid protein and its calculated molecular mass is 63 kDa. The protein is encoded by a single copy gene located on chromosome 3 of the A. thaliana genome. Interestingly, the AtNAP57 gene does not contain any introns. Mutations in the human DKC1 gene encoding dyskerin (human homologue of yeast Cbf5p and rat NAP57) cause dyskeratosis congenita a rare inherited bone marrow failure syndrome characterized by abnormal skin pigmentation, nail dystrophy and mucosal leukoplakia.

In eukaryotes, 18S, 5, 8S and 28S rRNA molecules are transcribed as a single large precursor, which undergoes a complicated maturation process [1].This process occurs in a highly specialized part of the nucleus known as the nucleolus [2].Maturation of the precursor molecule consists of many endo-and exonucleolytic cleavages leading to the production of mature rRNA species and also of modifications of standard nucleosides at specific positions.One of the most frequent post-transcriptional modification of rRNA is pseudouridylation.
Pseudouridine (Y, 5-ribosyluracil) was the first modified nucleoside to be discovered, and this is the most abundant modification in all RNA species [3].Many Y residues were found not only in rRNA, but in tRNA, snRNA and snoRNA molecules as well [4][5][6].Despite this frequent localization of Y in various classes of RNA its role in RNA structure and function has not yet been finally elucidated [3].The conversion of uridine to pseudouridine involves the breakage of the N1 glycosidic bond and the rotation of the base by 180°around the N3-C6 axis, followed by reformation of a covalent bond at position C5 [7].This reaction is catalyzed by pseudouridine synthases.Many representants of this large and ancient enzyme family have been identified in both prokaryotes and yeast, with TruA, TruB, RluA and RsuA representing four distinct subfamilies of Y-synthases [8,9].
Selection of uridine residues for pseudouridylation by prokaryotic Y-synthases occurs with a high degree of site-specificity, both for tRNA and rRNA substrates.In these cases, each synthase is generally responsible for pseudouridylation of one or more [10] U positions in a particular RNA species.However, dual-specificity synthases catalyzing the formation of Y residues in both tRNAs and rRNAs [11], tRNAs and snRNAs [12], and in both cytoplasmic and mitochondrial tRNAs [13] have been described.In eukaryotic RNAs (especially rRNA), due to the numerous localization of Y [4,6], site-selection for pseudo-uridylation has to proceed in another way.It seems that this process involves a specific class of small nucleolar RNAs (snoRNAs) known as box H/ACA snoRNAs, that act as guides to direct site-specific pseudouridylation of rRNA [14][15][16].These snoRNAs possess a hairpin-hinge-hairpin-tail structure, with the hinge region containing the conserved sequence block ANANNA (H box) and the ACA motif found in the tail structure, three nucleotides away from the 3¢-end [17].Target rRNAs have short signal sequences located around U to be converted to Y. Site selection for Y synthesis occurs by base-pair interactions between antisense elements in the H/ACA snoRNAs and sequences in the rRNA on both sides of the target uridine.This creates a pocket structure in which the target U is unpaired and accessible to the Y-synthase.The snoRNAs interact with many proteins creating functional ribonucleoprotein particles (snoRNPs).Each box H/ACA snoRNA associates with at least four specific proteins, one of which is pseudouridine synthase, known as Cbf5p in yeast [18] or NAP57 [19] in mammals.
Cbf5p was originally isolated as a low-affinity in vitro centromeric DNA binding protein [20].CBF5 is an essential gene encoding a 55 kDa highly charged protein with a domain containing ten tandem KKE/D repeats near the C terminus.Cbf5p is a highly conserved protein with homologues found in many organisms (including rat -NAP57 and human -dyskerin).Cbf5p and its orthologues have high sequence homology to Escherichia coli TruB pseudouridine synthase which catalyzes the conversion of uridine to Y at position 55 in tRNA [21].The region of TruB believed to be the conserved U-binding domain, is the area showing the greatest homology to Cbf5p.Furthermore, Cbf5p shares the KP and XLD motifs found in three out of the four distinct families of known and putative Y synthases [22].Cbf5p coimmunoprecipitates with all members of the H/ACA class snoRNAs [18].This suggests that Cbf5p directly inter-acts with the conserved H/ACA box to provide metabolic stability to the snoRNA and participate in assembly of the snoRNP molecule.It has been shown that genetic depletion of Cbf5p inhibits Y formation in rRNA, resulting in accumulation of the unmodified rRNA molecules, and in delay or arrest of the yeast cell cycle [22].Cbf5p shows high homology (71% identity, 85% similarity) to the rat nucleolar protein NAP57, which was identified as a protein associated with nucleolar shuttling protein Nopp140 [19].Nucleolar localization and homology with known Y synthases suggests that Cbf5p and NAP57 might be the rRNA Y synthases in yeast and mammals, respectively, guided to their target sites by box H/ACA snoRNAs [23].In these snoRNP particles Cbf5p/NAP57 are the active Y synthesizing components.Interesting new features of the human Cbf5p homologue, dyskerin, have been recently revealed (see Results and Discussion).
In this work we present the identification of A. thaliana Cbf5p/NAP57 homologue.AtNA-P57, as we named the protein, shares a high degree of sequence identity with mammalian (rat NAP57 and human dyskerin) and yeast (Cbf5p) proteins.It is the first plant homologue that belongs to the family of Y synthases.

GenBank accession number.
The nucleotide and protein sequence of AtNAP57 have been submitted to the GenBank database under accession No. AF234984.
Plant material and growth conditions.Seeds of the plant A. thaliana (ecotype Columbia) were supplied by Lehle Seeds (Round Rock, U.S.A.).They were grown in a greenhouse (22°C with 16 h light photoperiod) on soil irrigated with mineral nutrients as suggested by the seed producer.
Cloning of AtNAP57 cDNA.To clone cDNA of the A. thaliana gene encoding a homologue of yeast protein Cbf5p we performed 5¢ and 3¢ RACE (rapid amplification of cDNA ends) experiments [24,25].To obtain gene specific primers for RACE we searched the GenBank (www.ncbi.nlm.nih.gov)database with the protein sequence of rat NAP57 [26].We got one A. thaliana EST partial cDNA sequence (GenBank F20038).On the basis of this sequence, we designed three 5¢ RACE and three 3¢ RACE primers.To characterize the 3¢ end of AtNAP57 transcript, total RNA was isolated from whole two weeks old A. thaliana seedlings using the RNeasy Plant Mini Kit (Qiagen).DNA and protein sequence analysis.The obtained DNA sequences were assembled and analyzed with the Lasergene computer program (DNA STAR Inc., U.S.A.).Protein alignments were performed with the ClustalW [27] and Boxshade [www.ch.embnet.org]programs.Computer-assisted searches for nucleotide and amino-acid sequences were carried out using the BLAST tools [26].Conserved domain searches were performed with the help of Reverse Position Specific BLAST (Gen-Bank Conserved Domain Database) [28] and ProfileScan (Prosite Database) [www.isrec.isb-sib.ch]programs.
Primers used in reactions.Below, the list of primers used to amplify and sequence AtNAP57 cDNA is shown.Primers NAP1-NAP6 and Qt, Qo, Qi were used in RACE reactions, primers NAP7-NAP14 and Universal and Reverse primers (specific for pGEM T-Easy vector) were used in sequencing reactions.All primers are listed in 5¢ to 3¢ direction; they were supplied by Sigma-ARK Scientific.

RESULTS AND DISCUSSION
To clone cDNA encoding the A. thaliana homologue of Cbf5p/NAP57, we applied a RACE strategy.Gene specific primers were designed on the basis of Arabidopsis EST sequence with high homology to yeast Cbf5p and rat NAP57 (F20038).These primers and Arabidopsis total RNA were used in RACE reactions to amplify the 5¢ and 3¢ parts of cDNA of the putative arabidopsis Cbf5p/NAP57 homologue.The 5¢ RACE reaction yielded a product about 200 bp in size (Fig. 1, lane 2) and the 3¢ reaction gave an about 1300 bp product (Fig. 1, lane 4).Both RACE products were directly cloned into pGEM T-Easy vector and sequenced.A short overlapping sequence in these products allowed us to compose a virtual, full length cDNA sequence.However, this sequence did not seem to be complete (it did not contain the STOP codon), so we decided to perform another 3¢ RACE reaction with two new primers (NAP15, NAP16) to elongate the 3¢ part of the sequence.This reaction yielded an about 300 bp product (not shown) that was also cloned and sequenced.Based on the obtained sequence data we designed two additional edge primers to directly amplify the full length cDNA molecule by PCR.The product was also cloned and sequenced to confirm that all earlier analyzed parts of cDNA were derived from the same mRNA.This gave us the complete cDNA sequence of the A. thaliana protein that is homologous to yeast Cbf5p and rat NAP57 (Fig. 2).We named the gene and its product AtNAP57 (Arabidopsis thaliana homologue of NAP57).
The cDNA sequence of AtNAP57 consists of 1935 nucleotides (GenBank AF234984).It CACCAAAGTCTGAGAAGAAGA contains an 1698 bp ORF, starting at the nucleotide 59, encoding a 565 amino-acids protein (Fig. 2).Predicted size of this protein is 63 kDa.AtNAP57 is a strongly charged protein with many basic amino acids on its C-terminus (isoelectric point at 9.158).Genomic localization and structure of AtNAP57 gene were resolved by computer analysis.It revealed that the AtNAP57 gene is located within the sequence of A. thaliana chromosome 3 (AL138655).Interestingly, this AtNAP57 gene does not contain introns (genomic sequence is identical with cDNA).AtNAP57 contains conserved structural domains (Fig. 2) which may point to the protein function.Moreover, it has the TruB family pseudouridylate synthase (N terminal domain) motif [21] throughout amino-acid residues 101-238 and the PUA motif (pos.287-362) -a novel RNA binding domain detected in archaeal and eukaryotic pseudouridine synthases (designated PUA after pseudouridine synthase and archaeosine transglycosylase) [29].A bipartite nuclear localization signal (NLS BP) [30] was also de Considering all the conserved structural motifs (TruB, PUA and NLSs) and the high homology with yeast Cbf5p, which is known to be Y-synthase, we think that AtNAP57 is also a functional pseudouridine synthase.This can be proved by complementation test of yeast cbf5 null mutation with AtNAP57.Such experiments were carried out for the Drosophila melanogaster homologue of CBF5-Nop60B gene.Nop60B encodes an essential nucleolar protein that complements a cbf5 null mutation [31].On the other hand, rat NAP57 does not complement the cbf5 null phenotype in yeast [31].It has to be checked whether AtNAP57 does function in yeast or not.The complementation experiments are in progress in our laboratory.The predicted amino-acid sequence of AtNAP57 was compared with homologous proteins from other species yeast (CBF5, L12351), human dyskerin (DKC1, O60832) and rat (NAP57, Z34922).Standard single-letter code was used for the amino acids.Identical amino-acid residues are indicated by black boxes, and similar amino-acids are in grey boxes.Gaps () are introduced to obtain the maximum level of alignment.The alignment was done using the ClustalW and Boxshade programs (see text).
A possible role of yeast Cbf5p in centromere function, indicated by binding of centromeric DNA, was not confirmed in other organisms.In D. melanogaster there is no apparent association between Nop60B protein and the chromosomes [31].It is possible that the role of Cbf5p in centromere function is unique to yeast.Alternatively, it may be that a conserved centromeric function exists but has not been detected yet in other organisms, or that the association of Cbf5p with the yeast centromere is non-functional.
One of the best characterized CBF5 homologues is probably the human gene DKC1.This is because the DKC1 gene and its product dyskerin are believed to play a role in development of a human disease dyskeratosis congenita [32,33].Dyskeratosis is a rare inherited bone marrow-failure syndrome characterized by abnormal skin pigmentation, nail dystrophy, and mucosal leukoplakia.More than 80% of patients develop bone-marrow failure towards the end of the first decade of life, and this is the major cause of premature death.The X-linked (develops only in males) form of the disease has been shown to be caused by mutations in the DKC1 gene.This single-copy gene is localized on the X chromosome.It comprises 15 exons spanning at least 16 kb and is transcribed into a widely expressed 2.6 kb message [34].Numerous missense mutations and one 3¢ deletion [35] were detected in DKC1 gene sequence in patients who suffered dyskeratosis.These mutations result in production of non-functional dyskerin that is thought to cause the disease.Dyskerin is a 514 amino-acids protein with a predicted molecular mass of 57.6 kDa [32].Dyskerin localizes to the nucleolus -it contains multiple putative NLSs at the N-and C-ends [36,37].It is a highly charged peptide with some conserved sequence motifs like the TruB Y synthase motif, multiple phosphorylation sites, and a carboxy-terminal lysine-rich repeat domain [32].The whole amino-acid sequence of dyskerin shows high homology with other pro-teins from the TruB family of pseudouridine synthases (Fig. 3).Interestingly, all missense mutations characterized so far, are located in the most conserved regions of dyskerin.By analogy to the function of known dyskerin orthologues, involvement in the cell cycle, nucleolar function in ribosome biogenesis and Y synthesis have been predicted for the human protein.The molecular mechanism leading to dyskeratosis congenita has not yet been elucidated.On the basis of early experimental data it was thought to involve the rRNA biogenesis failure due to predicted RNA binding function of dyskerin in the nucleolus.Missense mutations in the DKC1 gene could modify the function of dyskerin affecting snoRNP assembly and stability.Recent data suggest, however, that dyskerin has another function the disturbance of which can lead to dyskeratosis.It was proposed that dyskeratosis may not be caused by the deficiency in rRNA, but rather by a defect in the maintenance of telomeres [38,39].It was previously shown that the 3¢ end of the RNA component of human telomerase (hTR) is structurally and functionally similar to the H/ACA family of snoRNAs.Dyskerin and its homologues interact with the H/ACA motif of snoRNAs, creating functional particles involved in rRNA biogenesis.It has been shown recently that dyskerin associates with the H/ACA portion of hTR and is a part of the human telomerase RNP complex.This interaction seems to be important for biogenesis, processing or turnover of the telomerase RNP.Decreased accumulation of hTR, reduced telomerase activity and abnormally short tracts of telomeric DNA were detected in the cells expressing mutated forms of dyskerin.On the other hand, the mutant dyskerins were still able to carry out snoRNP functions, as the mutation had no discernible impact on rRNA processing.Dyskeratosis is a disease affecting strongly dividing tissues like bone marrow and epithelium, and the risk to the patients of having some form of cancer is increasing with their age.This is consistent with telomere shortening that leads to chromosomal instability, telomeric rearrangements and cancer progression.

Figure 2 .
Figure 2. Nucleotide and amino-acid sequence of AtNAP57.The predicted amino-acid sequence is shown below the DNA sequence.Nucleotides are numbered on the left, and amino acids on the right.Start and stop codons are grey-shaded.The conserved TruB pseudouridine synthase domain is double line underlined.The PUA (RNA binding domain) motif is single line underlined.The NLS BP sequence is indicated by a box, critical amino acids are grey-shaded.The sequence has been submitted to GenBank (accession number AF234984).

Figure 3 .
Figure 3. Alignment of amino-acid sequences of AtNAP57 protein and its homologues.