Ribosomal proteins L34 and S29 of amphioxus Branchiostoma belcheri tsingtauense: cDNAs cloning and gene copy number �

The complete cDNA and deduced amino-acid sequences of ribosomal proteins L34 (AmphiL34) and S29 (AmphiS29) from the amphioxus Branchiostoma belcheri tsingtauense were identified in this study. The AmphiL34 cDNA is 435 nucleotides in length and encodes a 118 amino-acid protein with calculated molecular mass of 13.6 kDa. It shares 53.6-67.5% amino-acid sequence identity with its eukaryotic counterparts including human, mouse, rat, pig, frog, catfish, fruit fly, mosquito, armyworm, nematode and yeast. The AmphiS29 cDNA comprises 453 nucleotides and codes for a 56 amino-acid protein with a calculated molecular mass of 6.6 kDa. It shows 66.1-78.6% amino-acid sequence identity to eukaryotic S29 proteins from human, mouse, rat, pig, zebrafish, seahorse, fruit fly, nematode, sea hare and yeast. AmphiL34 contains a putative nucleolar localization signal, while AmphiS29 has a zinc finger-like domain. A phylogenetic tree deduced from the conserved sequences of AmphiL34 and AmphiS29 and other known counterparts indicates that the positions of AmphiL34/AmphiS29 are intermediate between the vertebrate and invertebrate L34/S29. Southern blot analysis demonstrates the presence of one copy of the L34 gene and 2-3 copies of the S29 gene in the genome of the amphioxus B. belcheri tsingtauense. This is in sharp contrast to the existence of 7-9 copies of the L34 gene and 14-17 copies of the S29 gene in the rat genome. These date suggest that housekeeping genes like AmphiL34 and AmphiS29 have undergone large-scale duplication in the chordate lineage.

Ribosomes are complex RNA-protein organelles that mediate the sequential addition of amino acids to the carboxyl end of the growing polypeptide chain, according to the blueprints encoded by mRNA (Ramakrishnan, 2002).In eukaryotes, ribosomes contain about 80 structural proteins, besides rRNA.Ribosomal proteins are highly conserved, and encoded by housekeeping genes, since their activity is required for the growth and maintenance of all cell kinds (Wool, 1979).The study of ribosomal proteins in a variety of organisms may improve our understanding of ribosome structure and evolution, and elucidate the role of ribosomal proteins in the basic mechanisms of protein synthesis.
Gene and genome duplication has been an interesting topic to biologists for decades (Ohno, 1970;Meyer & Malaga-Trillo, 1999;Sankoff, 2001).It is proposed that two rounds of large-scale gene duplication took place during the early chordate evolution: one occurred close to the origin of vertebrates, and the other close to the origin of jawed vertebrates (Holland et al., 1994;Sharman & Holland, 1996;Sidow, 1996;Meyer & Schartl, 1999).This has been substantially evidenced by comparison of the number of the luxury protein genes such as Hox (Holland & Garcia-Fernandez, 1996), Otx (William & Holland, 1998), Msx (Sharman et al., 1999) and hedgehog (Shimeld, 1999).Evolutionarily, whether the housekeeping genes like L34 and S29 also follow the two-round duplication rule remains unknown, and data concerning comparison of housekeeping gene copy numbers in different species are basically lacking.The aims of the present study were thus to characterize L34 and S29 cDNAs from the amphioxus B. belcheri tsingtauense and to determine their gene copy numbers in its genome.

MATERIAL AND METHODS
The amphioxus gut cDNA library was constructed as described by Liu et al. (2002).cDNA clones were selected at random for sequencing.Both strands of all selected clones were sequenced with an ABI PRISM 377XL DNA Sequencer and all sequences were then analyzed for coding probability with the DNATools program (Rehm, 2001).
Initial comparison against the GenBank protein database was performed using the BLAST network server at the National Center for Biotechnology Information (Altschul et al., 1997).Multiple protein sequences were aligned using the MegAlign program by the CLUSTAL method in the DNASTAR so�ware package (Burland, 2000).Phylogenetic tree was constructed by the neighbor-joining method within the Philip 3.5c so�ware package (Felsenstein, 1993) using 1000 bootstrap replicates.The accession numbers of the ribosomal protein sequences in the GenBank database used for comparison are listed in Table 1 and Table 2. Genomic DNA for Southern blo�ing analysis was isolated from adult amphioxus.A total of 30 amphioxus were ground in liquid nitrogen, and the ground powder was suspended in 15 ml of lysis buffer containing 10 mM Tris/HCl (pH 8.0), 100 mM EDTA and 0.5% SDS.A�er treatment with proteinase K (100 mg/ml, final concentration) at 55 o C for 3 h, it was cooled to room temperature and mixed with equal volume of Tris/HCl saturated phenol (pH 8.0).The mixture was centrifuged at 5 000 × g at 4 o C for 20 min, and the supernatant was pooled and then mixed with equal volume of phenol/chloroform (1:1, v/v).The mixture was centrifuged as above and the supernatant was collected.DNA was precipitated by ethanol, and digested with various restriction enzymes (2 units per µg DNA) at 37 o C for 20 h.The digested DNA was separated on a 1% agarose gel using 1×TBE (89 mM Tris/borate and 2 mM EDTA) and transferred onto Nylon membrane (Osmonics Inc.).The membranes were hybridized with digoxigenin (DIG)-labeled DNA probes produced with a DIG DNA labeling kit (Roche).Hybridized bands were visualized according to the instruction of the detection kit.

RESULTS AND DISCUSSION
The first cDNA clone encoding amphioxus ribosomal protein L34, AmphiL34, was identified from the gut cDNA library as revealed by Blast search.Figure 1 shows the nucleotide and deduced amino-acid sequences of AmphiL34 cDNA (GenBank accession number: AY168761).It consisted of 435 bp, including a 33 bp 5' untranslated region (UTR), an open reading frame (ORF) of 357 bp and a 45 bp

3' UTR.
The ORF encoded a 118 amino-acid protein with a calculated molecular mass of about 13.6 kDa, and an isoelectric point of 11.44.The 5' UTR had an oligopyrimidine tract CTTTCGCCATTTT upstream of the start codon ATG, which consists of a C residue at the cap site, followed by a sequence of polypyrimidines (Amaldi & Pierandrei-Amaldi, 1990;Perry & Meyuhas, 1990;Sugawara et al., 1992).
It should be noted that there are two purines interrupting the polypyrimidine tract and this was verified by sequencing both strands of the 5' UTR.The oligopyrimidine tract possibly plays a critical role in translation control (Levy et al., 1991).The 3' UTR included a polyadenylation signal AATAAA 16 bases upstream of the poly(A) site, which is required for post-translational cleavage-polyadenylation of the 3' end of the pre-mRNA (Proudfoot & Brownlee, 1976).
The deduced protein sequence of AmphiL34 was compared with that of other known L34 from diverse organisms in the GenBank database (Fig. 2).It showed that at the amino-acid level, AmphiL34 shared more than 57.3% identity with its homologues in vertebrates such as human, mouse, rat, pig, frog and catfish, and more than 53.6% identity with those in other eukaryotes including invertebrates like fruit fly, mosquito, armyworm and nematode and fungi like yeast.Resembling other L34 proteins (Niu & Fallon, 1999), the amino-acid sequence conservation in AmphiL34 is particularly high among the 38 Nterminal residues, and among residues 92-110 near the C-terminus.
AmphiL34 is a rather hydrophobic protein with 46 hydrophobic amino acids out of 118 residues, and has a high percentage of basic amino acids (18 lysines and 16 arginines) mostly located in the N-terminal half of the deduced protein sequence, and a low percentage of acidic amino acids (1 aspartic acid and 3 glutamic acid) mostly situated in the C-terminal half.The strong basic character of L34s including AmphiL34, may be instrumental for its binding to rRNA in the 60S subunit of eukaryotic ribosomes (Ulbrich et al., 1979;Dudov & Perry, 1984;Wiedemann & Perry, 1984).Moreover, the tetrapeptide RXXR, which is critical for nucleolar localization (Quaye et al., 1996), is found at residues 88-91 in L34 proteins including AmphiL34 (Fig. 2).
The second identified cDNA clone encoded amphioxus ribosomal protein S29 or AmphiS29.Figure 3 shows the nucleotide and deduced aminoacid sequences of AmphiS29 cDNA (GenBank accession number: AY264807).The cDNA comprised 453 bp, and contained a 5' UTR of 21 bp, an ORF of 171 bp and a 3' UTR of 261 bp.The ORF encoded a 56 amino-acid protein with a calculated molecular mass of about 6.6 kDa and an isoelectric point of 9.79.The 5' UTR of AmphiS29 had an oligopyrimidine tract CTCTTTGCCGATC critical for translation control.Like in AmphiL34, the polypyrimidine tract was interrupted by three purines and this was verified by sequencing both strands of the 5' UTR.The 3' UTR possessed a polyadenylation signal AATAAA required for post-translational cleavage-polyadenylation of the 3' end of the pre-mRNA.Comparison of the deduced AmphiS29 amino-acid sequence with that of its counterparts in the GenBank database (Fig. 4) showed high identity to S29 from mammals (76.8%) including human, mouse, rat, pig and zebrafish (78.6%), seahorse (76.8%), fruit fly (69.8%), nematode (75%), sea hare (71.4%) and yeast (66.1%).Like other known S29 proteins, AmphiS29 has a zinc finger-like domain, -C-X 2 -C-X 14 -C-X 2 -C-, at position 21-42 (Fig. 4).Zinc fingers are small DNA-binding peptide motifs that have the potential to coordinate a zinc ion and bind to nucleic acids, mostly to DNA, and a few to RNA (Klug & Rhodes, 1987).AmphiS29 also has an excess of basic residues over acidic ones (10:2), which is close to the proportion of basic residues versus acidic ones in the amino-acid sequence of S29 from human, mouse, rat, pig, zebrafish and seahorse (11:2), fruitfly (10:3), nematode (12:4), sea hare (10:3) and yeast (11:4).
Two phylogenetic trees were constructed based on the conserved sequences of AmphiL34 and AmphiS29 and their counterparts from more than 10 representative species including invertebrates and vertebrates.Both trees demonstrated that the relationship among different species well reflected the established phylogeny of the chosen organisms and that the positions of AmphiL34/AmphiS29 were intermediate between the vertebrate and invertebrate L34/S29 (Figs. 5, 6).These result are in line with the notion that amphioxus is an organism transitional from invertebrates to vertebrates in phylogeny (Stokes & Holland, 1998;Zhang et al., 2001).
The visible homology of AmphiL34 and AmphiS29 to their known counterparts extends the range of species in which these proteins are highly conserved, and the high conservatism of L34 and S29 amino-acid sequences from human to yeast sug-   Shaded residues are the amino acids that match the consensus.The zinc finger-like domains are underlined.See Table 2 for sequence reference.gests that they have been subjected to a strong selective pressure during evolution.
To analyze the copy number of the AmphiL34 and AmphiS29 genes, DIG-labeled cDNA probes of AmphiL34 and AmphiS29 were used to hybridize with digests of amphioxus genomic DNA with HindIII, BstXI, EcoRV and BglII or EcoRI, PstI, EcoRV and HindIII.The enzymes used do not digest AmphiL34 or AmphiS29 cDNA sequences, respectively.For AmphiL34 there is a single hybridization band for each of the enzymes HindIII, BstXI, EcoRV and BglII (Fig. 7A), while for AmphiS29 there are two hybridization bands for each of the enzymes EcoRI, PstI EcoRV, and three hybridization bands for the HindIII (Fig. 7B).Considering the possibility that AmphiS29 contains introns which can be digested by these restriction enzymes, one can infer the presence of one copy of the L34 gene and not more than 2-3 copies of the S29 gene in the genome of B. belcheri tsingtauense.It is of great interest to note that 7-9 copies of the L34 gene and 14-17 copies of the S29 gene are present in the rat genome (Aoyama et al., 1989;Chan et al., 1993).It is clear from the comparison of the number of the AmphiL34 and AmphiS29 genes with that of the rat L34 and S29 genes that duplication of the L34 and S29 genes occurred in vertebrates like rat.It therefore appears that housekeeping protein genes, such as L34 and S29, also underwent large-scale duplication during early chordate evolution, reinforcing the large-scale gene duplication proposal (Holland et al., 1994;Sharman & Holland, 1996;Sidow, 1996).The numbers at the branches are bootstrap percentages supporting the given branching pa�ern.L34 of yeast Schizosaccharomyces pombe was used as the outgroup.See Table 1 for sequence reference.

Figure 1 .
Figure 1.Nucleotide and deduced amino-acid sequences of amphioxus L34 gene (accession number in GenBank: AY168761).The presumed translational start and termination sites are underlined, and the asterisk represents the stop codon.The potential polyadenylation signal upstream with respect to the poly(A) tail is boxed and the oligopyrimidine tract within the 5' UTR is double underlined.The putative nucleolar localization signal (RXXR) is indicated with dots under the le�ers.

Figure 2 .
Figure 2. Amino-acid sequence alignment of representative L34s using the MegAlign program (DNASTAR) by CLUSTAL method.Shaded residues are the amino acids that match the consensus.Gaps introduced into sequences to optimize the alignments are represented by (-).The putative nucleolar localization signals (RXXR) are underlined.SeeTable 1 for sequence reference.

Figure 3 .
Figure 3.Nucleotide and deduced amino-acid sequences of amphioxus S29 gene (accession number in GenBank: AY264807).The presumed translational start and termination sites are underlined, and the asterisk represents the stop codon.The potential polyadenylation signal upstream with respect to the poly(A) tail is boxed and the oligopyrimidine tract within the 5' UTR is double underlined and the zinc finger-like domain is marked by heavy bar.

Figure 4 .
Figure 4. Amino-acid sequence alignment of representative S29s using the MegAlign program (DNASTAR) by CLUSTAL method.Shaded residues are the amino acids that match the consensus.The zinc finger-like domains are underlined.See Table2for sequence reference.

Figure 5 .
Figure 5. Phylogenetic tree constructed from different amino-acid sequences of L34 by the neighbor-joining method within the Philip 3.5c so�ware package using 1000 bootstrap replicates.The numbers at the branches are bootstrap percentages supporting the given branching pa�ern.L34 of yeast Schizosaccharomyces pombe was used as the outgroup.See Table1for sequence reference.

Figure 6 .
Figure 6.Phylogenetic tree constructed from different amino-acid sequences of S29 by the neighbor-joining method within the Philip 3.5c so�ware package using 1000 bootstrap replicates.The numbers at the branches are bootstrap percentages supporting the given branching pa�ern.S29 of yeast Saccharomyces cerevisiae was used as the outgroup.SeeTable 2 for sequence reference.