Analysis of Schistosoma mansoni genes using the expressed sequence tag approach.

Expressed sequence tags (ESTs) are partial cDNA sequences read from both ends of random expressed gene fragments used for discovering new genes. DNA libraries from four different developmental stages of Schistosoma mansoni used in this study generated 141 ESTs representing about 2.5% of S. mansoni sequences in dbEST. Sequencing was done by the dideoxy chain termination method. The sequences were submitted to GenBank for homology searching in nonredundant databases using Basic Local Alignment Search Tool for DNA (BLASTN) alignment and for protein (BLASTX) alignment at the National Center for Biotechnology Information (NCBI). Among submitted ESTs, 29 were derived from lambdagt11 sporocyst library, 70 from lambdaZap adult worm library, 31 from lambdaZap cercarial library, and 11 from lambdaZap female B worm library. Homology search revealed that eight (5.6%) ESTs shared homology to previously identified S.mansoni genes in dbEST, 15 (10.6%) are homologous to known genes in other organisms, 116 (81.7%) showed no significant sequence homology in the databases, and the remaining sequences (2.1%) showed low homologies to rRNA or mitochondrial DNA sequences. Thus, among the 141 ESTs studied, 116 sequences are derived from noval, uncharactarized S. mansoni genes. Those 116 ESTs are important for identification of coding regions in the sequences, helping in mapping of schistosome genome, and identifying genes of immunological and pharmacological significance.

cused on creating resources and techniques for development of stage specific libraries, on identifying genes of interest and on drawing low resolution physical maps of the whole parasite genomes.Priority was given to gene discovery using the expressed sequence tag (EST) approach leading to the deposition of 16 815 ESTs in dbEST (Oct./18/02).Despite this significant accomplishment, less than 15% of schistosomal genes have been identified and EST sequencing will therefore continue to be the primary objective of network activities.
Expressed sequence tags can serve the same purpose as random genomic DNA sequencetagged sites (STSs) and provide the additional feature of pointing directly to an expressed gene.ESTs longer than 150 bp were found to be the most useful for similarity searches and mapping (Adams et al., 1991).ESTs have application in the discovery of new genes, mapping genomes, identification of coding regions in genomic sequences, and the study of the mechanisms of tissue differentiation and ontogeny by developing profiles of sequences that are differentially expressed in particular cell types or at specific developmental stages (Boguski et al., 1993).
Ever since the analysis of complex genomes began in earnest in the 1980s with the assembly of framework physical maps and the advent of methods for large-scale DNA sequencing, genome analysis has been justified in large part by the promise that it would eventually result in discoveries of great biological importance and pave the way for novel approaches in the study of biology.Time has passed and a number of genome projects are either complete or well underway (Hart, 1996;Feng et al., 2002).The ESTs are submitted to the database of expressed sequence tags (dbEST), which is a division of GenBank at the U.S. National Center for Biotechnology Information, and made available to the public through GenBank and its international collaborators, the European Bioinformatics Institute and the DNA database of Japan.The non-proprietary aspect of EST data and cDNA clone acquisition and use will maximize the enormous value offered to the biomedical community by this initiative (Cohen & Emanuel, 1994).Pharmaceutical companies, biotech companies and academics are now using dbEST to select genes for characterization and functional association (Gerhold & Caskey, 1996).Many commercial enterprises have, however, withheld or delayed submission to GenBank in order to gain a competitive edge in product development from the available information (Baxevanis et al., 1996).
Homology searching is the process of comparing a new sequence against all other known sequences and then attempting to infere the function of the new sequence by assessing the matches and their biological annotation.There are a number of important issues in searching DNA and protein sequence databases, but the most important is the type of database.The schistosome genome project appears to increase our understanding of schistosome genome and its expression as well as provide a new approach to the identification of parasite gene products of immunological or pharmacological inserts (Simpson, 1990).
Parasites.Schistosoma mansoni Egyptian strain (Frandsen, 1979) and cercaria were obtained through the Schistosome Biological Supply Program (SBSP) at the Theodor Bilharz Research Institute.Molecular size marker was from Sigma Chemical Co.Bulk dNTPs were from Promega Corp.
Techniques for DNA analysis.Schistosoma mansoni Sporocyst lgt11 cDNA library was immunoscreened using affinity purified antibodies, diluted 1:1000 in 1% BSA/1x PBST20 (Sambrook et al., 1989).Preparation of in vivo excised lZap cDNA library was according to Sambrook et al. (1989).The l bacteriophage which may contain insert or not was amplified by the PCR (Saiki et al., 1988) using a pair of oligonucleotides complementary to sequences flanking the unique EcoRI site.The enzymatic DNA sequencing (Sanger et al., 1977) system was applied for purified DNA samples.Analysis of the sequencing data was carried out according to the method of Ausubel et al. (1995).
Database 'homology' searches were conducted on the sequence data.The searches were conducted using the BLAST algorithm (Altschul et al., 1990).BLASTN searches the homology between the nucleotide query sequence and nucleotide sequence databases, BLASTX searches for homology between the six frame translation of the nucleotide query sequence and amino-acid sequences in protein databases.

EST analysis
The results of a homology search in non-redundant databases for the 141 ESTs produced are summarized in Table 1.A total of 141 ESTs were generated from four different libraries constructed from four distinct stages of the parasite life cycle.The libraries were lgt11 sporocyst, lZap cercaria, lZap adult worm, and lZap female libraries.The sporocyst stage is an important stage for identifying antigens on the surface of 3-h schistosomules.Presumably the RNA messages directing the synthesis of those antigens are present in the sporocyst stage.The generation of expressed sequenced tags in this library depends on the arbitrary selection of indvidual cDNA clones.The efficiency of this process reflects the clonal structure of the libarary used and can be significantly increased using size selected cDNA library.This strategy, however, is not readily applicable when mRNA is limiting as is the case in this study of complete parasite genome.Immunoscreening of this library using affinity purified antibodies derived from rabbits immunized with irradiated cercariae led to the identification of 29 immunoreactive clones.
From the 141 ESTs we found that 5.6% are homologous to previously identified S. mansoni genes, 10.5% represent S. mansoni genes that are homologous to known genes in other organisms, about 2.1% represent genes of low to moderate homology to ribosomal RNA or mitochondrial DNA, the rest (about 81.6%) represent genes with no known homology in the non-redundant databases.

EST data
Tables 2-5 summarize the positive homology of ESTs submitted for each library.

lgt-11 sporocyst library
The diluted antibody used to immunoscreen lgt11 library was firstly adsorbed with recombinant lgt11 clones encoding antigens previously identified in our laboratory.This step was necessary in order to decrease the possibility of repeated isolation of the same clones and to facilitate identification of new antigen encoding clones.Twenty nine independent positive plaques were isolated from about 150 000 screened plaques after three rounds of immunoscreening.DNA of those 29 phages was prepared by using PCR with lgt11 forward and backward primers.DNA fragments were sized and separated on 1% agarose gel.The inserts generated from this library were digested by EcoR1 and subcloned in pGE-M3ZF cloning vector and sequenced by double-stranded sequencing.
The average insert size was 0.62 kbp, while the average EST length was 249 bp. Figure 1 shows the size of randomly selected inserts by PCR.From 29 ESTs generated from this library, one clone is homologous to previously identified S. mansoni genes, three S. mansoni genes are homologous to known genes in other organisms, and 26 ESTs have no homology in the databases.Table 2 describes the positive homology matches of the library generated sequences with S. mansoni and other organisms' ESTs from the GenBank.

ESTs from l-Zap cercarial library
Thirty one ESTs were generated from this library and all were sequenced.The average insert size was 0.94 kbp, while the average EST length was 195 bp.Five of them match identified S. mansoni genes, two S. mansoni genes showed homology to known genes in other organisms, and 24 ESTs have no homology in the databases.Table 3 describes the positive homology matches of library generated sequences with S. mansoni and other organisms' ESTs from the GenBank.

ESTs from l-Zap adult worm library
Seventy ESTs were generated from this library.The recombinant pBluescript SK plasmids were sequenced.The average insert size was 1.1 kbp, while the average EST length was 167 bp.Of these ESTs two were homolo-  gous to previously identified S. mansoni genes, three are S. mansoni genes that are homologous to known genes in other organisms, three are low homologous to rRNA or mitochondrial DNA (mtDNA), and 62 ESTs have no significant homology in the databases.Table 4 describes the positive homology matches of the library generated sequences with S. mansoni and other organisms' ESTs from GenBank.

ESTs from l-Zap female (B) library
Eleven ESTs were generated from this library.The average insert size was 2.8 kbp, while average EST length was 183 bp.Seven are sequences homologous to identified S. mansoni genes, and four ESTs have no homology in the databases.Table 5 describes the positive homology matches of the library generated sequences with S. mansoni and other organisms' ESTs from GenBank.

DISCUSSION
Schistosoma are dioecious digenetic trematodes carrying a large (270 Mb) genome.Gaining knowledge about the genome of these parasites is of importance for the understanding of their biology, mechanisms of drug resistance and antigenic variation that determine escape from the host's immune system (Franco et al., 2000).The aim of the present study is a small scalle comparison of S. mansoni genes profile in different developmental stages based on the expressed sequence tags (ESTs) approach.Some of those genes may be  of importance for the development of a vaccine against schistosomiasis.
Initially we used lgt-11 sporocyst libraries; this was a nondirectional library so we ended up with a 50% chance for sequencing in the 3¢ untranslated region.Most of the results, however, were obtained by sequencing the 5¢ end of inserts in clones derived from directional l-ZAP adult worm, cercariae and female specific libraries.
Most of the ESTs from different libraries were shown to have less than 5% useless clones and more than 80% new genes.The redundancy of each library was also analyzed, showing that one adult worm cDNA library was composed of a small number of highly frequent genes.When comparing ESTs from distinct libraries, we could detect that most genes were present only in a single library, but others were expressed in more than one developmental stage and may represent housekeeping genes in the parasite.When considering only once the genes present in more than one library, a total of 138 informative genes were obtained, corresponding to 116 new S. mansoni genes.From the total of unique genes, 10.6% were identified based on homology with genes from other organisms, 5.7% matched S. mansoni characterized genes and 82.3% represent unknown genes (Table 1).We used immunoscreening to identify clones encoding S. mansoni vaccine candidates.This approach was successfully used previously in this laboratory to identify and characterize a number of important vaccine candidates (Osman et al., 1995;Keung et al., 1995;Ghazali et al., 1996;Mohamed et al., 1998;Mohamed et al., 2000).This approach is based on the studies which have shown that the early schistosomula stage is an important target for immune elimination (Taylor, 1991).Random sequencing of cDNA clones was done from excised lZap cDNA libraries for the purpose of generating ESTs; 112 sequences were generated in this study.Homology comparisons with DNA sequences in non-redundant databases revealed that of the 112 EST sequences seven (6.25%) encoded proteins homologous to identified genes in S. mansoni, 12 (10.72%)encoded sequences homologous to those of known proteins in organisms other than S. mansoni.The availability of those clones will thus facilitate characterization of the schistosome counterparts of proteins identified in other organisms.The remaining 93 (83.04%) EST sequences did not match sequences in the database.These clones are available for studies to identify the function and expression profiles of as yet unidentified proteins.
The precise characterization of the diversity of ESTs and physiological activity in different stage libraries can be ascertained by sorting the identified genes into functional categories.The detection of genes involved in tegument and membranes, transcriptional and translational activities or with regulatory and signaling functions is considered significant in cercariae followed by sporocyst then adult worm.Cytoskeletal and structural genes were also detected in ESTs that are essential for muscle contraction, movement through water or blood stream or laying out eggs.Genes that might be related to enzymes involved in energy metabolism formed the largest category in the case of the cercarial library which is not surprising considering that cercariae need much energy for rapid swimming and for penetrating the skin of the host.This finding is in accordance with previous expression studies of energy metabolism genes (Skelly & Shoemaker, 1995).The cercariae need to be fully prepared for the rapid morphological and physiological changes that will be necessary for their transformation into schistosomula to adult worm.Some interesting matches to genes of S. mansoni or other organisms (<100% homology at amino-acid level) were found, suggesting that these may be new members of gene families that include several genes with similar sequences and probably similar functions.These new genes may provide important in-sights into the physiology of the sporocyst, cercariae and adult worms.Amongst the informative genes we identified major female specific polypeptide, Sm20.8 tegumental protein, Sm disulfide isomerase, Sm glyceraldhyde-3-phosphate dehydrogenase, a-subunit of ATP synthase, D. melanogaster actin 88 protein, Sm13 tegumental protein.This is interesting because previous analyses had identified some of those genes from sporocysts to adult worms.This suggested that the protein is synthesized at the sporocyst stage and retained throughout the life of the worm (Grossman et al., 1990), leading to the proposal that many genes for S. mansoni proteins were subject to stage-specific regulation.EST MS1MM024.AS6 with insert size 0.6 kbp was shown to share 51% identity at the amino-acid level with the database S. mansoni major female specific polypeptide transcript of 0.36 kbp which encodes a protein of 104 amino acids.Homology occurs at the region from bp 5-103 of the EST, corresponding to amino acids 66-98 of the SM major female specific polypeptide.The sexual maturation of adult schistosomes, pairing of male and female worms and the subsequent massive and continuous production of eggs are the major developmental events of the parasite that occur within the mammalian host.The end product of maturation is the production of viable eggs and thus studies of the maturation process have concentrated on the development of the female vitelline gland within which the major egg proteins are synthesized.Indeed, in females which pair with males, the appearance of differentiating vitelline cells is the earliest sign of reproductive maturation.Apparently, a single type of mRNA, which is absent from male and immature female worms dominates the message population of egg producing female schistosomes.For simplicity we will refer to the product of this gene as the major female specific polypeptide (FSP) and the gene as the FSP gene.It has been demonstrated that the presence of the male worm is required to induce female specific polypeptide gene expression although it is not yet known whether an agent is transferred from one sex to the other, which acts directly on the nuclear DNA of the other sex or if an internal signal is induced.Although the gene is very highly expressed, it is present in a very low copy number and is found in the DNA of both sexes.It is not rearranged or amplified during expression.The gene is first expressed five weeks after infection of the mammalian host (Simpson et al., 1987).EST MC3NE030.ACS with insert size 0.8 kbp was shown to share 58% identity at the amino-acid level with the database S. mansoni tegumental antigen Sm20.8 transcript of 0.7 kbp which encodes a protein of 181 amino acids.Homology occurs at the region from bp 34-207 of the EST, corresponding to amino acids 1-58 of Sm20.8.Sm20.8 shares homology with Sm21.7,Sm22.6 and Sj22.6, schistosome vaccine candidates which are members of a family of tegument associated antigens.It is not clear what role these proteins play in the tegument of schistosomes, they may help to maintain this organ.Members of this family contain a sequence motif characteristic of Ca-binding proteins known as the EF hand motif, Sm20.8 possesses a similar motif at amino acids 38-56.Sm20.8 has an isoelectric point of 7.27 and its expression is developmentally regulated, with the highest concentration found in cercariae, Sm20.8 shows immunoreactivity with sera from infected humans and rabbit vaccinated with irradiated cercariae.Confocal microscopy demonstrates that Sm20.8 localizes to the tegument of adult worms and 3 h schistosomula.The IgG fraction specific to Sm20.8 mediated complement killing of schistosomula in vitro by 35%.Vaccination of mice with naked DNA containing the Sm20.8 gene and subsequently challenged with cercariae showed 30% reduction in worm burden compared to controls (Mohamed et al., 1998).EST MA2AS022.AAS with insert size 0.7 kbp was shown to share 88% identity at the amino-acid level with the database S. mansoni protein disulfide-isomerase (PDI) homologue precursor transcript of 5.3 kbp which encodes a protein of 482 amino acids.Homology occurs at the region from bp 6-110 of the EST, corresponding to amino acids 445-479 of the protein disulfide-isomerase homologue precursor.Also, another EST MA2AS022.AAS with insert size 0.4 kb was shown to share 95% identity with the same protein but the homology occurs at the region from bp 1-60 of the EST, corresponding to amino acids 463-482 of the protein disulfide-isomerase homologue precursor.PDI is a ubiquitous enzyme that is localized in the lumen of the endoplasmic reticulum.The deduced protein sequence contains two thioredoxin like domains with a particularly high sequence identity of 68% between S. mansoni and man.Thioredoxin domains characterize all PDI genes hitherto sequenced but they are also found in some other genes with very different functions.The schistosome PDI gene and the human PDI gene, however, share also significant similarity (34%) in other parts of the coding region besides the thioredoxin domains.In vertebrates PDI is mainly accumulated in secretory cells where its abundance correlates with the level of secretory protein synthesis.Its major function is to catalyze isomerization of intramolecular disulfide/sulfhydryl bonds, a reaction that is required for folding of nascent polypeptides in the endoplasmic reticulum.The PDI genes of S. mansoni and man are also very similar in their genomic structure, the sequence positions of the introns in the schistosome gene are exactly the same as in the human PDI gene (Finken et al., 1994).EST MC2AS032.ACS with insert size 1.6 kbp was shown to share 90% identity at the amino-acid level with the database S. mansoni glyceraldhyde-3-phosphate dehydrogenase transcript of 1.5 kbp which encodes a protein of 338 amino acids molecular mass 37 kDa.Homology occurs at the regions from bp 3-110, 112-138, 159-200 of the EST, corresponding to amino acids 14-49, 50-58, 65-78 of the protein.Schistosomes ingest large quantities of glucose, as much as 26% of its dry body mass per hour, since they depend on glycolysis for their energy production, so glyceraldhyde-3-phosphate dehydrogenase is certainly critical to parasite survival.The enzyme was identified as a surface associated vaccine candidate.The function of the surface-located enzyme is, however, not clear.So far there is no evidence indicating that it plays a metabolic role at the surface since it would have to be associated with other glycolytic enzymes to perform its catalytic activity (Goudot-Crozel et al., 1989).
EST MS1MM021.AS7 with insert size 0.6 kbp was shown to share 94% identity at the amino-acid level with the database human mitochondrial a subunit of ATP synthase transcript of 1.8 kbp which encodes a protein of 553 amino acids.Homology occurs at the region from bp 2-265 of the EST, corresponding to amino acids 364-451 of the mitochondrial a subunit of the ATP synthase.The ATP synthase complexes of mitochondria, chloroplasts, and bacterial membranes are structurally and functionally similar.Functionally, ATP synthase complexes synthesize ATP from ADP and P i utilizing energy generated from the electron transport chain (Breen, 1988).EST MC2NE023.ACS with insert size 0.6 kbp was shown to share 98% identity with S. mansoni actin mRNA of 1.5 kbp.Homology occurs at the region from bp 1-167 of the EST, corresponding to bases 687-853 of the actin mRNA.Actins constitute a highly conserved family of proteins found in all eukaryotes.These proteins are found predominantly in the cytoplasm of cells as monomers polymerize to form microfilaments.Microfilaments of actin participate in various cell functions such as muscle contraction, cell cytoskeleton and motility.Actin microfilaments also function in bundles where they form microvilli, stereocilia, and other cellular structures.The tegument of the human blood fluke S. mansoni is limited by two lipid membrane bilayers that contain numerous crystal-line spines.Optical diffraction patterns suggested that these spines are composed predominantly of actin bundles.Actin has since been shown by fluorescence microscopy to be present on the surface of schistosomula, and by immunofluorescence to be present in the muscle, tegumental tubercles and spines of male and female adult worms.Two-dimensional gel electrophoresis and immunoblotting have detected several isoforms of actin.Actin, either associated with tegumental spines or free in the cytoplasm, has been implicated in the repair and maintenance of the integrity of the adult worm's tegument.S. mansoni actin genes are more homologous to invertebrate actin than vertebrate cytoplasmic actin genes, as expected due to their evolutionary relationship.Although highly conserved, the differences observed in the primary sequence of the parasite actin may account for the immunogenicity of the protein (Oliveira & Kemp, 1995).EST MA1AS012.AAS (GenBank accession # AA269243) with insert size about 0.65 kbp was shown to share 100% identity at the amino-acid level with the database S. mansoni protein Sm13 transcript of 0.96 kbp which encodes a protein of 320 amino acids.Sequence analysis revealed a 104 amino acid open reading frame (ORF) identical to that described by Abath et al. (2000).Our sequence also shows 100% identity with GenBank genomic Sm13 sequence AF07886 of 960 bp and with Sm13 mRNA sequence U67153 of 396 bp.However, EST database searches revealed three almost identical sequences (GenBank accession numbers AAS17940, N20684 and AA269243).The independent isolation of homologous cDNA confirmed the sequence of Sm13.
This study will provide additional data to the Schistosoma Gene Discovery Program, which is part of the Schistosoma Genome Project created in 1992.One of the main objectives of this program is the discovery and characterisation of new genes of S. mansoni and S. japonicum in an attempt to search for new targets for drugs and vaccine development.
The success of the Schistosoma Gene Discovery Program is demonstrated by the number of the catalogued genes that now reaches 15 to 20% of the full gene complement of its genome (Franco et al., 2000).

Table 4 . Positive homology matches, l-Zap adult worm library
*SM, as in Table2.