Analysis of the complete mitochondrial genome sequence of the resurrection plant Haberlea rhodopensis

Haberlea rhodopensis is a paleolithic tertiary relict species that belongs to the unique group of resurrection plants sharing remarkable tolerance to desiccation. When exposed to severe drought stress, this species shows an ability to maintain structural integrity of its deactivated photosynthetic apparatus, which easily reactivates upon rehydration. In addition to its homoiochlorophyllous nature, the resurrection capability of H. rhodopensis is of particular importance to the global climate change mitigation. In this study, we sequenced, assembled, and analyzed the mitochondrial (mt) genome of H. rhodopensis for the first time. The master circle has a typical circular structure of 484 138 bp in length with a 44.1% GC content in total. The mt genome of H. rhodopensis contains 59 genes in total, including 35 protein-coding, 21 tRNAs, and 3 rRNAs genes. 7 tandem repeats and 85 simple sequence repeats (SSRs) are distributed throughout the mt genome. The alignment of 20 plant mt genomes confirms the phylogenetic position of H. rhodopensis in the Lamiales order. Our comprehensive analysis of the complete mt genome of H. rhodopensis is a significant addition to the limited database of organelle genomes of resurrection species. Comparative and phylogenetic analysis provides valuable information for a better understanding of mitochondrial molecular evolution in plants.


INTRODUCTION
Over the last several years, many plant genomes have been sequenced and assembled by the next-generation sequencing (NGS) technologies. Due to their small size, conserved gene order, and content, chloroplast (cp) genomes were sequenced more frequently than mitochondrial genomes. In comparison to the cp genomes, the mt genome molecules proved to be difficult to assemble because of their variable structure .
Mitochondria are essential organelles in plants, often called the energy factory of the cell. They possess their own genome which is often used for comparative and evolutionary studies. A recent study highlights mt genes as significant markers for resolving relationships among genera, families, and higher rank taxa across angiosperms. It was observed that the low substitution rates of mt genes in comparison to cp genes make them very useful in the reconstruction of ancient phylogenetic relationships (Qiu et al., 2010). Phylogenetic trees represent the true relationship between conserved core genome sequences and are used to resolve taxonomic grouping among different species. Most mt genomes are extremely large and complex when compared to those of animals and fungi, which generally show a stable and conservative mode of evolution (Li et al., 2009). Land plants' mitochondrial genomes exhibit some features, such as a significant size expansion, frequent gene loss and gene transfer to the nucleus, RNA editing, genomic rearrangement, and replacement of some tRNAs by their cp counterparts (Knoop, 2004), that highlight their evolutionary dynamics. Mitochondrial genomes vary significantly in structure, size, and gene order . Much of these variations occur even between members of the same family or genera (Alverson et al., 2010). These main features contrast with animal mtDNAs which are structurally conserved, relatively small in size, and have very fast nucleotide substitution rates (Ballard et al., 2004).
Previous studies revealed that the nucleotide substitution rates between animals and plants differ due to the fact that the plant and animal kingdoms diverged about 1000 million years ago and their patterns of evolution have become different. Also, researchers observed significantly different rates in nucleotide substitution rates among plants. Comparison between plant mitochondrial (mt), chloroplast (cp) and nuclear (n) DNA sequences revealed slower substitution rates in the mitochondrial DNA, which is maybe due to a lower mutation rate. In contrast to mammalian mtDNA, where mitochondrial sequences evolve 5 times faster than nDNA, mtDNA sequences in angiosperms evolve at least 5 times more slowly than nDNA (Wolfe et al., 1987).
Plant mt genomes contain various repeat sequences, such as tandem repeat sequences, simple repeat sequences, and large repeats (Alverson et al., 2010(Alverson et al., , 2011. In many studies it was observed that mt repeats contain valuable genetic information and are regarded as components in intramolecular recombination (Chang et al., 2013). Methylated sites are linked to tandem repeats in the maize mt genome (Clifton, 2004 repeats play an important role in the evolution of plant mt genomes and are responsible for structural variations and size variability of plant mt genomes (Chang et al., 2013). Some genes with large repeats may have multiple copies. Recombination across large direct repeats may divide mt genomes into pairs of subgenomic molecules, whereas inverted repeats may generate isomeric circles (Handa, 2003;Clifton, 2004;Chang et al., 2013). The gene content of the plant mt genomes is highly variable. The number of protein-coding genes in the mt genomes varies from 3 to 67, whereas the number of tRNA genes varies from 0 to 27 (Adams & Palmer, 2003). In the course of evolution, many mt genes originally found in plant mt genomes have been lost during transfer to the nucleus (Lei et al., 2013). For example, the sdh2, rps9, rps11, and rps16 genes were lost in the evolution of plant mt genomes. The protein-coding genes: rps12, sdh3, and sdh4 were lost in monocots, while rps2 was lost in dicots .
RNA editing, a post-transcriptional process of changing nucleotide sequence of any RNA molecule, challenged the basic concept of molecular biology that the primary RNA sequence reflects the sequence of DNA from which it is transcribed. The changes encompass insertions and deletions of uridine residues, and a conversion of a cytidine to uridine within the RNA molecule. RNA editing affects the transcripts of protein-coding genes, non-coding transcribed regions, structural RNAs and intron sequences, and serves as a buffer for less preferred mutations in the coding sequences (Covello & Gray, 1989).
In plant organelles, RNA editing sites were found in the coding regions of mRNA, introns, and non-translated regions. The majority of post-transcriptional modifications include U-to-C, A-to-I, and the RNA-editing of C-to-U was identified in most of the angiosperms (Gray et al., 1999). The process of site editing can generate an initiation or termination codon, but in most cases it generates an internal codon with functional relevance (Handa, 2003). Mt and cp RNA editing in plants is essential for the normal functioning of their translation and respiration activity, and is beneficial for understanding gene expression.
Resurrection plants are a group of flora that thanks to unique survival mechanisms evolved over time can survive extreme water shortages for years. Because chloroplasts and mitochondria play an irreplaceable role in stress sensing and responses, studying their genomes is an important prerequisite for understanding their desiccation tolerance. Recently, the cp genomes of the two representatives of resurrection plants -Boea hygrometrica  and H. rhodopensis (Ivanova et al., 2017), both belonging to Gesneriaceae, were sequenced and annotated; this was later followed by the mt genome of B. hygrometrica. Here, we report the sequencing data, assembly, and annotation of the mt genome of H. rhodopensis, and provide a comparative and phylogenetic analysis that contributes to a better understanding of mitochondrial molecular evolution in plants.

Plant material and sequencing
Plant material from H. rhodopensis was collected from the northeast Rhodopi Mountain, Bulgaria (location42.1′N24.52′E). Total DNA was extracted from the leaf tissue with a DNeasy Plant Mini Kit (QIA-GEN), according to the manufacturer's instructions. The quality and quantity of DNA were checked with an Epoch microplate spectrophotometer and an agarose gel. Library preparation and sequencing by HiSeqX Illumina technology were performed at Macrogen (Seoul, South Korea) by Illumina standard protocol. The isolated DNA was used to generate reads with a 150 bp paired-end data library and insert size of 350 bp. DNA sequencing is generated in a total of 2 × 366909885 reads.

Genome assembly and annotation
De novo assembly of the H. rhodopensis mt genome (mtDNA) was performed by applying NOVOPlasty (https://github.com/ndierckx/NOVOPlasty), а sole de novo assembler. A seed-and-extend algorithm was used which assembles organelle genomes from whole-genome sequencing (WGS) data, starting from a related or distant single seed sequence with kmer 39 (Dierckxsens et al., 2016). According to the manual of this assembler, we used the total DNA sequencing reads (including nuclear, mt, and cp reads) instead of mapping them to the reference genome and filtering mitochondrial reads. The mtDNA nucleotide sequence of the cox1 gene from the closely related species B. hygrometrica (JN107812) was used as a seed sequence in the process of genome assembly. This was subsequently elongated, resulting in one contig and a successfully circularized genome.
Annotation of the H. rhodopensis mt genome was achieved using MITOFY (https://vcru.wisc.edu/cgi-bin/ mitofy/mitofy.cgi) with manual start and stop codon correction and validation via comparing to the mt genes of previously annotated genomes. The mt gene nomenclature was based on that of published land plant mt genomes available in the NCBI database. Transfer RNA genes (tRNA) were identified by MITOFY and validated by the tRNAscan-SE program (http://lowelab.ucsc.edu/ tRNAscan-SE/) with default settings (Schattner et al., 2005). The mt genome circular representation was generated by OrganelllarGenomeDraw (OGDRAW) (Lohse et al., 2013), and the complete mtDNA sequence was deposited in the NCBI GeneBank database under accession number MH757117.

Repeat structures
Mt genome tandem repeats were identified using the Tandem Repeats Finder software with default settings (Benson, 1999). Additionally, we analysed the distribution of simple sequence repeats (SSRs) with the MISA web-based server application (http://pgrc.ipk-gatersleben.de/misa/), with the following settings: 10 repeats for mono-, 5 for di-, and 4 for tri-, tetra-, penta-and hexanucleotide repeat patterns .

Phylogenetic analysis
We performed a phylogenetic analysis with CVTree3 and a whole-genome based phylogenetic analysis without sequence alignment using a Composition Vector (CV) approach (Qi et al., 2004;Zuo & Hao, 2015). This approach is successfully used in other studies of viruses, fungi, and plastids (Zuo & Hao, 2015), and has demonstrated its applicability in phylogenetic studies using vertebrate mitochondrial genomes.

Analysis of RNA Editing
We used the web-based software platform PREP (Predictive RNA Editor for Plants) (http://prep.unl.edu/) to identify possible RNA-editing sites in the protein-coding genes of the H. rhodopensis mt genome. PREP sites is based on the evolutionary principle that the process of site editing increases protein conservation among species (Mower, 2005). For this analysis, we used the PREP software with default settings and a cut-off value of C=0.2.
To determine RNA-editing sites in the H.rhodopnesis mt genome, we used a set of land plant protein-coding genes included in the PREP software. Additionally, we performed RNA editing analysis of three closely related genomes from the Lamiales order (O. europaea, S. miltiorrhiza, and B. hygrometrica), and used the results to compare the number of detected RNA sites between these genomes and mt genome of H. rhodopensis.

Genome features
The complete mt genome of H. rhodopensis is a 484 138 bp long circular DNA molecule (Fig. 1). The summary of the H. rhodopensis mt genome features is presented in Table 1. Base composition (27.7% A, 22.1% C, 22.0% G, 28.2% T) is typical for previously published mt genomes from the Lamiales order.

Tandem repeats
Tandem repeats are short lengths of DNA repeated multiple times within the genome. They play an important role in genome rearrangement and recombination (Cavalier-Smith, 2002), and are widely used in phylogenetic and comparative analyses (Nie et al., 2012). Using the Tandem Repeat Finder software, 7 tandem repeats were detected in the mt genome of H. rhodopensis (Table 5). The length of TR ranges from 18 to 24 bp. Two of the repeats were observed in a protein-coding region, while others are distributed in non-coding regions.

SRRs
SSRs, or microsatellites, are short DNA motifs that are usually repeated 5-50 times and commonly observed throughout the mt genomes (Chen et al., 2006).
They are regarded as molecular markers and widely used in population genetics (Doorduin et al., 2011). Using MISA, 85 SSRs were identified in the H. rhodopensis mt genome ( Table 6). 42 of them have mononucleotides, 25 -di-nucleotides, and 18 -tri-nucleotides (Table 6). Tetra-, penta-and hexa-nucleotides were not found in the specified setting. Many mononucleotide repeats are comprised of A/T.10 di-nucleotides  are composed primarily of AT/AT and the rest of SRRs have a high content of A/T. These observations confirm other studies that polyA and polyT repeats are found in the mt genomes (Kuntal et al., 2012).

Comparison with other mt genomes
Plant mt genomes are larger than animal mt genomes and they vary significantly in size, gene order, and content (Alverson et al., 2011). We selected 15 mt plant genomes to compare genome features and detect variability between them and the mt genome of H. rhodopensis (Table 7).
CG content, one of significant compositional genome features, differs slightly among the selected genomes. The range is from 43.3% (B. hygrometrica)  to 50.4% (Ginkgo biloba).
The size of selected mt genomes ranges from 219 Kb (Brassica juncea) to 773 KB (Vitis vinifera). The smallest number of genes (51) is observed in Vigna angularis, and the largest (163) in Nicotiana tabacum. A significant tRNA gene number variability is detected as well. This hygrometrica is 510 KB, which is slightly larger than the mt genome of H. rhodopensis, while the two mtDNAs have a similar base composition. Upon comparison between these two genomes, it is evident that although 7 proteincoding genes (mttB, rpl10, rpl2, rpl5, rps10, rps14, sdh4) and a trnI-CAU tRNA gene are observed in the H. rhodopensis mt genome, they are absent in the B. hygrometrica genome. 2 protein-coding genes (rps7 and sdh3), 4 tRNA genes (trnR-ACG, trnT-UGU, trnH-GUG, trnW-CCA and trnV-GAC ) and 4 hypothetical proteins (orf1, orf2, orf3 and orf4) are present in B. hygrometrica mt genome, but are not present in H. rhodopensis mt genome. Interestingly, a previous study found that succinate dehydrogenase genes were usually lost in angiosperms, and losses of sdh4 were predominant in the monocots while no losses were detected in basal angiosperms (Adams et al., 2001). However, our comparison revealed that the sdh3 gene exists in the mt genomes of B. hygrometrica, N. tabacum, G. biloba, O. europea and V. vinifera, while sdh4 exists in the genomes of H.rhodopensis, S. miltiorrhiza (pseudo gene), G. biloba, S. purpurea, C. sativa, O. europea and V. vinifera. The whole set of RNA genes (trnA, trnC, trnD, trnE, trnF, trnG, trnH, trnI, trnK, trnL, trnM, trnN, trnP, trnQ, trnR, trnS, trnT, trnV, trnW, trnY and trnfM) is required for the protein synthesis of mt plant genomes. However, a large number of RNA genes is either lost or deactivated during the evolution of plant mt genomes (Dietrich et al., 1996). We observed that three RNA genes (trnV, trnT, trnR) were lost in a large number of plant mt genomes: the trnV gene was lost in B. napus, V. angularis, N. tabacum, B. jun-

Phylogenetic analysis
We analyzed plant mitochondrial phylogeny of 20 conserved homologous mt genes (atp1, atp4, atp6, atp8, atp9, cob, cox1, cox2, cox3, rps3, rps4, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7, nad9) (Fig. 2). We constructed the phylogenetic tree including 20 species from 10 orders, of which 4 species belong to Lamiales -A. reptans, B. hygrometrica, O. europea and S. miltiorrhiza. The closest mt genome relative of H. rhodopensis is that of B. hygrometrica. Moreover, these two species share one of the most interesting and unique features -they can survive extreme drought. The conducted phylogenetic analysis and generated tree strongly support the closest relationship between H. rhodopensis and B. hygrometrica, as well as confirm that these two species belong to the Gesneriaceae family.

Analysis of RNA Editing
Using the PREP program, we analysed 35 proteincoding genes from the H. rhodopensis mt genome and we predicted 419 RNA editing sites inside them (Supplementary Table 1 at https://ojs.ptbioch.edu.pl/index. php/abp). We discovered that the NAD(H) complex contains 153 editing sites (36.51% of all predicted sites). 5 genes (ccmB, ccmC, ccmFn, mttB and nad4) encoded most of the RNA editing sites (156), while 4 genes (atp1, atp8, atp9 and rpl10) encoded the fewest sites. Among the 419 editing sites, 102 (24.34%) were converted from Serine to Leucine, 88 (21%) were converted from Proline to Leucine and 63 (15%) were converted from Serine to Phenylalanine. The other 166 amino acids are converted between different types of amino acids. Also, we calculated the codon position where these changes happened. 117 (27.92%) of the RNA editing sites occurred in the first position of the codon, whereas 284 (67.78%) were in the second position.
To compare the RNA editing sites between the closely related mt genomes of the Lamiales order, we additionally analysed protein-coding genes from B. hygrometrica, O. europaea, and S. miltiorrhiza mt genomes. The number of predicted RNA editing sites is 431 for S. miltiorrhiza (Supplementary Table 2 Table 4 at https://ojs.ptbioch.edu.pl/index.php/abp). We observed that the number of RNA editing sites under amino acid changes in the compared mt genomes is similar to that of the H. rhodopensis mt genome (Fig.  3). This analysis revealed that the NAD(H) complex in these three genomes contains a fair amount of RNA editing sites, i.e. 175, 164, and 170 for O. europaea, S.  -10; 2-5; 3-4; 4-4; 5-4; 6-4 where 1, 2, 3, 4, 5, 6 indicate the mono-di-, tri-, tetra-, penta-and hexa-nucleotide repeats  (Maier et al., 1996) with representatives from the Lamiales order. The most edited transcripts were ccmC with 36 editing sites, in O. sativa (Notsu et al., 2002), and ccmB with 39 editing sites in Brassica napus (Handa, 2003). We compared these results to those of the Lamiales representatives analysed in our study, and found that, similar to O. sativa and B. napus, one of the most edited transcripts belongs to the cytochrome-c familly.

CONCLUSIONS
The main characteristics of the H. rhodopensis Friv. mitochondrial genome, including its variable size and structure, as well as fluctuating gene order and content, are consistent with the mitochondrial genomes of most higher plants. Its sequencing and annotation that were performed here constitute an important addition to the limited body of sequenced and analyzed mt genomes from the Gesneriaceae family, especially when it comes to resurrected plants. The comparative and phylogenetic analysis presented here provides a greater understanding of mt molecular evolution in higher plants, facilitating further study of gene organization and evolution in the Lamiales genus.  It is well known that higher resurrection plants maintain their energetic status during periods of dehydration and hibernation Gechev et al., 2013). It is well established that Haberlea respiration remains energetically stable during leaf drying (Kimenov & Minkov, 1975;Minkov et al., 1977). Undoubtedly, mitochondria play an important role in the process of rehydration and "resurrection" of these plants. These processes most likely differ in H. rhodopensis Friv. and other resurrecting plants (such as B. hygrometrica), owing to significant differences in their respective mt genes. These "resurrection" mechanisms, therefore, may have evolved independently of the evolution of their respective mt gene arrangements. Furthermore, the emergence of this process appears to be the result of regulation, rather than specific gene composition. High levels of NADPH and other phosphorylated sugars found in Haberlea's mitochondria during drought and recovery support this view. These sugars most likely maintain the plant's high energy status during the periods of stress (Bardarov et al., 2018).
We can further speculate that convergent evolution in higher plants has produced "resurrection" mechanisms multiple times. Despite their advantage for individual organisms, however, many instances appear to have subsequently become vestigial due to their extremely complex, onerous maintenance, and dubious benefits for the overall population. Immortality of individual organisms, therefore, appears to involve more than their own selfinterest. System complexity, energy balance, and evolutionary cost for the entire population all play an important role in shaping evolution. Clearly, immortality of the individual organisms has evolved many times before. Today, only its residual traces remain within the genomes, physiology and biochemistry of some plants. Regardless, these vestigial "resurrection" mechanisms can still be reactivated when the evolutionary pressure of colonizing new spaces outweighs other concerns.