New functionality of RNAComposer: an application to shape the axis of miR160 precursor structure

RNAComposer is a fully automated, web-interfaced sys-tem for RNA 3D structure prediction, freely available at http://rnacomposer.cs.put.poznan.pl/ and http://rnacom-poser.ibch.poznan.pl/. Its main components are: manually curated database of RNA 3D structure elements, highly efficient computational engine and user-friendly web application. In this paper, we demonstrate how the lat-est additions to the system allow the user to significantly affect the process of 3D model composition on several computational levels. Although in general our method is based on the knowledge of secondary structure topology, currently the RNAComposer offers a choice of six incorporated programs for secondary structure prediction. It also allows to apply a conditional search in the database of 3D structure elements and introduce user-provided elements into the final 3D model. This new functionality contributes to a significant improvement of the predicted 3D model reliability and it facilitates a better model adjustment to the experimental data. This is exemplified based on the RNAComposer application for modelling of the 3D structures of precursors of the miR160 family members.


INTRODUCTION
Over the past decades, it has become evident that addressing many complex problems and questions that were raised in the area of molecular biology and biochemistry requires a strong support of computational methods.One such challenge, aligning with our scientific interest, is to enable functional insight into the world of RNA molecules, allowing to explain the roles that the RNAs play in complex biological processes.This can be achieved by exploring the three-dimensional space of RNA structures, studying structure-function relationship and simulating RNA-involving biological processes (Nishida et al., 2016).
RNA 3D structure determination at atomic resolution is usually based on X-ray crystallography, NMR spectroscopy or cryo-microscopy (Blazewicz et al., 2005;Felden, 2007;Blazewicz et al., 2011).When lacking experimental data, an application of computational methods dedicated to RNA 3D modelling remains the only way (Dufour & Marti-Renom, 2015;Miao et al., 2015).In the recent years, the latter approach has started to gain an increas-ing interest and over a dozen methods devoted to RNA 3D structure prediction have been developed (Dufour & Marti-Renom, 2015).Among them, RNAComposer, a fully automated, knowledge-based system for RNA 3D structure prediction has been introduced by our team (Popenda et al., 2012).
A thorough analysis of the RNAComposer-predicted 3D models reveals that in a number of cases the quality and fidelity of prediction might be improved (Miao et al., 2015).Such improvement often requires an individual treatment of selected regions of the 3D model, and additional structure modelling at the local, fragment-oriented level.Full automation of the 3D structure prediction by RNAComposer excludes the user influence on maneuvering structural elements that build the final 3D model (Popenda et al., 2012).Thus, although it is countercurrent to our general assumption, we have decided to introduce a new option that brakes the basic premise of fully automated prediction by allowing the users to provide their own structural elements which are then used to build the resultant 3D model.We have also enabled a conditional search in the database of available RNA 3D structure elements.Finally, in order to provide support for sequence-based 3D structure modelling, we have incorporated three new in silico methods that predict the RNA secondary structure.
In this paper, we have described these new added functionalities of RNAComposer with regard to both, a computational engine and web application.To exemplify a scope of the introduced features, prediction of the 3D structure of miR160 precursors will be presented.
MicroRNAs (miRNAs) are small non-coding RNAs of 21-nt in length that control gene expression in animals and plants (Axtell et al., 2011;Szostak et al., 2014).They originate from primary transcripts (pri-miRNAs) with characteristic stem-loop structures that are subsequently processed in a series of steps to produce the mature miRNA (Bartel et al., 2004;Mickiewicz et al., 2016).In animals, pri-miRNA is first processed in the nucleus by the Drosha enzyme, to release pre-miRNA (Han et al., 2006).In the second step, an excision of the miRNA/ miRNA* duplex from the pre-miRNA is performed in the cytoplasm by another enzyme called Dicer (Winter et al., 2009).In plants, in contrast, the entire process of microRNA biogenesis is undertaken within the nucleus, and both cleavages of miRNAs are carried out by a single Dicer-like1 (DCL1) enzyme that is a homolog of Dicer (Hutvágner et al., 2001;Kurihara & Watanabe, 2004;Voinnet, 2009;Liu et al., 2012).Mature miRNAs are incorporated into an RNA-inducing silencing complex (RISC) and they function by guiding the Argonaute proteins in RISC to complementary sites in messenger RNA (mRNA) for subsequent cleavage or translation inhibition (Brodersen et al. 2008).
All enzymes that play central roles in the miRNA biogenesis, human Drosha, Dicer and plant DCL1, belong to the Ribonuclease III (RNase III) family (Zhang et al., 2004;MacRae & Doudna, 2007;Nicholson, 2014).They contain two RNase III domains forming a tightly associated heterodimer that can cleave double-stranded RNA (dsRNA) to generate characteristic products with 2-nt 3' overhangs.Drosha (class 2 ribonuclease III enzyme), Dicer and DCL1 (class 3 ribonuclease III enzymes) also contain additional domains and act as "molecular rulers" that measure out the length of dsRNA for slicing.Structural information for the RNase III domains in enzymes of these classes is available from the crystal structure of primitive Dicer from Giardia intestinalis (MacRae et al., 2006;MacRae & Doudna, 2007), Drosha (Kwon et al., 2016), and the RIIIB domain from human and mouse Dicer (Takeshita et al., 2007;Du et al., 2008).None of these structures contains a bound dsRNA substrate.However, since the RNase III enzymes are very well conserved across the species, an information about interactions of RNase III domains with RNA in Drosha, Dicer and DCL1 can be derived from the available crystal structures of homologous domains from class 1 ribonuclease III enzymes in a complex with RNA, i.e. the bacterial RNase III domain from Aquifex aeolicus (Aa-RNase III) (Gan et al., 2006(Gan et al., , 2008)).
A typical miRNA precursor contains a long dsRNA stem in the context of a hairpin loop, disrupted by multiple non-canonical structural features such as internal loops and bulges.Several studies revealed that structural features of pre-miRNAs are important for accurate and efficient processing of miRNA (Mateos et al., 2010;Song et al., 2010;Werner et al., 2010;Starega-Roslan et al., 2011;Feng et al., 2012;Lee et al., 2015;Galka-Marciniak et al., 2016).MicroRNA length is also affected by pre-miRNA structure.For example, asymmetric bulges have been shown to be responsible for the production of longer miRNAs in animals and plants, as the unpaired bases in bulges are not measured by Dicer or the DCL1 enzyme (Mateos et al., 2010;Cuperus et al., 2010;Chen et al., 2010;Starega-Roslan et al., 2011;Manavella et al., 2012).Asymmetric bulges and mismatches were also shown to be crucial for 20-nt miRNA formation in plants (Lee et al., 2015).
Although miRNAs constitute an important class of regulatory RNAs, the tertiary structure information about their precursors is very limited.Crystal structure of a stem domain from pre-miR30a in complex with Exp-5:RanGTP (PDB ID: 3A6P) is available, where pre-mi-R30a adopted a typical A-form RNA helical structure.In the light of available structures for Aa-RNase III in complex with dsRNA (PDB IDs: 2EZ6, 2NUF, 2NUG), the dsRNA is accommodated in the valley at the interface of RNase III dimer and it takes an A-like form in the vicinity of the cleavage sites (Gan et al., 2006(Gan et al., , 2008)).Here, we focus on the structure of plant miRNA precursors in the region of miRNA/miRNA* duplex and its proximity.

MATERIALS AND METHODS
RNAComposer has been designed as a bioinformatics tool integrating three components: a computational engine, a database of RNA 3D structure elements (provided by PostgreSQL), and a web-based interface.The web-application was developed in Java on Spring IO platform using Hibernate.It is provided by Apache Tomcat 7.0 web server operated by Ubuntu 14.04 environment.The computational core of the RNAComposer system which is based on the machine translation paradigm was also developed in Java.Its heterogeneous nature is revealed through integration of variety of functions provided by comprehensive external software used for secondary structure prediction, in-silico prediction of 3D structure elements, or minimization of energy in torsion angle space and the space of atom coordinates.The computational server is operated by openSUSE 11.0 environment.These three components are integrated using an effective and secure communication layer through message brokers (i.e., Apache Activemq 5.3).The RNA-Composer web server is publicly available at two websites: http://rnacomposer.cs.put.poznan.pland http:// rnacomposer.ibch.poznan.pl.

RNAComposer with new functionality
RNAComposer (Popenda et al., 2012) allows for prediction of RNA 3D structure models from user-input sequence only, or sequence and secondary structure topology.Full automation of the modelling process, driven by efficient computational procedures, allows quick and easy access to the predicted 3D structure.When the RNAComposer website is entered, the system is ready for running in an interactive mode.In this default mode, users provide the RNA sequence and then secondary structure topology (in dot-bracket notation) or the name of a method to be used for the secondary structure prediction (selected from among those incorporated into the system), and then they press the Compose button.Consequently, a result page with the report on computation progress and the predicted 3D model is immediately displayed.In the case of necessity to run predictions for multiple input data at a time, define additional restraints, or analyze the results of intermediate steps of the modelling process, one should login to the system and work in a batch mode.This mode supports processing of up to 10 sequences at a time.Moreover, 1 to 10 different secondary structure topologies can be defined for each input sequence.Additionally, atom distance or torsion New functionality of RNAComposer angle restrains can be provided.As an output, the user obtains up to 10 RNA 3D models for each secondary structure topology, and the summary of all conducted computational steps.The file with structural elements used for 3D model composition is also provided.In both modes, the resulting 3D models are saved in the PDB file and can be sent by email upon user request.
New functions of the RNAComposer system have been added in the interactive and batch modes.Their implementation required additions to both, the computational engine and the interface.
RNA secondary structure prediction Initially, three tools for prediction of the RNA secondary structure have been incorporated into the RNA-Composer system and could be applied in the interactive mode if only the RNA sequence was provided by the user.This initial set consisted of RNAstructure (Reuter & Mathews, 2010), RNAfold (Lorenz et al., 2011) and CONTRAfold (Do et al., 2006).Now, the suite of supported tools to predict the secondary structure of RNA has been extended and it is available in both modes.Three methods have been added to the initial set: Context Fold (Zakov et al., 2011), CentroidFold (Sato et al., 2009) and IPknot (Sato et al., 2011).They have been selected based on two main criteria.Firstly, according to CompaRNA (Puton et al., 2013), all of these methods achieved high ranking when processing different benchmark sets.Secondly, they allow a straightforward application for a single input sequence and do not require any additional information -like the RNA family or sequence alignment -which could influence the prediction results.Additionally, IPknot has been selected due to its usefulness and effectiveness in pseudoknot prediction, which aligns well with a growing interest in modelling and analysis of pseudoknotted RNAs (Purzycka et al., 2013;Antczak et al., 2015).In order to apply one of the above mentioned tools within computational routine of RNAComposer, one should either click Select secondary structure prediction method and select the preferred method name from an alphabetically ordered list, or directly enter the name in the 3 rd line of input data.If the secondary structure is undefined, the 3 rd line remains empty and the method is not selected from the list, CentroidFold is applied by default.

3D structure element insertion
Algorithms governing the RNAComposer automated search and selection of 3D structure elements to be used for the RNA 3D structure composition are based on strictly defined criteria (Popenda et al., 2012).The automated selection process is executed in the following order of priority: secondary structure topology similarity, sequence similarity, pyrimidines/purines compatibility, source structure resolution and energy.New functionality of RNAComposer provides the user with the possibility to interfere in selection of the 3D structure elements.In particular, this option, available in the batch mode only, allows the user to replace 3D structure elements selected by the automated routine of RNAComposer.When RNAComposer finishes computation for user input data, it provides the final 3D model and a log file with the results of intermediate computational steps.Inter alia, the log file enumerates: (i) secondary structure elements resulting from input data fragmentation (identifier, sequence and dot-bracket-encoded secondary structure topology are given for every element), and (ii) 3D structure elements associated with their secondary structure counterparts.Every 3D element, described by sequence, secondary structure topology, experimental method resolution and sequence identity, is defined in the PDB format.If necessary, the users can decide to modify selected parts of the obtained 3D model by providing their own 3D structure elements and indicating which elements of originally predicted 3D structure should be substituted.
To achieve this, one should run RNAComposer in the batch mode for the same input sequence and secondary structure topology.Next, after clicking Insert own 3D structure elements button, the user introduces own 3D structure element(s) in PDB format preceded by the identifier(s) of associated secondary structure element(s).More than one structure element can be substituted at a time.When several secondary structures are provided, the user should also determine which input sequence and secondary structure topology the substitution refers to.Once all the substitutions are defined, the Compose button should be clicked to run the 3D structure prediction.
Filtering of the 3D structure elements repository Four novel routines have been implemented in the RNAComposer system allowing for conditional search of 3D structural elements that build the final 3D model(s).All of them are supported in the batch mode only, after clicking Filter 3D structure elements repository.The Exclude PDB structures button allows to provide a list of PDB ids of structures that should not be considered when the 3D structure elements are searched.This filter is especially valuable for users who want to test the RNAComposer capability to predict structures that are already present in the PDB database (Rose et al., 2011).When Use X-ray determined structures only button is clicked, RNAComposer searches for 3D structure elements derived from the PDB structures that were resolved via X-ray crystallography.Additionally, in the Set resolution threshold edit box, one can define the maximum resolution of structure elements to be considered.Finally, selecting one of the buttons Generate A-RNA-based single strands or Generate A-RNA-based double helices results in running the procedure that generates all single/double stranded fragments occurring in the predicted structure, without support from the 3D structure elements repository.The respective elements are generated by NAB (Case et al., 2015) based on the input sequence and the template structure of A-RNA (Popenda et al., 2012).

Example application: predicting the 3D structure of mir160 precursors
Here, we present how the above described functions of RNAComposer can be applied if the user wants to influence the modelling process via modifications introduced at the level of the secondary and tertiary structure processing.As an example miRNA precursor of Arabidopsis thaliana has been selected.We show how application of particular functions can change the predicted 3D structure of this RNA.

Sequence data selection
The Arabidopsis thaliana sequences for miRNA precursors used in this analysis were retrieved from miR-Base v.21 (Kozomara & Griffiths-Jones, 2014).At first, a set of 50 Arabidopsis thaliana miRNA precursors was selected based on the work by Bologna et.al (Bologna et al., 2013).Their secondary structures were predicted by RNAstructure and the 3D structures were modelled using the RNAComposer with default parameters.Some of the resulting 3D models displayed strong bends in the miRNA/miRNA* region or in its vicinity (Zok et al., 2014).The strongest bend was identified for precursors of miR160 family members.The precursor sequences from this family were chosen to exemplify how one can introduce modifications into preliminary predicted models and change the axis shape.

RNA secondary structure prediction and analysis
According to miRBase, the miR160 family is composed of three miRNAs in Arabidopsis thaliana (miR160ac).For each of their precursor sequences we have predicted secondary structures with all six programs incorporated in the RNAComposer system.The obtained secondary structures were compared using RNAforester (Höchsmann, 2005) with the multi-alignment option (Table 1).
All programs predicted hairpin structures for the studied precursors of miR160 family.Within a given precursor sequence, the predicted secondary structures differed by 1-2 base pairs.A comparison of the sequences and the secondary structures between all precursors shows that while the sequences of miRNA in the 5' arm are the same for all analyzed miR160 family members, the miRNA* sequences in the 3' arm differ in length by one nucleotide.Variation is also seen in the miRNA/ miRNA* region of the predicted secondary structures for analyzed precursors.The largest differences, both in the sequence and the secondary structure, were found in the apical loop region.
A comparison of predicted secondary structures shows that CentroidFold prediction was more often confirmed by other programs.In the case of miR160a precursor, Context Fold, Contrafold and IPknot predicted the same structure as the CentroidFold.For the miR160b and miR160c precursors the secondary structures predicted by CentroidFold were the same as those obtained from IPknot and RNAstructure.

Automated 3D structure prediction of miR160 precursors
Initially, for each predicted secondary structure of miR160 precursor (Table 1), we obtained the 3D structure using the RNAComposer with default parameters.Three 3D structures were generated for miR160a, fourfor miR160b and miR160c precursors.For each miR160 precursor, the obtained 3D models were superimposed using PyMOL (Schrödinger LLC, 2016) (Fig. 1).The largest differences between the models were observed for miR160b precursors, some differences for miR160a, and the smallest differences for miR160c (average RMSD score equals 8.7Å, 4.4Å and 2.6Å, respectively).For all miR160b and miR160c precursor 3D models, a considerable axis bending was observed.
Additionally, for each precursor, the 3D models generated based on CentroidFold secondary structures were superimposed with respect to miRNA (Fig. S1A at www. actabp.pl).In the region of the miRNA/miRNA* duplex and its vicinity, the 3D models display large structural variability.

3D structure prediction of miR160 precursors with new functionality
In the Aa-RNase-dsRNA complex that serves us as a model structure for accommodation of miRNA precursor into the catalytic valley of RNase III domains in DCL1, the dsRNA occurs in an A-like form.This made us assume that the bend observed in the automatically predicted 3D models of miR160 precursors Table 1.Secondary structures predicted from the sequences of miR160a, miR160b, and miR160c precursors, and the results of their sequence and secondary structure topology identity analysis (in the two bottom rows).For every precursor, its secondary structures were predicted by all methods included in the RNAComposer system.miRNAs are shown in red, miRNAs* in blue.New functionality of RNAComposer should not occur in the vicinity of the cleavage sites in the catalytic complex of RNase III with RNA.In order to minimize the bends and to obtain a better structural homogeneity in the miRNA/miRNA* region of the generated 3D models, we applied the new RNAComposer functionality concerning 3D structure elements insertion and restriction of element search.The structural elements to be replaced involved internal loops and bulges.To search for appropriate 3D structure elements that comprise the internal loop or bulge equipped with closing canonical base pairs, we run an RNA FRABASE search procedure (Popenda et al., 2008).The main criterion applied for selection was the best resolution of the X-ray RNA structure from which the respective element was extracted.For bulges and mismatches, additional criterion applied was sequence identity of unpaired nucleotides.In the case of tandem mismatches, the criterion was a purine/pyrimidine identity (Table S1 at www.actabp.pl).Apical loops were automatically selected by the system with the option Use X-ray determined structures only and the resolution threshold set to 3Å.
Since the local axis bends were observed also in the helical regions of 3D miRNA precursor models, the option Generate A-RNA-based double helices was turned on.
The 3D models of miR160a, miR160b and miR160c precursors were generated based on secondary structures obtained with CentroidFold.Figure 2 presents the input secondary structures and the 3D models predicted in the interactive mode and in the batch mode with new functionality.For each 3D model, the stem axis is visualized.Stem axes were calculated using Curves (Lavery et al., 2009) based on the helical regions extracted by X3D-NA program (Lu & Olson, 2003).The stem axis coordinates were used to compute the local axis bend angles between two neighboring base pairs using own script.The calculated bend angles were visualized on the stem axes in the rainbow scale.Figure 2 clearly shows that the application of new RNAComposer functionality led to obtaining the 3D miRNA precursor models displaying much smaller stem bends.The bend angles generated this way appeared smaller than 10º, whereas for the 3D models obtained using the automated routine with default parameters they were reaching up to 40º.The largest bends were found for miR160b and miR160c models at the position of L1 and L3 internal loops.To prove the proposed method fidelity and to validate the predicted 3D models, we have compared them with the reference structure.The reference RNA was derived from the 3D structure of the complex of Aa-RNase III -RNA containing RNase III homodimer and two 28-nt stem-loop RNAs (PDB ID: 2NUF).One of the apical loops was removed and the strands were connected via XPLOR-driven minimization to create continuous hairpin stem with a proper stereochemistry (Fig. 3).The PDB files for both, the reference structure and the pre-dicted 3D models, are available at http://www.cs.put.poznan.pl/mantczak/abp-antczak-suppl.zip.
A comparison of the predicted miR160 precursor models to the reference structure has been based on local bend angles of stem axis and 3D alignments with overlapping cleavage sites.The local bend angles of the reference stem axis are very small, with a maximum value not exceeding 4.5º.In our models, obtained with the new RNAComposer functionality, the maximum values of local bend angles in the region of expected interaction with the RNase III domains are 5.5º, 10.6º and 7.4º for miR160a, miR160b and miR160c, respectively.Next, we performed a 3D alignment of miRNA precursor models with the reference structure using the ARTS program (Dror et al., 2006).For each structure, we have considered the best 3D alignments ranked according to the ARTS score.All of these alignments were scrutinized according to the cleavage sites' superposition.In the case of miR160a, we obtained RMSD of 1.61Å for a 48 residue core with the score of 90.0.For miR160b, the RMSD was 1.48 Å for a 30 residue core with the score of 52.0.In the case of miR160c, the RMSD was 1.59Å for a 35 residue core with the score of 65.0 (Fig. 4).Finally, we aligned the new models for miR160a, miR160b and miR160c precursors.This superposition shows structure similarity in the region of the miRNA/miR-NA* duplex and its vicinity for all models (Fig. S1B at www.actabp.pl).

CONCLUDING REMARKS
We have presented the new functionality of RNA-Composer which allows the user to significantly influence the composition of the RNA 3D model for its better adjustment to the experimental data.An impact on the 3D model composition concerns two levels of the process, the RNA secondary and tertiary level.While considering the first one, the user can run secondary structure prediction by six methods incorporated into the RNAComposer system and then easily compare the secondary structure models.This function allows studying of variability in the secondary structure for a given RNA.Further, differences in the predicted secondary structures result in different fragmentation patterns into secondary structure elements and, thus, change the 3D element searching criteria.This, in consequence, increases the conformational diversity of the generated 3D models.At the tertiary level of model composition, in order to shape the final model one can provide its own 3D structural elements which build the 3D structure.These structural elements can be obtained by searching the associated structural data repository with new criteria for automated selection of a 3D structure element.Specific 3D structure elements can be also derived from a molecular dynamics simulation.These new functions of RNAComposer facilitate an exploration of RNA conformational space depending on the 3D structure elements included in the final 3D model, and allow the user to obtain a better fitting of the predicted model to experi-  mental data, e.g.obtained from SAXS (Jones et al., 2014;Cornilescu et al., 2015).By applying new functions of RNAComposer, we have shown how the 3D structure of miR160 precursors could be adjusted to the shape of the catalytic valley of the RNase III dimer during the prediction process.This approach can be followed in the case of any other RNA structure which needs post-prediction fitting to user defined parameters.
Future work will be focused on further analysis of the predicted 3D structure dependence on the input data.Moreover, based on current observations, we are going to develop new criteria for structure element selection in a fully automated mode of RNAComposer.This should result in formulating a novel and advanced function for ranking the 3D structure elements.
Figure 1.Superimposed 3D models of miR160a (A), miR160b (B) and miR160c (C) precursors predicted by RNAComposer in the interactive, fully automated mode.For every precursor, its secondary structures were predicted by all methods included in the RNAComposer system.miRNAs are shown in red, miRNAs* in blue.

Figure 2 .
Figure 2. RNAComposer-predicted 3D models of miR160a (A), miR160b (B) and miR160c (C) precursors obtained in the interactive mode (left) and batch mode with new functionality (right).Stem axis is displayed for each model and its local bend angles are visualized in the rainbow scale.For every miRNA precursor, its secondary structure predicted by CentroidFold is presented with displayed loop identifiers.The miRNA and miRNA* are marked in red and blue, respectively.

Figure 3 .
Figure 3. Reference structure for miRNA precursors (green) obtained based on the crystal structure of RNA complexed with Aa-RNase III (PDB ID: 2NUF).The cleavage sites are indicated by red spheres.Stem axis is displayed and its local bend angles are visualized in the rainbow scale.The reference RNA structure is located in the cleavage groove formed by dimerization of two Aa-RNase III domains (shown as cyan and blue molecular surfaces), double-stranded RNA binding domains (dsRBD) are shown in gray in the cartoon representation.

Figure 4 .
Figure 4. Pairwise superposition of miR160a (A), miR160b (B) and miR160c (C) precursor (white and cyan-core), and the reference RNA structure (black and magenta-core).The cleavage sites are indicated by cyan spheres for miRNA precursor, magentafor the reference RNA structure.