In silico analysis of different signal peptides to discover a panel of appropriate signal peptides for secretory production of Interferon-beta 1 b in Escherichia coli

Signal peptides (SPs) are one of the most important factors for suitable secretion of the recombinant heterologous proteins in Escherichia coli (E. coli). The objective of this study was to identify a panel of signal peptides (among the 90 biologically active SPs) required for the secretory production of interferon-beta 1b (IFN-beta 1b) recombinant protein into the periplasmic space of E. coli host. In the initial step, after predicting the accurate locations of the cleavage sites of signal peptides and their discrimination scores using SignalP 4.1 server, 31 SPs were eliminated from further analysis because their discrimination scores were less than 0.5 or their cleavage sites were inappropriately located. Therefore, only 59 SPs could be theoretically applied to secrete IFN-beta 1b into the periplasmic space of E. coli. The physico-chemical and the solubility properties, which are necessary parameters for selecting appropriate SPs, were predicted using ProtParam and SOLpro servers using the 59 remaining signal peptides. The final subcellular localization of IFN-beta 1b in combination with different SPs was predicted using ProtComB server. Consequently, according to the ranking of 59 confirmed SPs, the obtained results revealed that SPs Flagellar P-ring protein (flgI), Glucan 1,3-beta-glucosidase I/II (EXG1) and outer membrane protein C (OmpC) were theoretically the most potent and desirable SPs for secretion of recombinant IFN-beta 1b into the periplasmic space of E. coli. For further studies in the future, the experimental investigations on the obtained results will be considered.


INTRODUCTION
Cytokines are a group of proteins used for communication between cells which helps the protective mechanisms of the immune system to destroy pathogens (Cohen & Parkin, 2001).Among cytokines, interferons (IFNs) are secreted proteins which induce an antiviral state in their target cells.The IFNs are the first line of defense against viral diseases and possibly against viral infectious agents, and they are accepted universally as therapeutic agents (Sen & Lengyel, 1992).There are two different types of recombinant IFN-beta.The first type, IFN-beta 1a, is a glycosylated form of pharmaceutical recombinant protein and is produced in mammalian cells, such as CHO cell line (Sørensen, 2010;Morowvat et al., 2014).The second type, IFN-beta 1b, is not a glycosylated protein and is produced in Escherichia coli recombinant cells (Runkel et al., 1998;Morowvat et al., 2014).IFN beta-1b is a synthetic analogue (165 amino acids, 18500 Da) of human IFN in which cysteine 17 is substituted with serine.IFN beta-1b was the first disease-modifying drug approved by FDA in 1993 for treatment of (relapsing/remitting) multiple sclerosis (MS) (Marziniak & Meuth, 2014;Gasteiger et al., 2005).This disease is a chronic inflammatory disorder of the central nervous system affecting mainly young adults (Kolb-Maurer et al., 2015).
The vast majority of recombinant proteins have been produced in the Gram-negative bacteria.Furthermore, E. coli is a Gram-negative bacterium which has numerous benefits for heterologous recombinant protein production due to its ability of growing rapidly at high density on inexpensive substrates, the profound genetic and physiological characterization, the large number of compatible tools available for biotechnology especially cloning vectors and host strains, and the simple process scale up (Baneyx, 1999;Babaeipour et al., 2013).The inability of proteins to rapidly fold into native structures causes their degradation into insoluble aggregates known as inclusion bodies or inactive proteins (Baneyx & Mujacic, 2004;Choi & Lee, 2004;Ventura & Villaverde, 2006).One approach to solve these problems is transferring the heterologous proteins into the periplasmic space of the bacterial host, using a proper signal peptide at the N-terminal of the protein (Choi & Lee, 2004).The periplasm has many benefits which make it a preferable space for protein storage and folding as well as protein purification and N-terminal processing.Also, it provides more stability for expressed heterologous protein because of proper folding and decreased protein degradation.Moreover, it ensures an oxidizing environment to promote proper folding of the produced proteins (Mergulha˜o et al., 2005;Morowvat et al., 2014).Some reports revealed the toxic effect of human IFN-beta expressed in E. coli host (Gross et al., 1985;Morowvat et al., 2014).To solve the problem, periplasmic production of this recombinant protein was considered as the most important strategy for its easy accumulation in the safe space to overcome its lethal effects on the host cells (Morowvat et al., 2014).
The general secretory pathway is one of the mechanisms for protein secretion.It is found in both eukaryotic and prokaryotic cells.The entrance into the general secretory pathway is controlled by the signal peptide.It is an N-terminal peptide which is normally between 15 and 40 amino acids long and detached from the mature section of the protein during translocation across the membrane (Emanuelsson et al., 2007;Zhang et al., 2013).The most characteristic feature of the signal peptides is a segment of hydrophobic amino acids called the h-region containing 7 to 15 residues (Haeuptle et al., 1989).The area between the first methionine and the h-region is called the n-region and it typically contains one to five amino acids and normally carries a positive charge.The c-region is between the h-region and the cleavage site and it consists of three to seven polar, but mostly uncharged, amino acids (Nielsen & Krogh, 1998).Obviously, for selecting an optimal signal sequence, which is compatible with the recombinant secretory protein, it is normally necessary to consider and analysis the feature of signal peptides for the efficient secretory production of the heterologous recombinant proteins (Gupta & Shukla, 2016;Forouharmehr et al., 2018).
In Gram-negative bacteria, the secretion system can transport proteins into periplasmic space by the Secdependent pathway or Twin-arginine translocation (Tat) pathway.In Sec-dependent pathway, the secreted preproteins carry a signal peptide that is unfolded in this step and directed through a protein-conducting channel, whereas in the Tat pathway, the folded preproteins are translocated across the inner membrane (Bagos et al., 2010;Yoon et al., 2010).
A number of signal sequences have been used for the efficient secretory production of heterologous recombinant proteins in E. coli host.Over the past years, different signal peptides have been used and analyzed for transportation of proteins into the periplasmic or extracellular space.For example, OmpA and PelB (Ramanan et al., 2010), and PelB (Morowvat et al., 2014) were used for IFN-alpha 2b and IFN-beta 1b secretory production in E. coli, respectively.
Nowadays, with progress in genetic engineering and computer technology, conducting reliable and fast computational programs for predicting the signal peptides are required for improving the production level and minimizing the production expenses.In this study, some important features of IFN-beta 1b and the other 89 different signal peptides were studied using bioinformatics tools, in order to theoretically identify the efficient signal peptides that can be suitably used for secretion of IFNbeta 1b in E. coli host.

MATERIALS AND METHODS
Signals sequence collection and study design.The amino acid sequences of 90 signal peptides that were usually used in secretory proteins production in previous studies were retrieved from database resources of the Universal Protein Resource (UniProt) at www.uniprot.org.Signal sequences are tabulated in Table 1.Afterward, in the next phase, in silico methods were utilized to analyze and characterize the collected signal peptide sequences.Eventually, after trimming, predicting the subcellular localization site and excluding inappropriate signal peptides, the elected signal peptides were compared and measured to gain a high level of secretory expression of IFN-beta 1b protein in the E. coli host.
Prediction of signal peptides cleavage sites.Among the proposed data mining models and bioinformatics tools used for identifying signal peptides sequences and their precise cleavage sites, SignalP was the most accurate and reliable tool, which provided high-through processing of protein sequences with an accuracy of 87% (Dyrløv Bendtsen et al., 2004).The SignalP 4.1 is available at http://www.cbs.dtu.dk/services/SignalP, and is based on ANN method (Petersen et al., 2011).
Physico-chemical parameters of signal peptides.ProtParam online server at http://web.expasy.org/protparam/(Gasteiger et al., 2005) was used to evaluate different physico-chemical properties of the chosen signal peptides.The physico-chemical parameters were evaluated, including molecular weight, instability index, amino acid composition, theoretical pI, aliphatic index and grand average of hydropathicity (GRAVY).
In silico analysis for protein solubility prediction.The solubility of recombinant protein that is expressed in E. coli most often represents the production yield.SOLpro server at http://scratch.proteomics,ics.uci.edu/ with an accuracy of above 74% was used to predict the tendency of protein solubility in E. coli.This bioinformatics tool employed a two-phase support vector machine (SVM) architecture according to multiple representations of the primary sequence (Magnan et al., 2009).
Prediction of protein localization site.Predicting the final destination of secreted proteins from the primary sequence is a major component of automated protein annotation and is critical to a wide range of studies.ProtComp B server was used for in silico analysis and prediction of the final destination of IFN-beta 1b protein in fusion with different signal peptides (http://www.softberry.com).Softberry reports 86% correct prediction of extracellular proteins as tested with approximately 200 extracellular proteins (Klee & Ellis, 2005).

In silico analysis of signal peptides cleavage sites' prediction
In this study, SignalP 4.1 was used to predict the most suitable amino acid sequences as a signal peptide for connecting to IFN-beta 1b in order to secrete that protein inside the periplasmic space in E. coli.The suitable signal peptides were predicted based on their potential D-score (discrimination score).The output of SignalP 4.1 reported five scores.The C and S-score recognized cleavage sites and signal peptide positions respectively.Y-score was a derivative of the C and S-score resulting in the more precise prediction of the cleavage sites than the raw C-score.The average of the S-score was S-mean.D-score was the average of the S-mean and Y-max which indicated the primary distinction between secretory and non-secretory proteins.Sequences with D-score > 0.5 have a high probability of being signal peptides.In SignalP 4.1, there is an option for the user to adjust the cut-off values in order to increase the sensitivity of the program.In this research, a default SignalP D-score of 0.5 was used.The in silico analysis results of SignalP including three regions of signal peptides (n, h and c), cleavage probabilities and cleavage sites and C, Y and S scores, S means and D-score, are presented in Moreover, the SignalP results demonstrated that the D-scores of 23 signal peptides were less than 0.5 and cleavage sites of 8 signal peptides such as DsbA, AZU1 were inappropriately located; hence, they were eliminated from further analysis.Nonetheless, further analysis was conducted on the other 59 signal peptides.

Physico-chemical parameters of signal peptides
The 59 remaining signal peptides were analyzed for their different physico-chemical parameters using ProtParam server (Table 3).The length range of the 59 remaining signal peptides was between 18 and 36 amino acids.The parameters, computed by ProtParam, included the molecular weight (daltons), theoretical pI, aliphatic index, instability index and grand average of hydropathicity (GRAVY).In ProtParam, the molecular weight of a protein is calculated by the addition of average isotopic masses of amino acids in the provided protein and the average isotopic mass of one water molecule.
The grand average of hydropathy (GRAVY) value for a peptide or protein was calculated by the sum of hydropathy values of all the amino acids, divided by the number of residues in the sequence.The GRAVY index was normally used to compare the general hydropathy of the signal peptide.The in silico analysis results showed that the highest GRAVY scores belonged to flgI, EXG1, IFNA1, OmpC and MICA (1.816,1.726,1.604,1.552 and 1.53,respectively).
The aliphatic index is one of the major factors which indicate the hydrophobicity value of peptides and proteins.This index is specified as the relative  xylF and DEFB103A (180,179.47,171.9,161.3 and 155 respectively).
The stability of a recombinant protein in a test tube is estimated by the instability index.A protein with instability index of below 40 is predicted as stable, whereas a value higher than 40 implies that the protein may be unstable.The instability of signal peptides alone and in the connection with IFN-beta 1b was evaluated by instability index.The analysis results demonstrated that Spy, PhoE, MalE, DsbC and gltF were the most stable signal peptides among the 59 remaining signal peptides (-4.79, 1.44, 2.85, 5.25 and 8.52 respectively).
All the signal peptides in connection with IFN-beta 1b were unstable (Table 3).

Protein solubility prediction
The solubility of IFN-beta 1b in connection with the different signal peptides was evaluated.The results showed that IFN-beta 1b was insoluble in connection with all the signal peptides (Table 3).

Prediction of the protein localization
The predicted localization site of the protein with different signal peptides is shown in Table 4.The in silico analysis results indicated that the final localization site of most of the signal peptides in fusion with IFN-beta was in the outer membrane space.Furthermore, final subcellular localization of IFN-beta 1b with 9 signal peptides such as flgI, glnH and etc. was predicted to be inside the periplasmic space.However, only the final predicted localization for spa peptide was extracellular.

DISCUSSION
Computational methods are being used in the large variety of biological fields in order to decrease the costs and increase the accuracy of experimental research (Zamani et al., 2015).The aggregation and misfolding can be occurring as the result of a high expression level of heterologous proteins during intracellular expression (Baradaran et al., 2013).Inserting a signal peptide at the N-terminus of the DNA sequence of these proteins caused secretion of the proteins into the E. coli periplasmic space and eventually solved the problem (Zamani et al., 2015).For a successful secretion, there are different factors that should be carefully balanced during the secretory pathway.One of the most important factors for production of recombinant heterologous protein in the prokaryotic system is the signal peptide.Therefore, the physicochemical and structural features of a signal peptide are an important property in the functionality of secretion.For this purpose, various computational tools have been applied to predict and characterize the physicochemical properties of the signal peptides.They compute different features like the number of amino acids and the physicochemical properties of a signal peptide such as molecular weight, isoelectric point, GRAVY, aliphatic index and instability index (Baradaran et al., 2013).
In this study, different signal peptides including the natural IFN-beta 1b and 89 signal peptides were evaluated with different bioinformatics tools.The theory behind the suitability of the eukaryotic signal peptides for prokaryotic expression in the current study is that the Sec-dependent proteins (SPDs) are homologous for both mentioned groups, and in SPDs, the homology between the translocating machinery ensures the cross-kingdom signal peptide compatibility.On the other hand, some eukaryotic sequences can be suitable for TAT purely due to a sequence convergence/chance because of the presence of twin arginine that has somehow evolved.Surprisingly, this study results indicated that EXG1 which belongs to Saccharomyces cerevisiae ranked at the second place as proper SPs to fuse with IFN-beta 1b.And theoretically, it is one of the most suitable signal peptides for translocating IFN-beta 1b protein into the periplasmic space of E. coli.The signal peptide of IFN-beta, Human growth hormone and Interleukin-2 signal peptides were used because of their ability to contribute to the secretion of recombinant protein to periplasmic space in other studies (Dalton & Barton, 2014;Zamani et al., 2015).
The results of SignalP 4.1 analysis showed the discrimination between secreted and non-secreted sequences using signal peptide amino acid sequence in combination with the IFN-beta 1b protein.SignalP server has the capability to identify the main regions of the signal peptides.The C and S-score could recognize cleavage sites and signal peptide positions, respectively.Y-score is derived from the C and S-score resulting in the more precise prediction of the cleavage sites than the raw Cscore.The average of the S-score is the S-mean.D-score is the average of the S-mean and Y-max which indicates a primary distinction between secretory and non-secretory proteins.In silico evaluation results indicated that the signal peptidase cannot recognize the cleavage sites of 23 signal peptides in connection with IFN-beta 1b in E. coli host.Therefore, they were not predicted as an appropriate choice for IFN-beta 1b secretion.On the other hand, the other 67 signal peptides can be used for expression of the protein in E. coli.Moreover, the result of SignalP analysis demonstrated that the D-score values of 8 signal peptides such as DsbA and AZU1 were higher than 0.5, so these signal peptides were suitable signal peptides for further analysis.More so, they have multiple cleavage sites for signal peptides enzymes or their cleavage sites were inappropriately located.So, it was concluded that using these signal peptides in fusion with IFN-beta 1b protein can cause problems after expression in E. coli cytoplasm and it is strongly possible that the final three-dimensional structure of IFN-beta 1b protein will be changed.And this prominent change might decrease the functionality, potency and efficiency of the IFN-beta 1b protein.Therefore, these 8 signal peptides were eliminated from the further analysis too.
It was suggested that by increasing the hydrophobicity levels and length of the h-region, the rate of the protein secretion could be improved (Chen et al., 1996).The hydrophobicity levels of the signal peptides can be indicated by aliphatic index and GRAVY (Table 3).An efficient signal peptide cleavage in the c-region considerably influences the protein secretion levels.There is also a famous rule in the c-region called (-3, -1) or AXA motif.According to this rule, the amino acids at positions of -1 and -3 relative to the cleavage site must be small and neutral like alanine, glycine, and serine (Choi & Lee, 2004).In contrast, there were large bulky residues at the position of -2 that would not fit into either the -3 or -1 position (Pratap & Dikshit, 1998).Most of the signal peptides in this study have AXA motif in their cleavage sites (Table 2).The most important parameter that should be considered to select the most appropriate sequences is hydrophobicity (aliphatic index, GRAVY and h-region length).Therefore, data were sorted with priority of aliphatic index, GRAVY, h-region length and D-scores respectively.The results were presented in Table 5.
According to this statistical sorting of the main parameters, flgI, EXG1, OmpC, xylF and DEFB103A were considered as the most effective signal peptides, respectively.In contrast, yadV, bcsB, Endo-1, 4-betaxylanase, CEACAM5 and DsbC showed a weak level of the features needed for the secretion process.However, the status of most of the signal peptides was confirmed.
Some signal peptides were applied by the researchers for the secretory production of recombinant proteins in E. coli such as PhoA, OmpA, PelB, etc (Choi & Lee, 2004).Previous studies reported the high yield of IFNbeta 1b secretion in the presence of PelB (Morowvat et al., 2014) and IFN-beta 1b signal peptides (Krishna Rao et al., 2009).Moreover, using PelB signal peptide in combination with IFN-beta 1b facilitated the expression of the protein fully in the periplasmic space (Mobasher et al., 2016).
ProtComp, from Softberry, Inc., is an online bioinformatics tool that could predict protein localization, including extracellular proteins, using a combination of neural networks methods and sequence homology.ProtCompB combines different methods of protein localization prediction.For Gram-negative bacteria proteins, four locations are predicted: Cytoplasmic, Membrane (outer and inner), Periplasmic and Extracellular (secreted).
On the other hand, translocating the protein of interest into the periplasmic space or into the culture medium has several advantages when compared with the lower content of bacterial proteins (Forouharmehr et al., 2018).In an experimental investigation by Steiner et al. (2006), they evaluated 10 different SPs substitution effects at N-terminal of the protein of interest (POI) for translocation of polypeptides through the cytoplasmic membrane into the periplasm of Gram-negative bacteria.This study results indicated that the substitution of SPs improved the enrichment of phage display selection from 10 fold to more than 1000 fold per each round of panning (Steiner et al., 2006).Klatt and Konthur (Klatt & Konthur, 2012) accomplished SPs trimming for optimizing secretory expression of recombinant protein in Leishmania tarentolae.They applied in silico approach and used SignalP server to identify the most potent SPs.To evaluate the signal peptide cleavage site and changes of expression rate, SPs were N-terminally linked to POI.The obtained results demonstrated the importance of SPs optimization for efficient secretory expression of recombinant proteins.These results indicated that minor modifications in SPs structure, based on in silico investigations, increased the yield of recombinant protein secretory production (Klatt & Konthur, 2012).
Purification of the expressed recombinant protein which is translocated into the periplasm compartment, enables not only the provision of downstream processing much easier than cytosolic production but also could decrease the processing cost and running time, too.Hence, due to the reducing impurity of different cellular components and the circumvention of proteolytic degradation by intracellular proteases, isolation and purification of the over-expressed products which is translocated into the periplasm compartment can be much simplified.Overall, the obtained results showed that the final localization of IFN-beta 1b protein in combination with most of the signal peptides was in the periplasmic space of E. coli.However, only in combination with Spa signal peptide IFN-beta 1b was predicted to be found in the culture media (extracellularly).

CONCLUSIONS
Nowadays, the emergence of in silico approaches such as artificial neural network, computational biology, bioinformatics and data analysis using mathematical language in theoretical biology accelerated the process of analyzing SPs for the production of pharmaceutical recombinant proteins.Moreover, it reduced the costs of the expression and purification of recombinant proteins as well the time required for the process.So, predicting the best SPs by in silico approach would help biologist and pro-tein engineers to accelerate and facilitate the vital projects.In this research, in order to select an optimal signal peptide, which is compatible with the Interferon-beta 1b recombinant secretory protein, we applied in silico approach to build a computational prediction model for virtual screening.In conclusion and with regards to the obtained results of the in silico analysis, the flgI, EXG1 and OmpC signal peptides seemed good candidates for the secretion of IFN-beta into the proper location.Therefore, they can be used for further experimental analysis of IFN-beta 1b secretion in the future.Furthermore, the test that was developed in this research could be useful and applied by researchers for evaluating and controlling various SPs in combination with other pharmaceutical recombinant proteins, to verify and introduce the best SPs for successful scale-up of the production of the pharmaceutical recombinant protein of choice.

Table 2 .
Cleavage probability is the maximum cleavage site probability at the beginning of the protein.The in silico analysis results of SignalP server indicated that the highest D-score belonged Appropriate signal peptides for secretory production of Interferon-beta 1b in Escherichia coli

Table 2 .
Analysis of the signal peptides sequences using SignalP 4.1 server

Table 5 . Sorting the signal peptides according to the aliphatic index
volume occupied by aliphatic side chains (alanine, valine, isoleucine, and leucine) in an amino acid sequence.The in silico analysis results demonstrated that the highest aliphatic index values belonged to flgI, EXG1, OmpC,