Electronegativity and intrinsic disorder of preeclampsia-related proteins

1Department of Mathematics, Faculty of Sciences, Universidad Nacional Autonoma de México. México City, México; 2Departments of Critical Care Medicine and Biomedical Research, Hospital Juárez de México. México City, México; 3Department of Molecular Medicine and USF Health Byrd Alzheimer’s Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33647, USA; 4Laboratory of Structural Dynamics, Stability and Folding of Proteins, Institute of Cytology, Russian Academy of Sciences, St. Petersburg, Russia; 5Centro de Investigaciones Químicas, Universidad Autónoma del Estado de Morelos, Chamilpa, Cuernavaca, Morelos, México; 6Department of Infectious Diseases, Instituto Nacional de Ciencias Médicas y Nutrición “Salvador Zubirán”, México City, México


INTRODUCTION
Preeclampsia is a disorder associated with pregnancy.It is a systemic syndrome originated in the placenta that leads to a widespread maternal vascular endothelial dysfunction (Powe et al., 2011) and is characterized by the de novo development of concurrent hypertension and proteinuria.Recent observations (Goel & frog, 2013;Maynard & Karumanchi, 2011) suggested that there are anti-angiogenic factors responsible for the clinical manifestations of the disease.Being one of the hypertensive disorders of pregnancy, preeclampsia is a human-specific pregnancy malady of unknown etiology, with no known diagnostic biomarkers.
Preeclampsia can progress to involve multiple organ systems and can lead to high maternal mortality, perinatal death, preterm birth, fetal growth restriction, low birth weight, and respiratory distress syndrome (Altman et al., 2002;Davey & MacGillivray, 1988).It is estimated that 10% of all women experience preeclampsia during pregnancy (Duley, 2009).The World Health Organization reported that worldwide 76 000 women die annually because of preeclampsia, and the death counts are even higher for babies, with 500 000 fetal and newborn lives being lost annually because of the perinatal complications of preeclampsia (Khan et al., 2006).Furthermore, women with preeclampsia systematically show an increased risk of hypertension, ischemic heart disease, and stroke in later life (Bellamy et al., 2007;Brown et al., 2013).
Although preeclampsia and preeclampsia-related complications constitute a significant threat to pregnant women and babies, the underlying pathological mechanisms are still not well understood.Recently, proteomics profiling of serum samples obtained from pregnant women with severe preeclampsia and healthy participants revealed that several proteins (such as α1-antitrypsin, α1-microglobulin, clusterin, and haptoglobin) are significantly upregulated in women with severe preeclampsia (Hsu et al., 2015).Other studies suggested involvement of angiogenic factors (Alasztics et al., 2014;Rana et al., 2012), catechol-O-methyltranferase (COMT) and calcium transport genes (Yang et al., 2015), SERPINA1 and albumin (Buhimschi et al., 2008), sialic acid-binding immunoglobulin-like lectin-6 and pappalysin-2 (Winn et al., 2009), apolipoprotein E3 (Atkinson et al., 2009), and some inflammatory cytokines (Lockwood et al., 2008).Furthermore, the search of the UniProt database (Uni-Prot, 2015) for the preeclampsia-related human proteins returned 24 hits that included nine canonical proteins (see Table 1) and several alternatively spliced isoforms.In the current work, these preeclampsia-related human proteins were selected for the computational and bioinformatics analyses aiming at characterizing the origin of these proteins and at understanding the extent and potential functionality of the intrinsic disorders.
Electronegativity is the ability of an atom to attract electrons.Accordingly, most common atoms found in proteins (and other biological molecules) can be arranged in order of decreasing electronegativity as follows: oxygen, nitrogen, carbon, and hydrogen (Pauling, 1960).Together with other physicochemical properties, electronegativity can be used to find regularities between protein groups.In this work we used the bioinformatics supervised tool, Polarity Index Method (PIM), which evaluates the electronegativity of a peptide or a protein based solely on its amino acid sequence.PIM can be used to identify the functional and structural characteristics of different protein groups, showing a high discrimination in the process (Polanco & Samaniego, 2009;Polanco et al., 2012;2013a;2013b;2014a;2014c;2014d;2014e;2015).In this study, PIM was used to carry out a comparative analysis of the polarity profiles of proteins associated with preeclampsia (retrieved from UniProt database) and proteins from the following groups: antimicrobial peptides (APD2 database), lipoproteins (Magrane, 2011), angiogenesis-related proteins (UniProt database), and intrinsically disordered proteins (Oldfield et al., 2005; supplementary material).Our bioinformatics results showed that there is a high degree of similarity between the preeclampsia-related proteins and the lipoproteins, as well as between the angiogenesis and the preeclampsia proteins.
Although according to more than a century old structure-function paradigm unique protein function depends on its unique and stable three-dimensional structure, which is determined by unique amino acid sequence, experimental evidence accumulated during the last two decades revealed that numerous biologically active proteins or long regions of these proteins lack stable structures in solution (Peng et al., 2015;Xue et al., 2012).These intrinsically disordered proteins (IDPs) or hybrid proteins possessing ordered domains and intrinsically disordered protein regions (IDPRs) are commonly involved in cell signaling, regulation and recognition (Berlow et al., 2015;Dunker et al., 2008;Dunker et al., 2005;Habchi et al., 2014;Jakob et al., 2014;Oldfield & Dunker, 2014;Uversky et al., 2005;Uversky & Dunker, 2010;Tompa et al., 2015;van der Lee et al., 2014;Wright & Dyson, 2015).Since, pathogenesis of various human diseases (such as cancer, neurodegenerative diseases, amyloidoses, diabetes, and cardiovascular disease) (Uversky et al., 2008;2014) is frequently associated with dysfunction of IDPs, we decided to evaluate intrinsic disorder predisposition of the preeclampsia-related proteins using a broad arsenal of computational tools.This bioinformatics analysis revealed that many proteins associated with preeclampsia possess long functional IDPRs.

MATERIAL AND METHODS
Evaluation of electronegativity profile.The mathematical-computational Polarity Index Method (PIM) is completely automated and was described in several publications of this group (Polanco & Samaniego, 2009;Polanco et al., 2012;2013a;2013b;2014a;2014c;2014d;2014e).However, to facilitate the understanding of this text, the main parts of its metric are described below.
Metrics.The electronegativity or polarity profile of a protein is the only physicochemical property registered by the PIM.The algorithm starts by replacing the amino acids forming the linear representation of the protein polypeptide chain with their corresponding polar values that can be [P+] basic hydrophilic, [P-] acidic hydrophilic, [N] neutral, and [NP] non-polar."Each pair of amino acids in the sequence is counted and registered in an incident matrix; i.e., (row, column) = (amino acid_A, amino acid_B).These pairs of amino acids are formed when reading the amino acid sequence converted to its polar equivalent from N-terminus to C-terminus (from left to right equivalently) for each protein moving one Polar interactions present (✔), and absent (✕) in the vector of polarity index method.Electronegativity and intrinsic disorder of preeclampsia-related proteins amino acid at a time.The incident matrix (from the training data) is compared with its corresponding matrix of each target sequence.Those sequences that are greater than the percentage default (Tables 8-10), are considered to be candidate proteins" (see Metrics, Polanco et al., 2016).For example, if the linear representation of a protein polypeptide chain once converted into its polar equivalent is: {P-, NP, N, N, P+, P+}, then the first pair would be {P-, NP}, this means that in line {P-} column {NP} of the incident matrix a unit will be added.Then, one position is run to the right in the polar representation, and, therefore, the next pair would be {NP, N}.As a result, in row {NP} column {N} of the incident matrix a unit will be added.The procedure continues until the last pair {P+, P+}, when a unit is added in row {P+} column {P+}.The same is done for each protein studied, as well as for the proteins used for the pattern searched (training proteins).Afterward, the incident matrices of all training proteins are added to end up with only one matrix.Finally, the protein in study A[i,j] and the training protein B[i,j] matrices are normalized and compared, noting the number of hits in each position of the two matrices: A[i,j] = (A [i,j] + B [i,j]).What do we mean by the number of hits in each position?Suppose element (3,4) in A[i,j] matrix has the largest normalized relative frequency of the 16 elements forming it and that in (A[i,j] + B[i,j]) matrix, element (3,4) also has the largest relative frequency; if any other position was coincident in both matrices, then the number of coincidences will be one.Finally, the incident matrices are linearized in a vector (so-called polar vector) of 16 elements in this way: the first row of the matrix will be formed by the first four elements of the polar vector, the second row will be the next four elements and so on until all rows of the incident matrix are transferred to the polar vector.The PIM determines that a protein in study resembles the training proteins if the percentage of similarity between the two polar vectors is 80 or higher.The polar vector of the group of reviewed preeclampsia-related proteins is shown in Table 1.
Assembling protein sets for analysis.In this work, the files were formed by five categories of peptides and proteins.In order to avoid over-representation, the proteins were searched in all the groups and if a protein was found in two or more groups it was excluded from subsequent consideration.These five categories were: Six groups of lipoproteins HDL, IDL, LDL, VLDL, Chylomicrons, and Atherosclerosis (Table 2).When the IDL group was checked for over-representation, it was excluded (Magrane, 2011).
Four groups of antimicrobial peptides: fungi, virus, bacteria, and a subgroup of the bacteria group, so-called selective cationic amphipathic antibacterial peptides (SCAAP).The over-representation of this group was 60% of all the protein sequences registered in APD2 Database (Table 3) (Wang & Wang, 2009).
Two groups of reviewed human proteins related to angiogenesis and angiogenesis inhibitor proteins (Table 6) were found in UniProt Database (Magrane, 2011).
Test plan.The PIM performs two procedures to characterize the proteins associated with preeclampsia.
Graphing the relative frequency of each polar interaction of the four groups of proteins: lipoproteins, antimi-   crobials, intrinsically disordered proteins, and preeclampsia proteins groups (Figs 1-4).
Calculating the number of hits to identify the preeclampsia protein group (Tables 2-6) and compare it with the other groups; i.e. if PIM is calibrated with the preeclampsia proteins (Table 5), it will be tested with lipoproteins (Table 2), antimicrobial proteins and peptides (Table 3), disordered proteins (Table 4), and angiogenesis-related proteins (Table 6) so that the analytical test shows false positives and false negatives.
The preeclampsia associated proteins were identified in the proteins registered as angiogenesis and angiogenesis inhibitor proteins (Table 5), from UniProt database (Magrane, 2011), the hits are reported in Appendix 1 (at www.actabp.pl).
Evaluation of intrinsic disorder propensity.Prediction of the per-residue disorder propensities.Per-residue disorder distribution in preeclampsia-related proteins was evaluated using the members of the PONDR family, such as PONDR ® VLXT (Romero et al., 2001), PON-DR ® VSL2 (Peng et al., 2005), PONDR ® VL3 (Peng et al., 2006), and PONDR® FIT (Xue et al., 2010).In these analyses, the disorder scores above 0.5 are considered to correspond to the disordered residues/regions.PONDR ® VSL2 is one of the more accurate stand-alone disorder predictors (Fan & Kurgan, 2014;Peng et al., 2005;Peng & Kurgan, 2012).PONDR ® VLXT has high sensitivity to local sequence peculiarities associated with disorder-based interaction sites (Romero et al., 2001).PONDR ® VL3 is one of the more accurate evaluators of long disordered regions (Peng et al., 2006), whereas PONDR-FIT is one of the more accurate metapredictors (Xue et al., 2010).We used multiple disorder predictors here to increase the confidence of our results.
Charge-hydropathy plot.In addition to the per-residue disorder predictors, there are computational tools for binary classification of whole proteins as either mostly disordered or mostly ordered, where mostly ordered indicates proteins that contain more ordered residues   than disordered residues and mostly disordered indicates proteins that contain more disordered residues than ordered residues.One of such tools is the charge-hydropathy plot (CH-plot) of a particular sequence which is a linear disorder classifier that discriminate proteins with substantial amounts of extended disorder from proteins with globular conformations (Oldfield et al., 2005;Uversky et al., 2000).The CH-plot shows results from a binary disorder predictor and represents an input protein as a 2D graph, in which the mean Kate-Doolittle hydrophobicity and the mean absolute net charge are projected onto the X-and Y-coordinates, respectively.In the corresponding CH-plot, fully structured proteins and fully disordered proteins can be separated by a boundary line.All proteins located above this boundary line are highly likely to be extended, while proteins located below this line are likely to be compact (Oldfield et al., 2005;Uversky et al., 2000).CDF analysis.Cumulative distribution function is another binary disorder predictor that uses amino acid sequences to predict which proteins are likely to be mostly disordered (Oldfield et al., 2005).CDF analysis is a cumulative histogram of the PONDR scores for a given protein.In other words, this tool summarizes the PONDR-derived per-residue predictions by plotting PONDR scores against their cumulative frequency, which allows ordered and disordered proteins to be distinguished based on the distribution of prediction scores.At any given point on the CDF curve, the ordinate gives the proportion of residues with a PONDR score less than or equal to the abscissa.Disordered proteins with high PONDR scores will have CDF curves that have low cumulative values over most of the CDF curve, whereas ordered proteins with low PONDR scores will   have CDF curves that have high cumulative values over most of the CDF curve.Based on the comparison of CDF curves of known ordered and disordered proteins, a boundary may be determined that separates the CDF curves of disordered proteins located below the boundary from the CDF curves of ordered proteins located above the boundary, whereas for proteins with CDF curves that do not fall completely on one side of the boundary or the other a minimal boundary majority of four points is used for classification (Oldfield et al., 2005).
ANCHOR algorithm.The ANCHOR algorithm (http://anchor.enzim.hu/ ) was used to identify the AI-BSs (ANCHOR-identified binding sites) within the disordered regions of NK-lysins (Dosztanyi et al., 2009;Meszaros et al., 2009).This tool is based on the hypothesis that long IDPRs might contain localized potential binding sites that cannot form enough favorable intrachain interactions to fold on their own, but are likely to gain stabilizing energy by interacting with a globular protein partner (Dosztanyi et al., 2009;Meszaros et al., 2009).
The protein-protein interaction networks of the preeclampsia-related proteins.The interactivity networks of the preeclampsia-related proteins were estimated by STRING database (Search Tool for the Retrieval of Interacting Genes, http://string-db.org/)(Szklarczyk et al., 2011).This tool provides the predicted network of interactions and associations of the preeclampsia-related proteins with a certain group of proteins.The network nodes are partner proteins, and the edges are predicted or known function associations.The edge are drawn with seven lines of different color corresponding to the seven types of evidence used to establish function association.A green line corresponds with neighborhood evidence; red line relates to the presence of fusion evidence; a purple line indicates the experimental evidence; a blue line represents co-occurrence evidence; a yellow line indicates text-mining evidence; a black line represents co-expression evidence; a light blue indicates databases evidence (Szklarczyk et al., 2011).

Results of PIM analysis
The PIM (calibrated with LDL group) showed a high correlation with a group of reviewed preeclampsia-related proteins (54%) (Table 7).The PIM (calibrated with Folded group) showed a high correlation with the preeclampsia group (54%) (Table 9).The PIM (calibrated with Partially-Folded group) showed a moderate correlation with preeclampsia group (42%) (Table 9).The PIM (calibrated with Bacteria group) showed a moderate correlation with preeclampsia group (46%) (Table 8).The polarity profile calculated by the PIM for the pre-eclampsia proteins showed high correlation (Appendix 1 at www.actabp.pl;23 out of 24 proteins) with the groups of lipoproteins (Appendix 1 at www.actabp.pl;PIM lipoproteins), antimicrobial (Appendix 1 at www.actabp.pl;PIM antimicrobials), and intrinsically disordered proteins (Appendix 1 at www.actabp.pl;PIM structural).Onethird of the proteins associated with preeclampsia were also associated with angiogenesis (Table 10); the similarity of their polar profiles was found in 16 interactions (Fig. 5).
Descriptive analysis (PIM).The graph shows that the maximum, minimum and inflection points correlating with the preeclampsia reviewed proteins and all lipoprotein groups (Fig. 1), except in interactions [P-, N], [P-, NP], and [N, P+] that showed turbulence.Additional-  (Oldfield et al., 2005;Uversky et al., 2000).(B) Cumulative distribution function plot for the human preeclampsia-related proteins (differently colored lines) which represents a statistical analysis of PONDR ® VLXT disorder scores for comprehensive disorder assessment of proteins (Oldfield et al., 2005).Boundary separating wholly ordered and wholly disordered proteins is shown as a thick black line with black circles.A query protein is expected to be wholly disordered if the majority of its CDF curve is located below this boundary.ly, the graphs of the polar profile of proteins associated with preeclampsia (reviewed and unreviewed), did not correlate in the range [P +, P +] -[N, P-].Finally, there was no correlation graphically, nor analytically with the antimicrobial groups (Fig. 3 and Table 9), particularly with the SCAAP group (Fig. 4).From these results we concluded that the reviewed preeclampsia proteins, were highly associated with LDL, and moderately associated with intrinsically ordered proteins, as well as angiogenesis proteins, and bacteria group.
Intrinsic disorder propensity analysis.To gain information on the overall abundance and potential functionality of intrinsic disorder in preeclampsia-related proteins, we applied a wide variety of computational tools (see Materials and Methods).Results of this multiparametric analysis are summarized in Table 11 and are further detailed in Appendix 2 at (www.actabp.pl).The use of multiple computational tools and consensuses of several disorder predictors for evaluation of intrinsic disorder was motivated by empirical observations that this leads to an increase in the predictive performance compared to the use of a single predictor (Fan & Kurgan, 2014;Peng & Kurgan, 2012;Walsh et al., 2015).Our analysis revealed though that none of the preeclampsia-related proteins studied in this work is predicted to be completely disordered; all of them can be classified as hybrid proteins, and many of them contain substantial amounts of intrinsic disorder (see Table 11 and Appendix 2 at www.actabp.pl).Importantly, the vast majority of these proteins (84%) can be classified as either highly or moderately disordered proteins; i.e., proteins containing more than 30% and between 10 and 30% of predicted disordered residues, respectively.By various formal criteria (for example, by their intrinsic disorder content evaluated by various predictors, as well as by mean content of predicted disordered residues averaged over the outputs of 11 predictors, see Table 11), among the most disordered preeclampsia-related proteins were human tachykinin-3 (uniProt ID: Q9UHF0), human Storkhead-box protein 1 (UniProt ID: Q6ZVD7), human nostrin (UniProt ID: Q8IVI9), human NK cell receptor (UniProt ID: Q96L47), and human Atrial natriuretic pep- The number of hits (%) of PIM in lipoprotein groups.The PIM was calibrated with each group (rows), and compared with the groups (columns).E.g.PIM calibrated with HDL group (row) detected that 35% of VLDL proteins group (column), have HDL polar profile.PIM calibrated with Preeclampsia group (row), detected that 14% of LDL proteins group (column), have Preeclampsia polar profile.The number of hits (%) of PIM in intrinsically disordered protein groups.The PIM was calibrated with each group (rows), and compared with the groups (columns).E.g.PIM calibrated with Folded group (row), detected that 9% of Unfolded proteins group (column), have Folded polar profile.PIM calibrated with Preeclampsia group (column), did not detect any match in Unfolded group or Partially-Folded group.
Table 10.Hits in the angiogenesis proteins.

Group Angiogenesis Angiogenesis inhibitor
Preeclampsia (reviewed) 8 1 The number of hits found by the PIM in angiogenesis/angiogenesisinhibitor proteins and preeclampsia protein group.Electronegativity and intrinsic disorder of preeclampsia-related proteins tide-converting enzyme (UniProt ID: Q9Y5Q5).In fact, these proteins were characterized by the mean content of disordered residues of 60.4%, 46.84%, 36.83%,33.71%, and 17%, respectively.To further illustrate this point, Fig. 6 represents the results of disorder evaluation in canonical forms of these five proteins using the members of the PONDR family of disorder predictors and clearly shows high levels of predicted intrinsic disorder in these preeclampsia-related proteins.Figures 7A and 7B represent the results of analysis of all 25 preeclampsia-related proteins with two binary disorder predictors, CH-plot and CDF, that both evaluate the predispositions of query proteins to be ordered or intrinsically disordered as a whole.Although none of the preeclampsia-related proteins was predicted to belong to the class of proteins with extended disorder (there were no points corresponding to these proteins localisation above the boundary in the CH-plot, see Fig. 7A), several of these proteins (Nostrin (Q8IVI9) and its alternatively spliced isoforms, Q8IVI9-2 and Q8IVI9-3) were located right at or in a very close proximity to the boundary of the CH-plot.Furthermore, several preeclampsia-related proteins (e.g., Tachykinin-3 (Q9UHF0) and its alternatively spliced isoforms, Q9UHF0-2 and Q9UHF0-3) were predicted to be wholly disordered by CDF analysis, since the majority of their CDF curves was located below the boundary (see Fig. 7B).Therefore, combined CH-CDF analysis suggested that Tachykinin-3 and its alternative-ly spliced isoforms are either native molten globules (i.e., compact disordered protein forms) or hybrid proteins containing comparable amounts of ordered and disordered regions.Finally, in agreement with known fact that alternative splicing affects preferentially mRNA regions encoding IDPRs (Romero et al., 2006), Table 11 and data shown in Appendix 2 (at www.actabp.pl)illustrate that alternative splicing significantly affected disorder predispositions of the related proteoforms.
Another way of looking at the predisposition of a query protein for functional intrinsic disorder is provided by the D2P2 platform (Oates et al., 2013), which in addition to providing results of nine disorder predictors shows locations of various curated posttranslational modifications and predicted disorder-based protein binding sites (Oates et al., 2013).Figure 8 provides D2P2 plot for one of the highly disordered preeclampsia-related proteins, Nostrin (Q8IVI9), whereas analogous plots for other preeclampsia-related proteins are collected in Appendix 2 (at www.actabp.pl).These plots clearly showed that many of these proteins are predicted to have disordered regions of various lengths, often possess numerous potential disorder-based binding motifs (see also Table 11) and contain multiple sites of various posttranslational modifications (PTMs).The finding that the IDPRs of the preeclampsia-related proteins have a multitude of PTMs is in agreement with the well-known fact that phosphorylation (Iakoucheva et al., 2004) and many other enzymatically cata-   STRING produces the network of predicted associations for a particular group of proteins.The network nodes are proteins, whereas the edges represent the predicted or known functional associations.An edge may be drawn with up to 7 differently colored lines that represent the existence of the seven types of evidence used in predicting the associations.A red line indicates the presence of fusion evidence; a green line -neighborhood evidence; a blue line -cooccurrence evidence; a purple line -experimental evidence; a yellow line -text mining evidence; a light blue line -database evidence; a black line -co-expression evidence (Szklarczyk et al., 2011).Electronegativity and intrinsic disorder of preeclampsia-related proteins lyzed PTMs are preferentially located within the IDPRs (Pejaver et al., 2014).Furthermore, Fig. 9 that represents a protein-protein interaction network of human Tachykinin-3 (Q9UHF0) created using the STRING platform and analogous STRING plots in Appendix 2 (at www.actabp.pl)for other preeclampsia-related proteins clearly illustrate that all these proteins are characterized by high interactivity, being involved in multiple protein-protein interactions.All these observations suggest that intrinsic disorder plays important role in function of preeclampsia-related proteins.

DISCUSSION
Electronegativity is a physicochemical property strongly associated with molecular binding.In PIM, the measurement of electronegativity is expressed as a matrix comprising 16 possible polar interactions.It is important to note that more than 70% of the algorithms used to characterize or identify peptide groups measure this property, however, it is always represented as a figure.We consider that the high efficiency shown by PIM is derived from this polarity matrix, since it allows the measurement and analysis of all possible polar interactions based on just four polar groups {P +, P-, N, NP}.An algebraic structure such as this polarity matrix is an R16 dimensional space where the eigenvectors are determined to generate an invariant subspace, this reinterpretation of the matrix is equally efficient (see Polarity Vector Method (PVM), Polanco, 2016a).
It is important to emphasize the comprehensive way in which polarity is examined by considering the metric of all possible polar interactions and not interpreting them as a simple figure.The PIM was evaluated with different groups of peptides or proteins, and in all cases analyzed so far a high level of efficiency in protein/peptide identification was evident (Polanco & Samaniego, 2009;Polanco et al., 2012;2013a;2013b;2014a;2014c;2014d;2014e).In just six decades, Bioinformatics has gone from an auxiliary verification tool for proteomics results obtained in laboratory, to an essential tool for the construction of new pharmaceutical drugs.Today, this discipline already produces synthetic peptides from supervised methods and it is very likely that in this century, biological environments will be re-created to test thousands of synthetic peptides simultaneously.In our opinion, this will involve a thorough review of the physicochemical properties used now in bioinformatics algorithms, as well as the review of peptides and proteins listed in public databases and the inclusion of quantitative referents such as toxicity, in the case of the antimicrobial peptides, that help robust the bioinformatics algorithms.
According to our results, 30% of the proteins associated with preeclampsia are angiogenesis proteins.If we consider that the studied group is rather small (just 24 preeclampsia proteins), this percentage showed a strong association between the two groups (according to our bioinformatics results, because we do not develop any experimental verification).Our future work will be dedicated to further exploration of this issue.
It is important to mention that the metric used to associate the reviewed preeclampsia proteins with other protein groups, such as bacteria, folded proteins, SCAAP, was determined indirectly.This means that instead of calibrating the PIM with the preeclampsia group and then compare it with the other groups to find a coincidence, the PIM was calibrated with all the different groups e.g.LDL (see Table 8), and then the polarity pattern of the preeclampsia re-viewed group was sought among those groups, finding a coincidence of 58%.This result is not significant, therefore, the PIM is not recommended for the bioinformatics analysis of proteins associated with preeclampsia.

CONCLUSIONS
Method presented in this study is a bioinformatics algorithm that exhaustively evaluates polarity from the amino acid sequence of a protein, and several bioinformatics methods oriented to measure the intrinsic disorder level in query proteins and to evaluate/illustrate their disorder-based functionality.Our analysis showed with a high to moderate certainty that proteins associated with preeclampsia can be classified as lipoproteins and angiogenesis-related proteins, and further revealed that many proteins associated with preeclampsia possess long ordered domains and functional intrinsically disordered protein regions.

Figure 1 .
Figure 1.Relative frequency distribution of preeclampsia proteins and lipoproteins.X-axis represents the 16 polar interactions.

Figure 2 .
Figure 2. Relative frequency distribution of preeclampsia proteins and intrinsically disordered proteins.X-axis represents the 16 polar interactions.

Figure 3 .
Figure 3. Relative frequency distribution of preeclampsia proteins and antimicrobial peptides.X-axis represents the 16 polar interactions.

Figure 4 .
Figure 4. Relative frequency distribution of preeclampsia proteins and SCAAP.X-axis represents the 16 polar interactions.

Figure 5 .
Figure 5. Relative frequency distribution of angiogenesis and preeclampsia proteins.X-axis represents the 16 polar interactions.

Figure 7 .
Figure 7. (A) Charge-hydropathy plot (mean scaled hydropathy, <H>, against mean net charge, <R>, at neutral pH) of the human preeclampsia-related proteins (large, variously shaped and colored symbols) and a set of known disordered proteins (light pink diamonds) and a set of known ordered proteins (light blue circles).The boundary separating intrinsically disordered and compact proteins shown by solid black line is empirically defined by the equation <H> b = (<R> + 1.151)/2.785(Oldfield et al., 2005;Uversky et al., 2000).(B) Cumulative distribution function plot for the human preeclampsia-related proteins (differently colored lines) which represents a statistical analysis of PONDR ® VLXT disorder scores for comprehensive disorder assessment of proteins(Oldfield et al., 2005).Boundary separating wholly ordered and wholly disordered proteins is shown as a thick black line with black circles.A query protein is expected to be wholly disordered if the majority of its CDF curve is located below this boundary.

Figure 8 .
Figure 8. Evaluation of the functional intrinsic disorder propensity of human Nostrin (UniProt ID: Q8IVI9) using D 2 P 2 database (http:// d2p2.pro/)(Oates et al., 2013).In this plot, top nine colored bars represent location of disordered regions predicted by different disorder predictors (Espritz-D, Espritz-N, Espritz-X, IUPred-L, IUPred-S, PV2, PrDOS, PONDR® VSL2b, and PONDR® VLXT, see legend for the corresponding color codes).Green-andwhite bar in the middle of the plot shows the predicted disorder agreement between these nine predictors, with green parts corresponding to disordered regions by consensus.Yellow bar shows the location of the predicted disorder-based binding site (MoRF region), whereas red and blue circles at the bottom of the plot show location of phosphorylation and methylation sites, respectively.

Figure
Figure 9. Analysis of the interactivity of the human Tachykinin-3 (UniProt ID: Q9UHF0) with STRING (Szklarczyk et al., 2011).STRING produces the network of predicted associations for a particular group of proteins.The network nodes are proteins, whereas the edges represent the predicted or known functional associations.An edge may be drawn with up to 7 differently colored lines that represent the existence of the seven types of evidence used in predicting the associations.A red line indicates the presence of fusion evidence; a green line -neighborhood evidence; a blue line -cooccurrence evidence; a purple line -experimental evidence; a yellow line -text mining evidence; a light blue line -database evidence; a black line -co-expression evidence(Szklarczyk et al., 2011).

Table 8 . Hits in the antimicrobial peptide groups
The number of hits (%) of PIM in antimicrobial peptide groups.The PIM was calibrated with each group (rows), and compared with the groups (columns).E.g.PIM calibrated with SCAAP group (row), detected that 5% of Virus proteins group (column), have SCAAP polar profile.PIM calibrated with Preeclampsia group, detected that 1% of Bacteria proteins group, have Preeclampsia polar profile.

Table 11 . Human preeclampsia-related proteins analyzed during this study and their major intrinsic disorder characteristics
Data for the alternatively spliced isoforms of preeclampsia-related proteins are indicated by the Italic font, whereas the mean disorder content (percent of predicted disordered residues) calculated by averaging the outputs of 11 predictors is shown with the bold font.