Identification of antimicrobial peptides by using eigenvectors

Antibacterial peptides are subject to broad research due to their potential application and the benefit they can provide for a wide range of diseases. In this work, a mathematical-computational method, called the Polarity Vector Method, is introduced that has a high discriminative level (>70%) to identify peptides associated with Gram (–) bacteria, Gram (+) bacteria, cancer cells, fungi, insects, mammalian cells, parasites, and viruses, taken from the Antimicrobial Peptides Database. This supervised method uses only eigenvectors from the incident polar matrix of the group studied. It was verified with a comparative study with another extensively verified method developed previously by our team, the Polarity Index Method. The number of positive hits of both methods was up to 98% in all the tests conducted.


INTRODUCTION
The manufacture of pharmaceutical drugs (Blundell et al., 2006;Ekins, 2004;Ekins et al., 2007;Kantardjieff & Rupp, 2004;Readhead & Dudley, 2013) from proteins made or identified by "bioinformatics methods" is strategically important mainly for two reasons: the experimental location of peptides in living organisms is less frequent every time, and the costs involved in their synthesis and trial and error assays (Adams & Brantner, 2006;Bahar & Ren, 2013;Breda et al., 2006;Dudley et al., 2011) is constantly increasing.These two factors have given an impulse to design a new generation of mathematical-computational algorithms, oriented to measure the characteristic physico-chemical profiles (Gill et al., 2007;Liu et al., 2012;Vilar et al., 2008) of different groups of proteins and thus "computationally" build peptides by design.Among the different bioinformatics methods, the Polarity Index Method (PIM) (Polanco & Samaniego, 2009;Polanco et al., 2012;2013;2013a;2014;2014a;2014b;2014c;2014d) stands out for its high level of efficiency to identify the major action of proteins with antimicrobial action (>75% in a double blind test).The metric of this method is based on a polar matrix, which is built by counting the polar incidents of the amino acids forming the linear sequence of a protein.Although this polar matrix is the core of the PIM metrics, an equally effective matrix has been identified from the eigenspace of the polar matrix of the studied group (Poole, 2011;Sahai & Bist, 2002).In this work, method called "Polarity Vector Method" (PVM) is presented, which uses as a metric an eigenvector matrix.With the substitution of the polar matrix by an eigen-vector matrix, this variant shows that the mathematical substitute is equally efficient.Another important aspect to highlight is that both methods, PIM and PVM, use in the metric a single physico-chemical property, i.e. polarity (Pauling, 1955).The verification of this method was done by comparing the results found in PIM and PVM, taking the main antimicrobial peptides from the APD2 Database, as accessed in December, 2012 (Wang & Wang, 2009).This public database undergoes constant maintenance and it includes the notes of the publications that support the information; these were important factors considered in the selection of the group of peptides used to train the methods mentioned.

MATERIALS AND METHODS
The eight main groups of antibacterial peptides from the APD2 Database (Dec 2012) (Wang & Wang, 2009) were tested in the automated versions of the mathematical-computational Polarity Index Method (PIM) and Polarity Vector Method (PVM).The first has already been extensively tested with different groups of peptides and proteins (Polanco & Samaniego, 2009;Polanco et al., 2012;2013;2013a;2014;2014a;2014b;2014c;2014d), the second will be introduced in this paper.
Afterward, the polar incidents obtained from reading the sequences from left to right, number by number, were registered in a polar incident matrix called the polar matrix, where (row, column)=(number equivalent A, number equivalent B).Once the incidents were recorded in the matrix, this was normalized (Table 1).Note that this matrix informs as to which polar interaction is more/less frequent from the 16 possibilities available.Then, the PIM compared the polar interaction between the polar matrix of the training set and the sequence of the group being studied, to determine whether the sequence studied was similar to that particular group.

Example
In order to evaluate if the peptide has the profile of the training set, let us take the following protein: (taken from the Example section, Polanco et al., 2014d) and follow these steps: The numerical equivalent is obtained according to the rule mentioned above 1.434432423344344433441424431422444313324244424 413144434431344344434344321343111443344333324 334214414311411243413412434333434443343443314 344332444344343323442331131134433334441123144 443334144234433323442442443341344344134331433 343413244234343311434343141.2. This sequence (step 1) is read from left to right, the first element is (4,3), the second element is (3,4) (note this element appears when one position is run to the right).The element (4,3) is recorded in the polar matrix in row 4, column 3; the second element (3,4) is recorded in the same matrix in row 3, column 4, and so on until all incidents are recorded; this matrix will be called A [i, j]. 3. The same procedure is conducted for the training set representing the characteristic sought, gathering all incidences in a matrix; this matrix will be called P [i, j]. 4. Both matrices A [i, j] and P [i, j] are weighted.5. Matrix C[i,j] is created; C[i,j]=A[i,j] + P[i,j].6.Now the C [i, j] matrix has as elements the normalized relative frequencies of the sequence studied, and the P [i, j] matrix has as elements the normalized relative frequencies from the training set.In order to compare them, two vectors are built for each one of them, sorting their elements in an ascending order, and instead of using their relative frequencies, the position they have in each vector is used.7.
Step 6 is also applied to the P [i, j] matrix comparing both vectors.The greater the number of hits, the greater the similarity between the two sequences.
In this step a percentage of similarity is determined, if the peptide or protein has an equal or a greater percentage, the sequence is accepted.

Polar Vector Method metrics
The metrics of the supervised method called the Polarity Vector Method (PVM) had the comparison structure of PIM but replaced the polar matrix by the eigenvector matrix (Table 3).This matrix was built as follows: 1.The four eigenvectors (Hait, 2002;Poole, 2011;Sahai & Bist, 2002) of each polar matrix group were calculated (Table 1).These four eigenvectors (v 1 , v 2 , v 3 , v 4 ) were then integrated into a 4×4 matrix, where v 1 was the first column, v 2 the second, v 3 the third and v 4 the last column (Table 2).The 16 elements of this matrix were placed in an ascending order by the following rule: Since z 1 =a + bi, and z 2 =c + di, where a, b, c and d are real numbers, and i is an imaginary identity.z 1 =z 2 ⇔ a=c and b=d; z 1 ≥z 2 ⇔ if a≥c and b≥d; z 1 <z 2 ⇔ if a<c and b≤d.

Phase Portrait
The eigenvalues of each of the eight antimicrobial peptide groups from the APD2 database (Dec 2012) (Wang & Wang, 2009) were calculated by the Bluebit Software http://www.bluebit.gr/,accessed Nov 26, 2014 (Hait, 2002), and then were plotted with the GNU Octave http://www.gnu.org/software/octave/doc/interpreter(Eaton et al., 2009).The analysis of the portrait phase considered the spatial distribution of the eigenvalues for each group.An eigenvector matrix was calculated for each peptide, as well as its four complex eigenvalues.The real part of each complex eigenvalue was located in the X-axis and its imaginary part in the Y-axis (Figs. 1-8).

APD2 Database
The selected antimicrobial peptide groups from the APD2 Database (Dec 2012) (Wang & Wang, 2009), were verified to make sure if any of their representative sequences were included in another group.Avoiding duplication of peptides provided a more accurate fingerprint of the group studied, minimizing false positives and false negatives, and raising the level of efficiency of the method.It should be noted that this filter reduced the number of the peptides studied to almost 60%.The number of peptides analyzed was 1146 with the following distribution: 131 for Gram (-) bacteria, 260 for Gram (+)bacteria, 54 for cancer cells, 527 for fungi, 7 for insects, 93 for mammalian cells, 20 for parasites, and 54 for viruses.There are other groups of antimicrobial peptides in this database, however, their number and relevance to compare the efficiency of the methods were not meaningful; therefore, they were not included.There are also at least 15 other public databases, some of them are general and others specialized; however, to assess the efficiency of the PVM it was decided to analyze the antimicrobial peptides for the importance they have in the production of new pharmaceutical drugs.

Test Trial
Polarity Index Method (PIM) was calibrated with the polar matrix (Table 1) and Polarity Vector Method Identification of antimicrobial peptides by using eigenvectors (PVM) with the eigenvector matrix (Table 3) for each antimicrobial group in the APD2 Database (Dec 2012) (Wang & Wang, 2009).Afterward, the percentage of efficiency for both methods was determined by comparing the target group with the group studied, and with the other groups extracted (Table 4).Additional-ly, the coincidence of each protein accepted or rejected by both methods was also evaluated (see Appendices A-H at www.actabp.pl);i.e. both methods were calibrated with one of the groups (Table 4) (training set) and were tested with the others (test sets), and this procedure was conducted for each group studied.

RESULTS
The PIM and PVM methods (Table 4) show a high affinity to identify the group studied (>70%) and are discriminative against other groups.The analysis of each sequence, individually evaluated by both methods, shows a coincidence higher than 98% (Appendices A-H at www.actabp.pl);therefore, it can be stated that both methods are equivalent.The spatial distribution of the eigenvectors of each group studied is not discriminative (Figs 1-8), as in all groups the cumulus is located in quadrants I and II, except in the insects group that is located in quadrant I (Fig. 7).

DISCUSSION
Although in practical terms the Polarity Vector Method is equally discriminative as the Polarity Index Method, it is important to note that it includes in its met-       The X-axis corresponds to the real part of the eigenvalue, and the Y-axis to its imaginary part.Identification of antimicrobial peptides by using eigenvectors rics the eigenvectors of the polarity matrix studied.This made possible to study the span of eigenspace, showing that although the eigenvectors (v) and eigenvalues (λ) of the system are related from the expression Av=λv, according to the results, the eigenvalues are not effective discriminants.This can be attributed to the fact that   each eigenvalue (λ: complex number) is associated with an eigenvector (v: vector column formed by four complex numbers); in this sense, the exhaustive character of the metric formed by the eigenvectors shows a regularity that is lost when the eigenvalues are compared.Furthermore, the metric of the Polarity Vector Method does not depend on previous calculations, therefore, its execution with high-performance computational architectures makes it possible to predict the affinity of a peptide or protein in a processing time t p /n, where "n" is the number of computer processors.PVM depends on a single physico-chemical property, polarity, that quantifies the electromagnetic balance of the protein.It originates from the electronegativity of the valence electrons in the constituent amino acids.Linus Pauling (Pauling, 1955) defined the "electronegativity" as: the affinity between the electrons in a covalent bond.This property has been verified as "necessary and sufficient" to efficiently identify the major association of a protein.Although there are other physico-chemical properties that have been used together as a metric i.e., hydrophobicity (Borgese & Fasana, 2011), isoelectric point (Kidman et al., 2004), and net charge (Shaw et al., 2001) among others; it was important to find a physicochemical property capable to describe the activity of a protein by itself, which would give an impulse to basic science and will allow the cleaning of bioinformatics codes used for this purpose.
This bioinformatics product can contribute to the computational and structural proteomics research.The performance of the method is high, and using a single physico-chemical property could enable scholars to gain a deeper insight about polarity.This fundamental property of matter, strengthens the field of bioinformatics since its metric adapts smoothly to parallel and distributed processing schemes, making possible the assessment of all peptides and proteins in the public databases.
Finally, it is worth mentioning the importance of encouraging the creation and use of public databases, as a significant part of basic research is founded on the availability of free and updated information that is carefully revised.This has been mainly the reason for using the APD2 database for many years.

CONCLUSIONS
The Polarity Vector Method is a robust and highly discriminative method that can be used as a "first filter" in the identification of antimicrobial peptides.Its programming scheme also allows its execution with highperformance computing platforms for the comprehensive analysis of peptide regions.

Figure 1 .
Figure 1.Spatial distribution of the eigenvalues (see Phase Portrait section, Appendix I at www.actabp.pl) of the Gram (+) bacteria group from the APD2 Database accessed in December 2012 (Wang & Wang, 2009).The X-axis corresponds to the real part of the eigenvalue, and the Y-axis to its imaginary part.

Figure 2 .
Figure 2. Spatial distribution of the eigenvalues (see Phase Portrait section, Appendix J at www.actabp.pl) of the Gram (-) bacteria group from the APD2 Database accessed in December 2012 (Wang & Wang, 2009).The X-axis corresponds to the real part of the eigenvalue, and the Y-axis to its imaginary part.

Figure 3 .
Figure 3. Spatial distribution of the eigenvalues (see Phase Portrait section, Appendix K at www.actabp.pl) of the virus group from the APD2 Database accessed in December 2012 (Wang & Wang, 2009).The X-axis corresponds to the real part of the eigenvalue, and the Y-axis to its imaginary part.

Figure 4 .
Figure 4. Spatial distribution of the eigenvalues (see Phase Portrait section, Appendix L at www.actabp.pl) of the parasite group from the APD2 Database accessed in December 2012 (Wang & Wang, 2009).The X-axis corresponds to the real part of the eigenvalue, and the Y-axis to its imaginary part.

Figure 5 .
Figure 5. Spatial distribution of the eigenvalues (see Phase Portrait section, Appendix M at www.actabp.pl) of the insect group from the APD2 Database accessed in December 2012 (Wang & Wang, 2009).The X-axis corresponds to the real part of the eigenvalue, and the Y-axis to its imaginary part.

Figure 6 .
Figure 6.Spatial distribution of the eigenvalues (see Phase Portrait section, Appendix N at www.actabp.pl) of the mammalian cells group from APD2 Database accessed in December 2012 (Wang & Wang, 2009).The X-axis corresponds to the real part of the eigenvalue, and the Y-axis to its imaginary part.

Figure 7 .
Figure 7. Spatial distribution of the eigenvalues (see Phase Portrait section, Appendix O at www.actabp.pl) of the fungi group from the APD2 Database accessed in December 2012 (Wang & Wang, 2009).The X-axis corresponds to the real part of the eigenvalue, and the Y-axis to its imaginary part.

Figure 8 .
Figure 8. Spatial distribution of the eigenvalues (see Phase Portrait section, Appendix P at www.actabp.pl) of the cancer cells group from the APD2 Database accessed in December 2012 (Wang & Wang, 2009).The X-axis corresponds to the real part of the eigenvalue, and the Y-axis to its imaginary part.