on-line at: www.actabp.pl Detection of selective antibacterial peptides by the Polarity Profile method

Antimicrobial peptides occupy a prominent place in the production of pharmaceuticals, because of their effective contribution to the protection of the immune system against almost all types of pathogens. These peptides are thoroughly studied by computational methods designed to shed light on their main functions. In this paper, we propose a computational approach, named the Polarity Profile method that represents an improvement to the former Polarity Index method. The Polarity Profile method is very effective in detecting the subgroup of antibacterial peptides called selective cationic amphipathic antibacterial peptides (SCAAP) that show high toxicity towards bacterial membranes and exhibit almost zero toxicity towards mammalian cells. Our study was restricted to the peptides listed in the antimicrobial peptides database (APD2) of December 19, 2012. Performance of the Polarity Profile method is demonstrated through a comparison to the former Polarity Index method by using the same sets of peptides. The efficiency of the Polarity Profile method exceeds 85% taking into account the false positive and/or false negative peptides.


InTRODuCTIOn
The re-emergence of multi-drug-resistant organisms (Barie PS, 2012) has focused part of the research in bioinformatics on the search for fast and reliable computational procedures (Blueggel et al., 2004) that allow for the detection or prediction of the primary pathogenic action of a peptide (most of the peptides found experimentally show non-specific pathogenic action).These procedures allow for checking peptides of short or medium lengths in fractions of a second, thus avoiding the high cost of an experimental analysis.However, the mathematical abstraction of physicochemical properties is not an easy task, especially when it comes to the discrimination between primary properties of the peptides and those derived from them.
The bioinformatics mathematical algorithms related to the analysis of biological structures are classified as "supervised learning" and "non-supervised learning" (Zhao et al., 2011).The supervised algorithms require "training data" that are characteristic of the population that was intended to identify.The representatives of these algorithms are Quantitative Structure Activity Relationships (QSAR) (Gonzalez-Diaz H, 2012), Hidden Markov models (Polanco & Samaniego, 2009), Monte Carlo methods (Perez-Riverol et al., 2012), Support Vector Machines (Han et al., 2012), and Fourier Transforms (Silverman BD, 2005).On the other hand, non-supervised algorithms do not use "training data".One of these algorithms is called clustering (Li et al., 2008).Both groups offer varying degrees of difficulty in their computational implementation, however, the supervised algorithms are less difficult in their implementation.
In this manuscript we present a QSAR algorithm called the Polarity Profile method, which represents an improvement or modification to another approach that was formerly introduced by us (Polanco et al. 2012).Three relevant aspects have been improved.
(1) The Polarity Profile method is highly efficient in identifying the subgroup of antibacterial peptides named Selective Cationic Amphipathic Antibacterial Peptides (SCAAP) that are characterized by highly selective toxicity towards bacteria, not adopting the alpha-helicoidal structure in neutral aqueous solution, and showing the therapeutic index higher than 75 (del Rio et al., 2001).The therapeutic index of a peptide is determined by the ratio between the minimum inhibitory concentration observed against mammalian and bacterial cells, i.e. the higher the value, the more specific the peptide for bacterial-like membranes.Hence, SCAAP display strong lytic activity against bacteria but no toxicity against normal eukaryotic cells such as erythrocytes.
(2) The algebraic structure of the Polarity Profile method (matrix with 4x4 interactions formed by the polar groups P, P-, N and NP) offers extensive polarity information about the peptide characteristics.Most of the supervised algorithms measure the polarity property as one number while the Polarity Profile method considers 16 numbers that correspond to the 16 possible polar interactions.Hence, major information about the dynamics of the phenomenon is obtained.
(3) The similarity found when comparing the polarity matrices of the bacteria group and a subset of this socalled SCAAP shows that the physicochemical property called polarity constitutes an important measurement.
The Polarity Profile method uses the classification of 20 proteic amino acids, differentiated by their side chain R and by dividing them into four different categories according to their polar profile (Australian, 2012) (Table 1).
The method was verified with all peptides listed in the antimicrobial peptide database (APD2) from December 19, 2012 (Wang et al., 2009) and showed a high discriminating capacity.

MATeRIAl AnD MeThODS
The Polarity Profile method requires building different matrices in which the rows and columns represent four polarity groups in the order [P+] polar, [P-] acidic, [N] neutral, and [NP] non-polar residues (Table 1).The elements (i,j) represent 16 possible polar interactions.For example, the definition of the A[i,j] polarity matrix is represented as shown in Table 2 where the value 5 is located in row 2 and column 3, so that the element (2,3) of the A matrix is 5, i.e.A[2,3] = 5.
We built the polarity matrix by adding up the number of incidences moving the peptide sequence in amino acid pairs from left to right, one at the time to the end.Each amino acid pair was related to its polarity group, associating the i-th row with j-th colum, and adding 1 to the matrix element, i.e.A[i,j] = A[i,j] + 1, thus obtaining the incidence matrix A[i,j].This means that the amino acids of the studied peptides, are converted to numbers {P, P-, N, NP} = {1,2,3,4} (Table 1, columns #1 and #2), then the sequence is read from left to right, where each pair of numbers indicate the coordinates (i,j) in the matrix.
To illustrate the use of this rule by considering the protein sequence QIINNPITCMTNGAIC WGP-CPTAFRQIGNCGHFKVRCCKIR (Table 9,entry #35), where the amino acid Q is assigned to the polarity equivalence 3, the amino acid I to the equivalence 4, etc., until the complete sequence is translated to 34433443343334434 343434413433331414133141.Once the protein sequence has been converted into its equivalent series, the polarity matrix O[i,j] is built.Each element (i,j) accumulates the occurrence that is obtained by reading the series 3443344334333443434343441 3433331414133141 from left to right and taking as element (i,j) the pair of numbers found by moving one digit at the time through the series.As an example, the polarity matrix O[i,j] corresponding to the series 344334 433433344343434344134333314 14133141 is expressed in Table 3.

Polarity Profile Method Description
The method considers comparing two matrices and determines a specific profile.
The steps are: 1. Building the P[i,j] matrix with the entire peptide set of SCAAP sequences with a unique pathogenic action.When the polarity matrix P[i,j] is concluded, it will be normalized to one.Under this rule, we consider 51 peptides of SCAAP extracted from the APD2.Under the rule described in Section 2.1 the P[i,j] matrix is obtained as shown in Table 4.
2. Building the O[i,j] matrix with the sequence-objective of the study.The O[i,j] matrix is not to be normalized.For instance, we took the sequence QIINNPITC MTNGAICWGPCPTAFRQIGNCGHFKVRCCKIR and calculated its O[i,j] matrix (Table 3).
3. Each element of the matrix P[i,j] is multiplied by the corresponding element in the matrix O[i,j].The result of this operation is multiplied by the factor 0.30.The resulting matrix of this operation will be defined with the operator (⊗) as 4. Now, the matrices P[i,j] and P[i,j] ⊗ O[i,j] should restate, as matrices that identify, line by line, ordered frequency positions.This means that these matrices now contain what will be the positions, and not the frequency.The new matrix P[i,j] ⊗ O[i,j], for the sequence QIINNPITCMTNGAICWGPCPTAFRQI- A, F, I, L, M, P, V, W Proteinogenic amino acids classification differentiated by their side-chain according to their polarity into four categories (Australian, 2012).Index: Numeric identity assigned to polarity group.Polarity O[i,j] matrix interaction in the polarity groups differentiated by their lateral chain to the sequence e.g.QIINNPITCMTNGAICWGP-CPTAFRQIGNCGHFKVRCCKIR (Section 2.1).Detection of selective antibacterial peptides by the polarity profile method GNCGHFKVRCCKIR, is indicated in Table 5, and the matrix P[i,j] for 51 SCAAP (Table 4) is expressed in Table 6. 5.The Polarity Profile method qualifies as SCAAP candidates those peptides whose number of matches when comparing both matrices P[i,j] and P[i,j] ⊗ O[i,j] is greater than 60% (60% of 16 is 10 elements).For example, the peptide with the sequence QIINNPITC-MTNGAICWGPCPTAFRQIGNCGHFKVRCCKIR is accepted (Table 7).

APD2 Database Trial Data Preparation
We have studied and classified all 2169 peptides in the APD2 database by their unique and multiple action against: bacteria, viruses, fungi, parasites, insects, carcinogenic cells, mammalian cells, sperms, and SCAAP.
The set of 51 SCAAP was taken directly from previous records that have already been reported (Polanco et al. , 2012).The majority of these peptides fall in the bacteria group of the APD2.The bacteria group that is compared to the SCAAP group (Section 2.4) was also extracted from the APD2.
All groups extracted from the APD2 were analyzed (Table 9) and classified into two single sets: action vs. multiple action.The peptide sets with a unique pathogenic action are those peptides with confirmed experimental action against a single pathogen agent, whereas multiple action peptides show pathogenic action against two or more pathogens.
As an example of this classification, let us consider the set of anti-fungi peptides with unique action.This set is composed of peptide sequences that do not appear in other peptide subgroups, while the anti-fungi multiple action peptides contain peptides with an action against fungi and additionally with a possible action against other pathogenic groups.
We have validated the unique action peptide sets by matching peptides from the APD2 database with those identified by the Polarity Profile method.All obtained information was verified comparing the identified SCAAP with all peptides from the APD2 database.Number of hits: Number of matches of the polarity profile method within two groups of peptides from the APD2 (Wang et al., 2009), with unique and multiple pathogenic action against: bacteria, viruses, fungi, parasites, insects, carcinogenic cells, mammalian cells, sperms, and SCAAP (Polanco & Samaniego, 2009, Polanco et al., 2012).Unique action: Peptides exerting pathogenic action against only one group (Section 2.3).
Multiple action: Peptides exerting pathogenic action against two or more groups (Section 2.3).(%): Percentage of number of hits/total of peptides.

Catastrophic bifurcations points
Catastrophic bifurcation points are points where abrupt changes in the behavior of a function occur.These points are associated with the positions in the matrix where the maximum/minimum frequencies observed are identified; for the sake of convenience and clarity they are marked in red/blue, respectively.

ReSulTS
The Polarity profile method showed an efficiency of more than 85% to detect SCAAP (44/85) in the entire APD2 database sub-classifications (Table 8).It is equally efficient in not identifying other groups such as fungi 18/77 = 23%, or bacteria 237/743 = 31%.
The method excluded 7 out of 51 peptides from SCAAP (Table 9, column #1 with N-letter) while the formerly reported algorithm (Polanco et al., 2012) excluded 6 out of 51 peptides from SCAAP (Table 9, column #2 with N-letter).There are no coincidences between both excluded sets.
The maximum value points (catastrophic bifurcations points) of the SCAAP group are located in the positions 4, 5, 12, and 16 from the polarity matrix (Table 10, first column in red color), its corresponding points for the bacteria group in the positions 4, 8, 12, and 16 (Ta-ble 11, first column in red color).The match is almost 3 out of 4 total.
The minimum value points (catastrophic bifurcations points) of the SCAAP group are located in the positions 2, 6, 10, and 14 from the polarity matrix (Table 10, first column in blue color), its corresponding points for the bacteria group in the positions 2, 6, 10, and 14 (Table 11, first column in blue color).The coincidence is total between the two groups.

DISCuSSIOn
For decades peptides were classified according to their toxic action.However, these features were related to the space where the peptides interact with the structural membrane of the object.As discovered by the experiment, most peptides exert, in some degree, action against multiple pathogens.One could speculate that the nature attempts to avoid a large differentiation in the linear sequence of the peptide, with regard to the toxicity against a particular group of pathogen.In other words, if a peptide requires only small changes in the sequence to face another pathogen group, the amount of energy required to do so will be much smaller than in case of large changes.The above mentioned considerations led us to believe that the detection of selective antibacterial peptides and their prediction is rather related to the general features of the peptides.Hence, the most effective algorithms are those evaluating fundamental characteristics of all peptides that search only for small differences.
The design of bioinformatic algorithms related to peptide detection is basically of two types.The first depends on a system of nonlinear differential equations (Janes & Lauffenburger, 2006) that characterizes the peptide properties with exponentially growing complexity.The other type allows for the inclusion of multiple peptide characteristics without affecting its complexity.Here, the efficiency depends greatly on a good peptide training set selection.The Polarity Profile method falls into the latter type that is characterized by effectively excluding multiple action peptides with a margin of error lower than 30%.Its efficiency to identify SCAAP subjects is higher than 85%.It only measures the polarity of the peptides and this information allows for efficient classification of the pathogenic action.
The catastrophic bifurcation points in the SCAAP and bacteria groups show almost total coincidence in the points 7 of 8 (Tables 10 and 11).In the two sets these points are clearly associated to the last elements of the rows of the polarity matrix.The almost total coincidence of the catastrophic bifurcation point locations between the SCAAP and the bacterial group may indicate that there is an algebraic structure associated to the polarity matrix.This would reinforce our assumption that the catastrophic bifurcation points stand for regions, in which the protein functionality definition takes place.Depending on its validation for other peptide sets exerting different pathogenic actions, the dynamic bifurcation analysis could become an important mathematical contribution to the field of proteomics.
Both the Polarity Index and the Polarity Profile methods are almost equally efficient.However, the robustness of the polarity profile method is higher due to the presentation of a more complete polarity profile represented by 16 elements.In order to improve this method, we are working on the subclassification of Gram+/-ONLY, Gram+ ONLY, and Gram-ONLY bacteria in the APD2 database, as well as in the virus group exerting action against HIV, fungi with action against protists and mammalian cells (e.g.hemolytic or cytotoxic effects on those with chemotaxis property).In addition to this, we are also determining the toxicity of all antimicrobial peptides of this database to obtain a new classification by toxicity.This work of subclassification by toxicity appears to us useful as a contribution to other researchers designing prediction algorithms.
Finally, we consider the Polarity Profile method as a simple mathematical and computational algorithm that does not demand heavy computational resources such as processing memory or speed.Therefore, it can be used to explore peptide regions.These peptide regions can be worked out by evaluating massively all possible peptide combinations with the same length (Polanco & Samaniego, 2009;Polanco et al., 2012).

COnCluSIOnS
The computational mathematical method called the Polarity Profile method is a robust and fast method that can be used as a first filter tool in the detection of selective antibacterial peptides (SCAAP).It can be also used in high-performance computing platforms for search patterns in peptide regions.

Table 1 . Classification of amino acids.
Polar amino acids with positive charges have more amino groups as compared to carboxyl groups making them basic.The amino acids, which have positive charges on the R group are placed in this category.Acidic.Polar amino acids with negative charges have more carboxyl groups than amino groups making them acidic.The amino acids, which have negative charges on their R group are placed in this category.They are called dicarboxylic mono-amino acids.These amino acids have equal numbers of amino and carboxyl groups and are neutral.These amino acids are hydrophobic and have no charges on the R group.

Table 2 . Polarity matrix A[i,j].
The Polarity matrix A[i,j] uses 20 amino acid classifications differentiated by their side chains that fall into four polarity groups: [P+] polar, [N] neutral, [P+] basic hydrophilic, and [NP] nonpolar residues (Table1), where each row and column represents (i,j) 16 possible interactions between the groups.

Table 5 .
Positions-matrix P[i,j] ⊗ O[i,j]Array of positions in matrix P[i,j] ⊗ O[i,j] corresponding to the sequence QIINNPITCMTNGAICWGPCPTAFR QIGNCGHFKVRCCKIR.

Table 6 . Positions-matrix P[i,j]
Array of positions in matrix P[i,j] corresponding to 51 identified SCAAP.

Table 7 . Test of Polarity Profile Method.
Test of peptide QIINNPITCMTNGAICWGPCPTAFRQIGNCGHFKVRCCKIR by the polarity profile method (Table9, entry #35).(): The polar interaction is present in the position.(x): The polar interaction is not present in the position.

Table 9 .
Polarity profile matches by linear sequence.Detection of selective antibacterial peptides by the polarity profile method

Table 10 . Catastrophic bifurcations points of the SCAAP group
Set of the SCAAP group.The positions in red represent maximum values and positions in blue represent the minimum points (Section 2.4).

Table 11 . Catastrophic bifurcations points of the bacteria group
Set of the bacteria group.The positions in red represent maximum values and positions in blue represent the minimum points (Section 2.4).