relative amino

This paper presents a mathematical-computational toy model based on the assumed dynamic principles of prebiotic peptide evolution. Starting from a pool of amino acid monomers, the model describes in a generalized manner the generation of peptides and their sequential information. The model integrates the intrinsic and dynamic key elements of the initiation of biopolymerization, such as the relative amino acid abundances and polarities, as well as the oligomer reversibility, i.e. fragmentation and recombination, and peptide self-replication. Our modeling results suggest that the relative amino acid abundances, as indicated by Miller-Urey type electric discharge experiments, played a principal role in the early sequential information of peptide profiles. Moreover, the computed profiles display an astonishing similarity to peptide profiles observed in so-called biological common ancestors found in the following three microorganisms; E. coli, M. jannaschii, and S. cereviasiae. The prebiotic peptide fingerprint was obtained by the so-called polarity index method that was earlier reported as a tool for the identification of cationic amphipathic antibacterial short peptides.


InTROdUcTIOn
This paper addresses the question of how primitive information transfer may have developed during the process of amino acid polymerization in the prebiotic world.Taking a minimalist view of such a scenario, one may sketch a starting point consisting of a mixture of relevant amino acid monomers, as well as an environment enabling chemical polymerization reactions between them with a possible support of the catalytic matter.Various evolution scenarios based on these elements have been proposed (Miller, 1953;Fox et al., 1970), among them the suggestion of a sufficiently efficient co-evolution between nucleotides and amino acids to form the "first" peptides (Lambert, 2008).
If we consider the "first peptide" formation not only as an initiation of biopolymerization on the journey to the emergence of life, but also as potential information carriers, the question arises whether the sequential information found in biotic ancestors of our present life could be -at least partially -traced back to prebiotic evolution scenarios as described above.
Hence, we will focus on the dynamics of sequential information generation starting at zero in a world composed of amino acid monomers that can be assumed prototypically to have evolved under conditions suggested by the pioneering Miller experiment (Miller, 1953).We present a mathematical model that describes the evolution of peptide sequences from such a pool of amino acids taking into account a number of intrinsic and dynamic evolution rules.
The intrinsic imperatives concern, on one hand, the composition of various amino acids in the monomer pool that we assume to be related to their relative abundances according to Miller-type experiments and, on the other hand, the nature of the amino acid -amino acid interaction during polymerization that we consider to be governed by their polarities, i.e. by differences in the reaction probability between interactions of, for instance, equally charged vs. oppositely charged amino acids, polar/polar vs. polar/non-polar encounters, and so forth.
The dynamic rules of the model consist of the successive growth of the peptide chain with a certain clock frequency.We allow already formed peptides to break apart into smaller fragments, as well as to recombine or combine with other fragments or with monomers of the pool, which could be associated to the interplay between hydrolysis and condensation reactions in a variable prebiotic environment (Lahav & White, 1980).
Such kind of reversibility results not only in a growing diversity of the peptide population, but also in a higher dynamic flexibility because the fragments carry in their memory relative abundances of their building blocks, and have to obey the polarity rules when reacting with other fragments or monomers.The continuous breaking and merging of the peptide strains could lead to a higher selectivity in respect to the more probable amino acid sequences in the polymers, introducing the evolutionary terms of so-called dynamic combinatorial libraries (Cousins et al., 2000) into the model.
Additionally, we included into the model the basic principles of template-directed peptide self-replication (Issac et al., 2001;Paul & Joyce, 2004).This element of copying peptide fragments from a mother peptide that are subsequently subjected as daughter peptides to the evolutionary modeling process, leads to an acceleration in the peptide formation and can be regarded from a more general but also debated viewpoint as a possible support for a peptide-based molecular evolution on the early Earth.From the dynamic perspective, peptide selfreplication stands for a positive feedback that is assumed to lead, particularly in combination with the reversibility notation, to non-intuitive behavior (Dadon et al., 2008;Oprea et al., 2007), such as perhaps dynamic error correction (Hopfield, 1974) in the course of the sequential information transfer or, in general, to complex dynamic phenomena like bifurcations or catastrophic events during the peptide evolution.
Models that pretend to shed light on prebiotic scenarios usually share the difficulty to be corroborated by experimental data.Our aim was to evaluate if the prebiotic world could have generated sequential peptide characteristics that were transferred to the first living systems.In order to compare the modeling results (prebiotic world) with available sequential records of so-called common ancestors (biological world), we made reference to our former studies on selective antibacterial short peptides, where we developed the polarity index method (Polanco et al., 2012) that delivers a specific fingerprint of the peptides in form of a polarity profile, which was proved as an efficient classification tool.Defining this common point of classification, we will show that the polarity profile of our simulated peptide sequences and that of the indicated biological records display an astonishing similarity.
As outlined in Fig. 1, the formation of peptides starts with a weighted random generation of the amino acid monomers in accordance with amino acid abundances, orientations, lateral chains, and polarities, to subsequently form the peptide chain.
Then, the model comprises the growth of the peptide chain, its breaking, re-combination, and self-replication.Breaking refers to the splitting of the peptide bond and binding of the remaining segments to other amino acid oligomers that are simultaneously built.Self-replication multiplies the evolutionary process through the heritage of segments from the same peptide into processes that generate offspring.This way, the model simulates the peptide building by taking into account specific features of the amino acid polymerization as well as the interaction with a hypothetical aqueous-lipid medium.).Glycine = 0.26%; Alanine = 0.71% total yield of amino acids in the table = 1.90%.Polarity: Type of polarity: Polar amino acids with positive charge (P+), Polar amino acids with negative charge (P-), Neutral (N) and Non Polar (NP).Amino acids in blue are considered in this work as alpha-amino-acids.

Amino acid polymerization
The first amino acid monomer that initiates the peptide chain is chosen randomly from the 21 amino acids presented in Table 1.New amino acids are successively added on one side of the oligomer.Since all amino acids exhibit an electrical charge configuration, the model considers the amino acid polarities as selection criteria for an amino acid monomer to bind or not to bind to the peptide chain.In particular, for the acceptation or rejection of an amino acid, the model takes into account its polarity group according to a classification given in Table 2 that is not unique (Australian Naturophathic Network, 2012).By means of this classification, we take advantage of previous works related to the prebiotic molecular outline of peptides through Markov processes (Polanco & Samaniego, 2009).In these, the polarity classification is set in four groups [P-] polar, [N] neutral, [NP] non-polar, and [P+] basic hydrophilic.Thus, the stochastic polarity matrix P[i,j] is determined by the indexes [i,j] corresponding to a specific row/column {P+, P-, N, NP}.The corresponding interaction values are stated in Table 2.The polarity matrix was outlined in a previous work (Mosqueira et al., 2012) and anticipated for the possible polymerization of amino acids under anhydrous conditions in a prebiotic Earth environment.
The model simulates the polarity interaction among amino acids by counting the number of times at which this interaction occurs over the whole simulation process.The criterion to accept the binding of an amino acid is given when the interaction between the candidate amino acid from the monomer pool and the terminal amino acid of the peptide chain of a certain polarity group reaches the figure given in Table 2.For instance, if the interaction between two amino acids is [D] -[S], the corresponding polarity value is located in row 2 column 3 of Table 2, [i,j] = (2,3).Therefore, the polarity interaction value [P-][N] is 0.15, which is equivalent to 85.The model recreates this interaction by adding up one unit to the interaction counter.Once the indicated value is reached, the amino acid binding [D] -[S] will be accepted.
The model takes into account the absolute amino acid abundances as a bias for the binding interaction between the candidate and terminal amino acid.For this purpose, the model uses again a counter that adds up the number of times of potential interactions (Table 3).Like in the previous case, unless this number is not reached, the binding will be rejected.For example, if the terminal amino acid of the peptide chain is P and the candidate amino acid to be bound is T, the interaction is only accepted when the abundance counter reaches the value 987, which is the number corresponding to the amino acid T.
The general structure of an amino acid is given by the presence of a central carbon attached to the carboxylic group, amino group, hydrogen, and the side chain.Those amino acids that are differentiated by their amino groups (-NH 2 ) located one, two, or three atoms in proximity to the carboxylic group (-COOH), are called alpha-, beta-, and gamma-amino acids, respectively (Herrera, 1993).The model establishes the absolute abundance of the beta-and gamma-amino acids 0, 4, 5, 6, and 9 at 15% (see Table 1), thereby encouraging these elements to be part of the simulated peptides.

Peptide splitting and merging
The processes of splitting and merging of the peptides, possibly due to the presence of a variable aqueouslipid environment, constitute the dynamic aspects of the model.It comprises splitting of the peptide in construction by cutting it at a random position and adding the segment to a so-called cutting record.The splitting probability of each peptide is defined by the function C (L) that decreases inversely to the length of the peptide, where, e = 2.7183, and L is the peptide length at any particular moment.
After the random splitting, the model keeps one part of the peptide and transfers the other part to a cutting record, in which all these segments accumulate.All peptides can split again where the corresponding part is added to the same cutting record (Fig. 2a).For instance, if the peptide GVVLAAASE20T was constructed and then it splits in the position 9, the segment GVVLAAASE will be transferred to the cutting record and the construction peptide becomes 20T that is now 3 amino acids long.
The second aspect of reversibility is peptide recombination.It increases the size of the peptide in construction by adding segments from the cutting record.A peptide that was split can access the cutting record  anytime, and can add one of these segments.However, this proceeds only as long as the amino acid at the right side of the segment in the cutting record and the left side of the amino acid from the peptide in construction match the amino acid polymerization criteria described above.In case these criteria are met, the segment is withdrawn from the cutting record and added to the left-side ending of the peptide in construction (Fig. 2b).
For example, if the model has constructed the peptide LTKSAGVA7, it searches for a segment in the cutting record between for instance GTK, 0AGAASV, and 4CS56G.Each of these segments will be read and analyzed forward and backwards consecutively.Then, the model verifies the amino acid interaction of each segment in the cutting record with the peptide in construction GTK-LTKSAG-VA7, GTK-0AGAASV, and GTK-0AGAASV.In case the first segment is approved, according to the selection criteria, it will be added to the left side of the peptide in construction.If the first segment does not satisfy the criteria mentioned above, the next segment will be chosen and analyzed accordingly.If all segments are rejected or the cutting record has no elements, the polymerization procedure restarts by adding one amino acid monomer at the time.

Peptide self-replication
Self-replication represents the evolutionary aspect of the modeling and signifies a very high demand for computational resources.As illustrat-  ed in Fig. 3, replicates of oligomer segments are created and are then subjected to the amino acid polymerization mechanism.The replicates are generated by randomly selecting a segment from the "mother" peptide in construction.Both the "mother" segment and its replicate share the cutting record.For instance, if the peptide A432PNDCEG34GSLD6LKEEPS77YV is constructed, a segment copy is generated, e.g.G34GSLD6LKE, and transferred as a seed to a parallel program.Then in this parallel program the peptide building mechanism starts with the seed G34GSLD6LKE, and not with the amino acid monomers.
The introduced segment replication mimics the selfreplication of peptides.As a consequence, the sequen-tial information of a "mother" peptide that has evolved under certain building criteria, such as amino acid abundances and polarity interactions, is recycled and provides a memory of these criteria for the newly formed oligomers.Such mechanism could lead to an evolutionary advantage of certain amino acid sequences through amplification.

Prebiotic code composition
Any comparison between the sequential information generated by the simulation and that of experimental data, such as the sequential records of the so-called common ancestors, tackles with non-equivalence amongst the amino acids.Today's peptides do not exhibit the prebiotic code composition anymore.It is unknown so far, how the 21 constitutive amino acids from the late prebiotic world evolved to the 20 amino acids related to the genetic code of the present times.Taking into account this non-equivalence, a conversion (Table 4) was used to match the corresponding polarity groups of the prebiotic and biotic amino acids.
To illustrate the use of this conversion, we need to consider the peptide sequence G34GSLD6LKE, where the amino acid G is assigned to the polarity equivalence 3, the amino acid 3 to the equivalence 4, etc., until the complete sequence is translated to 34433424412.Once the peptide sequence has been converted into its equivalent series, the polarity matrix P[i,j] is built.Each element [i,j] accumulates the occurrence that is obtained by reading the series 34433424412 from left to right, and taking as element [i,j] the pair of numbers found by moving one digit at the time through the series.As an  example, the polarity matrix P[i,j] corresponding to the series 34433424412 is expressed in Table 5.
When the polarity matrix P[i,j] is concluded, it will be normalized to one and stated in its linear form to express the relative frequencies of the 16 elements in 16 columns giving rise to a normalized relative frequency histogram.
Since there is no available data of prebiotic amino acid sequences, we made reference to best preserved genes from the following three microorganisms: E. coli, M. jannaschii, and S. cereviasiae, considered as the so-called common ancestors of approximately 2.5 billion years ago (Delaye et al., 2005).This set was used to compare the peptide tendency generated by our simulations.

RESulTS ANd diScuSSiON
Simulations of the first six peptide generations showed that the obtained polarity index corresponded to {4, 9, 12, 16} -{1, 6, 10, 14}.However, as shown in Fig. 4, a successive decrease of the maximum for the polarity interaction [N] -[P+] in position 9 of the polarity matrix that goes along with a corresponding growth in the maximum of the interaction [P-] -[NP] in position 8 can be observed.This tendency was confirmed by additional simulations reaching 15 peptide generations.Hence, it can be assumed that the maximum in the polarity profiles of higher-order generations switches from the interaction 9 to 8 when the number of generations n tends to infinite.Accordingly, for a higher-order generation that could reflect a prebiotic scenario the polarity index method should result in {4, 8, 12, 16} -{1, 6, 10, 14}.
As a demonstration of the main features of our simulations, Fig. 5 shows the obtained relative frequency histogram of prebiotic peptides by simulation in comparison to that of best-preserved genes from E. coli, M. jannaschii, and S. cereviasiae reported by Delaye et al. (Delaye et al., 2005).The data set corresponds to 20 completely sequenced cellular genomes from eubacteria, archaebacteria, and eukaryotic nucleocytoplasm that are now referred to as new taxonomic categories as Bacteria, Archaea, and Eucarya.The data was obtained by twice oneway BLAST searches in order to define the set of the most conserved proteins encoding sequences to characterize the gene complement of the last common ancestor of extant life.Our simulation results show an astonishingly good agreement with this reported data.
The polarity index identified for common ancestors is {4, 8, 12, 16} -{1, 6, 10, 14}.The difference between this data and the simulation results for the occurrence frequency from 16 polarity interactions did not exceed 3% in 14 out of 16 polarity interactions.The only exceptions were the interactions 8 and 9 that can be understood by the already described successive maximum shift between the polarity matrix positions 8 and 9.We also observed that the positions 4, 8, and 12 of the polarity matrix correspond to sudden changes in the tendencies of the polarity graphic.This means that if the tendency was ascendant before any of these positions, it became descendant after them and vice versa.Such effect did not occur in any other position of the polarity matrix where the before-after tendencies remain the same.Those points located in the polarity matrix can be characterized as catastrophic bifurcation points.
To evaluate the robustness of the simulations, a number of variations in the initial conditions and parameter values were conducted.In particular, we tested variations in the amino acid abundances.These refer to the initial conditions of the simulations taking into account a possible abundance divergence in the prebiotic world.Changing the preference for the amino acid side chain by setting the absolute abundances of these amino acids to different values between 3% and 25% altered the polarity profile of less than 3% in 16 interactions.Moving glycine (G), which is placed second in the absolute  amino acid abundances with 440 μmol, from the neutral (N) to the non-polar (NP) polarity group, caused a similarly small change.In contrast, significant changes in the polarity profile of up to 30% were observed when the assumed abundances for alanine (A, 790 μmol), glycine (G, 440 μmol), and α-amino-n-butyric acid (9, 270 μmol) were varied by up to 20%.Fig. 6 shows the performance of alanine that is similar to that of glycine and α-aminon-butyric acid.Equivalent procedures for the amino acids with low abundances resulted in insignificant changes of the polarity profile.
The importance of the amino acid abundances in the peptide composition may imply that the prebiotic composition constituted the main element in the generation of sequential information.It also reinforces the idea that biochemical processes produce changes in the polarity interactions that induce the specialization of peptides.Consequently, the vast majority of peptides act as templates, which could explain why their toxicity is generalized and not specialized when analyzing peptides from antimicrobial databases (Wang et al., 2009).
The predicted by simulation polarity profile, and that of the best-preserved genes, is almost identical.Based on this correspondence, one may speculate that the "first peptides", i.e. those that were generated during the course of the chemical evolution about 1.5 billion years before the best-preserved genes have evolved, had already a defined structure.In consequence, this thought could support the so-called panspermia scenario for the origins of life on Earth where already more complex peptides came from outwards.
The dynamical features of the peptide generation are reflected in the polarity profiles by the presence of catastrophic bifurcation points (Arnold, 1974;Thom, 1975).We approached the polarity profiles by sevendegree continuous function graphics and identified the turning points where the graphics change their ascend-ant or descendent tendency as well as their concavity.These changes are distinctive dynamic features and allow for the recognition of the catastrophic bifurcation points.Such dynamic bifurcation analysis applied to the polarity matrix of the simulated peptides and of best preserved genes identified the catastrophic bifurcation points {4,8,12} in two sets (Fig. 7), which are clearly associated to the last elements of the first three rows of the polarity matrix.This coincidence in the location of the bifurcation points may indicate the existence of an algebraic structure associated to the polarity matrix and could support the assumption that the catastrophic bifurcation points stand for regions, in which the definition of the peptide functionality takes place.Depending on its validation for other peptide sets, for instance those with pathogenic actions, the dynamic bifurcation analysis could become an unprecedented mathematical contribution to the field of proteomics.

cONcluSiONS
In this paper, we have presented a toy model that takes into account the assumed dynamic elements of prebiotic peptide evolution is able to generate sequential information that is close to that of best-preserved genes.The main impact for this sequential information appears to originate from the prebiotic amino acid abundances.A more detailed modeling of a prebiotic scenario to recreate the peptide generation certainly requires a thorough study of a possible peptide/nucleotide co-evolution and the biochemical factors regulating the aqueous-lipid medium.However, our results suggest that even a minimalist model that includes the elements of self-replication and peptide splitting/recombination can possibly predict the main factors responsible for the generation of peptide sequential information.
In addition to the mathematical-computational modeling, it was required to design a method to equip the peptide profiles with a quantitative parameter by considering their components without using the specific prebiotic elements or amino acid identification.During our former research, we designed the so-called polarity index method that uses only the constitutive amino acids of the peptides and their electrical charges.This method required only the translation of the linear sequence into its electrical charge or polarity equivalence.This way, the peptide sequences were evaluated based on their polarity, which allowed for the comparison of peptides with different components.
We are aware that each modeling approach remains restricted by the consideration of a limited amount of variables.Hence, our model does not account for more microscopic details of sequential information generation.However, we believe that our method and the obtained results are useful as a contribution to the fundamental question about the origins of life, as well as for the developing field of proteomics.The programming structure of the model allows for the addition of other important prebiotic aspects such as the origin of homochirality of the biopolymers that are planned for future studies.

cOMPUTATIOnAl ReSOURceS
The computer program was written in Fortran 77 and executed on a Fedora 14 Unix-type platform (GNU).The program is freely available at request at polanco@ unam.mx.Its implementation was optimal in computers of four or more processors in shared memory.The pro-

Figure 1 .
Figure 1.Sketch of the simulated peptide evolution comprising a weighted random generation of amino acid monomers (1), successive growth of the peptide chain (2), peptide fragmentation and recombination (3) and, template-directed self-replication (4).

Figure 2 .
Figure 2. Illustration of peptide fragmentation and recombination.(a) After random splitting, peptide fragments are added to a shared pool (cutting record), while the other part remains subject to further chain growth; (b) Peptides under construction combine with fragments of the shared pool and increase their chain lengths.

Figure 3 .
Figure 3. Peptide self-replication as considered in the model.Starting with a mother peptide (a), oligomer segments (b) are randomly created that are then subjected to the amino acid polymerization mechanism (c-d).

Table 5 .Figure 4 .
Figure 4. comparison of polarity profiles between the first and sixth protein generation obtained by simulation.

Figure 5 .
Figure 5. comparison of polarity profiles between simulated peptides and those of best-preserved genes from three microorganisms (delaye et al., 2005).

Figure 6 .
Figure 6.effect of the alteration in the absolute abundance by 20% in the alanine polarity profile in simulation.

Figure 7 .
Figure 7. catastrophic bifurcation points 4, 8, 12, and 16 that coincide with the change in the polarity matrix row P[i,j|.

Table 1 . Amino acids used for the simulations with their corresponding polarity groups.
Protein amino acids are symbolized with a letter and non-protein amino acids with a number.
4 (336 mmoles), N 2 , and H 2 O with traces of NH 3 (yield based on the carbon added as CH 4

Table 3 . Absolute amino acid abundances.
Abundance: (in μmol).Yields from sparking CH4 (336 mmol), N2, and H2O with traces of NH3 (yield based on the carbon added as CH4.Glycine = 0.26%; Alanine = 0.71% total yield of amino acids in the table = 1.90%.Number of times: modeling equivalence with respect to the amino acid amount in μmol (see text).

Table 4 . Amino acid polarity group classification.
Source: Number and correspondent polarity group.Protein amino acids can be identified as they are symbolized with a letter, while non-protein amino acids are symbolized with a number.Polarity: Type of polarity: Polar amino acids with positive charge (P+), Polar amino acids with negative charge (P-), Neutral (N) and Non Polar (NP).Numeric equivalence: representative numeric value for each amino acid according to its polarity.