Computer simulations of protein folding with a small number of distance restraints

A high coordination lattice model was used to represent the protein chain. Lattice points correspond to amino-acid side groups. A complicated force field was designed in order to reproduce a protein-like behavior of the chain. Long-distance tertiary restraints were also introduced into the model. The Replica Exchange Monte Carlo method was applied to find the lowest energy states of the folded chain and to solve the problem of multiple minima. In this method, a set of replicas of the model chain was simulated independently in different temperatures with the exchanges of replicas allowed. The model chains, which consisted of up to 100 residues, were folded to structures whose root-mean-square deviation (RMSD) from their native state was between 2.5 and 5 Å. Introduction of restrain based on the positions of the backbone hydrogen atoms led to an improvement in the number of successful simulation runs. A small improvement (about 0.5 Å) was also achieved in the RMSD of the folds. The proposed method can be used for the refinement of structures determined experimentally from NMR data.

Within the frame of the above lattice model one can obtain the native structure of proteins at low or moderate resolution only.Further refinement of the obtained structure can be easily carried out.But the moderate resolution structure is sufficient to study the influence of distance restraints imposed on the chain when looking for the native structure.Thus, the model restraints can serve as a tool in NMR-based model building (Kolinski & Skolnick, 1998).The model of Smith-Brown et al. (1993) was based on restraints as biharmonic potential and secondary structure preferences.In this model the number of restraints was larger than the number of residues.The RMSD of the obtained fold from the native structure was close to 3 Å.Ashodi et al. (1995) built an algorithm using experimental and predicted restraints.The algorithm was very fast but used more than N/4 restraints, where N is the number of residues to get the structures with RMSD close to 5 Å.The later work of Kolinski & Skolnick (1998) showed that for the (3,1,0) lattice model, the number of restraints required to obtain 3-6 Å RMSD folds has to be between N/7 (protein G) and N/4 (plastocyanin).We employed a new Monte Carlo simulation technique called Replica Exchange (RE).It was recently introduced for simulations of spin glasses (Swendsen & Wang, 1986;Hukushima & Nemoto, 1996).Recently, it was shown that this simulation technique can sample very efficiently the conformational space of very simple models of polypeptides and polymers (Sugita & Okamoto, 1999;Hansmann & Okamoto, 1999;Gront et al., 2000;Kolinski, 2002;Sikorski, 2002).It was also found that the RE method was much better for looking for lowest energy structures.Moreover, it could be combined with the Histogram Method and give thermodynamical description of the system (Hansmann & Oka-moto, 1999;Sikorski, 2002).Into the model we have also introduced restraints based on hydrogen atoms.One can expect that the introduction of restraints based on hydrogen atoms should discriminate some structures, which are far away from the native state, e.g.mirror images.This model will hopefully serve as a tool for calculations with real restraints, which come from NMR experiments.

Lattice model
The protein chain was based on a lattice representation of amino-acid side groups (Kolinski et al., 1998;Ilkowski et al., 2000).The model chain consisting of N residues was built of N+1 vectors, each of them of the type (± k, ± l, ± m), where k, l, m = 0, 1, 2, 3, 4, 5 and the length of the vector was limited to be between (9) 1/2 and (30) 1/2 in order to cut off unrealistic distances between side groups.The number of vectors satisfying the above conditions was equal to 646.The lattice unit was chosen as 1.45 Å.

Force field
The potentials used in the force field were much simplified compared to previous models based on the C a backbone chain representation and (3, 1, 0) type lattice.The interaction scheme used here was previously described in detail and thus, we are giving here a brief description only (Kolinski et al., 1998;Ilkowski et al., 2000): (i) Short-range potential.A local stiffness was introduced in order to reproduce a protein-like behavior of the model chain.The potential was sequence independent and concerned five consecutive residues.The potential was realized by a bias towards the formation of helical and extended structures.
(ii) Short-range conformational preferences, which depended on the amino-acid sequence.
These potentials concerned pairs of residues that were in distance of 1, 2, 3 and 4 residues along the chain.These potentials were designed basing on a statistical analysis of conformations of chain fragments in PDB.
(iii) Long-range pair interactions.Going towards a longer distance between a given pair of amino acids, the potential consisted of hard repulsion on distances closer than 4.35 Å, finite repulsion and finally square-well interactions.The value of the last contribution, as well as the cut-off radius, depended on the pair of amino acids.The potential depended also on the mutual orientation of the interacting fragments of the chain.
(iv) Hydrogen bonds.This potential was designed on the basis of geometrical considerations, which are presented in Fig. 1.Two residues were hydrogen bonded when their alpha carbons were in a distance lower than 6.52 Å and the chain fragments were "parallel".In such cases a vector of 4.35 Å length orthogonal to the chain's plane is closer than 2.05 Å to the second alpha carbon.
(v) Centrosymmetric potential which depended on the radius of gyration of the chains.This potential speeds up the folding process by penalizing conformations which are too much extended.
The total energy is the sum of the contributions listed above with scaling factors introduced in order to maintain the proper balance between the short-range and long-range energy contributions.This weighting of potentials gives us a proper secondary structure content at high temperatures and enables us to avoid an artificial bias of the model towards a molten globule state (Ilkowski et al., 2000;Sikorski et al., 2000).

Simulation algorithm
The conformational space of the model chain was sampled by means of the Replica Exchange (RE) Monte Carlo method.Instead of using the Metropolis sampling algorithm (MS) and simulating the gradual annealing of the model system we simulated simultaneously a set of M copies (replicas) of the system, each at a different temperature.The temperature used in this model was simply a factor that scales all potentials of the force field.The temperatures were chosen to cover the range from the random coil state of the chain to a dense folded globule.Replicas can be exchanged and the probability of this transformation depends on the energy difference and temperature difference.The changes of the chain's conformations employed in the RE simulation algorithm were exactly the same as used previously in the MS simulations, i.e. local micromodifications of two and three subsequent residues and shifts of chain fragments for longer distances (Ilkowski et al., 2000;Gront et al., 2000;Kolinski, 2002;Sikorski, 2002).

Distance restraints
In our previous work short-range and longrange restraints were used.Here, we limited our considerations to long-range distances only (Skolnick et al., 1997).Originally, these restraints were based on distances between al- The excluded volume of the chain is also marked as smaller circles: gray (in the plane) and black (out of the plane).pha carbon positions.If a native pair of residues was in contact, the pair potential was lowered by a constant factor.Besides, a harmonic potential was imposed on this pair.
In this work, we tried to introduce more subtle restraints based on the distances between hydrogen atoms connected with the backbone, i.e. hydrogen atoms on nitrogen and alpha carbon.In our model chain the positions of side groups were given explicitly only, but positions of individual atoms can be easily determined.At first, the positions of alpha carbons should be calculated.It was previously shown (Ilkowski et al., 2000) that these positions can be estimated as located on the bisector of 3-residue fragments of the chain (Fig. 2).The distance between the side group center and alpha carbon depends on the type of the residue; for instance, for alanine it is close to 0.72 Å while for valine it is close to 1.5 Å (Ilkowski et al., 2000).The position of the hydrogen connected to alpha carbon can be approximated by a vector of length 0.7 Å and forming the tetrahedral angle with other bonds connected to this carbon atom.The geometrical considerations give us also information about the location of the hydrogens connected to nitrogen atoms (Fig. 3).The restraints based on these hydrogens had similar form as in our previous works (Skolnick et al., 1997;Kolinski & Skolnick, 1998).For dis-tances smaller than a proper cut-off distance (see Table 1 in Kolinski & Skolnick, 1998) the potential is tuned off while for longer distances it takes the form of a simple harmonic potential with a harmonic constant equal to 1.

RESULTS AND DISCUSSION
A series of simulations concerning models of two quite different proteins were carried out.These were protein G and plastocyanin.Domain B1 of Streptococcus protein G (2gb1) consists of 56 residues.Its native conformation is built of a single a-helix and a b-sheet formed by 2 hairpins (Fig. 4).The tertiary restraints used for this model are listed in Appendix A.
In every simulation run 20 replicas of protein G model were simulated covering the temperature range from random coil to collapsed chain (from T = 2.5 to 1.2).After the simulation was completed a further RE run was carried out starting from the conformations obtained in previous simulation but using a smaller number of replicas ( 5) and within the range of lower temperatures, T = 1.5 to 1.0 (around the folding transition which is close to T = 1.3).This simulation procedure was repeated 15 times starting from quite different initial conformations.
In Table 1, we presented the most important results of RE simulations for protein G. Results concerning the model with C a -C a re-  The distance restraints were put on these atoms.
straints were added for comparison in Table 2.The results for both kinds of restraints were qualitatively the same.The number of misfolds and mirror image structures was comparable for both methods.The identification of misfolded as well as mirror image structures could be easily done without any analysis of the chain conformation.The energies of native states were considerably lower    1).
than of other folded structures although quite close to the mirror images.RMSD was on the same level in successful simulation runs with both kinds of restraints.But RMSD of the mirror images was much higher, which was expected.Figure 5 presents the correlations between the total energy of the chain and RMSD in a simulation run.On the left we can easily distinguish points that represent structures close to the native state, i.e., having low RMSD and low energy.The almost folded structures had similar values of energies, which was not surprising as they had almost the same contacts and secondary structures as the native state.But they were located at the right of the figure because of considerably higher RMSD.The lowering of the number of restraints from 7 to 5 led to slight worsening of the obtaining folds (RMSD about 0.5 Å greater).In Fig. 4 we present a folded chain of the 2gb1 protein with RMSD 2.3 Å and the native conformation.One can observe that the topology of the folded chain is proper and most of the secondary structures are also properly folded.
A similar series of simulations were also carried out for a much longer protein chain, i.e. for plastocyanin.Plastocyanin (1pcy) is a protein consisting of 99 residues, which form a b-barrel structure with Greek key topology (Fig. 4).The temperature range of the simulation was the same as for protein G.For plastocyanin we carried out the RE simulation for the restraints listed in Appendix A.
In Tables 3 and 4 we collected the results for the models of the 1pcy chain with atomicbased and C a -C a restraints, respectively.One can observe that for the longer b protein with more complicated topology the results are similar.In general, both types of restraints give similar results, but, compared to the 2gb1 chain, RMSD is higher.Figure 6 presents the energy of plastocyanin chain versus the RMSD.The dispersion of points was similar to that for protein G. On the right side of the picture one can observe a mirror image  structure with higher energy and RMSD.The lowering of the number of restraints from 37 to 15 led to a similar behavior of folds as for protein G. Figure 7 presents an example of folded 1pcy (RMSD 4.2 Å) compared to the native structure of plastocyanin.
It has to be remembered that the number of distance restraints required to obtain a fold with RMSD on the level of 2.5 Å (protein G) and 4 Å (plastocyanin) was lower for the current model than for the model used previously (Kolinski & Skolnick, 1998).For protein G we used 5 and 7 restraints compared with 8 used previously.For plastocyanin we applied 15 and 37 restraints instead of 22 and 46, respectively.
It seems that the introduction of distance restraints based on distances between hydrogens made the simulations easier.This means that the number of MC cycles required to ob-tain proper folds was smaller for atomic restraints.Moreover, the obtained structures were slightly and systematically closer to the native states.This improvement can be explained by the fact that restraints based on hydrogens can eliminate some conformations or even some pathways of folding that do not lead towards proper structures.A study of a broader group of protein models should give us more precise evaluation of the model used.We think that the model developed in this study will become a useful tool for building the structures of proteins from the sparse experimental data obtained by the NMR method.

APPENDIX A
Below we present a list of the tertiary restraints used.The restraints were chosen on the basis of an analysis of the location of atoms in the native structure taken from PDB.The restrains were chosen in such a way that they were put on elements belonging to different fragments of secondary structures.Names of residues and their assignment to secondary structures are given in brackets.3).

Figure 1 .
Figure 1.Geometry of the model chain forming helical (a) secondary structures.

Figure 2 .
Figure 2. Hydrogen bonds.The positions of alpha carbons are determined on the bisector of the chain fragment.Distance between alpha carbon and the side group is based on a statistical analysis of PDB.

Figure 3 .
Figure 3.The positions of hydrogen atoms connected with alpha carbon and nitrogen atoms.

Figure 4 .
Figure 4. Final structure of protein G obtained in the RE simulation compared with the native conformation.

Figure 5 .
Figure 5.Total energy E versus RMSD for protein G for RE simulation run no.7 (5 restraints -see Table1).

Figure 6 .
Figure 6.Final structure of plastocyanin obtained in the RE simulation compared with the native conformation.

Table 4 . Simulation results obtained for plastocyanin (1pcy) using restraints based on alpha carbons
r tot , Number of restraints used in the given simulation run; E min , the lowest energy obtained in the given simulation run; E last , the energy of the last structure in the given simulation run; <cRMSD>, the average coordinate RMSD in the given simulation run; r fin /r tot , number of restraints satisfied in the final structure/number of restraints used in the given simulation run.