QUARTERLY Review Structural studies of cysteine proteases and their inhibitors ��

Cysteine proteases (CPs) are responsible for many biochemical processes occurring in living organisms and they have been implicated in the development and progression of several diseases that involve abnormal protein turnover. The activity of CPs is regulated among others by their specific inhibitors: cystatins. The main aim of this review is to discuss the structure-activity relationships of cysteine proteases and cystatins, as well as of some synthetic inhibitors of cysteine proteases structurally based on the binding fragments of cystatins.

enzyme should contain a reducing component.Glutathione serves as an activating agent in cells, whereas addition of mercaptoethanol or dithiothreitol is required for in vitro experiments.
The best characterized family of cysteine proteases is that of papain.The papain family contains peptidases which are structurally related to papain, like for example lysosomal cathepsins.Papain is characterized by a two-domain structure (Fig. 1).The active site (catalytic pocket), where the substrate is bound, is located between the domains.The catalytic residues of papain are Cys 25  and His 159 , and they are evolutionarily preserved in all CPs.Following a proposal by Schechter & Berger [6], the substrate pocket of papain binds at least seven amino-acid residues in appropriate subsites (Fig. 2).Recently, Turk et al. [7] have proposed, on the basis of kinetic and structural studies, that only 5 subsites are important for substrate binding.The S 2 , S 1 , and S 1 ¢ pockets are important for both backbone and side-chain binding, whereas S 3 and S 2 ¢ are crucial only for amino acid side-chain binding.
Enzymatic activity of cysteine proteases is related to the presence of a catalytic diad formed by the cysteine and histidine residues which in the pH interval 3.5-8.0,exists as an ion-pair -S -...H + Im- [8,9].Formation of an intermediate, S-acyl-enzyme moiety, is a fundamental step in hydrolysis.This intermediate is formed via nucleophilic attack of the thiolate group of the cysteine residue on the carbonyl group of the hydrolyzed peptide bond with a release of the C-terminal fragment of the cleaved product.In the next step, a water molecule reacts with the intermediate, the N-terminal fragment is released, and the regenerated free papain molecule can begin a new catalytic cycle [10].
CPs are responsible for many biochemical processes occurring in living organisms.The main physiological role of CPs is metabolic degradation of peptides and proteins.Mammalian cysteine proteases have been implicated in the development and progression of many diseases that involve abnormal protein turnover [11][12][13][14][15].The activity of cysteine proteases is regulated by proper gene transcription and the rate of protease synthesis and degradation, as well as by their specific inhibitors.Figure 2. Substrate subsites of papain [6].

CYSTATINS
Many natural protein inhibitors of cysteine proteases, called cystatins, have been isolated and characterized.They act both intra-and extracellularly forming complexes with their target enzymes.Maintenance of appropriate equilibrium between free cysteine proteases and their complexes with inhibitors is critical for proper functioning of all living systems.In this role, cystatins are general regulators of harmful cysteine protease activities.The roles of cystatins in health and disease have been reviewed by Henskens et al. [13] and Grubb [14].
The human superfamily of cystatins is divided into three families.Family I, called stefins, comprises intracellular cystatins A and B. Family II includes extracellular and/or transcellular cystatins (cystatins: C, D, E, F, S, SA, and SN).Kininogens, the intravascular cystatins, form family III of cystatins.

Stefin family
The stefin family comprises the following inhibitors: human stefin A (hCA), human stefin B (hCB) [16][17][18], as well as their analogues from rat [19], bovine [20,21] and porcine [22] tissues, and from some plants [23].Stefins are proteins of about 100 amino-acid residues (molecular mass about 11 kDa) which do not contain any sugar moiety or disulfide bridge.There is one cysteine residue in the stefin B sequence, and it can be converted into an intermolecular disulfide bridge (-Cys 3 -Cys 3¢ -) resulting in formation of inactive dimers, easily transformed back into active monomers at reducing conditions [24,25].Immunohistochemical studies have shown the presence of stefin A in skin and epithelium, suggesting that the major function of stefin A is related to protection of these organs against overreactivity of cysteine proteases [26].On the other hand, stefin B is distributed in many tissues, which suggests that this inhibitor interacts with cathepsins liberated from lysosomes [27].Both inhibitors have been found in all human fluids, but at a small concentration [28].The gene coding for human cystatin A has been assigned to chromosome 3 [29], and that for human stefin B to chromosome 21 [30].

Cystatin family
The cystatin family comprises the following human cystatins: C (hCC), D (hCD), E (hCE), F (hCF), S (hCS), SA (hCSA) and SN (hCSN).Their homologues have also been found in other mammalian organisms and birds.Chicken cystatin has been used in defining the superfamily of cystatins [31,32].Human cystatins are coded on chromosome 20 [33].They consist of 120-122 amino-acid residues and are synthesized as proproteins containing a signal peptide (20 residues), which suggests that cystatins display an extracellular activity [34].The cystatins contain two disulfide bridges and most of them are not glycosylated.Cystatins S, SA, and SN (S-type cystatins) consisting of 121 amino-acid residues with molecular mass of 14.2-14.4kDa display high sequence homology (90%).Post-translational phosphorylation of cystatins S and SA leads to formation of several isoforms.Expression of these cystatins is very restricted: cystatin SN has been found only in saliva and tears, whereas variants S and SA are also present in seminal fluid [35].Cystatin D has also been found mostly in saliva and tears.Fully active cystatin D, formed after removal of the 20-residue signal peptide, consists of 122 aminoacid residues with molecular mass about 13.8 kDa.The protein exists in 2 polymorphic forms: [Cys 26 ]hCD and [Arg 26 ]hCD which have identical activity, stability, and distribution.The quite low homology with other cystatins (51-55%) suggests that, on a phylogenetic tree, cystatin D is located between cystatins S and C [36,37].Human cystatin E (hCE, [38]), also described in the literature as cystatin M [39], and human cystatin F (hCF, [40]), also called leukocystatin [41], are released from appropriate proproteins containing signal peptides (28-mer for hCE and 19-mer for hCF).Cystatins E and F are glycoproteins built of 122 and 126 amino-acid residues, respectively.Their structures display low homology to the second family of cystatins: 26-34% in a case of hCE and 30-34% in a case of hCF.Unlike other mem-bers of the second family, cystatin F contains an additional, third, disulfide bridge stabilizing the N-terminal fragment of the protein.Tissue-distribution profile studies have shown that the highest concentration of hCE is in the uterus and liver [38] and that of hCF in spleen and leukocytes [40].
Human cystatin C (hCC), formed after removal of a 26-residue signal peptide, is a protein of 120 amino-acid residues with molecular mass of 13.4 kDa [42,43].In contrast to other members of the family, hCC is a basic protein (pI = 9.3) [44].It is widely distributed in all physiological fluids.The highest amounts were found in seminal plasma and cerebro-spinal fluid, and much lower concentrations were observed in tears, amniotic fluid, saliva, milk, and blood plasma [45].The wide distribution and high inhibitory potency of cystatin C suggest that this protein is a major cysteine protease inhibitor.

Kininogens
The third family consists of 3 members: human high molecular mass kininogen (hHK), about 120 kDa; human low molecular mass kininogen (hLK), about 68 kDa; and kininogen T, discovered so far only in rats [46].Kininogens hHK and hLK are glycoproteins released as proproteins containing a signal peptide (18 amino-acid residues) [3].The highest concentration of kininogens is found in blood plasma and synovial fluid [45].

Other cystatins and cystatin-like proteins
A number of proteins have been described which, in spite of high sequence homology, show distinct differences in structure and biological activity in comparison with cystatins.Histidine-rich glycoproteins (HRG) and fetuins are examples of such cystatin-like proteins.Both HRG and fetuins did not inhibit cysteine proteases.The subject of cystatins and cystatin-like proteins has been reviewed by Brown & Dziegielewska [47].Conversely, there are also proteins, like the intensely sweet plant protein monellin which, in spite of very low sequence homology and lack of inhibi-tory function, have a cystatin-like three-dimensional structure [48,49].

Cysteine protease-cystatin interaction
Numerous spectroscopic, kinetic, and crystallographic studies have been carried out to explain the mechanism of cysteine protease inhibition by cystatins.The results have shown that the inhibitor binds in a one-step process that is simple, reversible, and second-order type.In addition, those studies have revealed that enzymes with a blocked active centre could still bind cystatins, albeit with lower affinity [50][51][52].This indicates that cysteine protease-cystatin interactions are not based on a simple reaction with the catalytic cysteine residue of the enzyme, as is typical of substrates, but that they consist of hydrophobic contacts between the binding regions of cystatins and the corresponding residues forming the binding pockets of the enzyme.Despite their structural homology and similar mode of inhibition, cystatins display quite different enzyme affinities (Table 1).
From functional studies of cystatin C it was concluded that the N-terminal fragment containing 11 amino-acid residues is important for the inhibitory activity of hCC [44].Our early studies with synthetic peptides corresponding to the N-terminal sequence of hCC showed that they were very good substrates of papain, and that the cleavage took place at the Gly 11 -Gly 12 peptide bond.We have concluded that Arg 8 , Leu 9 and Val 10 from the N-terminal segment of cystatin C interact with papain substrate-pocket subsites S 4 , S 3 and S 2 , respectively [56].This was further confirmed when the three-dimensional structure of cystatins was solved.
So far, only three 3D-structures of cystatins have been published: the crystallographic structures of N-truncated chicken cystatin C [57] and of a complex of human cystatin B (stefin B) with papain [58], as well as an NMR-structure of human cystatin A (stefin A) [59,60].From the structure of the complex between papain and stefin B, it is evident that the interactions between the enzyme and cystatins are formed by the amino-acid residues from the N-terminal segment (occupying S n subsites of the enzyme) as well as by two additional fragments in b-hairpin loops: one in the middle and one in the C-terminal segment of the protein.These three cystatin regions, containing evolutionarily conserved amino-acid residues (Table 2), form a wedge-like structure, which inter-acts with the catalytic cleft of cysteine proteases [58].It has been also proposed that the hydrophobic amino-acid residues from the first loop, as well as the tryptophan residue from the second loop, occupy the S n ¢ subsites of the enzyme.Kininogens, which have three cystatin-like domains, display also high affinity for papain and cathepsins

Cystatin
Sequences in parenthesis correspond to the appropriate binding sequences of cystatins.
(Table 1).However, the first domain, which lacks the evolutionarily conserved N-terminal and b-hairpin-loop residues (Table 2, sequences in parenthesis), has no inhibitory activity against cysteine proteases.

HUMAN CYSTATIN C (hCC)
Human cystatin C (hCC; also called g-trace, post-g-globulin, gamma-CSF and post-gamma protein) was the first cystatin to be sequenced [42].It is recognized as the most physiologically important extracellular human cystatin.Its primary structure consists of a single non-glycosylated polypeptide chain of 120 amino-acid residues (Fig. 3).The cysteine residues at positions 73 and 83 and those at positions 97 and 117 form two internal disulfide bridges.The nucleotide sequence of hCC has been determined and localized on chromosome 20 [61,62].Human cystatin C is present in all extracellular fluids.The highest concentration was found in seminal plasma (50 mg/L) [63], whereas normal blood plasma contains 0.8-2.5 mg/L of hCC [64].The level of serum cystatin C is used now as an endogenous marker of renal function [64,65].hCC is an effective reversible inhibitor of cathepsins B, H, K, L and S [52].The affinity of hCC for papain is too high to be measured by equilibrium methods.Therefore, the dissociation constant, K D = 1.1 10 -14 M (Table 1), for the hCC-papain complex was calculated from the association and dissociation rate constants [53].

Structure-activity relationship studies (SAR)
Cystatins contain three segments which are recognized as responsible for the interaction with cysteine proteases.These are the N-terminal fragment and the so-called first and second loops, which are arranged at one edge of the molecule and are believed to directly interact with the catalytic cleft of CPs.It has been shown in early SAR studies that truncation at the Gly 11 -Gly 12 peptide bond decreases the affinity of hCC for papain by three orders of magnitude [44,66].The importance of the N-terminal segment of hCC for its interaction with CPs was further confirmed by the studies of the rate of hydrolysis of appropriate synthetic peptides.Fragments comprising residues Gly 4 -Glu 21 , Arg 8 -Asp 15 , and Arg 8 -Gly 12  were all cleaved completely by papain at the Gly 11 -Gly 12 bond within less than 60 s, whereas the corresponding bond of the peptide comprising residues Gly 11 -Asp 15 was uncleaved even after 15-h incubation [56].From these data, we postulated that the N-terminal fragment of hCC is involved in the inhibitor-enzyme interaction, and that the major contribution to the total affinity is through the binding of inhibitor residues Arg 8 , Leu 9 , and Val 10 in the substrate subsites S 4 , S 3 and S 2 of the enzyme [56].This was further corroborated by SAR studies with hCC variants [66][67][68].The side chain of Val 10 has the most important contribution to the affinity of the N-terminal fragment of hCC for cathepsins.It was also shown that Leu 9 is the most discriminating residue for selective binding of hCC to cathepsins B, H, L, and S [68].Exchange of the absolutely conserved Gly 11 residue for other amino acids generally leads to a sharp decrease of the inhibitory potency [68], indicating that this residue may function as a hinge between the conformationally flexible N-terminal segment and the rest of the molecule [69,70].
Structure-activity relationship for the remaining two binding segments of hCC (Gln 55 -Gly 59  and Pro 105 -Trp 106 ) has been studied less extensively.Substitution of Trp 106 by Gly decreases the affinity for cathepsin B and H by approximately three orders of magnitude [68,69].The Trp 106 ® Gly 106 substitution, when combined with a change in the N-terminal sequence of hCC, leads to a further sharp decrease of the inhibitory potency [66].

Leu68Gln mutation
One point mutation with glutamine residue substituting leucine at position 68 of hCC (Leu68Gln) is now recognized as a disease-causing disorder which leads to amyloid deposits in cerebral blood vessels [71][72][73][74].This disorder known as hereditary cystatin C amyloid angiopathy (HCCAA) results in paralysis and development of dementia due to multiple strokes and death [74].Indeed, it was shown that under various conditions the Leu68Gln mutant displays much higher tendency to dimerize and aggregate than wild-type hCC [75][76][77].

Dimerization, oligomerization
Early studies on thermal stability have shown that human cystatin C readily undergoes dimerization with complete loss of its inhibitory activity [77].At a temperature above 80°C hCC aggregates, and consequently precipitates.Self-association of hCC was further evident when the protein was treated with various denaturing agents [76,77].NMR studies of human cystatin C have shown that it can form dimers through structural changes in its native fold [69,77].We have studied the structural changes of hCC occurring during both thermal and chemical denaturation processes.Chemical denaturation (with guanidine hydrochloride, Gdn×HCl) was examined by two spectroscopic methods: circular dichroism (CD) and tryptophan fluorescence [78].To observe protein unfolding induced by heating, Fourier-transform infrared spectroscopy (FT-IR) was applied.
The obtained results indicate that unfolding of cystatin C caused by a denaturing agent is a complex process, characterized by two transition states (Fig. 4).The first one appeared in the con-centration range of 0.5-1 M Gdn×HCl, the same as that interpreted by Ekiel & Abrahamson [77] as indicating the existence of dimeric cystatin C in their NMR studies.Thus, it can be concluded that the intermediate detected in our measurements is also a dimer.
In the first transition state we did not detect any changes in the tertiary structure of cystatin C. Also very few changes were observed in the a-helix content.The only secondary structure motif exhibiting conformational changes after dimer formation, was the b-sheet.The most probable explanation of this fact is that b-strands participate directly in the formation of the dimeric molecule.After reverse conversion of the dimers into monomeric molecules, a dramatic loss of b-sheet content connected with the changes in the secondary and tertiary structure of cystatin C occurred.At a concentration of 2.5-3 M Gdn×HCl the second transition state, stabilized by partially recovered tertiary interactions, was detected (Fig. 4).However, it was only a temporary state preceding complete unfolding of cystatin C.
To study thermal denaturation of cystatin C, FT-IR spectroscopy was applied.The measure-Vol.48 Structural studies of cysteine proteases and their inhibitors 7 ments were performed for dry protein prepared by evaporation of a water solution.Oberg & Fink have reported [79] that solvent evaporation should not change the protein structure in solution.To confirm this statement, we carried out experiments at 35°C for dry cystatin C and cystatin C dissolved in water.The results reveal that at 35°C the protein structure in both cases is almost the same (Table 3).However, at higher temperatures no conformational changes in solid cystatin C could be detected.The dry protein retained its native state during the whole heating process.

Structural studies of hCC
Our early molecular modeling studies on human cystatin C [80] have shown that the energy-optimized structure of hCC is very close to the crystallographic structure of chicken cystatin [57].The results of fluorescence studies indicated that the Trp 106 residue is fully exposed to solvent.We found that, apart from Trp 106 , the main contribution to fluorescence comes from Tyr 62 and Tyr 42 .The remaining tyrosine residues (Tyr 34 and Tyr 102 ) are efficiently quenched as a result of energy transfer to the Cys 97 -Cys 112 disulfide bridge (Tyr 34 ) and tryptophan (Tyr 102 ) [80].
Development of a second generation of more effective, specific cysteine protease peptide inhibitors would be greatly facilitated by the knowledge of three-dimensional structure of hCC.Similarly, such a model is necessary for the elucidation of the pathophysiological background of the cerebral hemorrhage produced by hCC, particularly its L68Q variant.
Crystallographic and NMR studies of chicken cystatin [57,81,82], cystatin B in complex with papain [58], cystatin A [60], and human cystatin D [83], have shown a similar overall structure, with three regions implicated for inter-actions with the target enzymes.Those regions include the N-terminal segment and two hairpin loops, L1 and L2.The general fold of protein inhibitors belonging to the cystatin family has been defined by the crystal structure of chicken cystatin [57].Its canonical features include a long a1 helix running across a large, five-stranded antiparallel b sheet.The connectivity within the b sheet is as follows: (N)-b1-(a1)-b2-L1-b3-(AS)-b4-L2-b5-(C), where AS is a broad "appending structure", rather unrelated to the compact core of the remaining part of the molecule and positioned on the opposite end of the b sheet relative to the N-terminus and the two short loops L1 and L2.The latter three elements are aligned in a wedge-like fashion in the inhibitory motif of cystatins.Chicken cystatin shows 41% sequence identity and 62.5% homology to hCC but the crystal structure corresponds to an N-truncated variant [57].On the other hand, the eleven N-terminal amino-acid residues of hCC are important for its very high-affinity binding to papain [52] (K i 11 fM) and to other cysteine proteases [28].It has been shown that specific cleavage, by leukocyte elastase, of the single N-terminal Val 10 -Gly 11 bond of hCC results in seriously compromised affinities for such target enzymes as cathepsin B, H, and L [84].
It is interesting to compare the topology of cystatin with that of the intensely sweet plant protein, monellin.The structural similarity has been noted before in spite of the low sequence identity [48,49].However, natural monellin consists of two protein chains: chain B, corresponding to helix a1 and strands b1 and b2 (in the order b1-a1-b2), and chain A, corresponding to the remaining, prominent part of the b sheet.The N-terminus of chain A and C-terminus of chain B are close in space and seem to be the product of proteolytic cleavage of a single-chain protein [48] 8 Z. Grzonka and others 2001 in a region that corresponds to cystatin loop L1 of the inhibitory "wedge".An artificial tethered B-A protein retains the taste and conformation of natural monellin [49].
There are two disulfide bonds in human cystatin C (Cys 73 -Cys 83 , Cys 97 -Cys 117 ) and in all other proteins of family 2 and 3 cystatins [85].Both are located within the b region of the chicken protein structure, in the C-terminal half of the molecule that would correspond to chain A in monellin.The conservation of these two S-S bridges in family 2 and 3 cystatins [45,86] may be interpreted as implicating their requirement for stable protein fold.However, there are no disulfide bridges in family 1 cystatins or in monellin.In the structure of chicken cystatin there are two b-bulges, in strands b2 (Arg 46 ) and b5 (Leu 111 ), of the b sheet.They are preserved in the other structural models of cystatins, and also in monellin.The "appending helix" of chicken cystatin is disputable.It is only loosely connected with the molecular core and in the segment Cys 71 -Lys 91 is very poorly defined.
In particular, the Lys 73 -Leu 78 fragment was weakly defined and tentatively placed, while in electron density the Asp 85 -Lys 91 peptide was not defined at all as it is completely disordered.In spite of that, the Asp 77 -Asp 85 fragment was modeled as helix a2.This a helix is not seen in the preliminary structure of human cystatin D [83] or in the structurally homologous monellin.Also, in the NMR studies of cystatins [69], no helical conformation has been found for this fragment either in chicken or human cystatin C. It appears that this fragment must be rather disordered in solution.
Crystallization of human cystatin C has been a challenge for a long time.Recently, formation of single crystals in several forms has been reported [87].For the crystallization experiments, hCC was produced in its full-length form by recombinant techniques in Escherichia coli [88].This full-length wild-type protein crystallized in two forms, tetragonal (P4 1 2 1 2 or P4 3 2 1 2) and cubic (I432).Low-temperature synchrotron data are available for both forms at the originally reported resolution of 3.0 and 3.1 Å, respectively [87].The notorious poor quality and limited resolution of X-ray diffraction by full-length hCC crystals, in spite of their perfect and beautiful appearance, may be in-dicative of structural disorder (N-terminus, appending structure) and/or of lack of homogeneity resulting from uncontrolled protein aggregation (oligomerization) in the crystallization solutions and possibly also in the crystals.It should be stressed, however, that hCC used for growing the crystals represented pure monomeric protein obtained by gel filtration as the final isolation step.The Matthews volume [89] calculated for the two forms of full-length hCC is indicative of the presence of multiple copies of the protein in the asymmetric unit.The propensity of hCC to crystallize with multiple copies of the molecule in the asymmetric unit, in combination with the additional possibilities offered by the point symmetry elements of the unit cells, may be also indicative of the tendency of the protein to oligomerize.Such oligomerization might reflect the amyloid-forming property of Leu68Gln cystatin C, as earlier observations demonstrate that both wild type and Leu68Gln-substituted cystatin C are capable of forming dimers [69,74,77].In the tetragonal form, as many as seven independent molecules could be present.The cubic unit cell is likely to contain two asymmetric copies (V m 2.16 Å 3 /Da), but one molecule and high solvent content (72%) is also possible.To facilitate the solution of the crystal structure of hCC, the full-length protein was also produced in selenomethionyl form [87]. Electrospray mass spectrometry of the selenomethionyl protein confirmed that the three Met residues in the hCC sequence were fully substituted by Se-Met.A successful Met®Se-Met substitution was additionally confirmed by analysis of the amino-acid composition of the Se-Met protein after acidic hydrolysis.The selenomethionyl protein crystallized in the cubic form and X-ray absorption spectra confirmed a significant content of selenium in the crystals.Unfortunately, due to weak diffraction, only multiwavelength anomalous diffraction (MAD) data at 4.5 Å resolution could be measured for those crystals at the selenium absorption edge.
Very recently, a new low-temperature data set was obtained for the cubic form of native full-length hCC using synchrotron radiation (R. Janowski, unpublished).This data set extends to 2.5 Å resolution and is currently being used for the determination of the structure of hCC.In addition to the experiments involving full-length human cystatin C, preliminary crystallographic studies have also been reported for its N-terminally truncated variant [87].hCC devoid of ten N-terminal residues was obtained by incubation of recombinant wild type human cystatin C with leukocyte elastase and isolated as described by Abrahamson et al. [84].The protein could be crystallized in tetragonal form yielding crystals that are very stable in the X-ray beam.Measurement of diffraction data extending to 2.7 Å has been reported at room temperature, using conventional Cu Ka radiation [87].Also in the case of N-truncated hCC, the asymmetric unit can be expected to contain numerous (up to eleven) independent copies of the protein.Very recently, a new diffraction data set extending to 2.1 Å resolution has been measured at low temperature using synchrotron radiation (R. Janowski, unpublished).

Peptidyl-diazomethyl ketones
Soon after the discovery that the N-terminal fragment: Arg 8 -Leu 9 -Val10 -Gly 11 of human cystatin C interacts with the S n subsites of cysteine proteases [53,56], a series of peptidyl-diazomethyl ketones based on the structure of this segment was synthesized.Preliminary results showed that both Boc-Val-Gly-CHN 2 (Boc-VG-DAM) and Z-Leu-Val-Gly-CHN 2 (Z-LVG-DAM) inhibit papain, cathepsin B and streptococcal proteinase [56].The latter compound was tested for in vitro and in vivo antibacterial activity against a large number of bacterial strains of different species [90].Mice injected with lethal doses of group A streptococci were cured by a single injection of 0.2 mg of Z-LVG-DAM.Detailed structure-activity studies showed that the shortest among diazomethyl ketones, Z-Gly-CHN 2 , does not inhibit cysteine proteases (Table 4).On the other hand, extention of the -Leu-Val-Gly-sequence by an Arg residue in Z-RLVG-DAM gave the most potent inhibitor of papain and cathepsin B, with apparent second order rate constants (k +2 ') of the same order of magnitude as those determined for E-64, which is used as standard in the inhibitory bio-assays of cysteine proteases [91].Addition of the next Pro 7 residue in Z-PRLVG-DAM decreased the activity.Peptidyl-diazomethyl ketones with a free N-terminal amino group displayed a lower inhibitory potency.None of the peptidyl-diazomethyl ketones designed after the N-terminal sequence of various cystatins had an inhibitory activity higher than that of hCC itself [91,92].These peptidyl-diazomethyl ketone inhibitors have been found to be very fast and irreversible inhibitors of cysteine proteases.It should be noted that the reactivity of the diazomethyl ketone group with thiols is generally very low [93].Modified neglect of diatomic overlap (MNDO) studies of the mechanism of inhibition of cysteine proteases by diazomethyl ketones showed that the reaction is irreversible and leads to an a-thioketone derivative of the Cys 25 residue of papain [94].Recently, we have shown that Z-RLVG-DAM inhibits bone resorption in vitro by a mechanism that seems primarily due to inhibition of bone matrix degradation via cysteine proteases [95].
Oxirane-type inhibitors E-64 [(2S,3S)-trans-epoxysuccinyl-L-leucyl-agmatine] isolated from cultures of Aspergillus japonicus is a very strong and irreversible inhibitor of cysteine proteases [96,97].The first oxirane-containing inhibitor, based on the structure of the N-terminal segment of hCC designed by us, Z-Leu-Val-NHCH 2 -CH(O)CH-CH 2 COOH, displayed only weak reversible inhibition [56].Therefore, taking into account the structure of E-64 and its analogs, as well as our modeling studies, we have designed several new compounds with more hydrophobic C-termini.Most of these compounds displayed quite good inhibitory activities towards papain and cathepsin B. However, the most striking result came from two oxiranetype compounds: Z-Arg-Leu-ValY[CH 2 NH]CO-CH(O)CH-C 6 H 5 (Table 4, compound 16) and Z-Arg-Leu-ValY[CH 2 NH]CO-CH(O)CH-COC 6 H 5 (Table 4, compound 17).Compound 17, with a stronger electron-withdrawing benzoyl group at the C-terminus, was found to be a good irrevers-ible inhibitor of papain and cathepsin B, whereas compound 16, with the phenyl ring attached directly to the oxirane moiety, had no inhibitory potency towards cysteine proteases.This discrepancy prompted us to undertake more detailed Vol.48 Structural studies of cysteine proteases and their inhibitors 11  structural studies using molecular modeling and crystallographic methods.

Other inhibitors
Apart from diazomethyl ketone-and oxiranetype inhibitors, we have designed several other compounds containing the cystatin binding motif, as well as reactive groups for the thiol function of cysteine proteases [56,98].Good inhibitory potency was found for compounds containing an activated olefinic double bond and compounds with a C-terminal aldehyde group or chloro-and bromomethyl ketone groups.It was interesting to find that most of them displayed antibacterial activity against seventeen clinically important bacterial species tested [98].It should be mentioned that many cyclic peptides based on the N-terminal sequence of cystatin C also displayed antibacterial properties.Recently, we have designed and synthesized several azapeptides based on the binding sequence of cystatins, and some of them were found to be very selective inhibitors of different cathepsins (E.Wieczerzak, unpublished).

Crystallographic studies of papain-inhibitor complexes
Single crystals of the covalent complex papain-Z-Arg-Leu-ValY[CH 2 NH]-CO-CH(O)CH-COC 6 H 5 (Table 4, compound 17) were grown by the vapor diffusion method at room temperature in hanging drops using a modification of the procedure described for the complex papain-E-64c [99].Detailed crystallization conditions and the procedure for data collection at room temperature using freshly grown crystals (data set I -resolution 1.9 Å) were described previously [100].Another, low-temperature data set was collected about 10 months later using synchrotron radiation (resolution 1.65 Å).The crystals used in those studies correspond to the historically first crystal form of papain, form A, crystallized by Drenth & Jansonius [101], for which no crystal structure has yet been reported.
Even preliminary difference electron density maps calculated using the room-temperature data (data set I) clearly showed the inhibitor, which is covalently linked to the active-site Cys 25 of the enzyme.However, the maps calculated using the low-temperature data (data set II) clearly revealed only a short stem of electron density near the active-site Cys 25 .An analysis of the shape of this electron density and of potential hydrogen bonds strongly suggests that in these aged crystals of the complex, the inhibitor that was originally attached to the sulfhydryl group of Cys 25 has been replaced by a covalent hydroxyethyl substituent.
The overall structure (room-and low-temperature models) of the enzyme is similar to other papain structures deposited in the PDB, and the r.m.s.deviation for Ca atoms of these two models is 0.24 Å.
The inhibitor moiety in the room-temperature structure extends along the S n subsites of the enzyme (Fig. 5) and is stabilized in the active-site groove by a series of hydrogen bonds and hydrophobic interactions.The inhibitor forms hydrogen bonds with Gly 66 , Asp 158 , and Gln 19 as well as with two solvent molecules.Similar contacts were also observed in the 2.1 Å resolution structure of a complex between papain and E-64c [99].The hydrophobic interactions with the S 2 subsite characteristic for chloromethylketone inhibitors were not observed.The distances between the side chains of Val 133 and Val 157 (defining the enzyme's S 2 subsite) and the atoms of the Val residue of the inhibitor, are longer than 6.0 Å.
As a step towards understanding the specificity of peptidic, covalent, irreversible inhibitors of papain, two peptidyl-diazomethyl ketone-type inhibitors: Z-Arg-Leu-Val-Gly-DAM and Z-Leu-Phe-Gly-DAM (Table 4), with valine and phenylalanine residues in the P 2 site, respectively, were synthesized and reacted with the active site of papain.The complex between papain and the Z-Arg-Leu-Val-Gly-DAM has been characterized crystallographically (space group P2 1 , 1.78 Å resolution, R = 0.168).The side chain of Val from the Z-Arg-Leu-Val-Gly-DAM inhibitor molecule is rather far from the hydrophobic S 2 pocket, the closest distances in this region being above 4.6 Å. Electron density is clearly visible for the entire inhibitor moiety with the exception of the benzyloxycarbonyl (Z) group.The structure, therefore, demonstrates again no specific association between the S 2 pocket and the inhibitor's P 2 site, analogously to the situation observed in the crystal structure of the Z-Arg-Leu-ValY[CH 2 NH]CO-CH(O)CH-CO-C 6 H 5 complex [100], and in molecular dynamics simulations [102].This persistent lack of P 2 -S 2 interactions in Z-Arg-Leu-Val-type inhibitors is in contrast to the early findings by Drenth et al.
[103] that P 2 -S 2 complementarity is essential for productive inhibition and for enzyme specificity.This evidence seems to indicate that, while it might be important for efficient and precise docking of the inhibitor in the active site, the S 2 pocket does not play any significant role in the association between the inhibitor and the enzyme once a covalent bond has been formed.
Two polymorphs of a complex between papain and the Z-Leu-Phe-Gly-DAM inhibitor have been crystallized.Diffraction data for crystal form I (space group P2 1 ) were collected to 2.0 Å resolution, and for crystal form II (space group P2 1 2 1 2 1 ) to 1.63 Å.Both structures were solved by molecular replacement using the 1ppn.pdb[104] model of papain as a probe.The final R factors are 0.106 and 0.172, respectively.In both crystal forms the inhibitor is bound to the Cys 25 residue of the papain with a covalent bond formed between the methylene group (DAM) of the inhibitor and the thiol group of the enzyme.The phenylalanyl side chain is locked by hydrophobic interactions (3.5-3.8Å) with residues Val 133 and Val 157 of the S 2 pocket.The orientation of the phenyl ring is similar to that observed in the chloromethyl ketone complexes studied by Drenth et al. (Protein Data Bank codes: 1pad, 5pad, 6pad) [103].The inhibitor is stabilized by additional hydrogen bonds between its main chain and the residues forming the catalytic cleft of the enzyme (highly conserved hydrogen bonds with Gln 19 and Gly 66 ).Electron density is very clear for the covalent bond connecting the inhibitor and the enzyme as well as for the side chain of the phenylalanyl residue.The N-terminal part of the inhibitor, the benzyloxycarbonyl group (Z), has no visible electron density in either of the crystal forms.
Based on the above structures and the structures of other papain-inhibitor complexes one can conclude that, in covalent papain-inhibitor complexes, hydrophobicity of the P 2 residue is not Vol.48 Structural studies of cysteine proteases and their inhibitors 13 The enzyme is shown as space-filling model viewed into the catalytic cleft from the outside.The inhibitor (green) is seen in the cleft in its fully extended conformation, with the Z group at the top and the "oxirane" moiety, now opened and covalently linked to the enzyme's Cys 25 Sg atom (yellow), at the bottom.
sufficient for productive binding of the inhibitor in the S 2 pocket and that its bulkiness is equally important.This does not preclude, however, that even a smaller residue, like valine, may be effective during recognition and docking prior to the formation of the covalent link.
The initial structures of the papain-ligand complexes were subjected to constrained simulated annealing [102] enabling the simulation at very high, physically unrealistic temperature.The additional kinetic energy enhanced the ability of the system to explore the energy surface and to avoid getting stuck in energetically unfavourable local energy minima.Afterwards, the systems were subjected to 230 ps of unconstrained molecular dynamics at 300 K.
Time-averaged residue-based deviations as a function of residue number for all molecular dy-namics runs indicated (Fig. 6) changes up to 5 Å for some residues (C-terminus), but the overall C mobilities oscillated about 1 Å for InhA and 2 Å for InhB.The average structures displayed similarly significant changes in some flexible loops of the enzyme.It should be stressed that the protein structures reproduced very well the mobility pattern typical of the whole molecule as represented by the atomic displacement parameters (temperature factors) in the crystal structure of the papain-E-64c complex [100].This result validated the use of the AMBER 5.0 force field as a suitable tool for scanning the conformational space of both ligands in the catalytic cleft of papain.
Detailed atomic-level analysis of the mobilities of the inhibitor backbones reveals that the scatter of the relaxed positions of the residues increases steeply towards the N-terminus of the inhibitors (Fig. 7).Thus, the catalytic pocket S 3 , as defined by the pioneering studies of Schechter & Berger [6], appears rather elusive in view of the inhibitor flexibility evident from the molecular dynamics simulations and from the experimentally determined structures of papain-inhibitor complexes [58,99,103,109,110].The location and definition of the substrate binding site S 4 is even more questionable.

Figure 4 .
Figure 4. Conformational changes of cystatin C monitored by CD and tryptophan fluorescence analysis.The fluorescence intensity was measured at 360 nm using an excitation wavelength of 295 nm.The ordinate shows the fraction of the native state calculated according to the equation f N =(x-x D )/(x N -x D ) where x is the value for spectroscopic parameter (ellipticity or fluorescence intensity) and x N and x D are the values for the native and denaturated states, respectively.

14 Z. Grzonka and others 2001 Figure 6 .Figure 7 .
Figure 6.Time-averaged residue-based deviations (mobilities) along the papain sequence during molecular dynamics (MD) runs.Symbols A1 and B1 correspond to InhA and InhB MD runs, respectively.The distribution of both quantities along the sequence is evident.It can be seen that some loops of the papain L lobe undergo high fluctuations during molecular dynamics runs.