QUARTERLY Motifs of the caldesmon family �

Seven highly conserved regions were found in caldesmon molecules from various sources using the multiple sequence alignment method. Their localization coincides with regions where the binding sites to other proteins were postulated. Less conserved and highly divergent regions of the sequences are described as well. These results could refine the planning of caldesmon gene manipulations and accelerate the precise localization of binding sites in the caldesmon molecule and, as a consequence, this could help to elucidate its function in smooth muscle contraction.

Caldesmon (CaD) was first isolated from chicken gizzard almost twenty years ago (Sobue et al., 1981) as a protein that binds calmodulin in a Ca 2+ -dependent manner and interacts with F-actin in a Ca 2+ -independent manner.The colocalization of CaD with actin filaments in the cell (Lehman et al., 1989), induction of G-actin polymerization in solutions (Ga³¹zkiewicz et al., 1985), inhibition of actomyosin ATPase activity and its abolition by the Ca 2+ -calmodulin complex (see reviews Chalovich & Pfitzer, 1997;Chalovich et al., 1998;Huber, 1997;Marston et al., 1998;Marston & Huber, 1996;Sobue & Sellers, 1991) designate CaD as the thin filament regulatory protein in smooth muscle, the search for which lasted for decades.
The CaD family consists of two subfamilies: smooth muscle CaDs of a higher molecular mass (called H-CaDs) and CaDs with a lower molecular mass (L-CaDs) found in both muscle and non-muscle cells (Sobue & Sellers, 1991).In chicken (Haruna et al., 1993) and human (Hayashi et al., 1992), the alternative splicing of a single caldesmon gene produces different protein isoforms, three in chicken and five in human cells.For other organisms, several complete or partial sequences, predominantly of non-muscle CaDs, have been established.
Most research was carried out on H-CaD isoforms, which are unique proteins, 771-or 793-residues long (for chicken and human isoforms, respectively), containing an about 250-residues long "spacer" insertion beginning around position 200 and absent in L-CaDs.All CaDs are rich in ionisable amino-acid residues (202 acidic and 188 basic for chicken, 191 acidic and 206 basic for human H-CaDs).Thus, on average, every other residue carries a charge causing CaD to interact with other biomolecules, including many proteins, so readily that it is hard to distinguish between specific and nonspecific interactions (Czury³o et al., 1991;Czury³o et al., 1997b;Marston et al., 1998;Marston & Huber, 1996).The large number of charged residues prevents the CaD molecule from folding into a globule and forces its extended conformation (Czury³o et al., 1997a;Czury³o et al., 1993;Mabuchi & Wang, 1991).Chicken H-CaD contains more than twice the number of polar, exposed residues (594) as expected for a globular protein of the same molecular mass (274).This facilitates a structure with the largest surface possible: a rod 64 nm long and 2 nm in diameter in which small globules at the ends and a kink in the middle cannot be excluded (Czury³o et al., 1997a).The most effective secondary structure element for this kind of molecule is a helix, the content of which reaches 51% in H-CaD while b-strand comprises only 9% (Czury³o et al., 1993).The 187 residues of the chicken H-CaD spacer form 52 full helical turns, which is probably one of the longest known single helices in proteins.The extra energy necessary to keep it may be supplied by salt bridges between positively and negatively charged side chains in positions i and i+4 (Wang & Wang, 1996).
The unique shape of the H-CaD molecule affects the results of secondary structure predictions based mostly on the information from globular proteins where secondary structure elements interact with one another, which is impossible in CaD.Prediction results obtained by the eight most recent methods, publicly available on the NPS@ server (Combet et al., 2000) (http://pbil.ibcp.fr),are close enough to the experimental data (Czury³o et al., 1993) only for the PHD method based on the neural network (Rost et al., 1994).Therefore, for the secondary structure prediction of CaDs (Czury³o et al., 1993), the ALB algorithm was used.This algorithm was selected because it was the only one which allows to decide, whether to take into account the interactions between the secondary structure elements, or to ignore them.Up to now, no exhaustive analysis of multiple sequence alignments of various caldesmons was carried out.

METHODS
The sequences of caldesmon and its fragments were retrieved from publicly available protein and gene databases.The replicates were manually rejected.Multiple sequence alignment was performed with CLUSTAL X (Higgins et al., 1996;Thompson et al., 1994) with manual editing (especially for fragment sequences) with BIOEDIT (Hall, 1999).The latter program was used to produce figures colored according to the CLUSTAL X protein scheme.Sequences of fragments shorter than two motifs were removed from the final figures.The databases and selected records are listed in Fig. 1.

RESULTS AND DISCUSSION
Analysis of the human H-CaD sequence revealed that its spacer contains a previously undetected elevenfold repeat of 16-residues, with the consensus motive EERqRiKxEx-EEKrAA; here, the small letters denote deletions or non-conservative replacements and x stands for any residue (Fig. 1a).A revision of the chicken H-CaD sequence revealed that its spacer also contains eleven, not ten (Hayashi et al., 1989;Wang et al., 1991a), 15-residue repeats, with the general motif EEEKKAAE-ER(ER)AKA in five of which the residues in parentheses are deleted (Fig. 1a).Notice that the conservation among human CaD repeats is lower than for the chicken protein.
The common properties of CaDs are a result of the conserved regions that probably form crucial structural and functional sites.Two common strongly conservative (over 85% identity) or sometimes fully identical motifs, long enough to be functional, can be distinguished in the N-terminal part of all known CaDs sequences (Fig. 1b).We refer to them as the N1 and N2 motifs.The N1 motif starting from position 27 of both human and chicken H-CaD is identical to the 27 residues of the IK29C peptide (Lee et al., 2000; Li et al., 2000), proposed as the binding site for myosin (Wang et al., 1997).The IK29C peptide has two extra residues at its N-terminus and starts from position 25 of human H-CaD.There is no evidence for the importance of these two extra residues for myosin binding.In the middle part of the N1 motif, a strong helical structure encompassing almost half of the residues was predicted (Czury³o et al., 1993); 2/3 of its residues are charged.In all CaD sequences the N1 motifs exhibit a complete identity.In the N2 motif, clusters of charged residues (20) alternate with those of hydrophobic amino acids (11).Helical regions were predicted for both ends of the N2 motif while no structure was predicted for its middle part (Czury³o et al., 1993) (Fig. 1b).No function can yet be assigned to this motif though it might form the N-terminal calmodulin binding site (Lee et al., 2000; Wang, 1988) or the N-terminal tropomyosin binding site (Marston et al., 1998;Marston & Huber, 1996).
The C-terminal part harbors five motifs (C1 to C5, respectively) with highly conservative or even identical sequences (Fig. 1c) arguing for the existence of conserved, fundamental Besides seven highly conserved (with over 85% identity) motifs, the CaD molecule contains some less conserved regions, e.g. the sequence between the N1 and N2 motifs or be-Vol.47 Motifs of the caldesmon family 1023 L_MOUS_LIV, fragment of mouse liver mlia L-CaD; L_MOUS_EMB, fragment of mouse embryo mewa L-CaD; L_RAT_OVAR, fragment of rat ovary L-CaD; H_RABB_FRA, fragment of rabbit smooth muscle H-CaD; H_TURK_GIZ, fragment of turkey gizzard H-CaD; MOU_EMB_FR, fragment of mouse total embryo CaD (unknown isoform) and HU_SPLN_FR, fragment of human fetal liver and spleen CaD (unknown isoform).Sequences of protein fragments were included in the alignment procedure if their length exceeded two motifs.Secondary structure prediction (Czury³o et al., 1993) shows helices as H/h (with high and average probability, respectively), b-strands as B, loops and b-turns as t and other structures as *.Above the sequences, the positions in question were marked with: (#), when residues were identical; (+), when a residue was replaced by the same class of residue which resulted in small surface and structural alterations or (:), when surface and structural effects were pronounced, and (×), in the case of replacement by another class residue.The residues are colored according to the CLUSTAL X protein scheme.In (b), position numbers are stated for the beginnings and ends of the binding sites, while for the beginnings and ends of the motifs see the first sequence of the alignments in Fig. 1b, c.
tween P458(475) and V473(F490) for the chicken (human) protein.These parts of the proteins could be considered as motifs of lower homology (with about 60% identity on long enough segments of sequences).
There are also regions of major differences, for example, the region between G496(S511) and E511(552) where 26 deletions are found in avian CaDs compared to mammalian CaDs.An instance of such a difference is a characteristic 18 residues long sequence, (534)KRL-EELRRRRGETESEEF(551), just before the C1 motif of all CaD isoforms, except in avian where it is replaced by a single S510 residue.Also the 10 residue long (81)TTTTNTQVEG-(90) sequence in human CaDs is replaced by (81)RST(83) in chicken CaDs.Generally, the C-termini of the N-terminal part and the N-termini of the C-terminal part of CaD represent less conserved regions of the entire molecule.
As presented schematically in Fig. 2, it may be stated that the distribution of all previously suggested binding sites coincides to a large extent with the proposed CaD motifs.This might be useful in further detailed studies of the CaD interaction with other proteins and directly related to CaD function in smooth muscle contraction.
I am indebted to Dr. Patrick Groves for critical reading of the manuscript and helpful discussion and Dr. Natalia Kulikova for valuable help in preparation of figures.

Figure 1 .
Figure 1.Multiple alignment of the conserved motifs of caldesmons found in the publicly available sequence databases.Legend to Fig. 1. continued on the next page.

Figure 1 .
Figure 1.continued (a) The alignment of chicken and human H-CaD repeats.(b)The conserved motifs found in the N-terminal part of caldesmons.(c) The conserved motifs found in the C-terminal part of caldesmons.The CaD isoform and abbreviated name of the organism and the organ from which the protein was prepared are listed on the left of the alignment, while the name of the database and record in it are given on the right of the alignment.The abbreviations are: H_CHI_GIZZ, chicken gizzard H-CaD; H_CHI_OVID, chicken oviduct H-CaD; L_CHI_GIZZ, chicken gizzard L-CaD; L_CHI_BRAI, chicken brain L-CaD; H_HUM_AORT, human aorta H-CaD; L_HU_WI38a, human aorta L-CaD I; L_HU_WI38b, human aorta L-CaD II; L_HU_HeLa1, human HeLa cell L-CaD I; L_HU_HeLa2, human HeLa cell L-CaD II; L_RABB_FIB, rabbit fibroblast L-CaD; L_RAT_LIVE, rat liver L-CaD; HU_STMC_FR, fragment of human stomach CaD (unknown isoform); L_BOV_EPIT, fragment of adult bovine oviduct epithelium L-CaD; L_MOUS_LIV, fragment of mouse liver mlia L-CaD; L_MOUS_EMB, fragment of mouse embryo mewa L-CaD; L_RAT_OVAR, fragment of rat ovary L-CaD; H_RABB_FRA, fragment of rabbit smooth muscle H-CaD; H_TURK_GIZ, fragment of turkey gizzard H-CaD; MOU_EMB_FR, fragment of mouse total embryo CaD (unknown isoform) and HU_SPLN_FR, fragment of human fetal liver and spleen CaD (unknown isoform).Sequences of protein fragments were included in the alignment procedure if their length exceeded two motifs.Secondary structure prediction(Czury³o et al., 1993)  shows helices as H/h (with high and average probability, respectively), b-strands as B, loops and b-turns as t and other structures as *.Above the sequences, the positions in question were marked with: (#), when residues were identical; (+), when a residue was replaced by the same class of residue which resulted in small surface and structural alterations or (:), when surface and structural effects were pronounced, and (×), in the case of replacement by another class residue.The residues are colored according to the CLUSTAL X protein scheme.

Figure 2 .
Figure 2. The comparison of the distribution of the H-isoform of chicken caldesmon motifs (a) with its binding sites (b) for myosin (MY), calmodulin (CaM), tropomyosin (TM) and actin (Ac).