The genetic code — 40 years on

The genetic code discovered 40 years ago, consists of 64 triplets (codons) of nucleotides. The genetic code is almost universal. The same codons are assigned to the same amino acids and to the same START and STOP signals in the vast majority of genes in animals, plants, and microorganisms. Each codon encodes for one of the 20 amino acids used in the synthesis of proteins. That produces some redundancy in the code and most of the amino acids being encoded by more than one codon. The two cases have been found where selenocysteine or pyrrolysine, that are not one of the standard 20 is inserted by a tRNA into the growing polypeptide.


INTROducTION
There are few achievements in the short history of molecular biology that had a profound impact on the advancement of science, as well as are strongly imprinted in the public perception.Undoubtedly, one of such events was deciphering of the genetic code.
In the 1940s, it had been demonstrated for the first time that, contrary to previous beliefs, deoxyribonucleic acid (DNA) and not proteins is responsible for the transmission of genetic information through the generations.The nature of that process became apparent after the discovery of the DNA structure in 1953 (Watson & Crick, 1953), which, on the other hand, brought new challenges.The key question ahead was how the sequence of four nucleotides in DNA is translated into sequences of twenty amino acids in proteins and subsequently into functional protein folds (Woese, 2001).Much of the theoretical considerations concerning the nature of the genetic code were due to the activity of George Gamov, Francis Crick, Leslie Orgel, James Watson, Alexander Rich and other members of the exclusive Tie Club (Rich et al., 2004).
However, the breakthrough came in 1961, when in the first experiments in Escherichia coli cell free system the poly-U programmed synthesis of polyphenylalanine was demonstrated.It has been earlier assumed that the genetic code was composed of nucleotide triplets.Thus, the first word of the code, UUU, had been deciphered as encoding phenylalanine (Nirenberg et al., 1962).The code was cracked open.Subsequent work, over a period of about 5 years (1961-1966), led to assigning of all triplets to particular amino acids (Nirenberg et al., 1966, Nirenberg, 2004).The genetic code is nearly universal across all life forms and, with a few exceptions, unambiguous.Its importance has been compared to that of the periodic table of the elements for chemistry and the scientists who contributed to its deciphering were awarded the Nobel Prize in 1968.

ExcEPTIONS
Initially, it was believed that the genetic code used by all present day organisms is universal and invariant.Any change in the meaning of codons would result in erroneous protein sequences.That concept of a 'frozen accident' was later revised by discoveries of alternative genetic codes which show slight differences from the established standard (Knight et al., 2001).These deviations are limited to nuclear codes in certain taxonomic groups and to mitochondria (Fig. 1).Interestingly, the reassignments of codons are often recurrent in different groups of organisms.Mitochondria, which have much of their own protein translation machinery decode AUA as methionine rather than isoleucine.In Mycoplasma, the stop codon UAG is decoded as tryptophan as well as the usual UGG.The reassignment of termination codons UAA and UAG to encode glutamine is found in divergent groups including diplomonads, ciliates and some green algae (Gesteland & Atkins, 1996).
In organisms which utilize the standard genetic code there are also cases of alternative codon assignments (recoding) which may be due to several mechanisms: nonsense or missense suppression, ribosomal frameshifts, bypassing and natural suppression.In the last case the genetic code can be redefined as insertion of non-canonical amino acids at stop codons described below.While exceptions to the basic rules of the genetic code are of considerable biological interest, the universal features of the code indicate that a core of protein biosynthesis apparatus evolved before the divergence of the three kingdoms of life (Knight et al., 2001;Miranda et al., 2006).

ExPANSION
The complete set of all different amino acids found in natural proteins is currently estimated at 140.However, the canonical set comprises only 20 amino acids which have their corresponding triplets in the genetic code, and are incorporated into proteins during translation.This set is invariant in all organisms, irrespective of their evolutionary complexity and environment in which they live.The limits of the coding capacity were probably set at the very early stages of genetic code evolution.It is assumed that the present day three-letter triplet code evolved from a "two-letter triplet" code, in which only the first two nucleotides were in fact used for coding (Szathamary 1999;Travers, 2006).This property is somehow present in the contemporary genetic code in which most of the amino acids are encoded by groups of codons differing only at the third position (Fig. 1).Such degeneracy or redundancy of triplet combinations seems to have been preserved to minimize the deleterious effect of point mutations.
The traces of the evolutionary process of extending the coding capability of the genetic code can be observed in some of the present-day organisms.In certain species, there are no asparaginyl-or glutaminyl-tRNA synthetases, but they possess specific tRNAs which incorporate these amino acids into proteins.The synthesis of Asn-tRNA Asn and Gln-tRNA Gln is accomplished in a two-step process which involves aminoacylation of tRNA Asn and tRNA Gln with aspartate and glutamate, respectively.In the second step the acids are converted into amides by amidotransferases (Ibba & Soll, 2001).
There are, however, two examples of the expansion of the genetic code beyond the standard set of 20 amino acids.The two non-canonical amino acids, selenocysteine (Sec) and pyrrolysine (Pyl), can be incorporated cotranslationally into proteins at positions specified by codons UGA and UAG, respectively, which are normally termination codons (Fig. 1).
The selenocysteinyl tRNA [Ser]Sec is first aminoacylated with serine which is then used as a substrate for selenocysteine synthesis.The aminoacylation step is performed by serine specific aminoacyl-tRNA synthetase (Ambrogelly et al., 2007).In the case of pyrrolysine, the process is much simpler and involves direct aminoacylation of suppressor tRNA Pyl with pyrrolysine (Srinivasan et al., 2002).Thus in the two cases the expansion of coding capacity is achieved by different strategies.
The incorporation of selenocysteine and pyrrolysine into protein sequences can be treated as an expansion of the 'standard' genetic code because for both amino acids there are specific cognate tRNAs which recognize specific codons.However, unlike other codons, whether in the standard or alternative genetic codes, the presence of these codons is not sufficient for their decoding as Sec or Pyl.In this respect the two cases resemble suppression or a readthrough at termination codons, which is common both in eukaryotes and prokaryotes (Cornish et al., 1995;Beier & Grimm, 2001).Selenocysteine and pyrrolysine insertions require additional structural features in the mRNA known as SECIS (selenocysteine insertion sequence) and PYLIS (pyrrolysine insertion sequence) which provide an appropriate context for the termination codon to be suppressed (Mix et al., 2007;Longstaff et al., 2007).Additionally, the insertion of selenocysteine strictly depends on the activity of specific translation elongation factors (Fagegaltier et al., 2000).

PERSPEcTIvES
The genetic code is the basis for the templateinstructed synthesis of proteins, but it cannot be considered without the components of decoding machinery.As early as 1955, in an unpublished note: On degenerate templates and adaptor hypothesis written for the 'Tie Club', Francis Crick predicted the existence of adaptor molecules (tRNAs) which link particular codons with specific amino acids.Thus, the decoding process, which depends on the complementary interactions of the codons and tRNA mol-  Exceptions to the standard code.The numbers in brackets refer to the translation tables used in GenBank/EMBL databases listed below UUA terminator in Thraustochytrium mitochondrial code (23), initiation codon in protozoan mitochondrial code and Mycoplasma/Spiroplasma code ( 4), UUG initiation codon in standard (1), bacterial (11) and some mitochondrial codes (4,5,13), UCA terminator in Scenedesmus obliquus mitochondrial code (22), UAA Gln in ciliate nuclear code (6); Tyr in alternative flatworm mitochondrial code (14); Pyl (pyrrolysine) in Archaea (Methanosarcinaceae) decoded by Pyl-tRNA, UAG Gln in ciliate nuclear code (6,15); Leu in Chlorophyceae, and Scenedesmus mitochondrial codes (16,22), UGA Trp in mitochondrial codes (2,3,4,5,9,13,14,21); Cys in euplotid nuclear code (10); Sec (selenocysteine) depends on a presence of SECIS (SElenoCysteine Insertion Sequence) element in mRNA, CUU Thr in yeast mitochondrial code (3), CUC Thr in yeast mitochondrial code (3), CUA Thr in yeast mitochondrial code (3), CUG Thr in yeast mitochondrial code (3); in alternative yeast mitochondrial code (12); initiation codon in standard (1) bacterial ( 11) and some mitochondrial codes (4,12), AUU initiation codon in bacterial (11) and some mitochondrial codes (2,4,5,23), AUC initiation codon in bacterial (11) and some mitochondrial codes (2,4,5), AUA Met in mitochondrial codes of vertebrates (2), yeast (3) and some invertebrates (5,13,21); initiation codon in bacterial (11) and some mitochondrial codes (2,3,4,5,13), AAA Asn in flatworm (9,14,21) and echinoderm ( 9) mitochondrial codes, AGA terminator in vertebrate mitochondrial code (2); Gly in ascidian mitochondrial M. Szymański and J. Barciszewski ecules requires a highly specific mechanism which links amino acids and their cognate tRNAs.This task is accomplished by aminoacyl-tRNA synthetases (AARS) which catalyse specific charging of their cognate tRNAs with the corresponding amino acid.This reaction establishes the direct link between the anticodon and the activated amino acid attached to the 3' end of the tRNA.Therefore, aminoacyl-tRNA synthetases are considered to be the real interpreters (translators) of the genetic code.The determinants of the specificity of recognition between tRNAs and aminoacyl tRNA synthetases have been often referred to as the second genetic code (Beuning & Musier-Forsyth, 1999).
On top of the genetic code there is also an epigenetic code which constitutes a part of the complicated gene regulation system (Turner, 2007).It seems that deciphering the epigenetic features and their influence on the patterns of gene expression will have an impact on our understanding of molecular systems comparable with that of the genetic code forty years ago.
Figure 1.Table of the genetic code.