Croatica Chemica Acta, Vol. 73 No. 4, 2000.
Original scientific paper
Universal Metric Properties of the Genetic Code
Nikola Štambuk
; Rudjer Bošković Institute, P. O. Box 180, HR-10002 Zagreb, Croatia
Abstract
Universal metric properties of the genetic code (i.e. RNA, DNA and protein coding) are defined by means of the nucleotide base representation on the square with vertices U or T = 00, C = 01, G = 10 and A = 11. It is shown that this notation defines the Cantor set and Smale horseshoe map representation of the genetic code, the classic table arrangement and Siemion one-step mutation ring of the code. Gray code Solutions to the problem of defining codon positions on the [0, 1] interval, and an extension to the octal coding system, based on the linear block triple check code, are given. This result enables short block (word) decoding of the genetic code patterns. The block code is related to the minimization of errors during transcription and translation processes, which implies that the genetic code is error-correcting and not degenerate. Two algorithms for the representation of codons on the [0, 1] interval and the related binary trees are discussed. It is concluded that the ternary Cantor set algorithm is the method of choice for this type of analysis and coding. This procedure enables the analysis of the six dimensional hypercube codon positions by means of a simple time series and/or 'logistic' difference equation. Finally, a unified concept of the genetic code linked to the Cantor set and horseshoe map is introduced in the form of a classic combinatorial 4 colour necklace model with three horizontal frames consisting of 64 coloured pearls (bases) and vertically hanging decorations of triplets (codons). Three horizontal necklace frames define Crick’s code without comma, and vertical necklace decorations define the evolutional code. Thus, the type of the code depends on the level or direction of observation. The exact location of the mRNA and complementary DNA coding groups of triplets within a frame is determined. The latter enables decoding of long code block (language) patterns within the genetic code. This method of genetic code analysis is named Symbolic Cantor Algorithm (SCA). The validity of the method was confirmed by 94% accurate classification of 50 proteins of known secondary structure (25 α-helices and 25 β-sheets) with the C5.0 machine learning sys-tem. Nucleotide strings of proteins transcribed by SCA were used for the analysis. Spectral Fourier analysis of Pro-opiomelanocortin and Bone Morphogenetic Protein 6 confirmed that the method might be also applied to the analysis of bioactive hormone and cytokine sequences.
Keywords
Cantor set; symbolic dynamics; SCA; Gray code; genetic code; necklace; protein; secondary structure; C5.0; machine learning; spectral analysis
Hrčak ID:
131993
URI
Publication date:
4.12.2000.
Visits: 1.224 *