Advanced Search
Article Contents
Article Contents

Genome characterization through dichotomic classes: An analysis of the whole chromosome 1 of A. thaliana

Abstract Related Papers Cited by
  • In this article we show how dichotomic classes, binary variables naturally derived from a new mathematical model of the genetic code, can be used in order to characterize different parts of the genome. In particular, we analyze and compare different parts of whole chromosome 1 of Arabidopsis thaliana: genes, exons, introns, coding sequences (CDS), intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task we encode each sequence in the 3 possible reading frames according to the definitions of the dichotomic classes (parity, Rumer and hidden). Then, we perform a statistical analysis on the binary sequences. Interestingly, the results show that coding and non-coding sequences have different patterns and proportions of dichotomic classes. This suggests that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Moreover, such patterns seem to be more enhanced in CDS than in exons. Also, we derive an independence test in order to assess whether the percentages observed could be considered as an expression of independent random processes. The results confirm that only genes, exons and CDS seem to possess a dependence structure that distinguishes them from i.i.d sequences. Such informational content is independent from the global proportion of nucleotides of a sequence. The present work confirms that the recent mathematical model of the genetic code is a new paradigm for understanding the management and the organization of genetic information and is an innovative tool for investigating informational aspects of error detection/correction mechanisms acting at the level of DNA replication.
    Mathematics Subject Classification: 92B05, 92D20, 62P10.


    \begin{equation} \\ \end{equation}
  • [1]

    B. Efron, "Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction," Cambridge University Press, Cambridge, 2010.


    G. Elgar and T. Vavouri, Tuning in to the signals: Noncoding sequence conservation in vertebrate genomes, Trends in genetics, 24 (2008), 344-352.


    A. Elzanowski and J. OstellThe genetic codes, National Center for Biotechnology Information (NCBI), (2008-04-07). Retrieved 2010-03-10.


    D. L. Gonzalez, Can the genetic code be mathematically described?, Medical Science Monitor, 10 (2004), 11-17.


    D. L. Gonzalez, Error detection and correction codes, in "The Codes of Life: the Rules of Macroevolution, volume 1 of Biosemiotics. Chapter 17" (eds. M. Barbieri and J. Hoffmeyers), Springer Netherlands, (2008), 379-394.


    D. L. Gonzalez, The mathematical structure of the genetic code, in "The Codes of Life: the Rules of Macroevolution, volume 1 of Biosemiotics. Chapter 8" (eds. M. Barbieri and J. Hoffmeyers), Springer Netherlands, (2008), 111-152.


    D. L. Gonzalez, S. Giannerini and R. Rosa, Detecting structures in parity binary sequences: Error correction and detection in DNA, IEEE Engineering in Medicine and Biology Magazine, 25 (2006), 69-81.


    D. L. Gonzalez, S. Giannerini and R. Rosa, Strong short-range correlations and dichotomic codon classes in coding DNA sequences, Physical review E, 78 (2008), 051918.


    D. L. Gonzalez, S. Giannerini and R. Rosa, The mathematical structure of the genetic code: a tool for inquiring on the origin of life, Statistica, LXIX (2009), 143-157.


    D. L. Gonzalez, S. Giannerini and R. Rosa, Circular codes revisited: A statistical approach, Journal of Theoretical Biology, 275 (2011), 21-28.


    S. Giannerini, D. L. Gonzalez and R. Rosa, DNA, frame synchronization and dichotomic classes: a quasicrystal framework, Philosophical Transactions of the Royal Society. Series A, 370 (2012), 2987-3006.


    D. L. Gonzalez and M. Zanna, Una nuova descrizione matematica del codice genetico, Systema Naturae, Annali di Biologia Teorica, 5 (2003), 219-236.


    International Human Genome Sequencing Consortium, Initial sequencing and analysis of the human genome, Nature, 409 (2001), 860-921.


    A. G. Jegga and B. J. Aronow, Evolutionary conserved noncoding DNA, in "Encyclopedia of Life Sciences," John Wiley & sons, (2006).


    S. Ohno, So much "junk" DNA in our genome, Brookhaven Symposia in Biology, 23 (1972), 366-370.


    H. Pearson, Genetics: What is a gene?, Nature, 441 (2006), 398-401.doi: 10.1038/441398a.


    E. Pennisi, Genomics. DNA study forces rethink of what it means to be a gene., Science (New York, N. Y.), 316 (2007), 1556-1-557.


    E. Properzi, "Genome Characterization Through the Mathematical Structure of the Genetic Code: An Analysis of the Whole Chromosome 1 of A. Thaliana," PhD Thesis, University of Bologna.


    M. Quimbaya, K. Vandepoele, E. Rasp, M. Matthijs, S. Dhondt, G. T. Beemster, G. Berx and L. De Veylder, Identification of putative cancer genes through data integration and comparative genomics between plants and humans, Cell. Mol. Life Sci., 69 (2012), 2041-2055.doi: 10.1007/s00018-011-0909-x.


    R Development Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, (2012), http://www.R-project.org/.


    The Arabidopsis Genome Initiative, Analysis of the genome sequence of the flowering plant Arabidopsis thaliana, Nature, 408 (2000), 796-815.doi: 10.1038/35048692.


    TAIRGenome Annotation, http://www.arabidopsis.org/


    O. Trapp, K. Seeliger and H. Puchta, Homologs of breast cancer genes in plants, Front. Plant Sci., 2 (2011).


    J. C. Venter et al., The sequence of the human genome, Science, 291 (2001), 1304-1351.


    K. Watanabe and T. Suzuki, "Genetic Code and its Variants," in "Encyclopedia of Life Sciences," John Wiley & sons, 2006.

  • 加载中

Article Metrics

HTML views() PDF downloads(50) Cited by(0)

Access History



    DownLoad:  Full-Size Img  PowerPoint