# American Institute of Mathematical Sciences

• Previous Article
Mathematical analysis and simulations involving chemotherapy and surgery on large human tumours under a suitable cell-kill functional response
• MBE Home
• This Issue
• Next Article
A therapy inactivating the tumor angiogenic factors
2013, 10(1): 199-219. doi: 10.3934/mbe.2013.10.199

## Genome characterization through dichotomic classes: An analysis of the whole chromosome 1 of A. thaliana

 1 Dipartimento di Scienze Statistiche, Università di Bologna, Via delle Belle Arti 41, 40126, Bologna, Italy, Italy 2 CNR-IMM, UOS di Bologna, Via Gobetti 101, 40129 Bologna, Italy 3 Dipartimento di Scienze Statistiche, Università di Bologna, Via delle Belle Arti 41, 40126 Bologna, Italy

Received  May 2012 Revised  September 2012 Published  December 2012

In this article we show how dichotomic classes, binary variables naturally derived from a new mathematical model of the genetic code, can be used in order to characterize different parts of the genome. In particular, we analyze and compare different parts of whole chromosome 1 of Arabidopsis thaliana: genes, exons, introns, coding sequences (CDS), intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task we encode each sequence in the 3 possible reading frames according to the definitions of the dichotomic classes (parity, Rumer and hidden). Then, we perform a statistical analysis on the binary sequences. Interestingly, the results show that coding and non-coding sequences have different patterns and proportions of dichotomic classes. This suggests that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Moreover, such patterns seem to be more enhanced in CDS than in exons. Also, we derive an independence test in order to assess whether the percentages observed could be considered as an expression of independent random processes. The results confirm that only genes, exons and CDS seem to possess a dependence structure that distinguishes them from i.i.d sequences. Such informational content is independent from the global proportion of nucleotides of a sequence. The present work confirms that the recent mathematical model of the genetic code is a new paradigm for understanding the management and the organization of genetic information and is an innovative tool for investigating informational aspects of error detection/correction mechanisms acting at the level of DNA replication.
Citation: Enrico Properzi, Simone Giannerini, Diego Luis Gonzalez, Rodolfo Rosa. Genome characterization through dichotomic classes: An analysis of the whole chromosome 1 of A. thaliana. Mathematical Biosciences & Engineering, 2013, 10 (1) : 199-219. doi: 10.3934/mbe.2013.10.199
##### References:
 [1] B. Efron, "Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction," Cambridge University Press, Cambridge, 2010. [2] G. Elgar and T. Vavouri, Tuning in to the signals: Noncoding sequence conservation in vertebrate genomes, Trends in genetics, 24 (2008), 344-352. [3] A. Elzanowski and J. Ostell, The genetic codes, National Center for Biotechnology Information (NCBI), (2008-04-07). Retrieved 2010-03-10. [4] D. L. Gonzalez, Can the genetic code be mathematically described?, Medical Science Monitor, 10 (2004), 11-17. [5] D. L. Gonzalez, Error detection and correction codes, in "The Codes of Life: the Rules of Macroevolution, volume 1 of Biosemiotics. Chapter 17" (eds. M. Barbieri and J. Hoffmeyers), Springer Netherlands, (2008), 379-394. [6] D. L. Gonzalez, The mathematical structure of the genetic code, in "The Codes of Life: the Rules of Macroevolution, volume 1 of Biosemiotics. Chapter 8" (eds. M. Barbieri and J. Hoffmeyers), Springer Netherlands, (2008), 111-152. [7] D. L. Gonzalez, S. Giannerini and R. Rosa, Detecting structures in parity binary sequences: Error correction and detection in DNA, IEEE Engineering in Medicine and Biology Magazine, 25 (2006), 69-81. [8] D. L. Gonzalez, S. Giannerini and R. Rosa, Strong short-range correlations and dichotomic codon classes in coding DNA sequences, Physical review E, 78 (2008), 051918. [9] D. L. Gonzalez, S. Giannerini and R. Rosa, The mathematical structure of the genetic code: a tool for inquiring on the origin of life, Statistica, LXIX (2009), 143-157. [10] D. L. Gonzalez, S. Giannerini and R. Rosa, Circular codes revisited: A statistical approach, Journal of Theoretical Biology, 275 (2011), 21-28. [11] S. Giannerini, D. L. Gonzalez and R. Rosa, DNA, frame synchronization and dichotomic classes: a quasicrystal framework, Philosophical Transactions of the Royal Society. Series A, 370 (2012), 2987-3006. [12] D. L. Gonzalez and M. Zanna, Una nuova descrizione matematica del codice genetico, Systema Naturae, Annali di Biologia Teorica, 5 (2003), 219-236. [13] International Human Genome Sequencing Consortium, Initial sequencing and analysis of the human genome, Nature, 409 (2001), 860-921. [14] A. G. Jegga and B. J. Aronow, Evolutionary conserved noncoding DNA, in "Encyclopedia of Life Sciences," John Wiley & sons, (2006). [15] S. Ohno, So much "junk" DNA in our genome, Brookhaven Symposia in Biology, 23 (1972), 366-370. [16] H. Pearson, Genetics: What is a gene?, Nature, 441 (2006), 398-401. doi: 10.1038/441398a. [17] E. Pennisi, Genomics. DNA study forces rethink of what it means to be a gene., Science (New York, N. Y.), 316 (2007), 1556-1-557. [18] E. Properzi, "Genome Characterization Through the Mathematical Structure of the Genetic Code: An Analysis of the Whole Chromosome 1 of A. Thaliana," PhD Thesis, University of Bologna. [19] M. Quimbaya, K. Vandepoele, E. Rasp, M. Matthijs, S. Dhondt, G. T. Beemster, G. Berx and L. De Veylder, Identification of putative cancer genes through data integration and comparative genomics between plants and humans, Cell. Mol. Life Sci., 69 (2012), 2041-2055. doi: 10.1007/s00018-011-0909-x. [20] R Development Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, (2012), http://www.R-project.org/. [21] The Arabidopsis Genome Initiative, Analysis of the genome sequence of the flowering plant Arabidopsis thaliana, Nature, 408 (2000), 796-815. doi: 10.1038/35048692. [22] TAIR, Genome Annotation, http://www.arabidopsis.org/ [23] O. Trapp, K. Seeliger and H. Puchta, Homologs of breast cancer genes in plants, Front. Plant Sci., 2 (2011). [24] J. C. Venter et al., The sequence of the human genome, Science, 291 (2001), 1304-1351. [25] K. Watanabe and T. Suzuki, "Genetic Code and its Variants," in "Encyclopedia of Life Sciences," John Wiley & sons, 2006.

show all references

##### References:
 [1] B. Efron, "Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction," Cambridge University Press, Cambridge, 2010. [2] G. Elgar and T. Vavouri, Tuning in to the signals: Noncoding sequence conservation in vertebrate genomes, Trends in genetics, 24 (2008), 344-352. [3] A. Elzanowski and J. Ostell, The genetic codes, National Center for Biotechnology Information (NCBI), (2008-04-07). Retrieved 2010-03-10. [4] D. L. Gonzalez, Can the genetic code be mathematically described?, Medical Science Monitor, 10 (2004), 11-17. [5] D. L. Gonzalez, Error detection and correction codes, in "The Codes of Life: the Rules of Macroevolution, volume 1 of Biosemiotics. Chapter 17" (eds. M. Barbieri and J. Hoffmeyers), Springer Netherlands, (2008), 379-394. [6] D. L. Gonzalez, The mathematical structure of the genetic code, in "The Codes of Life: the Rules of Macroevolution, volume 1 of Biosemiotics. Chapter 8" (eds. M. Barbieri and J. Hoffmeyers), Springer Netherlands, (2008), 111-152. [7] D. L. Gonzalez, S. Giannerini and R. Rosa, Detecting structures in parity binary sequences: Error correction and detection in DNA, IEEE Engineering in Medicine and Biology Magazine, 25 (2006), 69-81. [8] D. L. Gonzalez, S. Giannerini and R. Rosa, Strong short-range correlations and dichotomic codon classes in coding DNA sequences, Physical review E, 78 (2008), 051918. [9] D. L. Gonzalez, S. Giannerini and R. Rosa, The mathematical structure of the genetic code: a tool for inquiring on the origin of life, Statistica, LXIX (2009), 143-157. [10] D. L. Gonzalez, S. Giannerini and R. Rosa, Circular codes revisited: A statistical approach, Journal of Theoretical Biology, 275 (2011), 21-28. [11] S. Giannerini, D. L. Gonzalez and R. Rosa, DNA, frame synchronization and dichotomic classes: a quasicrystal framework, Philosophical Transactions of the Royal Society. Series A, 370 (2012), 2987-3006. [12] D. L. Gonzalez and M. Zanna, Una nuova descrizione matematica del codice genetico, Systema Naturae, Annali di Biologia Teorica, 5 (2003), 219-236. [13] International Human Genome Sequencing Consortium, Initial sequencing and analysis of the human genome, Nature, 409 (2001), 860-921. [14] A. G. Jegga and B. J. Aronow, Evolutionary conserved noncoding DNA, in "Encyclopedia of Life Sciences," John Wiley & sons, (2006). [15] S. Ohno, So much "junk" DNA in our genome, Brookhaven Symposia in Biology, 23 (1972), 366-370. [16] H. Pearson, Genetics: What is a gene?, Nature, 441 (2006), 398-401. doi: 10.1038/441398a. [17] E. Pennisi, Genomics. DNA study forces rethink of what it means to be a gene., Science (New York, N. Y.), 316 (2007), 1556-1-557. [18] E. Properzi, "Genome Characterization Through the Mathematical Structure of the Genetic Code: An Analysis of the Whole Chromosome 1 of A. Thaliana," PhD Thesis, University of Bologna. [19] M. Quimbaya, K. Vandepoele, E. Rasp, M. Matthijs, S. Dhondt, G. T. Beemster, G. Berx and L. De Veylder, Identification of putative cancer genes through data integration and comparative genomics between plants and humans, Cell. Mol. Life Sci., 69 (2012), 2041-2055. doi: 10.1007/s00018-011-0909-x. [20] R Development Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, (2012), http://www.R-project.org/. [21] The Arabidopsis Genome Initiative, Analysis of the genome sequence of the flowering plant Arabidopsis thaliana, Nature, 408 (2000), 796-815. doi: 10.1038/35048692. [22] TAIR, Genome Annotation, http://www.arabidopsis.org/ [23] O. Trapp, K. Seeliger and H. Puchta, Homologs of breast cancer genes in plants, Front. Plant Sci., 2 (2011). [24] J. C. Venter et al., The sequence of the human genome, Science, 291 (2001), 1304-1351. [25] K. Watanabe and T. Suzuki, "Genetic Code and its Variants," in "Encyclopedia of Life Sciences," John Wiley & sons, 2006.
 [1] David Lubicz. On a classification of finite statistical tests. Advances in Mathematics of Communications, 2007, 1 (4) : 509-524. doi: 10.3934/amc.2007.1.509 [2] Wenxue Huang, Xiaofeng Li, Yuanyi Pan. Increase statistical reliability without losing predictive power by merging classes and adding variables. Big Data & Information Analytics, 2016, 1 (4) : 341-347. doi: 10.3934/bdia.2016014 [3] María Chara, Ricardo A. Podestá, Ricardo Toledano. The conorm code of an AG-code. Advances in Mathematics of Communications, 2021  doi: 10.3934/amc.2021018 [4] Bogdan Sasu, Adina Luminiţa Sasu. On the dichotomic behavior of discrete dynamical systems on the half-line. Discrete and Continuous Dynamical Systems, 2013, 33 (7) : 3057-3084. doi: 10.3934/dcds.2013.33.3057 [5] Laura Luzzi, Ghaya Rekaya-Ben Othman, Jean-Claude Belfiore. Algebraic reduction for the Golden Code. Advances in Mathematics of Communications, 2012, 6 (1) : 1-26. doi: 10.3934/amc.2012.6.1 [6] Irene Márquez-Corbella, Edgar Martínez-Moro, Emilio Suárez-Canedo. On the ideal associated to a linear code. Advances in Mathematics of Communications, 2016, 10 (2) : 229-254. doi: 10.3934/amc.2016003 [7] Serhii Dyshko. On extendability of additive code isometries. Advances in Mathematics of Communications, 2016, 10 (1) : 45-52. doi: 10.3934/amc.2016.10.45 [8] Dominique Lecomte. Hurewicz-like tests for Borel subsets of the plane. Electronic Research Announcements, 2005, 11: 95-102. [9] Jianjun Tian, Bai-Lian Li. Coalgebraic Structure of Genetic Inheritance. Mathematical Biosciences & Engineering, 2004, 1 (2) : 243-266. doi: 10.3934/mbe.2004.1.243 [10] Hermes H. Ferreira, Artur O. Lopes, Silvia R. C. Lopes. Decision Theory and large deviations for dynamical hypotheses tests: The Neyman-Pearson Lemma, Min-Max and Bayesian tests. Journal of Dynamics and Games, 2022, 9 (2) : 123-150. doi: 10.3934/jdg.2021031 [11] Andrea Seidl, Stefan Wrzaczek. Opening the source code: The threat of forking. Journal of Dynamics and Games, 2022  doi: 10.3934/jdg.2022010 [12] Kathy Horadam, Russell East. Partitioning CCZ classes into EA classes. Advances in Mathematics of Communications, 2012, 6 (1) : 95-106. doi: 10.3934/amc.2012.6.95 [13] Yi-Hsuan Lin, Gen Nakamura, Roland Potthast, Haibing Wang. Duality between range and no-response tests and its application for inverse problems. Inverse Problems and Imaging, 2021, 15 (2) : 367-386. doi: 10.3934/ipi.2020072 [14] Uwe Schäfer, Marco Schnurr. A comparison of simple tests for accuracy of approximate solutions to nonlinear systems with uncertain data. Journal of Industrial and Management Optimization, 2006, 2 (4) : 425-434. doi: 10.3934/jimo.2006.2.425 [15] Olof Heden. The partial order of perfect codes associated to a perfect code. Advances in Mathematics of Communications, 2007, 1 (4) : 399-412. doi: 10.3934/amc.2007.1.399 [16] Sascha Kurz. The $[46, 9, 20]_2$ code is unique. Advances in Mathematics of Communications, 2021, 15 (3) : 415-422. doi: 10.3934/amc.2020074 [17] Selim Esedoḡlu, Fadil Santosa. Error estimates for a bar code reconstruction method. Discrete and Continuous Dynamical Systems - B, 2012, 17 (6) : 1889-1902. doi: 10.3934/dcdsb.2012.17.1889 [18] Vadim S. Anishchenko, Tatjana E. Vadivasova, Galina I. Strelkova, George A. Okrokvertskhov. Statistical properties of dynamical chaos. Mathematical Biosciences & Engineering, 2004, 1 (1) : 161-184. doi: 10.3934/mbe.2004.1.161 [19] Cicely K. Macnamara, Mark A. J. Chaplain. Spatio-temporal models of synthetic genetic oscillators. Mathematical Biosciences & Engineering, 2017, 14 (1) : 249-262. doi: 10.3934/mbe.2017016 [20] Xian Chen, Zhi-Ming Ma. A transformation of Markov jump processes and applications in genetic study. Discrete and Continuous Dynamical Systems, 2014, 34 (12) : 5061-5084. doi: 10.3934/dcds.2014.34.5061

2018 Impact Factor: 1.313