• Previous Article
    Mathematical analysis and simulations involving chemotherapy and surgery on large human tumours under a suitable cell-kill functional response
  • MBE Home
  • This Issue
  • Next Article
    A therapy inactivating the tumor angiogenic factors
2013, 10(1): 199-219. doi: 10.3934/mbe.2013.10.199

Genome characterization through dichotomic classes: An analysis of the whole chromosome 1 of A. thaliana

1. 

Dipartimento di Scienze Statistiche, Università di Bologna, Via delle Belle Arti 41, 40126, Bologna, Italy, Italy

2. 

CNR-IMM, UOS di Bologna, Via Gobetti 101, 40129 Bologna, Italy

3. 

Dipartimento di Scienze Statistiche, Università di Bologna, Via delle Belle Arti 41, 40126 Bologna, Italy

Received  May 2012 Revised  September 2012 Published  December 2012

In this article we show how dichotomic classes, binary variables naturally derived from a new mathematical model of the genetic code, can be used in order to characterize different parts of the genome. In particular, we analyze and compare different parts of whole chromosome 1 of Arabidopsis thaliana: genes, exons, introns, coding sequences (CDS), intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task we encode each sequence in the 3 possible reading frames according to the definitions of the dichotomic classes (parity, Rumer and hidden). Then, we perform a statistical analysis on the binary sequences. Interestingly, the results show that coding and non-coding sequences have different patterns and proportions of dichotomic classes. This suggests that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Moreover, such patterns seem to be more enhanced in CDS than in exons. Also, we derive an independence test in order to assess whether the percentages observed could be considered as an expression of independent random processes. The results confirm that only genes, exons and CDS seem to possess a dependence structure that distinguishes them from i.i.d sequences. Such informational content is independent from the global proportion of nucleotides of a sequence. The present work confirms that the recent mathematical model of the genetic code is a new paradigm for understanding the management and the organization of genetic information and is an innovative tool for investigating informational aspects of error detection/correction mechanisms acting at the level of DNA replication.
Citation: Enrico Properzi, Simone Giannerini, Diego Luis Gonzalez, Rodolfo Rosa. Genome characterization through dichotomic classes: An analysis of the whole chromosome 1 of A. thaliana. Mathematical Biosciences & Engineering, 2013, 10 (1) : 199-219. doi: 10.3934/mbe.2013.10.199
References:
[1]

B. Efron, "Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction,", Cambridge University Press, (2010).   Google Scholar

[2]

G. Elgar and T. Vavouri, Tuning in to the signals: Noncoding sequence conservation in vertebrate genomes,, Trends in genetics, 24 (2008), 344.   Google Scholar

[3]

A. Elzanowski and J. Ostell, The genetic codes,, National Center for Biotechnology Information (NCBI), (): 2008.   Google Scholar

[4]

D. L. Gonzalez, Can the genetic code be mathematically described?,, Medical Science Monitor, 10 (2004), 11.   Google Scholar

[5]

D. L. Gonzalez, Error detection and correction codes,, in, (2008), 379.   Google Scholar

[6]

D. L. Gonzalez, The mathematical structure of the genetic code,, in, (2008), 111.   Google Scholar

[7]

D. L. Gonzalez, S. Giannerini and R. Rosa, Detecting structures in parity binary sequences: Error correction and detection in DNA,, IEEE Engineering in Medicine and Biology Magazine, 25 (2006), 69.   Google Scholar

[8]

D. L. Gonzalez, S. Giannerini and R. Rosa, Strong short-range correlations and dichotomic codon classes in coding DNA sequences,, Physical review E, 78 (2008).   Google Scholar

[9]

D. L. Gonzalez, S. Giannerini and R. Rosa, The mathematical structure of the genetic code: a tool for inquiring on the origin of life,, Statistica, LXIX (2009), 143.   Google Scholar

[10]

D. L. Gonzalez, S. Giannerini and R. Rosa, Circular codes revisited: A statistical approach,, Journal of Theoretical Biology, 275 (2011), 21.   Google Scholar

[11]

S. Giannerini, D. L. Gonzalez and R. Rosa, DNA, frame synchronization and dichotomic classes: a quasicrystal framework,, Philosophical Transactions of the Royal Society. Series A, 370 (2012), 2987.   Google Scholar

[12]

D. L. Gonzalez and M. Zanna, Una nuova descrizione matematica del codice genetico,, Systema Naturae, 5 (2003), 219.   Google Scholar

[13]

International Human Genome Sequencing Consortium, Initial sequencing and analysis of the human genome,, Nature, 409 (2001), 860.   Google Scholar

[14]

A. G. Jegga and B. J. Aronow, Evolutionary conserved noncoding DNA,, in, (2006).   Google Scholar

[15]

S. Ohno, So much "junk" DNA in our genome,, Brookhaven Symposia in Biology, 23 (1972), 366.   Google Scholar

[16]

H. Pearson, Genetics: What is a gene?,, Nature, 441 (2006), 398.  doi: 10.1038/441398a.  Google Scholar

[17]

E. Pennisi, Genomics. DNA study forces rethink of what it means to be a gene.,, Science (New York, 316 (2007), 1556.   Google Scholar

[18]

E. Properzi, "Genome Characterization Through the Mathematical Structure of the Genetic Code: An Analysis of the Whole Chromosome 1 of A. Thaliana,", PhD Thesis, ().   Google Scholar

[19]

M. Quimbaya, K. Vandepoele, E. Rasp, M. Matthijs, S. Dhondt, G. T. Beemster, G. Berx and L. De Veylder, Identification of putative cancer genes through data integration and comparative genomics between plants and humans,, Cell. Mol. Life Sci., 69 (2012), 2041.  doi: 10.1007/s00018-011-0909-x.  Google Scholar

[20]

R Development Core Team, R: A language and environment for statistical computing,, R Foundation for Statistical Computing, (2012).   Google Scholar

[21]

The Arabidopsis Genome Initiative, Analysis of the genome sequence of the flowering plant Arabidopsis thaliana,, Nature, 408 (2000), 796.  doi: 10.1038/35048692.  Google Scholar

[22]

TAIR, Genome Annotation,, \url{http://www.arabidopsis.org/}, ().   Google Scholar

[23]

O. Trapp, K. Seeliger and H. Puchta, Homologs of breast cancer genes in plants,, Front. Plant Sci., 2 (2011).   Google Scholar

[24]

J. C. Venter et al., The sequence of the human genome,, Science, 291 (2001), 1304.   Google Scholar

[25]

K. Watanabe and T. Suzuki, "Genetic Code and its Variants,", in, (2006).   Google Scholar

show all references

References:
[1]

B. Efron, "Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction,", Cambridge University Press, (2010).   Google Scholar

[2]

G. Elgar and T. Vavouri, Tuning in to the signals: Noncoding sequence conservation in vertebrate genomes,, Trends in genetics, 24 (2008), 344.   Google Scholar

[3]

A. Elzanowski and J. Ostell, The genetic codes,, National Center for Biotechnology Information (NCBI), (): 2008.   Google Scholar

[4]

D. L. Gonzalez, Can the genetic code be mathematically described?,, Medical Science Monitor, 10 (2004), 11.   Google Scholar

[5]

D. L. Gonzalez, Error detection and correction codes,, in, (2008), 379.   Google Scholar

[6]

D. L. Gonzalez, The mathematical structure of the genetic code,, in, (2008), 111.   Google Scholar

[7]

D. L. Gonzalez, S. Giannerini and R. Rosa, Detecting structures in parity binary sequences: Error correction and detection in DNA,, IEEE Engineering in Medicine and Biology Magazine, 25 (2006), 69.   Google Scholar

[8]

D. L. Gonzalez, S. Giannerini and R. Rosa, Strong short-range correlations and dichotomic codon classes in coding DNA sequences,, Physical review E, 78 (2008).   Google Scholar

[9]

D. L. Gonzalez, S. Giannerini and R. Rosa, The mathematical structure of the genetic code: a tool for inquiring on the origin of life,, Statistica, LXIX (2009), 143.   Google Scholar

[10]

D. L. Gonzalez, S. Giannerini and R. Rosa, Circular codes revisited: A statistical approach,, Journal of Theoretical Biology, 275 (2011), 21.   Google Scholar

[11]

S. Giannerini, D. L. Gonzalez and R. Rosa, DNA, frame synchronization and dichotomic classes: a quasicrystal framework,, Philosophical Transactions of the Royal Society. Series A, 370 (2012), 2987.   Google Scholar

[12]

D. L. Gonzalez and M. Zanna, Una nuova descrizione matematica del codice genetico,, Systema Naturae, 5 (2003), 219.   Google Scholar

[13]

International Human Genome Sequencing Consortium, Initial sequencing and analysis of the human genome,, Nature, 409 (2001), 860.   Google Scholar

[14]

A. G. Jegga and B. J. Aronow, Evolutionary conserved noncoding DNA,, in, (2006).   Google Scholar

[15]

S. Ohno, So much "junk" DNA in our genome,, Brookhaven Symposia in Biology, 23 (1972), 366.   Google Scholar

[16]

H. Pearson, Genetics: What is a gene?,, Nature, 441 (2006), 398.  doi: 10.1038/441398a.  Google Scholar

[17]

E. Pennisi, Genomics. DNA study forces rethink of what it means to be a gene.,, Science (New York, 316 (2007), 1556.   Google Scholar

[18]

E. Properzi, "Genome Characterization Through the Mathematical Structure of the Genetic Code: An Analysis of the Whole Chromosome 1 of A. Thaliana,", PhD Thesis, ().   Google Scholar

[19]

M. Quimbaya, K. Vandepoele, E. Rasp, M. Matthijs, S. Dhondt, G. T. Beemster, G. Berx and L. De Veylder, Identification of putative cancer genes through data integration and comparative genomics between plants and humans,, Cell. Mol. Life Sci., 69 (2012), 2041.  doi: 10.1007/s00018-011-0909-x.  Google Scholar

[20]

R Development Core Team, R: A language and environment for statistical computing,, R Foundation for Statistical Computing, (2012).   Google Scholar

[21]

The Arabidopsis Genome Initiative, Analysis of the genome sequence of the flowering plant Arabidopsis thaliana,, Nature, 408 (2000), 796.  doi: 10.1038/35048692.  Google Scholar

[22]

TAIR, Genome Annotation,, \url{http://www.arabidopsis.org/}, ().   Google Scholar

[23]

O. Trapp, K. Seeliger and H. Puchta, Homologs of breast cancer genes in plants,, Front. Plant Sci., 2 (2011).   Google Scholar

[24]

J. C. Venter et al., The sequence of the human genome,, Science, 291 (2001), 1304.   Google Scholar

[25]

K. Watanabe and T. Suzuki, "Genetic Code and its Variants,", in, (2006).   Google Scholar

[1]

Yi-Hsuan Lin, Gen Nakamura, Roland Potthast, Haibing Wang. Duality between range and no-response tests and its application for inverse problems. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2020072

2018 Impact Factor: 1.313

Metrics

  • PDF downloads (25)
  • HTML views (0)
  • Cited by (0)

[Back to Top]