Driven by the applications in DNA storage, codes capable of correcting errors in DNA sequences have attracted considerable interest. This paper concentrates on a scenario in which sequences are affected by errors known as reverse-complement duplications. More precisely, we construct error-correcting codes for channel where a single reverse-complement duplication of arbitrary length may happen. In general, handling reverse-complement duplications of variable (non-fixed) lengths involves combinatorial challenges, and our construction is the first code capable of correcting such errors of arbitrary lengths. By investigating some special combinatorial structures of the errors, we propose an explicit code construction using duplication-free sequences. Furthermore, we show that as the alphabet size $ q $ varies as an increasing function of $ n $, the asymptotic rate is 1.
| Citation: |
| [1] |
E. Ben-Tolila and M. Schwartz, On the reverse-complement sequence-duplication system, IEEE Trans. Inform. Theory, 68 (2022), 7184-7197.
doi: 10.1109/TIT.2022.3182873.
|
| [2] |
G. M. Church, Y. Gao and S. Kosuri, Next-generation digital information storage in DNA, Science, 337 (2012), 1628.
doi: 10.1126/science.1226355.
|
| [3] |
O. Elishco, On the long-term behavior of k-tuples frequencies in mutation systems, IEEE Trans. Inform. Theory, 70 (2024), 8524-8545.
|
| [4] |
O. Elishco, F. Farnoud, M. Schwartz and J. Bruck, The entropy rate of some Pólya sequence models, IEEE Trans. Inform. Theory, 65 (2019), 8180-8193.
doi: 10.1109/TIT.2019.2936556.
|
| [5] |
F. Farnoud, M. Schwartz and J. Bruck, The capacity of sequence-duplication systems, IEEE Trans. Inform. Theory, 62 (2016), 811-824.
doi: 10.1109/TIT.2015.2505735.
|
| [6] |
F. Farnoud, M. Schwartz and J. Bruck, Estimation of duplication history under a stochastic model for tandem repeats, BMC Bioinformatics, 20 (2019), article number 64, 11pp.
doi: 10.1186/s12859-019-2603-1.
|
| [7] |
R. Gabrys and F. Sala, Codes correcting two deletions, IEEE Trans. Inform. Theory, 65 (2019), 965-974.
doi: 10.1109/TIT.2018.2876281.
|
| [8] |
N. Goldman, P. Bertone, S. Chen, C. Dessimoz, E.M. LeProust, B. Sipos and E. Birney, Towards practical, high-capacity, low-maintenance information storage in synthesized DNA, Nature, 494 (2013), 77-80.
|
| [9] |
D. Goshkoder, N. Polyanskii and I. Vorobyev, Codes correcting long duplication errors, IEEE Trans. Molecular, Biological, Multi-Scale Commun., 10 (2024), 272-288.
doi: 10.1109/TMBMC.2024.3403755.
|
| [10] |
A. S. Helberg and H. C. Ferreira, On multiple insertion/deletion correcting codes, IEEE Trans. Inform. Theory, 48 (2002), 305-308.
doi: 10.1109/18.971760.
|
| [11] |
S. Jain, F. Farnoud and J. Bruck, Capacity and expressiveness of genomic tandem duplication, IEEE Trans. Inform. Theory, 63 (2017), 6129-6138.
doi: 10.1109/TIT.2017.2728079.
|
| [12] |
S. Jain, F. Farnoud, M. Schwartz and J. Bruck, Duplication-correcting codes for data storage in the DNA of living organisms, IEEE Trans. Inform. Theory, 63 (2017), 4996-5010.
doi: 10.1109/TIT.2017.2688361.
|
| [13] |
M. Kovačević, Zero-error capacity of duplication channels, IEEE Trans. Commun., 67 (2019), 6735-6742.
doi: 10.1109/TCOMM.2019.2931342.
|
| [14] |
A. Lenz, A. Wachter-Zeh and E. Yaakobi, Duplication-correcting codes, Des. Codes Cryptogr., 87 (2019), 277-298.
doi: 10.1007/s10623-018-0523-0.
|
| [15] |
V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals, Soviet Physics Doklady, 10 (1966), 707-710.
|
| [16] |
S. Liu, I. Tjuawinata and C. Xing, Explicit construction of $q$-ary 2-deletion correcting codes with low redundancy, IEEE Trans. Inform. Theory, 70 (2024), 4093-4101.
doi: 10.1109/TIT.2024.3360964.
|
| [17] |
S. Liu and C. Xing, Bounds and constructions for insertion and deletion codes, IEEE Trans. Inform. Theory, 69 (2023), 928-940.
doi: 10.1109/TIT.2022.3199503.
|
| [18] |
Z. Lu and Y. Zhang, $t$-deletion-$s$-insertion-burst correcting codes, IEEE Trans. Inform. Theory, 69 (2023), 6401-6413.
doi: 10.1109/TIT.2023.3289055.
|
| [19] |
N. I. Mundy and A. J. Helbig, Origin and evolution of tandem repeats in the mitochondrial DNA control region of shrikes, J. Molecular Evolution, 59 (2004), 250-257.
doi: 10.1007/s00239-004-2619-6.
|
| [20] |
T. Saeki and T. Nozaki, An improvement of non-binary code correcting single $b$-burst of insertions or deletions, Proc. Int. Symp. Inform. Theory Appl. (ISITA), (2018), 6-10.
|
| [21] |
C. Schoeny, A. Wachter-Zeh, R. Gabrys and E. Yaakobi, Codes correcting a burst of deletions or insertions, IEEE Trans. Inform. Theory, 63 (2017), 1971-1985.
doi: 10.1109/TIT.2017.2661747.
|
| [22] |
S. L. Shipman, J. Nivala, J. D. Macklis and G. M. Church, CRISPR-Cas encoding of digital movie into the genomes of a population of living bacteria, Nature, 547 (2017), 345-349.
doi: 10.1038/nature23017.
|
| [23] |
J. Sima, N. Raviv, M. Schwartz and J. Bruck, Error Correction for DNA Storage, IEEE BITS Inform. Theory Mag., 3 (2023), 78-94.
doi: 10.1109/MBITS.2023.3318516.
|
| [24] |
W. Song and K. Cai, Non-binary two-deletion correcting codes and burst-deletion correcting codes, IEEE Trans. Inform. Theory, 69 (2023), 6470-6484.
doi: 10.1109/TIT.2023.3285012.
|
| [25] |
Y. Sun, Z. Lu, Y. Zhang and G. Ge, Asymptotically Optimal Codes for $(t, s)$-Burst Error, IEEE Trans. Inform. Theory, 71 (2025), 1570-1584.
doi: 10.1109/TIT.2025.3531915.
|
| [26] |
Y. Tang and F. Farnoud, Error-correcting codes for short tandem duplication and edit errors, IEEE Trans. Inform. Theory, 68 (2022), 871-880.
doi: 10.1109/TIT.2021.3125724.
|
| [27] |
Y. Tang, S. Wang, H. Lou, R. Gabrys and F. Farnoud, Low-redundancy codes for correcting multiple short-duplication and edit errors, IEEE Trans. Inform. Theory, 69 (2023), 2940-2954.
doi: 10.1109/TIT.2022.3233733.
|
| [28] |
Y. Tang, Y. Yehezkeally, M. Schwartz and F. Farnoud, Single-error detection and correction for duplication and substitution channels, IEEE Trans. Inform. Theory, 66 (2020), 6908-6919.
doi: 10.1109/TIT.2020.3006228.
|
| [29] |
G. Tenengolts, Nonbinary codes, correcting single deletion or insertion, IEEE Trans. Inform. Theory, 30 (1984), 766-769.
doi: 10.1109/TIT.1984.1056962.
|
| [30] |
L. Yohananov and M. Schwartz, On the coding capacity of reverse-complement and palindromic duplication-correcting codes, Des. Codes Cryptogr., 93 (2025), 3283-3302.
doi: 10.1007/s10623-025-01627-7.
|
| [31] |
W. Yu and M. Schwartz, On duplication-free codes for disjoint or equal-length errors, Des. Codes Cryptogr., 92 (2024), 2845-2861.
doi: 10.1007/s10623-024-01417-7.
|
| [32] |
M. Zeraatpisheh, M. Esmaeili and T. A. Gulliver, Construction of tandem duplication correcting codes, IET Commun., 13 (2019), 2217-2225.
doi: 10.1049/iet-com.2018.6053.
|
| [33] |
M. Zeraatpisheh, M. Esmaeili and T. A. Gulliver, Construction of duplication correcting codes, IEEE Access, 8 (2020), 96150-96161.
doi: 10.1109/ACCESS.2020.2995812.
|