\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Kernelized approaches to streaming compression of scientific data

  • *Corresponding author: Benjamin P. Russo

    *Corresponding author: Benjamin P. Russo 

This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the work for publication, acknowledges that the US government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the submitted manuscript version of this work, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Abstract / Introduction Full Text(HTML) Figure(18) / Table(2) Related Papers Cited by
  • In this paper three algorithms are developed for the streaming compression of scientific data. The algorithms presented are reliant on the theory of vector-valued reproducing kernel Hilbert spaces and operator valued kernels. The scientific data is modeled as a snapshot of time dependent vector-field $ F(x, t) $ over a manifold $ M $ and the recovery of the data is framed as a learning problem. These processes are then appropriately modified and analyzed for the streaming scenario in which data is generated without the ability to revisit past entries.

    Mathematics Subject Classification: 41A05, 46E22, 94A08.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  $ E_{[0, T]}(t) $ for the surrogate model generated by Algorithm 1 on both data sets

    Figure 2.  The $ L^2 $ percent error over time, $ E_{\text{basis}}(t) $, due to the choice of basis

    Figure 3.  A sample reconstruction of $ \alpha_j(t) $ for a randomly chosen wavelet function. The dashed line represents the kernel interpolation and the dots represent the Halton sequence of centers

    Figure 4.  $ E_{M}(x) $ for Algorithm 1 applied to data sets A (left) and data set B (right). All figures displayed on a log scale

    Figure 5.  Sample reconstruction by Algorithm 1 of a single image from data set B

    Figure 6.  5000 Halton sequence points over the domain $ M $ overlayed on a snapshot of the data

    Figure 7.  Sample reconstruction by Algorithm 2 using Gaussian kernels of a single image from data set A

    Figure 8.  Sample reconstruction by Algorithm 2 using thin-plate spline kernels of a single image from data set B

    Figure 9.  The generated coefficient functions for selected few basis functions $ \phi_j(t) = \sqrt{2}\{\cos( \pi\cdot j \cdot t)\}_{j = 1}^{199}\cup\{1\} $ for data set A

    Figure 10.  The generated coefficient functions for selected few basis functions $ \phi_j(t) = \sqrt{2} \{\cos( \pi\cdot j \cdot t)\}_{j = 1}^{199}\cup\{1\} $ for data set B

    Figure 11.  $ E_{[0, T]}(t) $ for Algorithm 2 applied to data sets A (top) and data set B (bottom)

    Figure 12.  $ E_{M}(x) $ for Algorithm 2 applied to data sets A (left) and data set B (right). All figures displayed on a log scale

    Figure 13.  $ F(x_i, t) $ for $ t\in [0, T] $ at two different interpolation points in the domain $ M $ taken from data set A with additive noise

    Figure 14.  In this figure we show an example frame of a reconstruction produced by Algorithm 3 applied to data set A and data set A with additive noise

    Figure 15.  $ E_M(x) $ for Algorithm 3 applied to data set A and data set A with additive noise. Displayed on a log scale

    Figure 16.  $ E_{[0, T]}(t) $ for Algorithm 3 applied to data set A (blue) and data set A with additive noise (orange)

    Figure 17.  Coefficient functions produced by Algorithm 3 applied to a noisy signal corresponding to a few kernels over the time domain $ k(t, c) $ where $ c $ is a center belonging to the Halton sequence. Compare with Figure 18 which are the same functions derived from a non-noisy signal

    Figure 18.  Coefficient functions produced by Algorithm 3 applied to a noise-free signal corresponding to a few kernels over the time domain $ k(t, c) $ where $ c $ is a center belonging to the Halton sequence. Compare to Figure 17

    Table 1.  Description of fluid flow data sets

    data set Description
    total simulation $ 100, 000 $ time-steps using the Lattice-Boltzmann method with $ \Delta x = \Delta t = 1 $.
    data set A $ 1, 000 $ equally spaced time-steps between snapshots $ 32, 000 $ and $ 42, 000 $.
    data set B $ 1, 000 $ equally spaced time-steps between snapshots $ 22, 000 $ and $ 32, 000 $.
     | Show Table
    DownLoad: CSV

    Table 2.  Summary of the performance of Algorithm 1–3 with chosen parameters and details given in Sections 5.1–5.3. Here, A$ ^* $ represents data set A with noise

    data alg. basis kernel start size compressed $\mathit{\boldsymbol{ E}}_{\text{overall}} $
    A 1 Haar Gaussian $ 16, 000 \times 1000 $ $ 4000 \times 202 $ $ 24.36 \% $
    2 Fourier Gaussian $ 16, 000 \times 1000 $ $ 5000 \times 200 $ $ 17.90 \% $
    3 N/A Gaussian/spline $ 16, 000 \times 1000 $ $ 3200 \times 202 $ $ 20.40\% $
    A$ ^* $ 3 N/A Gaussian/spline $ 16, 000 \times 1000 $ $ 3200 \times 202 $ $ 23.41\% $
    B 1 Haar Gaussian $ 16, 000 \times 1000 $ $ 4000 \times 202 $ $ 23.44 \% $
    2 Fourier thin-plate spline $ 16, 000 \times 1000 $ $ 5000 \times 200 $ $ 14.74 \% $
     | Show Table
    DownLoad: CSV
  • [1] V. Bhaskaran and K. Konstantinides, Image and Video Compression Standards: Algorithms and Architectures, Springer Science & Business Media, 1997.
    [2] B. L. Brown, W. L. Miller, D. Bard, A. Boehnlein, K. Fagnan, C. Guok, E. Lançon, S. J. Ramprakash, M. Shankar and N. Schwarz, Integrated research infrastructure architecture blueprint activity (final report 2023), Technical report, US Department of Energy (USDOE), Washington, DC (United States). Office of …, 2023.
    [3] D. BullCommunicating pictures: A course in Image and Video Coding, Academic Press, 2014. 
    [4] C. Canuto, M. Y. Hussaini, A. Quarteroni and T. A. Zang, Spectral Methods: Fundamentals in Single Domains, Springer Science & Business Media, 2007.
    [5] C. Canuto and A. Quarteroni, Approximation results for orthogonal polynomials in sobolev spaces, Mathematics of Computation, 38 (1982), 67-86.  doi: 10.1090/S0025-5718-1982-0637287-3.
    [6] F. CappelloS. DiS. LiX. LiangA. M. GokD. TaoC.-H. YoonX.-C. WuY. Alexeev and F. T. Chong, Use cases of lossy compression for floating-point data in scientific data sets, The International Journal of High Performance Computing Applications, 33 (2019), 1201-1220.  doi: 10.1177/1094342019853336.
    [7] L. Cordier, Proper orthogonal decomposition: An overview, Lecture series 2008-01 on post-processing of experimental and numerical data, Von Karman Institute for Fluid Dynamics, février 2008.
    [8] A. C. Eberendu, et al., Unstructured data: An overview of the data of big data, International Journal of Computer Trends and Technology, 38 (2016), 46-50. doi: 10.14445/22312803/IJCTT-V38P109.
    [9] G. E. Fasshauer, Meshfree Approximation Methods with MATLAB, volume 6., World Scientific, 2007.
    [10] N. M. Freris, O. Öçal and M. Vetterli, Compressed sensing of streaming data, In 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton), (2013), 1242-1249.
    [11] J. H. Halton, Algorithm 247: Radical-inverse quasi-random point sequence, Communications of the ACM, 7 (1964), 701-702.  doi: 10.1145/355588.365104.
    [12] T. Hines, RBF, https://rbf.readthedocs.io/en/latest/index.html.
    [13] G. LeeR. GommersF. WaselewskiK. Wohlfahrt and A. O'Leary, Pywavelets: A python package for wavelet analysis, Journal of Open Source Software, 4 (2019), 1237.  doi: 10.21105/joss.01237.
    [14] A. Li, S. Becker and A. Doostan, Online randomized interpolative decomposition with a posteriori error estimator for temporal pde data reduction, preprint, arXiv: 2405.16076, 2024.
    [15] K. MurphyMachine Learning: A probabilistic perspective, MIT Press, 2012. 
    [16] S. OsherZ. Shi and W. Zhu, Low dimensional manifold model for image processing, SIAM Journal on Imaging Sciences, 10 (2017), 1669-1690.  doi: 10.1137/16M1058686.
    [17] V. I. Paulsen and M. Raghupathi, An Introduction to the Theory of Reproducing Kernel Hilbert Spaces, Cambridge Stud. Adv. Math., 152 Cambridge University Press, Cambridge, 2016.
    [18] B. P. Russo, M. P. Laiu and R. Archibald, Streaming compression of scientific data via weak-sindy, 2023.
    [19] D. Salomon, Data Compression, Springer, 2002. doi: 10.1007/978-0-387-21708-6_4.
    [20] D. V. Schroeder, Fluid Dynamics Simulation, https://physics.weber.edu/schroeder/fluids/.
    [21] N. Shklov, Simpson's rule for unequally spaced ordinates, The American Mathematical Monthly, 67 (1960), 1022-1023.  doi: 10.2307/2309244.
    [22] A. J. Smola and B. Schölkopf, Learning with Kernels, volume 4., Citeseer, 1998.
    [23] H. Wendland, Scattered Data Approximation, volume 17. Cambridge university press, 2004. doi: 10.1017/CBO9780511617539.
    [24] W. ZhuB. WangR. BarnardC. D. HauckF. Jenko and S. Osher, Scientific data interpolation with low dimensional manifold model, Journal of Computational Physics, 352 (2018), 213-245.  doi: 10.1016/j.jcp.2017.09.048.
  • 加载中

Figures(18)

Tables(2)

SHARE

Article Metrics

HTML views(417) PDF downloads(78) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return