\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents
Early Access

Early Access articles are published articles within a journal that have not yet been assigned to a formal issue. This means they do not yet have a volume number, issue number, or page numbers assigned to them, however, they can still be found and cited using their DOI (Digital Object Identifier). Early Access publication benefits the research community by making new scientific discoveries known as quickly as possible.

Readers can access Early Access articles via the “Early Access” tab for the selected journal.

HaCOO: Hashed coordinate sparse tensor Storage for rapid data insertion, access, and updates

  • *Corresponding author: Michael W. Berry

    *Corresponding author: Michael W. Berry
Abstract / Introduction Full Text(HTML) Figure(7) / Table(1) Related Papers Cited by
  • Tensors, or $ n $-way arrays, store data that can be indexed along an arbitrary number of dimensions. Tensor analysis using decomposition techniques can reveal latent factors and useful information about the underlying data. Tensors can grow extremely large, often containing millions of entries, and may be sparse, with the majority of the values equal to zero, making explicit storage impractical. Interest in tensor analysis using decomposition has expanded across diverse fields, including data mining, signal processing, computer vision, and machine learning. However, few sparse tensor formats provide an efficient method for locating arbitrary non-zero values.

    A motivating example comes from Lowe's textual influence model (2018), which applied non-negative sparse tensor decomposition to measure authors' contributions. Building the tensor required counting n-grams in a sliding window, necessitating frequent updates and insertions. Lowe discovered the need for a sparse tensor format that allowed rapid updates and insertions without regenerating the entire structure. We recognized the potential for a new sparse tensor storage format for applications that require frequent and rapid tensor lookup and insertion. This observation motivated the proposal of Hashed Coordinate Storage (HaCOO) in 2021, a mode-agnostic sparse tensor format that stores indexes and values in a separate chaining hash table, allowing constant-time insertion and access to arbitrary entries. HaCOO also balances storage efficiency by adopting ALTO's adaptive encoding scheme, which encodes index metadata using the minimum number of bits while supporting rapid structural updates.

    Mathematics Subject Classification: Primary: 68P20; Secondary: 68P05.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  A third-order tensor

    Figure 2.  CP Decomposition of a third-order tensor [7]

    Figure 3.  A sparse tensor in COO format [10]

    Figure 4.  The process of converting a COO tensor to an ALTO tensor

    Figure 5.  Converting a COO tensor to HaCOO format

    Figure 6.  Execution times for HaCOO and ALTO across select tensors

    Figure 7.  Execution time of parallel MTTKRP across individual modes for select tensors

    Table 1.  Characteristics of FROSTT sparse tensors

    Tensor Dimensions #NNZs Density
    LBNL $ 1.6K \times 4.2K \times 1.6K \times 4.2K \times 868.1K $ $ 1.7M $ $ 4.2 \times 10^{-14} $
    NIPS $ 2.5K \times 2.9K \times 14K \times 17 $ $ 3.1M $ $ 1.8 \times 10^{-6} $
    UBER $ 183 \times 24 \times 1.1K \times 1.7K $ $ 3.3M $ $ 3.8 \times 10^{-4} $
    CHICAGO $ 6.2K \times 24 \times 77 \times 32 $ $ 5.3M $ $ 1.5 \times 10^{-2} $
     | Show Table
    DownLoad: CSV
  • [1] B. W. BaderM. W. Berry and M. Browne, Discussion tracking in enron email using parafac, Survey of Text Mining II, Springer, London, (2008), 147-163.  doi: 10.1007/978-1-84800-046-9_8.
    [2] B. W. Bader, T. G. Kolda, et al., Matlab Tensor Toolbox Version 2.6, Available misc, 2015.
    [3] C. F. Beckmann and S. M. Smith, Tensorial extensions of independent component analysis for multisubject fmri analysis, NeuroImage, 25 (2005), 294-311.  doi: 10.1016/j.neuroimage.2004.10.043.
    [4] A. E. Helal, J. Laukemann, F. Checconi, J. J. Tithi, T. Ranadive, F. Petrini and J. Choi, Alto: Adaptive linearized storage of sparse tensors, ICS '21: Proceedings of the ACM International Conference on Supercomputing (New York, NY, USA), Association for Computing Machinery, (2021), 404-416.
    [5] R. Henrion, N-way principal component analysis theory, algorithms and applications, Chemometrics and Intelligent Laboratory Systems, 25 (1994), 1-23.  doi: 10.1016/0169-7439(93)E0086-J.
    [6] B. Jenkins, Hash functions, 1997.
    [7] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Review, 51 (2009), 455-500.  doi: 10.1137/07070111X.
    [8] T. G. Kolda, B. W. Bader and J. P. Kenny, Higher-order web link analysis using multilinear algebra, Fifth IEEE International Conference on Data Mining (ICDM'05) (Piscataway, NJ), IEEE Press, (2005), 8 pp.
    [9] L. Lathauwer and B. Moor, From matrix to tensor: Multilinear algebra and signal processing, 2015.
    [10] J. Li, J. Sun and R. W. Vuduc, Hicoo: Hierarchical storage of sparse tensors, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE Press, Piscataway, NJ, (2018), 238-252.
    [11] R. Lowe, M. Charles and A. Singh, Hashed Coordinate Storage Of Sparse Tensors, SC21: International Conference for High Performance Computing, Networking, Storage, and Analysis (New York, NY, USA), Association for Computing Machinery, 2021, Poster.
    [12] S. Rabanser, O. Shchur and S. Günnemann, Introduction to tensor decompositions and their applications in machine learning, 2017.
    [13] S. Smith, J. W. Choi, J. Li, R. Vuduc, J. Park, X. Liu and G. Karypis, FROSTT: The formidable repository of open sparse tensors and tools, 2017.
    [14] S. Smith and G. Karypis, Tensor-matrix products with a compressed sparse tensor, Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms (New York, NY, USA), Association for Computing Machinery, (2015), 1-7.
    [15] TensorFlow, Working with sparse tensors, 2023.
    [16] M. A. O. Vasilescu and Demetri Terzopoulos, Multilinear analysis of image ensembles: Tensorfaces, Computer Vision — ECCV 2002 (Berlin), (2002), 447–460. doi: 10.1007/3-540-47969-4_30.
  • 加载中

Figures(7)

Tables(1)

SHARE

Article Metrics

HTML views(343) PDF downloads(74) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return