\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Unsupervised physics-informed disentanglement of multimodal data

  • *Corresponding author: Nathaniel Trask

    *Corresponding author: Nathaniel Trask 
Abstract / Introduction Full Text(HTML) Figure(7) / Table(5) Related Papers Cited by
  • We introduce physics-informed multimodal autoencoders (PIMA) - a variational inference framework for discovering shared information in multimodal datasets. Individual modalities are embedded into a shared latent space and fused through a product-of-experts formulation, enabling a Gaussian mixture prior to identify shared features. Sampling from clusters allows cross-modal generative modeling, with a mixture-of-experts decoder that imposes inductive biases from prior scientific knowledge and thereby imparts structured disentanglement of the latent space. This approach enables cross-modal inference and the discovery of features in high-dimensional heterogeneous datasets. Consequently, this approach provides a means to discover fingerprints in multimodal scientific datasets and to avoid traditional bottlenecks related to high-fidelity measurement and characterization of scientific datasets.

    Mathematics Subject Classification: Primary: 62P35; Secondary: 68T07, 68T99.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Individual modalities are encoded into Gaussian distributions in a shared latent space. During training, the posterior is sampled from a product-of-experts distribution fusing complementary information into a shared multimodal Gaussian distribution. A Gaussian mixture prior parameterizes clusters encoding cross-modal shared information. Sampling from mixture components provides generative models using either black-box decoders or expert physics models incorporating prior physics knowledge. To facilitate cross-modal generative inference, a random selection of modalities is used on each epoch of training to encourage unimodal embeddings to reproduce the multimodal embedding, allowing inference of $ p(c|X_i) $. The MNIST dataset is shown here, where images are augmented with a synthetic, linear modality whose slope matches the digit label. A linear model for each cluster is used as the expert decoder for the synthetic modality, and successful unsupervised disentanglement implies multimodal fingerprint detection

    Figure 2.  Experimental setup for test examples. For unsupervised multimodal MNIST, we use training images (A1) and we replace the labels on digits $ c\in \left\{ {0, \dots, 9} \right\} $ by a sample of the function (A2) $ X_2 = ct + \epsilon $, for $ t \in [0,1] $ and Gaussian noise $ \epsilon $. For neural ODE MNIST, we use the training images (A1) and replace the label on digits by solutions to an ODE system (A3). For VDoS, we use the 1D VDoS data (B1) and the corresponding 0D average stress (B2)

    Figure 3.  MNIST clusters in the latent space and resulting confusion matrices for a) multimodal dataset with dropout, b) multimodal dataset without dropout, c) unimodal $ X_1 $ image dataset, d) unimodal $ X_2 $ 1D dataset. The white ellipses in the latent space represent two standard deviations of each cluster in the GMM. The approximate banding of the matrix in a) and b) illustrates that the sequential embedding of clusters limits misclassified digits to numbers with similar values in $ X_2 $

    Figure 4.  MNIST clusters, confusion matrix, and accuracy for the best PIMA run on the multimodal ODE dataset with an encoding dimension of size 3. With a NODEs expert model, maximum test accuracy achieved is 100.0%. With data-driven models, the average test accuracies are 82.3% and 62.6% for large and small 1D convolutional architectures, respectively. Statistics are computed from 10 independent runs

    Figure 5.  Ground-truth trajectories (black solid lines) and reconstructed trajectories (dashed lines) by solving IVPs with the learned ODE parameters

    Figure 6.  PIMA results on the VDoS dataset. VDoS latent space (left) with four clusters, where embedded data points are colored by the normalized true stress values. (Right) Nearest true VDoS data to the means of each cluster in the latent space, colored by the normalized stress. Results show a latent space organized by stress and VDoS profiles; when referencing the data generation information in Figure 7, we also see clusters 0 and 2 are differentiated by compression type, indicating discovery of hidden features

    Figure 7.  (Left) Number of specimens per cluster from the training dataset that underwent uniaxial compression, hydrostatic compression, or no compression. (Right) Predicted vs. true normalized average stress values over entire VDoS dataset

    Table 1.  Distributions, priors, variables, and trainable parameters. Here the $ F_{m} $ and $ G_{m,c} $ are each deep neural networks with respective weights and biases $ \theta_m $ and $ \hat{\theta}_{m,c} $. When suitable, each decoder network $ G_{m,c} $ can optionally be replaced with an expert model $ \mathcal{E}_{m,c} $

    Distribution Priors Computation Trainable Parameters
    $ p(X_m \vert Z, C=c) $ $ \mathcal{N}(\hat{\mu}_{m,c}, \hat{\sigma}_{m,c}^2 \mathbf{I}) $ $ [\hat{\mu}_{m,c}, \hat{\sigma}^2_{m,c}] \quad = G_{m,c}(Z; \hat{\theta}_{m,c}) $ $ \hat{\theta}_{m,c} $, for $ m{=}1,\dots, M, $
    $ \qquad \quad \quad c{=}1, \dots, N $
    $ p(Z \vert C=c) $ $ \mathcal{N}(\tilde{\mu}_c, \tilde{\sigma}^2_c\mathbf{I}) $ $ \tilde{\mu}_{c} \quad = $ Equation 15
    $ \tilde{\sigma}_{c}^2 \quad = $ Equation 15
    $ p(C) $ $ \text{Cat}(\pi) $ $ {\pi} \quad = \texttt{softmax}(\vec{v}) $ $ \vec{v}=(v_1, \dots, v_{N}) $
    $ q(Z_m \vert X_m) $ $ \mathcal{N}(\mu_{m}, \sigma^2_{m} \mathbf{I}) $ $ [\mu_{m}, \sigma^2_{m}] \quad = F_{m}(X_m ; \theta_m) $ $ \theta_m $, for $ m{=}1, \dots, M $
    $ q(Z \vert X_1, \dots, X_M) $ $ \mathcal{N}(\mu, \sigma^2 \mathbf{I}) $ $ \sigma^{2} \quad = $ Equation 5
    $ \mu \quad = $ Equation 5
     | Show Table
    DownLoad: CSV

    Table 2.  Unsupervised classification accuracy for MNIST. Results gathered from [2], [21] and [11] denoted by *, $ \dagger $ and $ \dagger\dagger $, respectively. If statistics were not provided we assume maximum accuracy was reported. While the data augmentation offered by $ X_2 $ is not incorporated in comparisons to unimodal unsupervised benchmarks, a comparison to the supervised setting is valid. For all experiments we do not overparameterize and keep clusters equal to the number of digits. The PIMA results are reported on the standard 10,000 test samples. Averages and standard deviation results are reported over 9 runs with different random seeds

    Method CNN SotA$ ^* $ VAE+GMM$ ^\dagger $ DEC$ ^\dagger $ VaDE$ ^\dagger $ GMVAE$ ^{\dagger\dagger} $ GMVAE$ ^{\dagger\dagger} $
    Notes Supervised 10 clusters 16 clusters
    Acc. (max) 99.91% 72.94% 84.30% 94.46% 88.54% 96.92%
    Acc. (mean±stdev) n/a n/a n/a n/a 82.31% (3.75%) 87.82% (5.33%)
    Method PIMA PIMA PIMA PIMA PIMA
    Notes multimodal dropout multimodal no dropout $ X_1 $ only
    -
    $ X_2 $ only
    -
    multi., no expert dropout
    Acc. (max) 99.79% 99.59% 14.84% 53.37% 58.36%
    Acc. (mean±stdev) 90.31% (14.81%) 87.95% (11.70%) - - -
    $ X_1 $ Acc. (max) 39.15% 11.26% - - 50.34%
    $ X_2 $ Acc. (max) 99.92% 32.39% - - 56.22%
     | Show Table
    DownLoad: CSV

    Table 3.  Ground-truth ODE parameters $ \beta(c) $ and identified ODE parameters

    label ($ c $) 0 1 2 3 4
    $ \beta(c) $ 0.5 .833 1.166 1.5 1.833
    identified $ \beta $ 0.4997 0.8339 1.1661 1.4999 1.8339
    label ($ c $) 5 6 7 8 9
    $ \beta(c) $ 2.166 2.5 2.833 3.166 3.5
    identified $ \beta $ 2.1669 2.5010 2.8330 3.1673 3.4989
     | Show Table
    DownLoad: CSV

    Table 4.  Hyperparameters for each experiment

    learning rate encoding dim number of clusters number of epochs
    Experiment 4.1 $ 3.125\times 10^{-7} $ 2 10 40,000
    Experiment 4.2 $ 1\times 10^{-3} $ 3 10 10,000
    Experiment 4.3 $ 9\times 10^{-5} $ 2 4 40,000
     | Show Table
    DownLoad: CSV

    Table 5.  Model decisions for each experiment

    DOFs Rescaling Single Decoder per Modality Dropout Expert Model
    Experiment 4.1
    Experiment 4.2
    Experiment 4.3
     | Show Table
    DownLoad: CSV
  • [1] S. Amal, L. Safarnejad, J. A. Omiye, I. Ghanzouri, J. H. Cabot and E. G. Ross, Use of multi-modal data and machine learning to improve cardiovascular disease care, Frontiers in Cardiovascular Medicine, 2 (2022).
    [2] S. An, M. Lee, S. Park, H. Yang and J. So, An ensemble of simple convolutional neural network models for MNIST digit recognition, arXiv preprint, arXiv: 2008.10400, 2020.
    [3] T. BaltrušaitisC. Ahuja and L.-P. Morency, Multimodal machine learning: A survey and taxonomy, IEEE Transactions on Pattern Analysis and Machine Intelligence, 41 (2018), 423-443. 
    [4] L. Biewald, Experiment Tracking with Weights and Biases, Software available from wandb.com, 2020.
    [5] B. L. Boyce and M. D. Uchic, Progress toward autonomous experimental systems for alloy development, MRS Bulletin, 44 (2019), 273-280. 
    [6] C. P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins and A. Lerchner, Understanding disentangling in $\beta$-VAE, arXiv preprint, arXiv: 1804.03599, 2018.
    [7] A. Chakraborty, P. Nandi and B. Chakraborty, Fingerprints of the quantum space-time in time-dependent quantum mechanics: An emergent geometric phase, Nuclear Phys. B, 975 (2022), Paper No. 115691, 27 pp. doi: 10.1016/j.nuclphysb.2022.115691.
    [8] R. T. Chen, X. Li, R. Grosse and D. Duvenaud, Isolating sources of disentanglement in vaes, in Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, 2615-2625.
    [9] R. T. Chen, Y. Rubanova, J. Bettencourt and D. K. Duvenaud, Neural ordinary differential equations, Advances in Neural Information Processing Systems, 31 (2018).
    [10] J. Cioffi and T. Kailath, Fast, recursive-least-squares transversal filters for adaptive filtering, IEEE Transactions on Acoustics, Speech, and Signal Processing, 32 (1984), 304-337. 
    [11] N. Dilokthanakul, P. A. Mediano, M. Garnelo, M. C. Lee, H. Salimbeni, K. Arulkumaran and M. Shanahan, Deep unsupervised clustering with Gaussian mixture variational autoencoders, arXiv preprint, arXiv: 1611.02648, 2016.
    [12] F. Dos Santos Rodrigues, G. Delgado, T. Santana de Costa and L. Tasic, Applications of fluorescence spectroscopy in protein conformational changes and intermolecular contacts, BBA Advances, 3 (2023).
    [13] M. El Hariri El Nokab and K. Sebakhy, Solid state nmr spectroscopy a valuable technique for structural insights of advanced thin film materials: A review, Nanomaterials (Basel), 11 (2021).
    [14] D. Gao, J. Huang, X. Lin, D. Yang, Y. Wang and H. Zheng, Phase transitions and chemical reactions of octahydro-1, 3, 5, 7-tetranitro-1, 3, 5, 7-tetrazocine under high pressure and high temperature, RSC Advances, 9 (2019).
    [15] K. Hasselmann, Multi-pattern fingerprint method for detection and attribution of climate change, Climate Dynamics, 13 (1997), 601-611. 
    [16] G. Hegerl, F. Zwiers, P. Braconnot, N. P. Gillett, Y. M. Luo, J. M. Orsini, N. Nicholls, J. E. Penner and P. A. Stott, Understanding and Attributing Climate Change, 2007.
    [17] I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed and A. Lerchner, beta-vae: Learning basic visual concepts with a constrained variational framework, in 5th International Conference on Learning Representations, ICLR, 2017 (2017).
    [18] J. D. Hunter, Matplotlib: A 2d graphics environment, Computing in Science & Engineering, 9 (2007), 90-95. 
    [19] O. IsayevD. FourchesE. N. MuratovC. OsesK. RaschA. Tropsha and S. Curtarolo, Materials cartography: Representing and mining materials space using structural and electronic fingerprints, Chemistry of Materials, 27 (2015), 735-743. 
    [20] E. Jang, S. Gu and B. Poole, Categorical reparameterization with gumbel-softmax, arXiv preprint, arXiv: 1611.01144, 2016.
    [21] Z. Jiang, Y. Zheng, H. Tan, B. Tang and H. Zhou, Variational deep embedding: An unsupervised and generative approach to clustering, in Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017, 1965-1972.
    [22] M. I. Jordan and R. A. Jacobs, Hierarchical mixtures of experts and the em algorithm, Proceedings of 1993 International Conference on Neural Networks, 6 (1993), 181-214.  doi: 10.1109/IJCNN.1993.716791.
    [23] H. Kim and A. Mnih, Disentangling by factorising, in International Conference on Machine Learning, PMLR, 2018, 2649-2658.
    [24] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint, arXiv: 1412.6980, 2014.
    [25] D. P. Kingma and M. Welling, Auto-Encoding Variational Bayes, in 2nd International Conference on Learning Representations, ICLR 2014, 2014.
    [26] H. W. Kuhn, The hungarian method for the assignment problem, Naval Research Logistics Quarterly, 2 (1955), 83-97.  doi: 10.1002/nav.3800020109.
    [27] I. E. LagarisA. Likas and D. I. Fotiadis, Artificial neural networks for solving ordinary and partial differential equations, IEEE Transactions on Neural Networks, 9 (1998), 987-1000. 
    [28] Y. LeCun, C. Cortes and C. Burges, Mnist handwritten digit database, ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2 (2010).
    [29] D. B. Lee, D. Min, S. Lee and S. J. Hwang, Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning, in International Conference on Learning Representations, 2020.
    [30] K. Lee, N. Trask and P. Stinis, Structure-preserving sparse identification of nonlinear dynamics for data-driven modeling, in Mathematical and Scientific Machine Learning, PMLR, 2022, 65-80.
    [31] K. Lee, N. A. Trask, R. G. Patel, M. A. Gulian and E. C. Cyr, Partition of unity networks: Deep hp-approximation, arXiv preprint, arXiv: 2101.11256, 2021.
    [32] A. Liu, W. Zhu, D. Tsai and N. I. Zheludev, Micromachined tunable metamaterials: A review, Journal of Optics, 14 (2012), p. 114009.
    [33] F. Locatello, S. Bauer, M. Lucic, G. Raetsch, S. Gelly, B. Schölkopf and O. Bachem, Challenging common assumptions in the unsupervised learning of disentangled representations, in International Conference on Machine Learning, PMLR, 2019, 4114-4124.
    [34] F. Locatello, S. Bauer, M. Lucic, G. R atsch, S. Gelly, B. Schölkopf and O. Bachem, Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations, in International Conference on Machine Learning, PMLR, 2019.
    [35] L. Lu, P. Jin and G. E. Karniadakis, Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators, arXiv preprint, arXiv: 1910.03193, 2019.
    [36] Z. Mao, L. Lu, O. Marxen, T. A. Zaki and G. E. Karniadakis, Deepm & mnet for hypersonics: Predicting the coupled flow and finite-rate chemistry behind a normal shock using neural-network approximation of operators, Journal of Computational Physics, 447 (2021), p. 110698. doi: 10.1016/j.jcp.2021.110698.
    [37] S. M. Mennen and et al, The evolution of high-throughput experimentation in pharmaceutical development and perspectives on the future, Organic Process Research & Development, 23 (2019), 1213-1242.
    [38] E. J. Miittemeijer and P. Scardi, Diffraction Analysis of the Microstructure of Materials, Springer-Verlag, Berlin, 2004.
    [39] P. Nikolaev, D. Hooper, F. Webbed, R. Rao, K. Decker, M. Krein, J. Poleski, R. Barto and B. Maruyama, Autonomy in materials research: a case study in carbon nanotube growth, Npj Computational Materials, 2 (2016).
    [40] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., Pytorch: An imperative style, high-performance deep learning library, Advances in Neural Information Processing Systems, 32 (2019).
    [41] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, The Journal of Machine Learning Research, 12 (2011), 2825-2830.
    [42] Y. PuZ. GanR. HenaoX. YuanC. LiA. Stevens and L. Carin, Variational autoencoder for deep learning of images, labels and captions, Advances in Neural Information Processing Systems, 29 (2016), 2352-2360. 
    [43] A. Quaglino, M. Gallieri, J. Masci and J. Koutník, SNODE: Spectral Discretization of Neural ODEs for System Identification, in International Conference on Learning Representations, 2020.
    [44] M. RaissiP. Perdikaris and G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics, 378 (2019), 686-707.  doi: 10.1016/j.jcp.2018.10.045.
    [45] D. RaoF. VisinA. RusuR. PascanuY. W. Teh and R. Hadsell, Continual unsupervised representation learning, Advances in Neural Information Processing Systems, 32 (2019), 7647-7657. 
    [46] D. J. Rezende, S. Mohamed and D. Wierstra, Stochastic backpropagation and approximate inference in deep generative models, in International Conference on Machine Learning, PMLR, 2014, 1278-1286.
    [47] Y. Shi, B. Paige, P. Torr, et al., Variational mixture-of-experts autoencoders for multi-modal deep generative models, Advances in Neural Information Processing Systems, 32 (2019).
    [48] R. D. SocholE. SweetC. C. GlickS.-Y. WuC. YangM. Restaino and L. Lin, 3d printed microfluidics and microelectronics, Microelectronic Engineering, 189 (2018), 52-68. 
    [49] K. SohnH. Lee and X. Yan, Learning structured output representation using deep conditional generative models, Advances in Neural Information Processing Systems, 28 (2015), 3483-3491. 
    [50] T. M. Sutter, I. Daunhawer and J. E. Vogt, Generalized multimodal ELBO, in 9th International Conference on Learning Representations, ICLR, 2021.
    [51] M. Suzuki, K. Nakayama and Y. Matsuo, Joint multimodal learning with deep generative models, in 5th International Conference on Learning Representations, ICLR 2017, 2017.
    [52] N. Trask, A. Huang and X. Hu, Enforcing exact physics in scientific machine learning: A data-driven exterior calculus on graphs, J. Comput. Phys., 456 (2022), Paper No. 110969, 19 pp. doi: 10.1016/j.jcp.2022.110969.
    [53] R. Vedantam, I. Fischer, J. Huang and K. Murphy, Generative Models of Visually Grounded Imagination, in 6th International Conference on Learning Representations, ICLR, 2018.
    [54] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, et al., Scipy 1.0: fundamental algorithms for scientific computing in python, Nature Methods, 17 (2020), 261-272.
    [55] D. VizosoG. SubhashK. Rajan and R. Dingreville, Connecting vibrational spectroscopy to atomic structure via supervised manifold learning: Beyond peak analysis, Chem. Mater., 35 (2023), 1186-1200. 
    [56] S. Wang, H. Wang and P. Perdikaris, Learning the solution operator of parametric partial differential equations with physics-informed DeepONets, J. Comput. Phys., 475 (2023), Paper No. 111855, 18 pp. doi: 10.1016/j.jcp.2022.111855.
    [57] M. L. Waskom, Seaborn: Statistical data visualization, Journal of Open Source Software, 6 (2021), p3021.
    [58] C. Weidenthaler, Pitfalls in the characterization of nanoporous and nanosized materials, Nanoscale, 3 (2011), 792-810. 
    [59] M. Wu and N. Goodman, Multimodal generative models for scalable weakly-supervised learning, Advances in Neural Information Processing Systems, 31 (2018).
    [60] J. Xie, R. Girshick and A. Farhadi, Unsupervised deep embedding for clustering analysis, in International Conference on Machine Learning, PMLR, 2016,478-487.
  • 加载中

Figures(7)

Tables(5)

SHARE

Article Metrics

HTML views(1845) PDF downloads(866) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return