Distribution | Priors | Computation | Trainable Parameters |
We introduce physics-informed multimodal autoencoders (PIMA) - a variational inference framework for discovering shared information in multimodal datasets. Individual modalities are embedded into a shared latent space and fused through a product-of-experts formulation, enabling a Gaussian mixture prior to identify shared features. Sampling from clusters allows cross-modal generative modeling, with a mixture-of-experts decoder that imposes inductive biases from prior scientific knowledge and thereby imparts structured disentanglement of the latent space. This approach enables cross-modal inference and the discovery of features in high-dimensional heterogeneous datasets. Consequently, this approach provides a means to discover fingerprints in multimodal scientific datasets and to avoid traditional bottlenecks related to high-fidelity measurement and characterization of scientific datasets.
Citation: |
Figure 1. Individual modalities are encoded into Gaussian distributions in a shared latent space. During training, the posterior is sampled from a product-of-experts distribution fusing complementary information into a shared multimodal Gaussian distribution. A Gaussian mixture prior parameterizes clusters encoding cross-modal shared information. Sampling from mixture components provides generative models using either black-box decoders or expert physics models incorporating prior physics knowledge. To facilitate cross-modal generative inference, a random selection of modalities is used on each epoch of training to encourage unimodal embeddings to reproduce the multimodal embedding, allowing inference of $ p(c|X_i) $. The MNIST dataset is shown here, where images are augmented with a synthetic, linear modality whose slope matches the digit label. A linear model for each cluster is used as the expert decoder for the synthetic modality, and successful unsupervised disentanglement implies multimodal fingerprint detection
Figure 2. Experimental setup for test examples. For unsupervised multimodal MNIST, we use training images (A1) and we replace the labels on digits $ c\in \left\{ {0, \dots, 9} \right\} $ by a sample of the function (A2) $ X_2 = ct + \epsilon $, for $ t \in [0,1] $ and Gaussian noise $ \epsilon $. For neural ODE MNIST, we use the training images (A1) and replace the label on digits by solutions to an ODE system (A3). For VDoS, we use the 1D VDoS data (B1) and the corresponding 0D average stress (B2)
Figure 3. MNIST clusters in the latent space and resulting confusion matrices for a) multimodal dataset with dropout, b) multimodal dataset without dropout, c) unimodal $ X_1 $ image dataset, d) unimodal $ X_2 $ 1D dataset. The white ellipses in the latent space represent two standard deviations of each cluster in the GMM. The approximate banding of the matrix in a) and b) illustrates that the sequential embedding of clusters limits misclassified digits to numbers with similar values in $ X_2 $
Figure 4. MNIST clusters, confusion matrix, and accuracy for the best PIMA run on the multimodal ODE dataset with an encoding dimension of size 3. With a NODEs expert model, maximum test accuracy achieved is 100.0%. With data-driven models, the average test accuracies are 82.3% and 62.6% for large and small 1D convolutional architectures, respectively. Statistics are computed from 10 independent runs
Figure 6. PIMA results on the VDoS dataset. VDoS latent space (left) with four clusters, where embedded data points are colored by the normalized true stress values. (Right) Nearest true VDoS data to the means of each cluster in the latent space, colored by the normalized stress. Results show a latent space organized by stress and VDoS profiles; when referencing the data generation information in Figure 7, we also see clusters 0 and 2 are differentiated by compression type, indicating discovery of hidden features
Table 1.
Distributions, priors, variables, and trainable parameters. Here the
Distribution | Priors | Computation | Trainable Parameters |
Table 2.
Unsupervised classification accuracy for MNIST. Results gathered from [2], [21] and [11] denoted by *,
Method | CNN SotA |
VAE+GMM |
DEC |
VaDE |
GMVAE |
GMVAE |
Notes | Supervised | 10 clusters | 16 clusters | |||
Acc. (max) | 99.91% | 72.94% | 84.30% | 94.46% | 88.54% | 96.92% |
Acc. (mean±stdev) | n/a | n/a | n/a | n/a | 82.31% (3.75%) | 87.82% (5.33%) |
Method | PIMA | PIMA | PIMA | PIMA | PIMA | |
Notes | multimodal dropout | multimodal no dropout | - |
- |
multi., no expert dropout | |
Acc. (max) | 99.79% | 99.59% | 14.84% | 53.37% | 58.36% | |
Acc. (mean±stdev) | 90.31% (14.81%) | 87.95% (11.70%) | - | - | - | |
39.15% | 11.26% | - | - | 50.34% | ||
99.92% | 32.39% | - | - | 56.22% |
Table 3.
Ground-truth ODE parameters
label ( |
0 | 1 | 2 | 3 | 4 |
0.5 | .833 | 1.166 | 1.5 | 1.833 | |
identified |
0.4997 | 0.8339 | 1.1661 | 1.4999 | 1.8339 |
label ( |
5 | 6 | 7 | 8 | 9 |
2.166 | 2.5 | 2.833 | 3.166 | 3.5 | |
identified |
2.1669 | 2.5010 | 2.8330 | 3.1673 | 3.4989 |
Table 4. Hyperparameters for each experiment
learning rate | encoding dim | number of clusters | number of epochs | |
Experiment 4.1 | 2 | 10 | 40,000 | |
Experiment 4.2 | 3 | 10 | 10,000 | |
Experiment 4.3 | 2 | 4 | 40,000 |
Table 5. Model decisions for each experiment
DOFs Rescaling | Single Decoder per Modality | Dropout | Expert Model | |
Experiment 4.1 | ✓ | ✓ | ✓ | ✓ |
Experiment 4.2 | ✓ | ✓ | ||
Experiment 4.3 | ✓ | ✓ |
[1] | S. Amal, L. Safarnejad, J. A. Omiye, I. Ghanzouri, J. H. Cabot and E. G. Ross, Use of multi-modal data and machine learning to improve cardiovascular disease care, Frontiers in Cardiovascular Medicine, 2 (2022). |
[2] | S. An, M. Lee, S. Park, H. Yang and J. So, An ensemble of simple convolutional neural network models for MNIST digit recognition, arXiv preprint, arXiv: 2008.10400, 2020. |
[3] | T. Baltrušaitis, C. Ahuja and L.-P. Morency, Multimodal machine learning: A survey and taxonomy, IEEE Transactions on Pattern Analysis and Machine Intelligence, 41 (2018), 423-443. |
[4] | L. Biewald, Experiment Tracking with Weights and Biases, Software available from wandb.com, 2020. |
[5] | B. L. Boyce and M. D. Uchic, Progress toward autonomous experimental systems for alloy development, MRS Bulletin, 44 (2019), 273-280. |
[6] | C. P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins and A. Lerchner, Understanding disentangling in $\beta$-VAE, arXiv preprint, arXiv: 1804.03599, 2018. |
[7] | A. Chakraborty, P. Nandi and B. Chakraborty, Fingerprints of the quantum space-time in time-dependent quantum mechanics: An emergent geometric phase, Nuclear Phys. B, 975 (2022), Paper No. 115691, 27 pp. doi: 10.1016/j.nuclphysb.2022.115691. |
[8] | R. T. Chen, X. Li, R. Grosse and D. Duvenaud, Isolating sources of disentanglement in vaes, in Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, 2615-2625. |
[9] | R. T. Chen, Y. Rubanova, J. Bettencourt and D. K. Duvenaud, Neural ordinary differential equations, Advances in Neural Information Processing Systems, 31 (2018). |
[10] | J. Cioffi and T. Kailath, Fast, recursive-least-squares transversal filters for adaptive filtering, IEEE Transactions on Acoustics, Speech, and Signal Processing, 32 (1984), 304-337. |
[11] | N. Dilokthanakul, P. A. Mediano, M. Garnelo, M. C. Lee, H. Salimbeni, K. Arulkumaran and M. Shanahan, Deep unsupervised clustering with Gaussian mixture variational autoencoders, arXiv preprint, arXiv: 1611.02648, 2016. |
[12] | F. Dos Santos Rodrigues, G. Delgado, T. Santana de Costa and L. Tasic, Applications of fluorescence spectroscopy in protein conformational changes and intermolecular contacts, BBA Advances, 3 (2023). |
[13] | M. El Hariri El Nokab and K. Sebakhy, Solid state nmr spectroscopy a valuable technique for structural insights of advanced thin film materials: A review, Nanomaterials (Basel), 11 (2021). |
[14] | D. Gao, J. Huang, X. Lin, D. Yang, Y. Wang and H. Zheng, Phase transitions and chemical reactions of octahydro-1, 3, 5, 7-tetranitro-1, 3, 5, 7-tetrazocine under high pressure and high temperature, RSC Advances, 9 (2019). |
[15] | K. Hasselmann, Multi-pattern fingerprint method for detection and attribution of climate change, Climate Dynamics, 13 (1997), 601-611. |
[16] | G. Hegerl, F. Zwiers, P. Braconnot, N. P. Gillett, Y. M. Luo, J. M. Orsini, N. Nicholls, J. E. Penner and P. A. Stott, Understanding and Attributing Climate Change, 2007. |
[17] | I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed and A. Lerchner, beta-vae: Learning basic visual concepts with a constrained variational framework, in 5th International Conference on Learning Representations, ICLR, 2017 (2017). |
[18] | J. D. Hunter, Matplotlib: A 2d graphics environment, Computing in Science & Engineering, 9 (2007), 90-95. |
[19] | O. Isayev, D. Fourches, E. N. Muratov, C. Oses, K. Rasch, A. Tropsha and S. Curtarolo, Materials cartography: Representing and mining materials space using structural and electronic fingerprints, Chemistry of Materials, 27 (2015), 735-743. |
[20] | E. Jang, S. Gu and B. Poole, Categorical reparameterization with gumbel-softmax, arXiv preprint, arXiv: 1611.01144, 2016. |
[21] | Z. Jiang, Y. Zheng, H. Tan, B. Tang and H. Zhou, Variational deep embedding: An unsupervised and generative approach to clustering, in Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017, 1965-1972. |
[22] | M. I. Jordan and R. A. Jacobs, Hierarchical mixtures of experts and the em algorithm, Proceedings of 1993 International Conference on Neural Networks, 6 (1993), 181-214. doi: 10.1109/IJCNN.1993.716791. |
[23] | H. Kim and A. Mnih, Disentangling by factorising, in International Conference on Machine Learning, PMLR, 2018, 2649-2658. |
[24] | D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint, arXiv: 1412.6980, 2014. |
[25] | D. P. Kingma and M. Welling, Auto-Encoding Variational Bayes, in 2nd International Conference on Learning Representations, ICLR 2014, 2014. |
[26] | H. W. Kuhn, The hungarian method for the assignment problem, Naval Research Logistics Quarterly, 2 (1955), 83-97. doi: 10.1002/nav.3800020109. |
[27] | I. E. Lagaris, A. Likas and D. I. Fotiadis, Artificial neural networks for solving ordinary and partial differential equations, IEEE Transactions on Neural Networks, 9 (1998), 987-1000. |
[28] | Y. LeCun, C. Cortes and C. Burges, Mnist handwritten digit database, ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2 (2010). |
[29] | D. B. Lee, D. Min, S. Lee and S. J. Hwang, Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning, in International Conference on Learning Representations, 2020. |
[30] | K. Lee, N. Trask and P. Stinis, Structure-preserving sparse identification of nonlinear dynamics for data-driven modeling, in Mathematical and Scientific Machine Learning, PMLR, 2022, 65-80. |
[31] | K. Lee, N. A. Trask, R. G. Patel, M. A. Gulian and E. C. Cyr, Partition of unity networks: Deep hp-approximation, arXiv preprint, arXiv: 2101.11256, 2021. |
[32] | A. Liu, W. Zhu, D. Tsai and N. I. Zheludev, Micromachined tunable metamaterials: A review, Journal of Optics, 14 (2012), p. 114009. |
[33] | F. Locatello, S. Bauer, M. Lucic, G. Raetsch, S. Gelly, B. Schölkopf and O. Bachem, Challenging common assumptions in the unsupervised learning of disentangled representations, in International Conference on Machine Learning, PMLR, 2019, 4114-4124. |
[34] | F. Locatello, S. Bauer, M. Lucic, G. R atsch, S. Gelly, B. Schölkopf and O. Bachem, Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations, in International Conference on Machine Learning, PMLR, 2019. |
[35] | L. Lu, P. Jin and G. E. Karniadakis, Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators, arXiv preprint, arXiv: 1910.03193, 2019. |
[36] | Z. Mao, L. Lu, O. Marxen, T. A. Zaki and G. E. Karniadakis, Deepm & mnet for hypersonics: Predicting the coupled flow and finite-rate chemistry behind a normal shock using neural-network approximation of operators, Journal of Computational Physics, 447 (2021), p. 110698. doi: 10.1016/j.jcp.2021.110698. |
[37] | S. M. Mennen and et al, The evolution of high-throughput experimentation in pharmaceutical development and perspectives on the future, Organic Process Research & Development, 23 (2019), 1213-1242. |
[38] | E. J. Miittemeijer and P. Scardi, Diffraction Analysis of the Microstructure of Materials, Springer-Verlag, Berlin, 2004. |
[39] | P. Nikolaev, D. Hooper, F. Webbed, R. Rao, K. Decker, M. Krein, J. Poleski, R. Barto and B. Maruyama, Autonomy in materials research: a case study in carbon nanotube growth, Npj Computational Materials, 2 (2016). |
[40] | A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., Pytorch: An imperative style, high-performance deep learning library, Advances in Neural Information Processing Systems, 32 (2019). |
[41] | F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, The Journal of Machine Learning Research, 12 (2011), 2825-2830. |
[42] | Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens and L. Carin, Variational autoencoder for deep learning of images, labels and captions, Advances in Neural Information Processing Systems, 29 (2016), 2352-2360. |
[43] | A. Quaglino, M. Gallieri, J. Masci and J. Koutník, SNODE: Spectral Discretization of Neural ODEs for System Identification, in International Conference on Learning Representations, 2020. |
[44] | M. Raissi, P. Perdikaris and G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics, 378 (2019), 686-707. doi: 10.1016/j.jcp.2018.10.045. |
[45] | D. Rao, F. Visin, A. Rusu, R. Pascanu, Y. W. Teh and R. Hadsell, Continual unsupervised representation learning, Advances in Neural Information Processing Systems, 32 (2019), 7647-7657. |
[46] | D. J. Rezende, S. Mohamed and D. Wierstra, Stochastic backpropagation and approximate inference in deep generative models, in International Conference on Machine Learning, PMLR, 2014, 1278-1286. |
[47] | Y. Shi, B. Paige, P. Torr, et al., Variational mixture-of-experts autoencoders for multi-modal deep generative models, Advances in Neural Information Processing Systems, 32 (2019). |
[48] | R. D. Sochol, E. Sweet, C. C. Glick, S.-Y. Wu, C. Yang, M. Restaino and L. Lin, 3d printed microfluidics and microelectronics, Microelectronic Engineering, 189 (2018), 52-68. |
[49] | K. Sohn, H. Lee and X. Yan, Learning structured output representation using deep conditional generative models, Advances in Neural Information Processing Systems, 28 (2015), 3483-3491. |
[50] | T. M. Sutter, I. Daunhawer and J. E. Vogt, Generalized multimodal ELBO, in 9th International Conference on Learning Representations, ICLR, 2021. |
[51] | M. Suzuki, K. Nakayama and Y. Matsuo, Joint multimodal learning with deep generative models, in 5th International Conference on Learning Representations, ICLR 2017, 2017. |
[52] | N. Trask, A. Huang and X. Hu, Enforcing exact physics in scientific machine learning: A data-driven exterior calculus on graphs, J. Comput. Phys., 456 (2022), Paper No. 110969, 19 pp. doi: 10.1016/j.jcp.2022.110969. |
[53] | R. Vedantam, I. Fischer, J. Huang and K. Murphy, Generative Models of Visually Grounded Imagination, in 6th International Conference on Learning Representations, ICLR, 2018. |
[54] | P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, et al., Scipy 1.0: fundamental algorithms for scientific computing in python, Nature Methods, 17 (2020), 261-272. |
[55] | D. Vizoso, G. Subhash, K. Rajan and R. Dingreville, Connecting vibrational spectroscopy to atomic structure via supervised manifold learning: Beyond peak analysis, Chem. Mater., 35 (2023), 1186-1200. |
[56] | S. Wang, H. Wang and P. Perdikaris, Learning the solution operator of parametric partial differential equations with physics-informed DeepONets, J. Comput. Phys., 475 (2023), Paper No. 111855, 18 pp. doi: 10.1016/j.jcp.2022.111855. |
[57] | M. L. Waskom, Seaborn: Statistical data visualization, Journal of Open Source Software, 6 (2021), p3021. |
[58] | C. Weidenthaler, Pitfalls in the characterization of nanoporous and nanosized materials, Nanoscale, 3 (2011), 792-810. |
[59] | M. Wu and N. Goodman, Multimodal generative models for scalable weakly-supervised learning, Advances in Neural Information Processing Systems, 31 (2018). |
[60] | J. Xie, R. Girshick and A. Farhadi, Unsupervised deep embedding for clustering analysis, in International Conference on Machine Learning, PMLR, 2016,478-487. |
Individual modalities are encoded into Gaussian distributions in a shared latent space. During training, the posterior is sampled from a product-of-experts distribution fusing complementary information into a shared multimodal Gaussian distribution. A Gaussian mixture prior parameterizes clusters encoding cross-modal shared information. Sampling from mixture components provides generative models using either black-box decoders or expert physics models incorporating prior physics knowledge. To facilitate cross-modal generative inference, a random selection of modalities is used on each epoch of training to encourage unimodal embeddings to reproduce the multimodal embedding, allowing inference of
Experimental setup for test examples. For unsupervised multimodal MNIST, we use training images (A1) and we replace the labels on digits
MNIST clusters in the latent space and resulting confusion matrices for a) multimodal dataset with dropout, b) multimodal dataset without dropout, c) unimodal
MNIST clusters, confusion matrix, and accuracy for the best PIMA run on the multimodal ODE dataset with an encoding dimension of size 3. With a NODEs expert model, maximum test accuracy achieved is 100.0%. With data-driven models, the average test accuracies are 82.3% and 62.6% for large and small 1D convolutional architectures, respectively. Statistics are computed from 10 independent runs
Ground-truth trajectories (black solid lines) and reconstructed trajectories (dashed lines) by solving IVPs with the learned ODE parameters
PIMA results on the VDoS dataset. VDoS latent space (left) with four clusters, where embedded data points are colored by the normalized true stress values. (Right) Nearest true VDoS data to the means of each cluster in the latent space, colored by the normalized stress. Results show a latent space organized by stress and VDoS profiles; when referencing the data generation information in Figure 7, we also see clusters 0 and 2 are differentiated by compression type, indicating discovery of hidden features
(Left) Number of specimens per cluster from the training dataset that underwent uniaxial compression, hydrostatic compression, or no compression. (Right) Predicted vs. true normalized average stress values over entire VDoS dataset