We propose the novel augmented Gaussian random field (AGRF), which is a universal framework incorporating the data of observable and derivatives of any order. Rigorous theory is established. We prove that under certain conditions, the observable and its derivatives of any order are governed by a single Gaussian random field, which is the aforementioned AGRF. As a corollary, the statement "the derivative of a Gaussian process remains a Gaussian process" is validated, since the derivative is represented by a part of the AGRF. Moreover, a computational method corresponding to the universal AGRF framework is constructed. Both noiseless and noisy scenarios are considered. Formulas of the posterior distributions are deduced in a nice closed form. A significant advantage of our computational method is that the universal AGRF framework provides a natural way to incorporate arbitrary order derivatives and deal with missing data. We use four numerical examples to demonstrate the effectiveness of the computational method. The numerical examples are composite function, damped harmonic oscillator, Korteweg-De Vries equation, and Burgers' equation.
Citation: |
Figure 1. Graphical illustration of augmented Gaussian random field prediction with measurement noise. There are three layers: input layer, hidden layer, and output layer. The hidden layer is dominated by augmented Gaussian random field. The observable and its derivatives of different orders are integrated into the same field to make predictions
Figure 2. [Composite function (noiseless)] Prediction of the observable, first order derivative, and second order derivative by AGRF. Case 1: the data include the observable only. Case 2: the data include the observable and first order derivative. Case 3: the data include the observable and second order derivative. Case 4: the data include the observable, first order derivative, and second order derivative. AGRF is able to integrate the observable and derivatives of any order, regardless of the location where they are collected. The AGRF prediction improves when more information is available
Figure 3. [Composite function (noiseless)] Comparison of the prediction accuracy of AGRF in different cases. See Figure 2 for more explanations
Figure 4. [Damped harmonic oscillator (noiseless)] Prediction of the displacement, velocity, and phase-space diagram by different methods. GP: the data include the observable and first order derivative; the observable data are used to predict the displacement and the first order derivative data are used to predict the velocity, respectively. GEK: the data include the observable and first order derivative; all the data are used jointly in the same random field to predict the displacement and velocity at the same time. AGRF: the data include the observable, first order derivative, and second order derivative; all the data are used together in the same random field to predict the displacement and velocity at the same time. GEK produces better prediction than GP, while AGRF predicts more accurately than GEK. By using all the available information together in the same random field, we can construct the most accurate surrogate model
Figure 5. [Damped harmonic oscillator (noiseless)] Comparison of the prediction accuracy by different methods. See Figure 4 for more explanations
Figure 6. [Korteweg-De Vries equation (noisy)] Top: the solution at $ t = 0.5 $ is studied. Bottom: prediction of the observable, first order derivative, and second order derivative by AGRF under different levels of noise. AGRF has good performance even when the noise is as high as 40%. As one might expect, the AGRF prediction is better when the noise is lower
Figure 7. [Korteweg-De Vries equation (noisy)] Comparison of the prediction accuracy under different levels of noise. See Figure 6 for more explanations
Figure 8. [Burgers' equation (noisy)] Top: the solution at $ t = 0.5 $ is studied. Bottom: prediction of the observable, first order derivative, and second order derivative by different AGRF calibrations. No $ \delta $: noiseless formulation is used despite the presence of noise in the data, i.e., $ \delta_0 = \delta_1 = \delta_2 = 0 $ in Eqn. (87). One $ \delta $: the same noise intensity is used for different order derivatives, i.e., $ \delta_0 = \delta_1 = \delta_2 $ in Eqn. (87). Multiple $ \delta $: different noise intensities are used for different order derivatives, i.e., the same as Eqn. (87). When the noiseless formulation is used despite the presence of noise in the data, overfitting is an issue. When the same noise intensity is used for different order derivatives, the uncertainty in the prediction is incompatible with the data since different order derivatives have different scales. When the formulation is exactly the same as Eqn. (87), AGRF has the best performance
Figure 9. [Burgers' equation (noisy)] Comparison of the prediction accuracy by different AGRF calibrations. See Figure 8 for more explanations. The relative $ L_2 $ errors in the case "no $ \delta $" are greater than $ 1.6 $ and out of bound
[1] | P. Abrahamsen and N. regnesentral, A Review of Gaussian Random Fields and Correlation Functions, Norsk Regnesentral/Norwegian Computing Center, 1997. |
[2] | R. J. Adler, The Geometry of Random Fields, Society for Industrial and Applied Mathematics, 2010. doi: 10.1137/1.9780898718980.ch1. |
[3] | O. A. Chkrebtii, D. A. Campbell, B. Calderhead and M. A. Girolami, Bayesian solution uncertainty quantification for differential equations, Bayesian Analysis, 11 (2016), 1239-1267. doi: 10.1214/16-BA1017. |
[4] | H.-S. Chung and J. Alonso, Using gradients to construct cokriging approximation models for high-dimensional design optimization problems, in 40th AIAA Aerospace Sciences Meeting & Exhibit, American Institute of Aeronautics and Astronautics, 2002. doi: 10.2514/6.2002-317. |
[5] | J. Cockayne, C. J. Oates, T. J. Sullivan and M. Girolami, Bayesian probabilistic numerical methods, SIAM Review, 61 (2019), 756–789. doi: 10.1137/17M1139357. |
[6] | H. Cramér and M. R. Leadbetter, Stationary and Related Stochastic Processes: Sample Function Properties and their Applications, Dover Books on Mathematics, Dover Publications, Inc., Mineola, NY, 2004. |
[7] | S. Da Veiga and A. Marrel, Gaussian process modeling with inequality constraints, Annales de la Faculté Des Sciences de Toulouse Mathématiques, 21 (2012), 529-555. doi: 10.5802/afst.1344. |
[8] | Y. Deng, G. Lin and X. Yang, Multifidelity data fusion via gradient-enhanced Gaussian process regression, Communications in Computational Physics, 28 (2020), 1812-1837. doi: 10.4208/cicp.OA-2020-0151. |
[9] | R. M. Dudley, Real Analysis and Probability, No. 74 in Cambridge Studies in Advanced Mathematics, Cambridge University Press, 2002. doi: 10.1017/CBO9780511755347. |
[10] | A. I. Forrester, A. Sóbester and A. J. Keane, Multi-fidelity optimization via surrogate modelling, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 463 (2007), 3251-3269. doi: 10.1098/rspa.2007.1900. |
[11] | P. Hennig, M. A. Osborne and M. Girolami, Probabilistic numerics and uncertainty in computations, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 471 (2015), 20150142. doi: 10.1098/rspa.2015.0142. |
[12] | M. C. Kennedy and A. O'Hagan, Predicting the output from a complex computer code when fast approximations are available, Biometrika, 87 (2000), 1-13. doi: 10.1093/biomet/87.1.1. |
[13] | P. K. Kitanidis, Introduction to Geostatistics: Applications to Hydrogeology, Cambridge University Press, 1997. doi: 10.1017/CBO9780511626166. |
[14] | L. Koralov and Y. G. Sinai, Theory of Probability and Random Processes, Universitext, Springer Berlin Heidelberg, 2007. doi: 10.1007/978-3-540-68829-7. |
[15] | L. Laurent, R. Le Riche, B. Soulier and P.-A. Boucard, An overview of gradient-enhanced metamodels with applications, Archives of Computational Methods in Engineering, 26 (2019), 61-106. doi: 10.1007/s11831-017-9226-3. |
[16] | L. Le Gratiet and J. Garnier, Recursive co-kriging model for design of computer experiments with multiple levels of fidelity, International Journal for Uncertainty Quantification, 4 (2014), 365-386. doi: 10.1615/Int.J.UncertaintyQuantification.2014006914. |
[17] | L. Lin and D. B. Dunson, Bayesian monotone regression using Gaussian process projection, Biometrika, 101 (2014), 303-317. doi: 10.1093/biomet/ast063. |
[18] | H. Liu, Y.-S. Ong, X. Shen and J. Cai, When Gaussian process meets big data: A review of scalable GPs, IEEE Transactions on Neural Networks and Learning Systems, 31 (2020), 4405-4423. doi: 10.1109/TNNLS.2019.2957109. |
[19] | A. F. López-Lopera, F. Bachoc, N. Durrande and O. Roustant, Finite-dimensional Gaussian approximation with linear inequality constraints, SIAM/ASA Journal on Uncertainty Quantification, 6 (2018), 1224–1255. doi: 10.1137/17M1153157. |
[20] | M. D. Morris, T. J. Mitchell and D. Ylvisaker, Bayesian design and analysis of computer experiments: Use of derivatives in surface prediction, Technometrics, 35 (1993), 243-255. doi: 10.1080/00401706.1993.10485320. |
[21] | A. Pensoneault, X. Yang and X. Zhu, Nonnegativity-enforced Gaussian process regression, Theoretical and Applied Mechanics Letters, 10 (2020), 182-187. doi: 10.1016/j.taml.2020.01.036. |
[22] | P. Perdikaris, M. Raissi, A. Damianou, N. D. Lawrence and G. E. Karniadakis, Nonlinear information fusion algorithms for data-efficient multi-fidelity modelling, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473 (2017), 20160751. doi: 10.1098/rspa.2016.0751. |
[23] | M. Raissi, P. Perdikaris and G. E. Karniadakis, Machine learning of linear differential equations using Gaussian processes, Journal of Computational Physics, 348 (2017), 683-693. doi: 10.1016/j.jcp.2017.07.050. |
[24] | M. Raissi, P. Perdikaris and G. E. Karniadakis, Numerical Gaussian processes for time-dependent and nonlinear partial differential equations, SIAM Journal on Scientific Computing, 40 (2018), A172–A198. doi: 10.1137/17M1120762. |
[25] | C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, Adaptive computation and machine learning, MIT Press, 2006. |
[26] | J. Sacks, W. J. Welch, T. J. Mitchell and H. P. Wynn, Design and analysis of computer experiments, Statistical Science, 4 (1989), 409-423. |
[27] | M. Schober, D. Duvenaud and P. Hennig, Probabilistic ODE solvers with runge-kutta means, arXiv: 1406.2582, [cs, math, stat], 2014. |
[28] | E. Solak, R. Murray-smith, W. E. Leithead, D. J. Leith and C. E. Rasmussen, Derivative observations in Gaussian process models of dynamic systems, in Advances in Neural Information Processing Systems 15 (S. Becker, S. Thrun, and K. Obermayer, eds.), 1057–1064, MIT Press, 2003. |
[29] | S. Ulaganathan, I. Couckuyt, F. Ferranti, E. Laermans and T. Dhaene, Performance study of multi-fidelity gradient enhanced kriging, Structural and Multidisciplinary Optimization, 51 (2015), 1017-1033. doi: 10.1007/s00158-014-1192-x. |
[30] | X. Yang, D. Barajas-Solano, G. Tartakovsky and A. M. Tartakovsky, Physics-informed CoKriging: A Gaussian-process-regression-based multifidelity method for data-model convergence, Journal of Computational Physics, 395 (2019), 410-431. doi: 10.1016/j.jcp.2019.06.041. |
[31] | X. Yang, G. Tartakovsky and A. Tartakovsky, Physics-informed kriging: A physics-informed Gaussian process regression method for data-model convergence, arXiv: 1809.03461, [cs, stat], 2018. |
[32] | X. Yang, X. Zhu and J. Li, When bifidelity meets CoKriging: An efficient physics-informed MultiFidelity method, SIAM Journal on Scientific Computing, 42 (2020), A220–A249. doi: 10.1137/18M1231353. |
Graphical illustration of augmented Gaussian random field prediction with measurement noise. There are three layers: input layer, hidden layer, and output layer. The hidden layer is dominated by augmented Gaussian random field. The observable and its derivatives of different orders are integrated into the same field to make predictions
[Composite function (noiseless)] Prediction of the observable, first order derivative, and second order derivative by AGRF. Case 1: the data include the observable only. Case 2: the data include the observable and first order derivative. Case 3: the data include the observable and second order derivative. Case 4: the data include the observable, first order derivative, and second order derivative. AGRF is able to integrate the observable and derivatives of any order, regardless of the location where they are collected. The AGRF prediction improves when more information is available
[Composite function (noiseless)] Comparison of the prediction accuracy of AGRF in different cases. See Figure 2 for more explanations
[Damped harmonic oscillator (noiseless)] Prediction of the displacement, velocity, and phase-space diagram by different methods. GP: the data include the observable and first order derivative; the observable data are used to predict the displacement and the first order derivative data are used to predict the velocity, respectively. GEK: the data include the observable and first order derivative; all the data are used jointly in the same random field to predict the displacement and velocity at the same time. AGRF: the data include the observable, first order derivative, and second order derivative; all the data are used together in the same random field to predict the displacement and velocity at the same time. GEK produces better prediction than GP, while AGRF predicts more accurately than GEK. By using all the available information together in the same random field, we can construct the most accurate surrogate model
[Damped harmonic oscillator (noiseless)] Comparison of the prediction accuracy by different methods. See Figure 4 for more explanations
[Korteweg-De Vries equation (noisy)] Top: the solution at
[Korteweg-De Vries equation (noisy)] Comparison of the prediction accuracy under different levels of noise. See Figure 6 for more explanations
[Burgers' equation (noisy)] Top: the solution at
[Burgers' equation (noisy)] Comparison of the prediction accuracy by different AGRF calibrations. See Figure 8 for more explanations. The relative