Accuracy | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
L2 | 0.9934 | 0.9961 | 0.9964 | 0.9978 | 0.9825 | 0.9934 | 0.9835 |
R2 | 0.9978 | 0.9967 | 0.9987 | 0.9983 | 0.9845 | 0.9945 | 0.9832 |
This paper presents a method for solving the supervised learning problem in which the output is highly nonlinear and discontinuous. It is proposed to solve this problem in three stages: (ⅰ) cluster the pairs of input-output data points, resulting in a label for each point; (ⅱ) classify the data, where the corresponding label is the output; and finally (ⅲ) perform one separate regression for each class, where the training data corresponds to the subset of the original input-output pairs which have that label according to the classifier. It has not yet been proposed to combine these 3 fundamental building blocks of machine learning in this simple and powerful fashion. This can be viewed as a form of deep learning, where any of the intermediate layers can itself be deep. The utility and robustness of the methodology is illustrated on some toy problems, including one example problem arising from simulation of plasma fusion in a tokamak.
Citation: |
Figure 1. Numerical examples 1-4 (row a), 2 (row b), and 3 (row c). The functions are plotted in columns (a), along with the final CCR machine output $ f_r(x , f_c(x)) $, and the intermediate $ f_c(x) $. Columns (b) show a scatter plot of the true $ f(x) $ and the CCR machine $ f_r(x , f_c(x)) $, illustrating the correlation. Column (c) shows a histogram of $ f_r(x , f_c(x))-f(x) $, illustrating the dissimilarity between the CCR reconstruction and the truth
Figure 3. Numerical example 5. $ f_5(x) $ is plotted in panel (a), and $ f_r(x, f_c(x)) $ and $ f_c(x) $ are plotted in panels (b) and (c), respectively. Panel (e) shows a scatter plot of the true $ y(x) $ and the CCR machine $ f_r(x , f_c(x)) $, illustrating the correlation. Panel (d) shows a histogram of $ f_r(x , f_c(x))-y(x) $, illustrating the dissimilarity between the CCR reconstruction and the truth
Figure 4. Numerical example 6. $ f_6(x) $ is plotted in panel (a), and $ f_r(x, f_c(x)) $ and $ f_c(x) $ are plotted in panels (b) and (c), respectively. Panel (e) shows a scatter plot of the true $ y(x) $ and the CCR machine $ f_r(x , f_c(x)) $, illustrating the correlation. Panel (d) shows a histogram of $ f_r(x , f_c(x))-y(x) $, illustrating the dissimilarity between the CCR reconstruction and the truth
Figure 5. Numerical example 7. Subfigure (a) shows some two variable slices over test data of the true function $ \chi $ (a-c), the CCR machine output $ f_r(x , f_c(x)) $ (d-f), the absolute difference $ |\chi(x) - f_r(x , f_c(x))| $ (g-i), and the intermediate $ f_c(x) $ (j-l), with remaining inputs set to the mean $ \mathbb E(x_{\backslash ij}) $, where $ x_{\backslash ij} = (m_1, \dots, m_{i-1}, m_{i+1}, \dots m_{j-1}, m_{j+1}, \dots, m_{10}) $ (assuming $ i<j $). Subfigure (b) shows the input data distribution marginals
Figure 6. Numerical example 7. Subfigure (a) shows all the remaining two variable slices of the true function $ \chi $ (constructed as described in Fig. 5), and subfigure (b) shows the corresponding CCR machine output $ f_r(x , f_c(x)) $
Figure 7. Numerical example 7. The first 500 (random) training data output values are plotted in Panel (a), along with the clustering values of the training data, showing $ \chi $ and the cluster labels. Panel (b) shows prediction results on test data: the final CCR machine output $ f_r(x , f_c(x)) $, the true $ \chi(x) $, and the intermediate $ f_c(x) $. Panel (c) shows a scatter plot of the true $ y(x) $ and the CCR machine $ f_r(x , f_c(x)) $. Panel (d) shows a histogram of $ f_r(x , f_c(x))-\chi(x) $
Table 1. L2 and R2 comparison for the 7 numerical examples
Accuracy | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
L2 | 0.9934 | 0.9961 | 0.9964 | 0.9978 | 0.9825 | 0.9934 | 0.9835 |
R2 | 0.9978 | 0.9967 | 0.9987 | 0.9983 | 0.9845 | 0.9945 | 0.9832 |
Table 2.
Error attainment with set of sample points for active learning with Example 2 and strategy 1a:
Active | Passive | |
L2 Error | 0.0039 | 0.0039 |
150 | 1000 |
[1] |
D. Adalsteinsson and J. A. Sethian, A fast level set method for propagating interfaces, Journal of Computational Physics, 118 (1995), 269-277.
doi: 10.1006/jcph.1995.1098.![]() ![]() ![]() |
[2] |
R. Archibald, A. Gelb, R. Saxena and D. Xiu, Discontinuity detection in multivariate space for stochastic simulations, Journal of Computational Physics, 228 (2009), 2676-2689.
doi: 10.1016/j.jcp.2009.01.001.![]() ![]() ![]() |
[3] |
R. Archibald, A. Gelb and J. Yoon, Polynomial fitting for edge detection in irregularly sampled signals and images, SIAM Journal on Numerical Analysis, 43 (2005), 259-279.
doi: 10.1137/S0036142903435259.![]() ![]() ![]() |
[4] |
G. Bateman, A. H. Kritz, J. E. Kinsey, A. J. Redd and J. Weiland, Predicting temperature and density profiles in tokamaks, Physics of Plasmas, 5 (1998), 1793-1799.
doi: 10.1063/1.872848.![]() ![]() |
[5] |
D. Batenkov, Complete algebraic reconstruction of piecewise-smooth functions from fourier data, Mathematics of Computation, 84 (2015), 2329-2350.
doi: 10.1090/S0025-5718-2015-02948-2.![]() ![]() ![]() |
[6] |
C. M. Bishop, Pattern Recognition and Machine Learning, springer, 2006.
doi: 10.1007/978-0-387-45528-0.![]() ![]() ![]() |
[7] |
L. Breiman, Bagging predictors, Machine Learning, 24 (1996), 123-140.
doi: 10.1007/BF00058655.![]() ![]() |
[8] |
L. Breiman, Random forests, Machine Learning, 45 (2001), 5-32.
![]() |
[9] |
H.-J. Bungartz and M. Griebel, Sparse grids, Acta Numerica, 13 (2004), 147-269.
doi: 10.1017/S0962492904000182.![]() ![]() ![]() |
[10] |
S. Conti and A. O'Hagan, Bayesian emulation of complex multi-output and dynamic computer models, Journal of Statistical Planning and Inference, 140 (2010), 640-651.
doi: 10.1016/j.jspi.2009.08.006.![]() ![]() ![]() |
[11] |
M. M. Dunlop, M. A. Iglesias and A. M. Stuart, Hierarchical bayesian level set inversion, Statistics and Computing, 27 (2017), 1555-1584.
doi: 10.1007/s11222-016-9704-8.![]() ![]() ![]() |
[12] |
K. S. Eckhoff, Accurate reconstructions of functions of finite regularity from truncated fourier series expansions, Mathematics of Computation, 64 (1995), 671-690.
doi: 10.1090/S0025-5718-1995-1265014-7.![]() ![]() ![]() |
[13] |
J. Friedman, T. Hastie and R. Tibshirani, The Elements of Statistical Learning, volume 1, Springer series in statistics New York, 2001.
doi: 10.1007/978-0-387-21606-5.![]() ![]() ![]() |
[14] |
C. W. L. Gadd, S. Wade and A. Boukouvalas, Enriched mixtures of Gaussian process experts, arXiv preprint, arXiv: 1905.12969, 2019.
![]() |
[15] |
T. S. Gardner, C. R. Cantor and J. J. Collins, Construction of a genetic toggle switch in escherichia coli, Nature, 403 (2000), 339-342.
doi: 10.1038/35002131.![]() ![]() |
[16] |
A. Gelb and E. Tadmor, Spectral reconstruction of piecewise smooth functions from their discrete data, ESAIM: Mathematical Modelling and Numerical Analysis, 36 (2002), 155-175.
doi: 10.1051/m2an:2002008.![]() ![]() ![]() |
[17] |
I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, MIT press, 2016.
![]() ![]() |
[18] |
A. Gorodetsky and Y. Marzouk, Efficient localization of discontinuities in complex computational simulations, SIAM Journal on Scientific Computing, 36 (2014), A2584–A2610.
doi: 10.1137/140953137.![]() ![]() ![]() |
[19] |
J. Greenwald, Major next steps for fusion energy based on the spherical tokamak design, 2016.
![]() |
[20] |
R. A. Jacobs, M. I. Jordan, S. J. Nowlan and G. E. Hinton, et al., Adaptive mixtures of local experts, Neural Computation, 3 (1991), 79-87.
doi: 10.1162/neco.1991.3.1.79.![]() ![]() |
[21] |
J. D. Jakeman, R. Archibald and D. Xiu, Characterization of discontinuities in high-dimensional stochastic problems on adaptive sparse grids, Journal of Computational Physics, 230 (2011), 3977-3997.
doi: 10.1016/j.jcp.2011.02.022.![]() ![]() ![]() |
[22] |
G. Janeschitz, G. W. Pacher, O. Zolotukhin, G. Pereverzev, H. D. Pacher, Y. Igitkhanov, G. Strohmeyer and M. Sugihara, A 1-d predictive model for energy and particle transport in h-mode, Plasma Physics and Controlled Fusion, 44 (2002), A459.
doi: 10.1088/0741-3335/44/5A/351.![]() ![]() |
[23] |
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint, arXiv: 1412.6980, 2014.
![]() |
[24] |
T. M. Kodinariya and P. R. Makwana, Review on determining number of cluster in k-means clustering, International Journal, 1 (2013), 90-95.
![]() |
[25] |
M. Kotschenreuther, W. Dorland, M. A. Beer and G. W. Hammett, Quantitative predictions of tokamak energy confinement from first-principles simulations with kinetic effects, Physics of Plasmas, 2 (1995), 2381-2389.
doi: 10.1063/1.871261.![]() ![]() |
[26] |
O. Meneghini, S. P. Smith, P. B. Snyder, G. M. Staebler, J. Candy, E. Belli, L. Lao, M. Kostuk, T. Luce and T. Luda, et al., Self-consistent core-pedestal transport simulations with neural network accelerated models, Nuclear Fusion, 57 (2017), 086034.
doi: 10.1088/1741-4326/aa7776.![]() ![]() |
[27] |
K. Monterrubio-Gómez, L. Roininen, S. Wade, T. Damoulas and M. Girolami, Posterior inference for sparse hierarchical non-stationary models, arXiv preprint, arXiv: 1804.01431, 2018.
![]() |
[28] |
K. P. Murphy, Machine Learning: A Probabilistic Perspective, The MIT Press, 2012.
![]() |
[29] |
H. N. Najm, B. J. Debusschere, Y. M. Marzouk, S. Widmer and O. P. Le Maître, Uncertainty quantification in chemical systems, International Journal for Numerical Methods in Engineering, 80 (2009), 789-814.
doi: 10.1002/nme.2551.![]() ![]() ![]() |
[30] |
T. Nguyen and E. Bonilla, Fast allocation of Gaussian process experts, In International Conference on Machine Learning, (2014), 145–153.
![]() |
[31] |
J. M. Park, M. Murakami, H. E. St John, L. L. Lao, M. S. Chu and R. Prater, An efficient transport solver for tokamak plasmas, Computer Physics Communications, 214 (2017), 1-5.
doi: 10.1016/j.cpc.2016.12.018.![]() ![]() ![]() |
[32] |
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss and V. Dubourg, et al., Scikit-learn: Machine learning in python, Journal of Machine Learning Research, 12 (2011), 2825-2830.
![]() ![]() |
[33] |
D. Pflüger, B. Peherstorfer and H.-J. Bungartz, Spatially adaptive sparse grids for high-dimensional data-driven problems, Journal of Complexity, 26 (2010), 508-522.
doi: 10.1016/j.jco.2010.04.001.![]() ![]() ![]() |
[34] |
C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning, Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2006.
![]() ![]() |
[35] |
C. E. Rasmussen and Z. Ghahramani, Infinite mixtures of Gaussian process experts, In Advances in Neural Information Processing Systems, 2002,881–888.
![]() |
[36] |
H. Robbins, An Empirical Bayes Approach to Statistics, Office of Scientific Research, US Air Force, 1955.
![]() |
[37] |
J. Sacks, W. J. Welch, T. J. Mitchell and H. P. Wynn, Design and analysis of computer experiments, Statistical Science, 4 (1989), 409-435.
doi: 10.1214/ss/1177012413.![]() ![]() ![]() |
[38] |
B. Settles, Active Learning Literature Survey, Technical report, University of Wisconsin-Madison Department of Computer Sciences, 2009.
![]() |
[39] |
G. M. Staebler, J. E. Kinsey and R. E. Waltz, A theory-based transport model with comprehensive physics, Physics of Plasmas, 14 (2007), 055909.
doi: 10.1063/1.2436852.![]() ![]() |
[40] |
V. Tresp, Mixtures of Gaussian processes, In Advances in Neural Information Processing Systems, (2001), 654–660.
![]() |
[41] |
R. E. Waltz, G. M. Staebler, W. Dorland, G. W. Hammett, M. Kotschenreuther and J. A. Konings, A gyro-landau-fluid transport model, Physics of Plasmas, 4 (1997), 2482-2496.
doi: 10.1063/1.872228.![]() ![]() |
[42] |
M. Y. Wang, X. Wang and D. Guo, A level set method for structural topology optimization, Computer Methods in Applied Mechanics and Engineering, 192 (2003), 227-246.
doi: 10.1016/S0045-7825(02)00559-5.![]() ![]() ![]() |
[43] |
D. Xiu, Numerical Methods for Stochastic Computations: A Spectral Method Approach, Princeton university press, 2010.
![]() ![]() |
[44] |
G. Zhang, C. G. Webster, M. Gunzburger and J. Burkardt, Hyperspherical sparse approximation techniques for high-dimensional discontinuity detection, SIAM Review, 58 (2016), 517-551.
doi: 10.1137/16M1071699.![]() ![]() ![]() |
[45] |
O. C. Zienkiewicz, R. L. Taylor, P. Nithiarasu and J. Z. Zhu, The Finite Element Method, volume 3., McGraw-hill London, 1977.
![]() ![]() |
Numerical examples 1-4 (row a), 2 (row b), and 3 (row c). The functions are plotted in columns (a), along with the final CCR machine output
The results of CCR (a), DNN (b), and MLP(c) as applied to numerical example 2,
Numerical example 5.
Numerical example 6.
Numerical example 7. Subfigure (a) shows some two variable slices over test data of the true function
Numerical example 7. Subfigure (a) shows all the remaining two variable slices of the true function
Numerical example 7. The first 500 (random) training data output values are plotted in Panel (a), along with the clustering values of the training data, showing