# American Institute of Mathematical Sciences

December  2019, 1(4): 491-506. doi: 10.3934/fods.2019020

## Cluster, classify, regress: A general method for learning discontinuous functions

 1 Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA 2 Fusion Energy Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA 3 Department of Mathematics, University of Manchester, Manchester, M13 4PL, UK

* Corresponding author: Clement Etienam

This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan http://energy.gov/downloads/doe-public-access-plan

Published  December 2019

This paper presents a method for solving the supervised learning problem in which the output is highly nonlinear and discontinuous. It is proposed to solve this problem in three stages: (ⅰ) cluster the pairs of input-output data points, resulting in a label for each point; (ⅱ) classify the data, where the corresponding label is the output; and finally (ⅲ) perform one separate regression for each class, where the training data corresponds to the subset of the original input-output pairs which have that label according to the classifier. It has not yet been proposed to combine these 3 fundamental building blocks of machine learning in this simple and powerful fashion. This can be viewed as a form of deep learning, where any of the intermediate layers can itself be deep. The utility and robustness of the methodology is illustrated on some toy problems, including one example problem arising from simulation of plasma fusion in a tokamak.

Citation: David E. Bernholdt, Mark R. Cianciosa, David L. Green, Jin M. Park, Kody J. H. Law, Clement Etienam. Cluster, classify, regress: A general method for learning discontinuous functions. Foundations of Data Science, 2019, 1 (4) : 491-506. doi: 10.3934/fods.2019020
##### References:

show all references

##### References:
Numerical examples 1-4 (row a), 2 (row b), and 3 (row c). The functions are plotted in columns (a), along with the final CCR machine output $f_r(x , f_c(x))$, and the intermediate $f_c(x)$. Columns (b) show a scatter plot of the true $f(x)$ and the CCR machine $f_r(x , f_c(x))$, illustrating the correlation. Column (c) shows a histogram of $f_r(x , f_c(x))-f(x)$, illustrating the dissimilarity between the CCR reconstruction and the truth
The results of CCR (a), DNN (b), and MLP(c) as applied to numerical example 2, $f_2$
Numerical example 5. $f_5(x)$ is plotted in panel (a), and $f_r(x, f_c(x))$ and $f_c(x)$ are plotted in panels (b) and (c), respectively. Panel (e) shows a scatter plot of the true $y(x)$ and the CCR machine $f_r(x , f_c(x))$, illustrating the correlation. Panel (d) shows a histogram of $f_r(x , f_c(x))-y(x)$, illustrating the dissimilarity between the CCR reconstruction and the truth
Numerical example 6. $f_6(x)$ is plotted in panel (a), and $f_r(x, f_c(x))$ and $f_c(x)$ are plotted in panels (b) and (c), respectively. Panel (e) shows a scatter plot of the true $y(x)$ and the CCR machine $f_r(x , f_c(x))$, illustrating the correlation. Panel (d) shows a histogram of $f_r(x , f_c(x))-y(x)$, illustrating the dissimilarity between the CCR reconstruction and the truth
Numerical example 7. Subfigure (a) shows some two variable slices over test data of the true function $\chi$ (a-c), the CCR machine output $f_r(x , f_c(x))$ (d-f), the absolute difference $|\chi(x) - f_r(x , f_c(x))|$ (g-i), and the intermediate $f_c(x)$ (j-l), with remaining inputs set to the mean $\mathbb E(x_{\backslash ij})$, where $x_{\backslash ij} = (m_1, \dots, m_{i-1}, m_{i+1}, \dots m_{j-1}, m_{j+1}, \dots, m_{10})$ (assuming $i<j$). Subfigure (b) shows the input data distribution marginals
Numerical example 7. Subfigure (a) shows all the remaining two variable slices of the true function $\chi$ (constructed as described in Fig. 5), and subfigure (b) shows the corresponding CCR machine output $f_r(x , f_c(x))$
Numerical example 7. The first 500 (random) training data output values are plotted in Panel (a), along with the clustering values of the training data, showing $\chi$ and the cluster labels. Panel (b) shows prediction results on test data: the final CCR machine output $f_r(x , f_c(x))$, the true $\chi(x)$, and the intermediate $f_c(x)$. Panel (c) shows a scatter plot of the true $y(x)$ and the CCR machine $f_r(x , f_c(x))$. Panel (d) shows a histogram of $f_r(x , f_c(x))-\chi(x)$
L2 and R2 comparison for the 7 numerical examples
 Accuracy 1 2 3 4 5 6 7 L2 0.9934 0.9961 0.9964 0.9978 0.9825 0.9934 0.9835 R2 0.9978 0.9967 0.9987 0.9983 0.9845 0.9945 0.9832
 Accuracy 1 2 3 4 5 6 7 L2 0.9934 0.9961 0.9964 0.9978 0.9825 0.9934 0.9835 R2 0.9978 0.9967 0.9987 0.9983 0.9845 0.9945 0.9832
Error attainment with set of sample points for active learning with Example 2 and strategy 1a: $N_{\rm res} = 1000$ and all points are used for passive learning, while only $n = 150$ points are used for active. We see with active learning we recover the same accuracy as to when all the points are used
 Active Passive L2 Error 0.0039 0.0039 $N$ 150 1000
 Active Passive L2 Error 0.0039 0.0039 $N$ 150 1000
 [1] Kengo Nakai, Yoshitaka Saiki. Machine-learning construction of a model for a macroscopic fluid variable using the delay-coordinate of a scalar observable. Discrete & Continuous Dynamical Systems - S, 2021, 14 (3) : 1079-1092. doi: 10.3934/dcdss.2020352 [2] Nicholas Geneva, Nicholas Zabaras. Multi-fidelity generative deep learning turbulent flows. Foundations of Data Science, 2020, 2 (4) : 391-428. doi: 10.3934/fods.2020019 [3] Min Ji, Xinna Ye, Fangyao Qian, T.C.E. Cheng, Yiwei Jiang. Parallel-machine scheduling in shared manufacturing. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2020174 [4] Ying Lin, Qi Ye. Support vector machine classifiers by non-Euclidean margins. Mathematical Foundations of Computing, 2020, 3 (4) : 279-300. doi: 10.3934/mfc.2020018 [5] Zexuan Liu, Zhiyuan Sun, Jerry Zhijian Yang. A numerical study of superconvergence of the discontinuous Galerkin method by patch reconstruction. Electronic Research Archive, 2020, 28 (4) : 1487-1501. doi: 10.3934/era.2020078 [6] Yue Feng, Yujie Liu, Ruishu Wang, Shangyou Zhang. A conforming discontinuous Galerkin finite element method on rectangular partitions. Electronic Research Archive, , () : -. doi: 10.3934/era.2020120 [7] Bimal Mandal, Aditi Kar Gangopadhyay. A note on generalization of bent boolean functions. Advances in Mathematics of Communications, 2021, 15 (2) : 329-346. doi: 10.3934/amc.2020069 [8] Andreas Koutsogiannis. Multiple ergodic averages for tempered functions. Discrete & Continuous Dynamical Systems - A, 2021, 41 (3) : 1177-1205. doi: 10.3934/dcds.2020314 [9] Leilei Wei, Yinnian He. A fully discrete local discontinuous Galerkin method with the generalized numerical flux to solve the tempered fractional reaction-diffusion equation. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020319 [10] Dorothee Knees, Chiara Zanini. Existence of parameterized BV-solutions for rate-independent systems with discontinuous loads. Discrete & Continuous Dynamical Systems - S, 2021, 14 (1) : 121-149. doi: 10.3934/dcdss.2020332 [11] Huu-Quang Nguyen, Ya-Chi Chu, Ruey-Lin Sheu. On the convexity for the range set of two quadratic functions. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2020169 [12] Xinpeng Wang, Bingo Wing-Kuen Ling, Wei-Chao Kuang, Zhijing Yang. Orthogonal intrinsic mode functions via optimization approach. Journal of Industrial & Management Optimization, 2021, 17 (1) : 51-66. doi: 10.3934/jimo.2019098 [13] Lars Grüne. Computing Lyapunov functions using deep neural networks. Journal of Computational Dynamics, 2020  doi: 10.3934/jcd.2021006 [14] Peter Giesl, Sigurdur Hafstein. System specific triangulations for the construction of CPA Lyapunov functions. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020378 [15] Yu Zhou, Xinfeng Dong, Yongzhuang Wei, Fengrong Zhang. A note on the Signal-to-noise ratio of $(n, m)$-functions. Advances in Mathematics of Communications, 2020  doi: 10.3934/amc.2020117 [16] Djamel Aaid, Amel Noui, Özen Özer. Piecewise quadratic bounding functions for finding real roots of polynomials. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 63-73. doi: 10.3934/naco.2020015 [17] Tahir Aliyev Azeroğlu, Bülent Nafi Örnek, Timur Düzenli. Some results on the behaviour of transfer functions at the right half plane. Evolution Equations & Control Theory, 2020  doi: 10.3934/eect.2020106 [18] Meenakshi Rana, Shruti Sharma. Combinatorics of some fifth and sixth order mock theta functions. Electronic Research Archive, 2021, 29 (1) : 1803-1818. doi: 10.3934/era.2020092 [19] Peter Giesl, Zachary Langhorne, Carlos Argáez, Sigurdur Hafstein. Computing complete Lyapunov functions for discrete-time dynamical systems. Discrete & Continuous Dynamical Systems - B, 2021, 26 (1) : 299-336. doi: 10.3934/dcdsb.2020331 [20] Kalikinkar Mandal, Guang Gong. On ideal $t$-tuple distribution of orthogonal functions in filtering de bruijn generators. Advances in Mathematics of Communications, 2020  doi: 10.3934/amc.2020125

Impact Factor:

## Tools

Article outline

Figures and Tables