August  2020, 19(8): 4085-4095. doi: 10.3934/cpaa.2020181

Function approximation by deep networks

1. 

Institute of Mathematical Sciences, Claremont Graduate University, Claremont, CA 91711

2. 

Center for Brains, Minds, and Machines, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, 02139

*Corresponding author

Received  August 2019 Revised  November 2019 Published  May 2020

Fund Project: The research of the first author is supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via 2018-18032000002. The research of the second author is supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216

We show that deep networks are better than shallow networks at approximating functions that can be expressed as a composition of functions described by a directed acyclic graph, because the deep networks can be designed to have the same compositional structure, while a shallow network cannot exploit this knowledge. Thus, the blessing of compositionality mitigates the curse of dimensionality. On the other hand, a theorem called good propagation of errors allows to "lift" theorems about shallow networks to those about deep networks with an appropriate choice of norms, smoothness, etc. We illustrate this in three contexts where each channel in the deep network calculates a spherical polynomial, a non-smooth ReLU network, or another zonal function network related closely with the ReLU network.

Citation: H. N. Mhaskar, T. Poggio. Function approximation by deep networks. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4085-4095. doi: 10.3934/cpaa.2020181
References:
[1]

F. Bach, Breaking the curse of dimensionality with convex neural networks, J. Mach. Learn. Res., 18 (2017), 629-681.   Google Scholar

[2]

Y. Cho and L. K. Saul, Kernel methods for deep learning, in Advances in Neural Information Processing Systems, (2009), 342–350. Google Scholar

[3]

C. K. ChuiX. Li and H. N. Mhaskar, Limitations of the approximation capabilities of neural networks with one hidden layer, Adv. Comput. Math., 5 (1996), 233-243.  doi: 10.1007/BF02124745.  Google Scholar

[4]

C. K. Chui, S. B. Lin and D. X. Zhou, Construction of neural networks for realization of localized deep learning, Front. Appl. Math. Statist., 4 (2018). doi: 10.1109/tnnls.2017.2665555.  Google Scholar

[5]

R. Eldan and O. Shamir, The power of depth for feedforward neural networks, in Conference on Learning Theory, (2016), 907–940. Google Scholar

[6]

B. Hanin, Universal function approximation by deep neural nets with bounded width and relu activations, Mathematics, 7 (2019), Art. 992. doi: 10.3390/math7100992.  Google Scholar

[7]

Q. T. Le Gia and H. N. Mhaskar, Localized linear polynomial operators and quadrature formulas on the sphere, SIAM J. Numer. Anal., 47 (2009), 440-466.  doi: 10.1137/060678555.  Google Scholar

[8]

P. Lizorkin and K. P. Rustamov, Nikol'skii-Besov spaces on the sphere in connection with approximation theory, Proc. Steklov Inst. Math. AMS Trans., 204 (1994), 149-172.   Google Scholar

[9]

H. N. Mhaskar, Approximation properties of a multilayered feedforward artificial neural network, Adv. Comput. Math., 1 (1993), 61-80.  doi: 10.1007/BF02070821.  Google Scholar

[10]

H. N. Mhaskar, Eignets for function approximation on manifolds, Appl. Comput. Harmon. Anal., 29 (2010), 63-87.  doi: 10.1016/j.acha.2009.08.006.  Google Scholar

[11]

H. N. Mhaskar, Dimension independent bounds for general shallow networks, Neural Netw., 123 (2020), 142-152.  doi: 10.1016/j.neunet.2019.11.006.  Google Scholar

[12]

H. N. Mhaskar, Function approximation with zonal function networks with activation functions analogous to the rectified linear unit functions, J. Complexity, 51 (2019), 1-19.  doi: 10.1016/j.jco.2018.09.002.  Google Scholar

[13]

H. N. Mhaskar and T. Poggio, Deep vs. shallow networks: An approximation theory perspective, Anal. Appl., 14 (2016), 829-848.  doi: 10.1142/S0219530516400042.  Google Scholar

[14]

R. MontufarG. F.Pa scanuK. Cho and Y. Bengio, On the number of linear regions of deep neural networks, Adv. Neural Inform. Process. Syst., 27 (2014), 2924-2932.   Google Scholar

[15]

S. Pawelke, Über die Approximationsordnung bei Kugelfunktionen und algebraischen Polynomen, Tohoku Math. J. Sec. Ser., 24 (1972), 473-486.  doi: 10.2748/tmj/1178241489.  Google Scholar

[16]

I. Safran and O. Shamir, Depth separation in relu networks for approximating smooth non-linear functions, preprint, arXiv: 1610.09887. Google Scholar

[17]

I. Safran and O. Shamir, Depth-width tradeoffs in approximating natural functions with neural networks, in Proceedings of the 34th International Conference on Machine Learning, Vol. 70, (2017), 2979–2987. Google Scholar

[18]

T. Serra, C. Tjandraatmadja and S. Ramalingam, Bounding and counting linear regions of deep neural networks, preprint, arXiv: 1711.02114. Google Scholar

[19]

O. Sharir and A. Shashua, On the expressive power of overlapping architectures of deep learning, preprint, arXiv: 1703.02065. Google Scholar

[20]

M. Telgarsky, Benefits of depth in neural networks, preprint, arXiv: 1602.04485. Google Scholar

[21]

D. Yarotsky, Error bounds for approximations with deep relu networks, Neural Netw., 94 (2017), 103-114.   Google Scholar

[22]

D. Yarotsky, Optimal approximation of continuous functions by very deep relu networks, preprint, arXiv: 1802.03620. Google Scholar

show all references

References:
[1]

F. Bach, Breaking the curse of dimensionality with convex neural networks, J. Mach. Learn. Res., 18 (2017), 629-681.   Google Scholar

[2]

Y. Cho and L. K. Saul, Kernel methods for deep learning, in Advances in Neural Information Processing Systems, (2009), 342–350. Google Scholar

[3]

C. K. ChuiX. Li and H. N. Mhaskar, Limitations of the approximation capabilities of neural networks with one hidden layer, Adv. Comput. Math., 5 (1996), 233-243.  doi: 10.1007/BF02124745.  Google Scholar

[4]

C. K. Chui, S. B. Lin and D. X. Zhou, Construction of neural networks for realization of localized deep learning, Front. Appl. Math. Statist., 4 (2018). doi: 10.1109/tnnls.2017.2665555.  Google Scholar

[5]

R. Eldan and O. Shamir, The power of depth for feedforward neural networks, in Conference on Learning Theory, (2016), 907–940. Google Scholar

[6]

B. Hanin, Universal function approximation by deep neural nets with bounded width and relu activations, Mathematics, 7 (2019), Art. 992. doi: 10.3390/math7100992.  Google Scholar

[7]

Q. T. Le Gia and H. N. Mhaskar, Localized linear polynomial operators and quadrature formulas on the sphere, SIAM J. Numer. Anal., 47 (2009), 440-466.  doi: 10.1137/060678555.  Google Scholar

[8]

P. Lizorkin and K. P. Rustamov, Nikol'skii-Besov spaces on the sphere in connection with approximation theory, Proc. Steklov Inst. Math. AMS Trans., 204 (1994), 149-172.   Google Scholar

[9]

H. N. Mhaskar, Approximation properties of a multilayered feedforward artificial neural network, Adv. Comput. Math., 1 (1993), 61-80.  doi: 10.1007/BF02070821.  Google Scholar

[10]

H. N. Mhaskar, Eignets for function approximation on manifolds, Appl. Comput. Harmon. Anal., 29 (2010), 63-87.  doi: 10.1016/j.acha.2009.08.006.  Google Scholar

[11]

H. N. Mhaskar, Dimension independent bounds for general shallow networks, Neural Netw., 123 (2020), 142-152.  doi: 10.1016/j.neunet.2019.11.006.  Google Scholar

[12]

H. N. Mhaskar, Function approximation with zonal function networks with activation functions analogous to the rectified linear unit functions, J. Complexity, 51 (2019), 1-19.  doi: 10.1016/j.jco.2018.09.002.  Google Scholar

[13]

H. N. Mhaskar and T. Poggio, Deep vs. shallow networks: An approximation theory perspective, Anal. Appl., 14 (2016), 829-848.  doi: 10.1142/S0219530516400042.  Google Scholar

[14]

R. MontufarG. F.Pa scanuK. Cho and Y. Bengio, On the number of linear regions of deep neural networks, Adv. Neural Inform. Process. Syst., 27 (2014), 2924-2932.   Google Scholar

[15]

S. Pawelke, Über die Approximationsordnung bei Kugelfunktionen und algebraischen Polynomen, Tohoku Math. J. Sec. Ser., 24 (1972), 473-486.  doi: 10.2748/tmj/1178241489.  Google Scholar

[16]

I. Safran and O. Shamir, Depth separation in relu networks for approximating smooth non-linear functions, preprint, arXiv: 1610.09887. Google Scholar

[17]

I. Safran and O. Shamir, Depth-width tradeoffs in approximating natural functions with neural networks, in Proceedings of the 34th International Conference on Machine Learning, Vol. 70, (2017), 2979–2987. Google Scholar

[18]

T. Serra, C. Tjandraatmadja and S. Ramalingam, Bounding and counting linear regions of deep neural networks, preprint, arXiv: 1711.02114. Google Scholar

[19]

O. Sharir and A. Shashua, On the expressive power of overlapping architectures of deep learning, preprint, arXiv: 1703.02065. Google Scholar

[20]

M. Telgarsky, Benefits of depth in neural networks, preprint, arXiv: 1602.04485. Google Scholar

[21]

D. Yarotsky, Error bounds for approximations with deep relu networks, Neural Netw., 94 (2017), 103-114.   Google Scholar

[22]

D. Yarotsky, Optimal approximation of continuous functions by very deep relu networks, preprint, arXiv: 1802.03620. Google Scholar

13] shows an example of a $ \mathcal{G} $–function ($ f^* $ given in (3.1)). The vertices $ V\cup \mathbf{S} $ of the DAG $ \mathcal{G} $ are denoted by red dots. The black dots represent the inputs; the input to the various nodes as indicated by the in–edges of the red nodes. The blue dot indicates the output value of the $ \mathcal{G} $–function, $ f^* $ in this example">Figure 1.  This figure from [13] shows an example of a $ \mathcal{G} $–function ($ f^* $ given in (3.1)). The vertices $ V\cup \mathbf{S} $ of the DAG $ \mathcal{G} $ are denoted by red dots. The black dots represent the inputs; the input to the various nodes as indicated by the in–edges of the red nodes. The blue dot indicates the output value of the $ \mathcal{G} $–function, $ f^* $ in this example
Figure 2.  On the left, with $ {\mathbf{x}}_0 = (1, 1, 1)/\sqrt{3} $, the graph of $ f({\mathbf{x}}) = [({\mathbf{x}}\cdot{\mathbf{x}}_0-0.1)_+]^8 + [(-{\mathbf{x}}\cdot{\mathbf{x}}_0-0.1)_+]^8 $. On the right, the graph of $ \mathcal{D}_{\phi_\gamma}(f) $. Courtesy: D. Batenkov
[1]

Bo Tan, Qinglong Zhou. Approximation properties of Lüroth expansions. Discrete & Continuous Dynamical Systems, 2021, 41 (6) : 2873-2890. doi: 10.3934/dcds.2020389

[2]

Zhihua Zhang, Naoki Saito. PHLST with adaptive tiling and its application to antarctic remote sensing image approximation. Inverse Problems & Imaging, 2014, 8 (1) : 321-337. doi: 10.3934/ipi.2014.8.321

[3]

Luke Finlay, Vladimir Gaitsgory, Ivan Lebedev. Linear programming solutions of periodic optimization problems: approximation of the optimal control. Journal of Industrial & Management Optimization, 2007, 3 (2) : 399-413. doi: 10.3934/jimo.2007.3.399

[4]

Xianming Liu, Guangyue Han. A Wong-Zakai approximation of stochastic differential equations driven by a general semimartingale. Discrete & Continuous Dynamical Systems - B, 2021, 26 (5) : 2499-2508. doi: 10.3934/dcdsb.2020192

[5]

Andrés Contreras, Juan Peypouquet. Forward-backward approximation of nonlinear semigroups in finite and infinite horizon. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2021051

[6]

Antonio De Rosa, Domenico Angelo La Manna. A non local approximation of the Gaussian perimeter: Gamma convergence and Isoperimetric properties. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2021059

[7]

G. Deugoué, B. Jidjou Moghomye, T. Tachim Medjo. Approximation of a stochastic two-phase flow model by a splitting-up method. Communications on Pure & Applied Analysis, 2021, 20 (3) : 1135-1170. doi: 10.3934/cpaa.2021010

[8]

Michiyuki Watanabe. Inverse $N$-body scattering with the time-dependent hartree-fock approximation. Inverse Problems & Imaging, 2021, 15 (3) : 499-517. doi: 10.3934/ipi.2021002

[9]

Fabio Camilli, Serikbolsyn Duisembay, Qing Tang. Approximation of an optimal control problem for the time-fractional Fokker-Planck equation. Journal of Dynamics & Games, 2021  doi: 10.3934/jdg.2021013

[10]

Francis Hounkpe, Gregory Seregin. An approximation of forward self-similar solutions to the 3D Navier-Stokes system. Discrete & Continuous Dynamical Systems, 2021  doi: 10.3934/dcds.2021059

[11]

Abraham Sylla. Influence of a slow moving vehicle on traffic: Well-posedness and approximation for a mildly nonlocal model. Networks & Heterogeneous Media, 2021, 16 (2) : 221-256. doi: 10.3934/nhm.2021005

[12]

Jean Dolbeault, Maria J. Esteban, Michał Kowalczyk, Michael Loss. Improved interpolation inequalities on the sphere. Discrete & Continuous Dynamical Systems - S, 2014, 7 (4) : 695-724. doi: 10.3934/dcdss.2014.7.695

[13]

Valery Y. Glizer. Novel Conditions of Euclidean space controllability for singularly perturbed systems with input delay. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 307-320. doi: 10.3934/naco.2020027

[14]

Montserrat Corbera, Claudia Valls. Reversible polynomial Hamiltonian systems of degree 3 with nilpotent saddles. Discrete & Continuous Dynamical Systems - B, 2021, 26 (6) : 3209-3233. doi: 10.3934/dcdsb.2020225

[15]

Seung-Yeal Ha, Myeongju Kang, Hansol Park. Collective behaviors of the Lohe Hermitian sphere model with inertia. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2021046

[16]

Juan Manuel Pastor, Javier García-Algarra, José M. Iriondo, José J. Ramasco, Javier Galeano. Dragging in mutualistic networks. Networks & Heterogeneous Media, 2015, 10 (1) : 37-52. doi: 10.3934/nhm.2015.10.37

[17]

Gheorghe Craciun, Jiaxin Jin, Polly Y. Yu. Single-target networks. Discrete & Continuous Dynamical Systems - B, 2021  doi: 10.3934/dcdsb.2021065

[18]

Ralf Hielscher, Michael Quellmalz. Reconstructing a function on the sphere from its means along vertical slices. Inverse Problems & Imaging, 2016, 10 (3) : 711-739. doi: 10.3934/ipi.2016018

[19]

Alessandro Gondolo, Fernando Guevara Vasquez. Characterization and synthesis of Rayleigh damped elastodynamic networks. Networks & Heterogeneous Media, 2014, 9 (2) : 299-314. doi: 10.3934/nhm.2014.9.299

[20]

Juan Manuel Pastor, Javier García-Algarra, Javier Galeano, José María Iriondo, José J. Ramasco. A simple and bounded model of population dynamics for mutualistic networks. Networks & Heterogeneous Media, 2015, 10 (1) : 53-70. doi: 10.3934/nhm.2015.10.53

2019 Impact Factor: 1.105

Metrics

  • PDF downloads (145)
  • HTML views (82)
  • Cited by (0)

Other articles
by authors

[Back to Top]