
-
Previous Article
Representations for the inverses of certain operators
- CPAA Home
- This Issue
-
Next Article
Quantitative convergence analysis of kernel based large-margin unified machines
Function approximation by deep networks
1. | Institute of Mathematical Sciences, Claremont Graduate University, Claremont, CA 91711 |
2. | Center for Brains, Minds, and Machines, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, 02139 |
We show that deep networks are better than shallow networks at approximating functions that can be expressed as a composition of functions described by a directed acyclic graph, because the deep networks can be designed to have the same compositional structure, while a shallow network cannot exploit this knowledge. Thus, the blessing of compositionality mitigates the curse of dimensionality. On the other hand, a theorem called good propagation of errors allows to "lift" theorems about shallow networks to those about deep networks with an appropriate choice of norms, smoothness, etc. We illustrate this in three contexts where each channel in the deep network calculates a spherical polynomial, a non-smooth ReLU network, or another zonal function network related closely with the ReLU network.
References:
[1] |
F. Bach,
Breaking the curse of dimensionality with convex neural networks, J. Mach. Learn. Res., 18 (2017), 629-681.
|
[2] |
Y. Cho and L. K. Saul, Kernel methods for deep learning, in Advances in Neural Information Processing Systems, (2009), 342–350. |
[3] |
C. K. Chui, X. Li and H. N. Mhaskar,
Limitations of the approximation capabilities of neural networks with one hidden layer, Adv. Comput. Math., 5 (1996), 233-243.
doi: 10.1007/BF02124745. |
[4] |
C. K. Chui, S. B. Lin and D. X. Zhou, Construction of neural networks for realization of localized deep learning, Front. Appl. Math. Statist., 4 (2018).
doi: 10.1109/tnnls.2017.2665555. |
[5] |
R. Eldan and O. Shamir, The power of depth for feedforward neural networks, in Conference on Learning Theory, (2016), 907–940. |
[6] |
B. Hanin, Universal function approximation by deep neural nets with bounded width and relu activations, Mathematics, 7 (2019), Art. 992.
doi: 10.3390/math7100992. |
[7] |
Q. T. Le Gia and H. N. Mhaskar,
Localized linear polynomial operators and quadrature formulas on the sphere, SIAM J. Numer. Anal., 47 (2009), 440-466.
doi: 10.1137/060678555. |
[8] |
P. Lizorkin and K. P. Rustamov,
Nikol'skii-Besov spaces on the sphere in connection with approximation theory, Proc. Steklov Inst. Math. AMS Trans., 204 (1994), 149-172.
|
[9] |
H. N. Mhaskar,
Approximation properties of a multilayered feedforward artificial neural network, Adv. Comput. Math., 1 (1993), 61-80.
doi: 10.1007/BF02070821. |
[10] |
H. N. Mhaskar,
Eignets for function approximation on manifolds, Appl. Comput. Harmon. Anal., 29 (2010), 63-87.
doi: 10.1016/j.acha.2009.08.006. |
[11] |
H. N. Mhaskar,
Dimension independent bounds for general shallow networks, Neural Netw., 123 (2020), 142-152.
doi: 10.1016/j.neunet.2019.11.006. |
[12] |
H. N. Mhaskar,
Function approximation with zonal function networks with activation functions analogous to the rectified linear unit functions, J. Complexity, 51 (2019), 1-19.
doi: 10.1016/j.jco.2018.09.002. |
[13] |
H. N. Mhaskar and T. Poggio,
Deep vs. shallow networks: An approximation theory perspective, Anal. Appl., 14 (2016), 829-848.
doi: 10.1142/S0219530516400042. |
[14] |
R. Montufar, G. F., Pa scanu, K. Cho and Y. Bengio,
On the number of linear regions of deep neural networks, Adv. Neural Inform. Process. Syst., 27 (2014), 2924-2932.
|
[15] |
S. Pawelke,
Über die Approximationsordnung bei Kugelfunktionen und algebraischen Polynomen, Tohoku Math. J. Sec. Ser., 24 (1972), 473-486.
doi: 10.2748/tmj/1178241489. |
[16] |
I. Safran and O. Shamir, Depth separation in relu networks for approximating smooth non-linear functions, preprint, arXiv: 1610.09887. |
[17] |
I. Safran and O. Shamir, Depth-width tradeoffs in approximating natural functions with neural networks, in Proceedings of the 34th International Conference on Machine Learning, Vol. 70, (2017), 2979–2987. |
[18] |
T. Serra, C. Tjandraatmadja and S. Ramalingam, Bounding and counting linear regions of deep neural networks, preprint, arXiv: 1711.02114. |
[19] |
O. Sharir and A. Shashua, On the expressive power of overlapping architectures of deep learning, preprint, arXiv: 1703.02065. |
[20] |
M. Telgarsky, Benefits of depth in neural networks, preprint, arXiv: 1602.04485. |
[21] |
D. Yarotsky,
Error bounds for approximations with deep relu networks, Neural Netw., 94 (2017), 103-114.
|
[22] |
D. Yarotsky, Optimal approximation of continuous functions by very deep relu networks, preprint, arXiv: 1802.03620. |
show all references
References:
[1] |
F. Bach,
Breaking the curse of dimensionality with convex neural networks, J. Mach. Learn. Res., 18 (2017), 629-681.
|
[2] |
Y. Cho and L. K. Saul, Kernel methods for deep learning, in Advances in Neural Information Processing Systems, (2009), 342–350. |
[3] |
C. K. Chui, X. Li and H. N. Mhaskar,
Limitations of the approximation capabilities of neural networks with one hidden layer, Adv. Comput. Math., 5 (1996), 233-243.
doi: 10.1007/BF02124745. |
[4] |
C. K. Chui, S. B. Lin and D. X. Zhou, Construction of neural networks for realization of localized deep learning, Front. Appl. Math. Statist., 4 (2018).
doi: 10.1109/tnnls.2017.2665555. |
[5] |
R. Eldan and O. Shamir, The power of depth for feedforward neural networks, in Conference on Learning Theory, (2016), 907–940. |
[6] |
B. Hanin, Universal function approximation by deep neural nets with bounded width and relu activations, Mathematics, 7 (2019), Art. 992.
doi: 10.3390/math7100992. |
[7] |
Q. T. Le Gia and H. N. Mhaskar,
Localized linear polynomial operators and quadrature formulas on the sphere, SIAM J. Numer. Anal., 47 (2009), 440-466.
doi: 10.1137/060678555. |
[8] |
P. Lizorkin and K. P. Rustamov,
Nikol'skii-Besov spaces on the sphere in connection with approximation theory, Proc. Steklov Inst. Math. AMS Trans., 204 (1994), 149-172.
|
[9] |
H. N. Mhaskar,
Approximation properties of a multilayered feedforward artificial neural network, Adv. Comput. Math., 1 (1993), 61-80.
doi: 10.1007/BF02070821. |
[10] |
H. N. Mhaskar,
Eignets for function approximation on manifolds, Appl. Comput. Harmon. Anal., 29 (2010), 63-87.
doi: 10.1016/j.acha.2009.08.006. |
[11] |
H. N. Mhaskar,
Dimension independent bounds for general shallow networks, Neural Netw., 123 (2020), 142-152.
doi: 10.1016/j.neunet.2019.11.006. |
[12] |
H. N. Mhaskar,
Function approximation with zonal function networks with activation functions analogous to the rectified linear unit functions, J. Complexity, 51 (2019), 1-19.
doi: 10.1016/j.jco.2018.09.002. |
[13] |
H. N. Mhaskar and T. Poggio,
Deep vs. shallow networks: An approximation theory perspective, Anal. Appl., 14 (2016), 829-848.
doi: 10.1142/S0219530516400042. |
[14] |
R. Montufar, G. F., Pa scanu, K. Cho and Y. Bengio,
On the number of linear regions of deep neural networks, Adv. Neural Inform. Process. Syst., 27 (2014), 2924-2932.
|
[15] |
S. Pawelke,
Über die Approximationsordnung bei Kugelfunktionen und algebraischen Polynomen, Tohoku Math. J. Sec. Ser., 24 (1972), 473-486.
doi: 10.2748/tmj/1178241489. |
[16] |
I. Safran and O. Shamir, Depth separation in relu networks for approximating smooth non-linear functions, preprint, arXiv: 1610.09887. |
[17] |
I. Safran and O. Shamir, Depth-width tradeoffs in approximating natural functions with neural networks, in Proceedings of the 34th International Conference on Machine Learning, Vol. 70, (2017), 2979–2987. |
[18] |
T. Serra, C. Tjandraatmadja and S. Ramalingam, Bounding and counting linear regions of deep neural networks, preprint, arXiv: 1711.02114. |
[19] |
O. Sharir and A. Shashua, On the expressive power of overlapping architectures of deep learning, preprint, arXiv: 1703.02065. |
[20] |
M. Telgarsky, Benefits of depth in neural networks, preprint, arXiv: 1602.04485. |
[21] |
D. Yarotsky,
Error bounds for approximations with deep relu networks, Neural Netw., 94 (2017), 103-114.
|
[22] |
D. Yarotsky, Optimal approximation of continuous functions by very deep relu networks, preprint, arXiv: 1802.03620. |


[1] |
Weihua Liu, Andrew Klapper. AFSRs synthesis with the extended Euclidean rational approximation algorithm. Advances in Mathematics of Communications, 2017, 11 (1) : 139-150. doi: 10.3934/amc.2017008 |
[2] |
Vikas S. Krishnamurthy. The vorticity equation on a rotating sphere and the shallow fluid approximation. Discrete and Continuous Dynamical Systems, 2019, 39 (11) : 6261-6276. doi: 10.3934/dcds.2019273 |
[3] |
Purshottam Narain Agrawal, Şule Yüksel Güngör, Abhishek Kumar. Better degree of approximation by modified Bernstein-Durrmeyer type operators. Mathematical Foundations of Computing, 2022, 5 (2) : 75-92. doi: 10.3934/mfc.2021024 |
[4] |
Gabriella Bretti, Roberto Natalini, Benedetto Piccoli. Fast algorithms for the approximation of a traffic flow model on networks. Discrete and Continuous Dynamical Systems - B, 2006, 6 (3) : 427-448. doi: 10.3934/dcdsb.2006.6.427 |
[5] |
Pierluigi Colli, Gianni Gilardi, Jürgen Sprekels. Deep quench approximation and optimal control of general Cahn–Hilliard systems with fractional operators and double obstacle potentials. Discrete and Continuous Dynamical Systems - S, 2021, 14 (1) : 243-271. doi: 10.3934/dcdss.2020213 |
[6] |
Christian Bläsche, Shawn Means, Carlo R. Laing. Degree assortativity in networks of spiking neurons. Journal of Computational Dynamics, 2020, 7 (2) : 401-423. doi: 10.3934/jcd.2020016 |
[7] |
Denis Mercier, Serge Nicaise. Existence results for general systems of differential equations on one-dimensional networks and prewavelets approximation. Discrete and Continuous Dynamical Systems, 1998, 4 (2) : 273-300. doi: 10.3934/dcds.1998.4.273 |
[8] |
Lars Grüne. Computing Lyapunov functions using deep neural networks. Journal of Computational Dynamics, 2021, 8 (2) : 131-152. doi: 10.3934/jcd.2021006 |
[9] |
D. Lannes. Consistency of the KP approximation. Conference Publications, 2003, 2003 (Special) : 517-525. doi: 10.3934/proc.2003.2003.517 |
[10] |
Cristina Stoica. An approximation theorem in classical mechanics. Journal of Geometric Mechanics, 2016, 8 (3) : 359-374. doi: 10.3934/jgm.2016011 |
[11] |
Susanna V. Haziot. On the spherical geopotential approximation for Saturn. Communications on Pure and Applied Analysis, 2022, 21 (7) : 2327-2336. doi: 10.3934/cpaa.2022035 |
[12] |
Hongfei Yang, Xiaofeng Ding, Raymond Chan, Hui Hu, Yaxin Peng, Tieyong Zeng. A new initialization method based on normed statistical spaces in deep networks. Inverse Problems and Imaging, 2021, 15 (1) : 147-158. doi: 10.3934/ipi.2020045 |
[13] |
Anne-Sophie de Suzzoni. Consequences of the choice of a particular basis of $L^2(S^3)$ for the cubic wave equation on the sphere and the Euclidean space. Communications on Pure and Applied Analysis, 2014, 13 (3) : 991-1015. doi: 10.3934/cpaa.2014.13.991 |
[14] |
Jakub Cupera. Diffusion approximation of neuronal models revisited. Mathematical Biosciences & Engineering, 2014, 11 (1) : 11-25. doi: 10.3934/mbe.2014.11.11 |
[15] |
Bernd Aulbach, Martin Rasmussen, Stefan Siegmund. Approximation of attractors of nonautonomous dynamical systems. Discrete and Continuous Dynamical Systems - B, 2005, 5 (2) : 215-238. doi: 10.3934/dcdsb.2005.5.215 |
[16] |
Rua Murray. Approximation error for invariant density calculations. Discrete and Continuous Dynamical Systems, 1998, 4 (3) : 535-557. doi: 10.3934/dcds.1998.4.535 |
[17] |
Bo Tan, Qinglong Zhou. Approximation properties of Lüroth expansions. Discrete and Continuous Dynamical Systems, 2021, 41 (6) : 2873-2890. doi: 10.3934/dcds.2020389 |
[18] |
Janne M.J. Huttunen, J. P. Kaipio. Approximation errors in nonstationary inverse problems. Inverse Problems and Imaging, 2007, 1 (1) : 77-93. doi: 10.3934/ipi.2007.1.77 |
[19] |
Nicolas Fournier. Particle approximation of some Landau equations. Kinetic and Related Models, 2009, 2 (3) : 451-464. doi: 10.3934/krm.2009.2.451 |
[20] |
Michael Herty, Gabriella Puppo, Sebastiano Roncoroni, Giuseppe Visconti. The BGK approximation of kinetic models for traffic. Kinetic and Related Models, 2020, 13 (2) : 279-307. doi: 10.3934/krm.2020010 |
2020 Impact Factor: 1.916
Tools
Metrics
Other articles
by authors
[Back to Top]