Advanced Search
Article Contents
Article Contents

Function approximation by deep networks

  • *Corresponding author

    *Corresponding author 
The research of the first author is supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via 2018-18032000002. The research of the second author is supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216
Abstract Full Text(HTML) Figure(2) Related Papers Cited by
  • We show that deep networks are better than shallow networks at approximating functions that can be expressed as a composition of functions described by a directed acyclic graph, because the deep networks can be designed to have the same compositional structure, while a shallow network cannot exploit this knowledge. Thus, the blessing of compositionality mitigates the curse of dimensionality. On the other hand, a theorem called good propagation of errors allows to "lift" theorems about shallow networks to those about deep networks with an appropriate choice of norms, smoothness, etc. We illustrate this in three contexts where each channel in the deep network calculates a spherical polynomial, a non-smooth ReLU network, or another zonal function network related closely with the ReLU network.

    Mathematics Subject Classification: Primary: 41A25; Secondary: 68Q32.


    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  This figure from [13] shows an example of a $ \mathcal{G} $–function ($ f^* $ given in (3.1)). The vertices $ V\cup \mathbf{S} $ of the DAG $ \mathcal{G} $ are denoted by red dots. The black dots represent the inputs; the input to the various nodes as indicated by the in–edges of the red nodes. The blue dot indicates the output value of the $ \mathcal{G} $–function, $ f^* $ in this example

    Figure 2.  On the left, with $ {\mathbf{x}}_0 = (1, 1, 1)/\sqrt{3} $, the graph of $ f({\mathbf{x}}) = [({\mathbf{x}}\cdot{\mathbf{x}}_0-0.1)_+]^8 + [(-{\mathbf{x}}\cdot{\mathbf{x}}_0-0.1)_+]^8 $. On the right, the graph of $ \mathcal{D}_{\phi_\gamma}(f) $. Courtesy: D. Batenkov

  • [1] F. Bach, Breaking the curse of dimensionality with convex neural networks, J. Mach. Learn. Res., 18 (2017), 629-681. 
    [2] Y. Cho and L. K. Saul, Kernel methods for deep learning, in Advances in Neural Information Processing Systems, (2009), 342–350.
    [3] C. K. ChuiX. Li and H. N. Mhaskar, Limitations of the approximation capabilities of neural networks with one hidden layer, Adv. Comput. Math., 5 (1996), 233-243.  doi: 10.1007/BF02124745.
    [4] C. K. Chui, S. B. Lin and D. X. Zhou, Construction of neural networks for realization of localized deep learning, Front. Appl. Math. Statist., 4 (2018). doi: 10.1109/tnnls.2017.2665555.
    [5] R. Eldan and O. Shamir, The power of depth for feedforward neural networks, in Conference on Learning Theory, (2016), 907–940.
    [6] B. Hanin, Universal function approximation by deep neural nets with bounded width and relu activations, Mathematics, 7 (2019), Art. 992. doi: 10.3390/math7100992.
    [7] Q. T. Le Gia and H. N. Mhaskar, Localized linear polynomial operators and quadrature formulas on the sphere, SIAM J. Numer. Anal., 47 (2009), 440-466.  doi: 10.1137/060678555.
    [8] P. Lizorkin and K. P. Rustamov, Nikol'skii-Besov spaces on the sphere in connection with approximation theory, Proc. Steklov Inst. Math. AMS Trans., 204 (1994), 149-172. 
    [9] H. N. Mhaskar, Approximation properties of a multilayered feedforward artificial neural network, Adv. Comput. Math., 1 (1993), 61-80.  doi: 10.1007/BF02070821.
    [10] H. N. Mhaskar, Eignets for function approximation on manifolds, Appl. Comput. Harmon. Anal., 29 (2010), 63-87.  doi: 10.1016/j.acha.2009.08.006.
    [11] H. N. Mhaskar, Dimension independent bounds for general shallow networks, Neural Netw., 123 (2020), 142-152.  doi: 10.1016/j.neunet.2019.11.006.
    [12] H. N. Mhaskar, Function approximation with zonal function networks with activation functions analogous to the rectified linear unit functions, J. Complexity, 51 (2019), 1-19.  doi: 10.1016/j.jco.2018.09.002.
    [13] H. N. Mhaskar and T. Poggio, Deep vs. shallow networks: An approximation theory perspective, Anal. Appl., 14 (2016), 829-848.  doi: 10.1142/S0219530516400042.
    [14] R. MontufarG. F.Pa scanuK. Cho and Y. Bengio, On the number of linear regions of deep neural networks, Adv. Neural Inform. Process. Syst., 27 (2014), 2924-2932. 
    [15] S. Pawelke, Über die Approximationsordnung bei Kugelfunktionen und algebraischen Polynomen, Tohoku Math. J. Sec. Ser., 24 (1972), 473-486.  doi: 10.2748/tmj/1178241489.
    [16] I. Safran and O. Shamir, Depth separation in relu networks for approximating smooth non-linear functions, preprint, arXiv: 1610.09887.
    [17] I. Safran and O. Shamir, Depth-width tradeoffs in approximating natural functions with neural networks, in Proceedings of the 34th International Conference on Machine Learning, Vol. 70, (2017), 2979–2987.
    [18] T. Serra, C. Tjandraatmadja and S. Ramalingam, Bounding and counting linear regions of deep neural networks, preprint, arXiv: 1711.02114.
    [19] O. Sharir and A. Shashua, On the expressive power of overlapping architectures of deep learning, preprint, arXiv: 1703.02065.
    [20] M. Telgarsky, Benefits of depth in neural networks, preprint, arXiv: 1602.04485.
    [21] D. Yarotsky, Error bounds for approximations with deep relu networks, Neural Netw., 94 (2017), 103-114. 
    [22] D. Yarotsky, Optimal approximation of continuous functions by very deep relu networks, preprint, arXiv: 1802.03620.
  • 加载中



Article Metrics

HTML views(1617) PDF downloads(277) Cited by(0)

Access History

Other Articles By Authors



    DownLoad:  Full-Size Img  PowerPoint