Article Contents
Article Contents

# Function approximation by deep networks

• *Corresponding author
The research of the first author is supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via 2018-18032000002. The research of the second author is supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216
• We show that deep networks are better than shallow networks at approximating functions that can be expressed as a composition of functions described by a directed acyclic graph, because the deep networks can be designed to have the same compositional structure, while a shallow network cannot exploit this knowledge. Thus, the blessing of compositionality mitigates the curse of dimensionality. On the other hand, a theorem called good propagation of errors allows to "lift" theorems about shallow networks to those about deep networks with an appropriate choice of norms, smoothness, etc. We illustrate this in three contexts where each channel in the deep network calculates a spherical polynomial, a non-smooth ReLU network, or another zonal function network related closely with the ReLU network.

Mathematics Subject Classification: Primary: 41A25; Secondary: 68Q32.

 Citation:

• Figure 1.  This figure from [13] shows an example of a $\mathcal{G}$–function ($f^*$ given in (3.1)). The vertices $V\cup \mathbf{S}$ of the DAG $\mathcal{G}$ are denoted by red dots. The black dots represent the inputs; the input to the various nodes as indicated by the in–edges of the red nodes. The blue dot indicates the output value of the $\mathcal{G}$–function, $f^*$ in this example

Figure 2.  On the left, with ${\mathbf{x}}_0 = (1, 1, 1)/\sqrt{3}$, the graph of $f({\mathbf{x}}) = [({\mathbf{x}}\cdot{\mathbf{x}}_0-0.1)_+]^8 + [(-{\mathbf{x}}\cdot{\mathbf{x}}_0-0.1)_+]^8$. On the right, the graph of $\mathcal{D}_{\phi_\gamma}(f)$. Courtesy: D. Batenkov

Figures(2)