• PDF
• Cite
• Share
Article Contents  Article Contents

# Mean-field and kinetic descriptions of neural differential equations

• *Corresponding author: Giuseppe Visconti
• Nowadays, neural networks are widely used in many applications as artificial intelligence models for learning tasks. Since typically neural networks process a very large amount of data, it is convenient to formulate them within the mean-field and kinetic theory. In this work we focus on a particular class of neural networks, i.e. the residual neural networks, assuming that each layer is characterized by the same number of neurons $N$, which is fixed by the dimension of the data. This assumption allows to interpret the residual neural network as a time-discretized ordinary differential equation, in analogy with neural differential equations. The mean-field description is then obtained in the limit of infinitely many input data. This leads to a Vlasov-type partial differential equation which describes the evolution of the distribution of the input data. We analyze steady states and sensitivity with respect to the parameters of the network, namely the weights and the bias. In the simple setting of a linear activation function and one-dimensional input data, the study of the moments provides insights on the choice of the parameters of the network. Furthermore, a modification of the microscopic dynamics, inspired by stochastic residual neural networks, leads to a Fokker-Planck formulation of the network, in which the concept of network training is replaced by the task of fitting distributions. The performed analysis is validated by artificial numerical simulations. In particular, results on classification and regression problems are presented.

Mathematics Subject Classification: Primary: 35Q83, 35Q84; Secondary: 90C31, 92B20.

 Citation: • • Figure 1.  Left: Moments of our PDE model with $\sigma(x) = x, w = -1, b = 0$. Right: Moments of our PDE model with $\sigma(x) = x, w = -1, b = -\frac{m_1(t)}{m_0(0)}$

Figure 2.  Left: The energy and variance plotted against the desired values with $\sigma(x) = x, w = -1, b = 0$. Right: The energy and variance plotted against the desired values with $\sigma(x) = x, w = -1, b = -\frac{m_1(t)}{m_0(0)}$

Figure 3.  We consider $50$ vehicles with measured length $2$ and $8$ obtained as uniformly distributed random realizations. Left: Histogram of the measured length of the vehicles. Right: Trajectories of the neuron activation energies of the $50$ measurements

Figure 4.  Solution of the mean field neural network model at different time steps. The initial value is a uniform distribution on $[2, 8]$ and the weight and bias is chosen as $w = 1, \ b = -5$

Figure 5.  Left: Regression problem with $5\cdot10^3$ measurements at fixed positions around $y = x$. Measurement errors are distributed according to a standard Gaussian. Center: Numerical slopes computed out of the previous measurements. Right: Numerical intercepts computed out of the previous measurements

Figure 6.  Evolution at time $t = 0$ (left plot), $t = 1$ (center plot), $t = 2$ (right plot) of the mean field neural network model (30) for the regression problem with weights $w_{xx} = 1$, $w_{xy} = w_{yx} = 0$, $w_{yy} = -1$, and biases $b_x = -1$, $b_y = 0$

Figure 7.  Evolution at time $t = 0$ (left plot), $t = 1$ (center plot), $t = 5$ (right plot) of the one dimensional mean field neural network model for the regression problem with weight $w = 1$ and bias $b = -1$

Figure 8.  Results of the mean field neural network model with updated weights and biases in the case of a novel target

Figure 9.  Solution of the Fokker-Planck neural network model at different times. Here, we have chosen the identity as activation function with weight $w = -1$, bias $b = 0$ and diffusion function $K(x) = 1$

Table 1.  Example of a data set for a classification problem

 Measurement 3 3.5 5.5 7 4.5 8 $\dots$ Classifier car car truck truck car truck $\dots$
• Figures(9)

Tables(1)

## Article Metrics  DownLoad:  Full-Size Img  PowerPoint