Reduced basis approximations of parameterized dynamical partial differential equations via neural networks

    *Corresponding author: Peter Sentz 
  • Projection-based reduced order models are effective at approximating parameter-dependent differential equations that are parametrically separable. When parametric separability is not satisfied, which occurs in both linear and nonlinear problems, projection-based methods fail to adequately reduce the computational complexity. Devising alternative reduced order models is crucial for obtaining efficient and accurate approximations to expensive high-fidelity models. In this work, we develop a timestepping procedure for dynamical parameter-dependent problems, in which a neural-network is trained to propagate the coefficients of a reduced basis expansion. This results in an online stage with a computational cost independent of the size of the underlying problem. We demonstrate our method on several parabolic partial differential equations, including a problem that is not parametrically separable.

    \begin{equation} \\ \end{equation}
  • Figure 1.  One residual layer of a ResNet showing the structure of the residual basic block used. The matrices and bias vectors define the parameters for this layer

    Figure 2.  A two layer ResNet for timestepping (see also Equation (31)). The residual $ R_j $ functions contain the weights and biases – cf. Figure 1. For compactness, we have introduced the notation $ \tilde{t}_k = t_k + \Delta t/2 $ for the initial input

    Figure 3.  2D advection-diffusion: The mean relative error over the test parameters as a function of time. The measure in Equation (39) is evaluated for the reconstructed solutions using the $ L^2 $-projection (black), the neural network approximation (red), and the coefficients computed using Galerkin POD (blue). The timesteps are marked every 5 steps

    Figure 4.  2D advection-diffusion: (Left) Mean relative $ L^2 $-error Equation (39) with respect to the full-order solution for a timestep size of $ 10\Delta t $. The shaded region denotes $ \pm $ one standard deviation over the test set. Timesteps are marked every 4 steps. (Right) Mean relative $ L^2 $-error Equation (39) with respect to the full-order solution for a timestep size of $ 20\Delta t $. The shaded region denotes $ \pm $ one standard deviation over the test set. Timesteps are marked every 2 steps

    Figure 5.  1D advection-diffusion: The mean relative error over the test parameters as a function of time. The measure in Equation (39) is evaluated for the reconstructed solutions using the $ L^2 $-projection (black), the neural network approximation (red), and the coefficients computed using Galerkin POD (blue). The timesteps are marked every 10 steps

    Figure 6.  1D advection-diffusion: Two solution profiles at different parameter values. The pointwise relative error (in magnitude) with the full-order solution in the neural network approximation (solid) and the POD (dashed) is shown in the background

    Figure 7.  1D advection-diffusion: Relative $ L^2 $-error Equation (38) as a function of time using the $ L^2 $-projection (black), the neural network approximation (red), and the coefficients computed using Galerkin POD (blue). The timesteps are marked every 10 steps. (Left) Error corresponding to $ Pe = .105 $, the smallest Peclet number in the test set. (Right) Error corresponding to $ Pe = 18.94 $, the largest Peclet number in the test set

    Figure 8.  Non-affine diffusion-reaction: Solution snapshots at various points in time

    Figure 9.  Non-affine diffusion-reaction: The mean relative error over the test parameters as a function of time. The measure in Equation (39) is evaluated for the reconstructed solutions using the $ L^2 $-projection (black), and the neural network approximation (red). The model is trained on the time interval to the left of the dashed vertical line; to the right of the dashed vertical line corresponds to extrapolation

    Table 1.  Model parameters and execution times for the examples in Section 5. All numerical experiments were executed on a MacBook Pro with 2 GHz Quad-Core Intel Core i5 processor

    Parameter 2D Adv. Diff. (33) 1D Adv. Diff. (40) 2D non-affine (41)
    # of residual blocks 3 2 2
    hidden layers per block 2 2 4
    width of each block 12 13 18
    total # of weights/biases 1,101 900 2,640
    rollout length $ m $ 3 4 5
    activation function tanh tanh tanh
    optimizer L-BFGS L-BFGS L-BFGS
    training set size 100 $ {\mathit{\boldsymbol{\mu}}} $ values 100 $ {\mathit{\boldsymbol{\mu}}} $ values 225 $ {\mathit{\boldsymbol{\mu}}} $ values
    50 timesteps 100 timesteps 40 timesteps
    NN training time $ \sim $ 1.5 m $ \sim $ 3 h $ \sim $ 1.5 h
    online NN cost per parameter $ \sim 1 $ ms $ \sim 0.4 $ ms $ \sim 0.4 $ ms
    full-order cost per parameter $ \sim 60 $ ms $ \sim 10 $ ms $ \sim 70 $ ms
    Table 2.  Notation used throughout

    Variable Definition
    $ \Omega \in \mathbb{R}^d $ spatial domain
    $ T $ end time of the simulation
    $ u({\mathit{\boldsymbol{x}}}, t; {\mathit{\boldsymbol{\mu}}}) $ PDE solution
    $ F $ differential operator
    $ {\mathit{\boldsymbol{\mu}}} $ vector of $ P $ parameters, $ \mu_1 $, $\dots, \mu_P $
    $ V^h $: $ \phi_1 $, $\dots, \phi_{N_h} $ full-order basis on mesh $ h $
    $ V^{ \text{rb}} $: $ \psi_1 $, $ \dots, \psi_{N_{ \text{rb}}} $ reduced basis on mesh $ h $
    $ t_k $ $ k=1 $, $\dots, N_t $ timestep $ k $ with $ t_{N_{t}} = T $
    $ N_t $ number of timesteps to the end time
    $ \mathcal{D} $ parameter space
    $ N_\text{s} $ number of POD samples
    $ \mathcal{D}_{\text{POD}} $ POD sample space
    $ S_j $ data matrix for parameter sample $ {{\mathit{\boldsymbol{\mu}}}}_j $
    $ U $ data matrix combining all compressed samples
    $ N_{ \text{rb}_j} $ first stage basis functions for parameter sample $ j $ (SVD)
    $ \hat{n}_i $ # of retained singular values for parameter sample $ i $
    $ \epsilon_{{\mathit{\boldsymbol{\mu}}}} $ stopping criteria, first stage
    $ \epsilon_{t} $ stopping criteria, second stage
    $ {\mathit{\boldsymbol{\alpha}}}(t;{\mathit{\boldsymbol{\mu}}}) $ vector of full-order basis coefficients
    $ {\mathit{\boldsymbol{c}}}(t;{\mathit{\boldsymbol{\mu}}}) $ vector of reduced basis coefficients
    $ u^{rb} $ reduced order PDE solution
    $ \mathcal{N}_R $ mapping of a parameter to RB coefficient
    $ L_T $ least squares single-step loss
    $ L_T^m $ least squares multi-step loss
    $ \mathcal{T} $ a training set specifying time nodes and parameters
    $ \mathcal{B} $ a mini-batch defined as a subset of $ \mathcal{T} $
    $ \mathcal{N}_T $ the neural net and mapping
    $ \mathcal{D}_{\text{train}} $ training set of parameters
    $ N_{\text{train}} $ number of training samples
    $ \mathcal{F} $ training data features
    $ \mathcal{T} $ training data targets
    $ L $ number of layers in a ResNet
    $ R_i $ ResNet block
    $ I $ identity
    $ A^{(j)} $ dense NN layer
    $ W^{(j)} $ weight matrix
    $ b^{(j)} $ bias vector
    $ \sigma $ activation function
