Multi-fidelity Generative Deep Learning Turbulent Flows

In computational fluid dynamics, there is an inevitable trade off between accuracy and computational cost. Low-fidelity simulations with coarse discretizations are computationally inexpensive, however, the resulting flow fields are often inaccurate. Alternatively, high-fidelity simulations can yield accurate predictions but at exponentially higher computational cost. In this work, a novel multi-fidelity deep generative model is introduced for the surrogate modeling of high-fidelity turbulent flow fields given the solution of a computationally inexpensive but inaccurate low-fidelity solver. The resulting surrogate is able to generate physically accurate turbulent realizations at a computational cost magnitudes lower than that of a high-fidelity simulation. The deep generative model developed is a conditional invertible neural network, built with normalizing flows, with recurrent LSTM connections that allow for stable training of transient systems with high predictive accuracy. The model is trained with a variational loss that combines both data-driven and physics-constrained learning. This deep generative model is applied to non-trivial high Reynolds number flows governed by the Navier-Stokes equations including turbulent flow over a backwards facing step at different Reynolds numbers and turbulent wake behind an array of bluff bodies. For both of these examples, the model is able to generate unique yet physically accurate turbulent fluid flows conditioned on an inexpensive low-fidelity solution.


Introduction
The numerical simulation and analysis of turbulent fluid flows is of great importance to many scientific and engineering domains. Over the past several decades computational fluid dynamics (CFD) has become an integral component of academia and industry. However high-accuracy fluid simulation remains a computationally demanding task particularly at high Reynolds numbers for which the flow is turbulent. This has led to a hierarchy of simulation models to predict fluid flow ranging from the fast but typically inaccurate Reynolds-Averaged Navier-Stokes (RANS) to the fully resolved but super-computer demanding direct numerical simulation (DNS) [1]. Large-eddy simulation (LES) has become a work horse method for scientific and industrial analysis since it can achieve both reasonable accuracy and computational requirement. Often only a section of the entire simulation domain is of interest or requires a greater degree of accuracy. Such examples include boundary layers, turbulent wakes behind a suspended or wall mounted objects, the interface between two fluids, shock boundaries, etc. This principle that different physical scales are of interest in different locations of the simulation domain has led to the development of various multiscale/multilevel methods [2]. Multiscale methods typically combine simulations at different resolutions to increase the accuracy of the simulation with minimal computational overhead.
Multiscale computational fluid dynamic methods constitute a rich and well developed field that encompasses many different methodologies that approach the multiscale aspect through different philosophies. Of particular interest are adaptive multilevel methods which focus on resolving different scales based on the complexity of the fluid flow [2]. Such approaches use a hierarchy of grids at various resolutions to resolve particular areas of the simulation domain at various levels of accuracy. This includes methods that use self-adaptive meshes in which the discretization of the simulation domain is evolved to meet specific resolution criteria [3,4,5], global hybrid methods such as very large eddy simulation (VLES) [6] or detached eddy simulation (DES) [7] and zonal methods for which prespecified regions of the flow domain are resolved with higher accuracy to capture relevant physics [8,9,10]. We take inspiration from these multiscale models to develop a deep learning model that takes advantage of simulations ran at multiple scales to predict high-fidelity turbulent fluid flow. This deep learning model replaces costly high-fidelity simulation enabling us to obtain fast yet accurate turbulent statistics given a coarse simulation.
Machine learning in CFD, specifically the modeling of the N-S equations, has gained a growing interest in recent years with a wide variety of methods ranging from Kalman filters to deep neural networks. These applications can be broken down into several major categories including: RANS turbulence modeling, LES sub-scale grid modeling, flow control and direct flow prediction. Machine learning based turbulence modeling for RANS simulation seeks to approximate the Reynolds-Stress term in the RANS equations at an accuracy that is higher than the traditionally used closure models through the incorporation of prior physical knowledge and highfidelity information [11,12,13,14,15]. Similarly, machine learning LES models seek to achieve the same goal of providing a sub-scale grid model that predicts the contribution of neglected turbulent length scales at a higher accuracy than the traditional methods [16,17,18]. These approaches are both very promising, however still rely on pre-existing physical assumptions, approximations and resolutions which fundamentally limit their predictive capability [19]. Another area of interest has been the use of machine learning models to build a controller to yield a particular fluid response [20,21].
The final category we discuss is direct fluid flow prediction where the machine learning model is used to predict the state variables of the fluid flow directly. This includes the use of machine learning to approximate fluid flows for graphical simulations [22,23,24], prediction of steady-state flows [25,26], prediction of oscillating/unsteady flows [21,27,28,29], and the super-resolution, compression or reproduction of various fluid systems [30,31,32,33]. While machine learning has become a popular tool to predict the behavior of fluids, we note that the majority of the test cases considered are focused on simple non-turbulent problems. Many works that predict turbulent flows are largely focused on qualitative results (e.g. computer graphics). This is expected due to the shear complexity of N-S turbulence that poses a challenging problem for even traditional numerical methods let alone machine learning models. Given that the vast majority of fluid flows of interest are turbulent in nature, much work is still needed to push the application of machine learning to practical fluid flow problems of engineering concern.
In this work, we accelerate the prediction of high-fidelity turbulent flows given a computationally inexpensive low-fidelity simulation through generative deep learning. Although similar ideas have been presented in past literature, the proposed model differs in several respects. First, we are interested in the prediction of physical turbulent fluid flow governed by the Navier-Stokes equations differing from the simpler inviscid Euler equations used in computer graphics [22,23,24]. Second, in a similar spirit, we are interested in recovering accurate time-averaged and turbulent statistics as oppose to fluid flows that are just visually pleasing. Third, in this work the input of our model is an inexpensive low-fidelity simulation that provides a coarse yet fairly inaccurate prediction. This contrasts to many works in machine learning for turbulent applications where compressed [30,31] or sub-sampled [32,33] fields of the high-fidelity target are used as the input. Some auto-regressive models, such as the deep neural network (DNN) in [29], are in fact even more dependent on a high-fidelity simulation which is needed to start the time-series prediction. The reliance upon a direct/coarsened high-fidelity field as a model input contains much richer and more accurate information than a low-fidelity simulation since it is being sampled from a space for which the physics simulated is significantly more precise. While this makes the machine learning problem significantly easier, it also results in models that need an expensive high-fidelity simulation to derive an input for making predictions. Thus the applicability of such models remains questionable. Fourth, we are interested in developing a surrogate model that can be used to predict multiple flows with different boundary conditions as opposed to just learning a single flow which is essential for justifying the model's training cost. Lastly, in contrast to past deterministic approaches [30,31,33], our generative model learns the probability distribution of high-fidelity flow fields conditioned on the low-fidelity simulation allowing for predictive probabilistic estimates.
This paper makes the following novel contributions to the integration of deep learning with CFD: (a) A multi-fidelity deep generative model is proposed for the prediction of physical high-fidelity fluid flow from a low-fidelity solution. (b) A novel invertible neural network architecture is proposed to model the distribution of possible high-fidelity fluid flow solutions conditioned on the low-fidelity observation. (c) A backwards Kullback-Leibler (KL) divergence loss is used that allows for physicsconstrained and standard data-driven training of the generative model. (d) The model is deployed and evaluated for surrogate modeling of turbulent flows at different Reynolds numbers and varying boundary conditions. The remainder of this paper is structured as follows. In Section 2, the problem of multi-fidelity generative modeling turbulent fluid flows is introduced and discussed. In Section 3, the generative invertible neural network architecture is introduced with details of each component of the model. Following in Section 4, the variational training of the generative model is outlined as well as the tuning of the model's hyper-parameters. The first numerical example, in Section 5, investigates the surrogate modeling of turbulent flow over a backwards facing step at different Reynolds numbers. The second numerical example, in Section 6, focuses on the prediction of turbulent wake behind an array of bluff bodies in varying locations. In Section 7, the computational cost of both training and testing the proposed deep learning model is discussed. Lastly, conclusions and discussion are provided in Section 8. All code, trained models and data used in this work is open-sourced for full reproducibility. 1 1 Code available upon publication.

Multi-fidelity Generative Modeling Fluid Flows
Multiscale fluid simulation methods seek to strike an ideal balance between predictive accuracy and computational requirement. In particular, zonal/hybrid methods couple a low-fidelity simulation with a high-fidelity simulation that is only evaluated in an area of interest. This is most commonly done through the use of RANS or unsteady RANS in the low-fidelity region and LES in the high-fidelity region. Here, we consider the use of a very large eddy simulation (VLES) simulation (a LES simulation where the majority of the kinetic energy is unresolved due to a coarse grid) and a LES simulation on a finer mesh for the low-and high-fidelity areas, respectively. As depicted in Fig. 1a, this results in two coupled simulations which are solved simultaneously with information being passed through the boundary of the high-fidelity simulation domain. The objective in this work is to replace this high-fidelity simulation zone with a fast generative deep learning model which can quickly predict a high-fidelity realization given the low-fidelity simulation as illustrated in Fig. 1b. We refer to this framework as a multi-fidelity generative model due to the distinct different physical scales resolved by the input and output. While the scope of the numerical examples explored in this work is focused on the use of low-fidelity and high-fidelity LES simulations, everything discussed in this work can be extended to other multi-fidelity models using different coarse/fine simulation schemes. We also note that there is no limit on the size of the prediction area by the deep learning model, i.e. it can be the entire simulation domain if necessary. However, in this work, we are motivated out of engineering needs where such zonal approaches are extremely applicable.
(b) Multi-fidelity deep generative turbulence. Simply learning a single solution of a PDE with a deep learning model has little practical benefit when a numerical solver exists due to the time and computational investment needed to tune and train the model. Thus we are interested in surrogate modeling turbulence in multiple flows with varying boundary conditions (e.g. obstacle position or inlet velocity). This is of particular interest for various engineering tasks including fluid-structure design/optimization, inverse modeling and uncertainty quantification. To formalize the problem of interest consider an incompressible flow governed by the Navier-Stokes (N-S) equations: where {u j , p} are the velocity components and pressure, respectively, being resolved within the spatial domain Ω. ν ef f is the effective kinematic viscosity which can represent the true dynamic viscosity, in the case of DNS, as well as turbulent dissipation from length scales not resolved, in the case of LES and URANS. Γ denotes the boundary of the domain of interest for which the boundary operator B imposes the desired boundary conditions. The initial state of the system is defined by {u 0 , p 0 }. As depicted in Fig. 1b, we wish to build a deep generative model to infer from a low-fidelity flow field the corresponding high-fidelity realizations. Due to their past success for modeling physical systems [35,36,37], we will choose to use a convolution based generative model with learnable parameters θ. The use of convolutions implies that the data is placed onto a structured Euclidean grid, akin to that of pixels in images. For example, given a two-dimensional incompressible fluid flow field, the prediction of a single time-step n would have a low-fidelity input x n = {u l , p l } ∈ R 3,d l1 ,d l2 and a high-fidelity output y n = {u h , p h } ∈ R 3,d h1 ,d h2 both of which span the same domain Ω ∈ Ω as depicted by the dashed boxes in Fig. 1b. Although omitted, this is easily extendable to one-and three-dimensional fluid flows as well. Given that d l1 , d l2 < d h1 , d h1 , this requires the model to predict length scales not recovered by the coarse simulation making this problem ill-posed and motivating the need of a generative probabilistic model to predict the density p(y n |x n ) as opposed to a single deterministic solution.
Remark 1. The inclusion of a low-fidelity simulator as an input to the deep learning surrogate allows for important information regarding the boundary conditions of the flow and approximate flow properties to be provided to the model. This simplifies the learning task significantly by providing a physical coarse estimate of the flow, which is important for the prediction of the solution of the highly non-linear N-S equations at high Reynolds numbers. While we have shown in our past work that deep learning surrogates can successfully model many complex physical systems independently [35,36,37], the systems of interest in these works are far less complex than the turbulent N-S equations.
Given that turbulence is a transient phenomenon, predicting at a single time-step is not sufficient, thus we wish to predict an entire time-series high-fidelity Y = y 1 , y 2 , . . . , y N given the respective low-fidelity observations X = x 1 , x 2 , . . . , x N . Although extensions can be made to simplify the model presented, we will assume that the time-step size, ∆t, of the low-fidelity input and high-fidelity output is the same (i.e. for each input there is one output). The objective for this surrogate is for fluid flow applications for which the boundary conditions are stochastic such that b (x, t) ∼ p(b), where p(b) is an empirical, analytically known or an unknown probability distribution. This spans problems including the modeling of a flow at different Reynolds numbers, different domain boundary conditions, flow through varying geometries or different initial conditions making this relevant to a vast number of fluid mechanics research studies. Given that we wish to predict an entire time-series of high-fidelity realizations, Y , we pose the following definition for the generative multi-fidelity surrogate for flows with a stochastic boundary.
, these simulators are used to collect a training set of low-and high-fidelity simulation data The problem of interest is training a generative surrogate to learn p θ (Y |X) and compute the predictive conditional density p θ (Y * |X * , D) of the high-fidelity flow field Y * for any low-fidelity flow time-series X * for a given boundary condition b * (x, t) ∼ p(b).

Generative Normalizing Flows
Deep generative models provide a flexible probabilistic framework with the most fundamental formulation centered around the use of random latent variables, z, in a deep learning model (i.e. a neural network) to allow for the likelihood of the model's output, y, to be expressed as the following marginal: in which θ denotes the model's parameters. In this work, the model's output, y, is the high-fidelity flow field we wish to predict, however in this particular section y should be interpreted as a much more abstract output encompassing a wide variety of machine learning problems. The latent variables are specifically designed such that their distribution is simple for sampling. However, this marginal is typically not practical to train due to the large number of samples needed from p θ (z) to approximate the marginalization. Hence, generative models such as variational auto-encoders (VAEs) [38] as well as generative adversarial networks (GANs) [39] approximate this likelihood through variational inference or by a min-max adversarial game, respectively. In this work, we will utilize normalizing flows which in recent years have gained increasing attention due to their extension to invertible neural networks (INNs) for tasks such as variational inference and generative modeling [40,41,42,43,44]. Generative normalizing flows provide a bijective mapping between an unknown likelihood density of the observations p θ (y) and a known latent density p θ (z). Typically, p θ (y) can be viewed as the unknown likelihood of a system for which we have a finite number of observations, i.e. training data. Let us consider a mapping with a tractable Jacobian determinant, henceforth referred as the Jacobian, which allows for the likelihood to be expressed w.r.t. the latent density as follows: which is nothing more than the change of variables formula. This implies that the model can be trained by maximizing the likelihood of p θ (y) (unknown) through the latent variables assigned a simple distribution p θ (z) a-priori (typically Gaussian). As depicted in Fig. 2a, we use f θ (·) to denote the learnable function, with a tractable Jacobian, that transforms observation to latent variables. To generate samples of y i ∼ p θ (y), samples are drawn form the latent distribution z i ∼ p θ (z) which are then transformed using the inverse of the model f −1 θ (·). However, the requirement for a tractable Jacobian as well as a function that can efficiently be inverted for sampling is not trivial. Normalizing flows address this challenge by using a series of change of variable transformations [45,46], each of which has a tractable Jacobian and is invertible. This allows for the log of the likelihood to be written as a summation of Jacobians: in which h 0 ≡ y and h K ≡ z. The core ideas of normalizing flows can be extended to constructing generative deep neural network models. A particular subset of flow-based deep learning models we are interested in are coupling layer normalizing flows first proposed in NICE [40] and Real NVP [41]. A coupling layer is a carefully designed function such that the inverse mapping and the Jacobian can be easily calculated. These layers can then be stacked, just like layers of a neural network to form an expressive model with a tractable Jacobian and inverse. To increase the expressive capabilities of normalizing flow models, various transformations have been proposed to envelope coupling layers such as 1 × 1 convolutions in the generative flow (Glow) model [42]. Such flow-based models have the unique advantage of not needing an auto-encoder structure or discriminator, greatly simplifying the hyper-parameter search and increasing robustness against mode collapse. As depicted in Fig. 2b, further extensions can be made to learn conditional likelihoods, p θ (y|x), for standard machine learning problems and surrogate modeling of physical systems [36,47].

Transient Multi-fidelity Glow
As discussed in Section 2, we are interested in the prediction of a high-fidelity flow, Y = y 1 , y 2 , . . . , y N , given the corresponding solution of a low-fidelity simulation, In our past work [37], we formulated a deep convolutional auto-regressive model for modeling the evolution of a transient PDE as a Markov chain. To increase the predictive capability of our model and integrate the low-fidelity observations, in this work we will use a deep recurrent neural network (RNN) formulation which is a standard approach for time-series predictions in deep learning [48]. While our model will still predict a single time-step at a time, latent information is passed between time-steps that the model can learn. The computational graph of this RNN with recurrent features τ n is depicted in Fig. 3. The likelihood for the entire time-series can be decomposed as follows: in which the recurrent features, τ n−1 , carry information from past time-steps x 1:n−1 [48]. This requires some initialization for τ 0 to be defined. In this work, these states are made random with a known distribution as discussed in Section 3.3.1, however there are many alternatives in RNN literature such as making them constant (i.e. delta function density function) or making them learnable parameters. Implementing the RNN framework, our goal is to develop a generative model that can learn the conditional density, p θ (y n |x n , τ n−1 ), in a single high-fidelity time-step. This poses the following three design requirements: a) the core of our model must be generative allowing for probabilistic modeling of the likelihood, b) we must formulate a method for encoding the low-fidelity inputs x into features that can condition the generator and c) recurrent connections need to be integrated into the heart of the generative model to condition it on temporal information. To this end, we present a novel Transient Multi-fidelity Glow (TM-Glow) model for probabilistic surrogate modeling of dynamical systems illustrated in Fig. 4. TM-Glow is built around the Glow model proposed by Kingma et al. [42] which will be the core generative INN for modeling the conditional likelihood. This model is depicted in the right column of Fig. 4a and the blue boxes in Fig. 4b. Glow is designed to provide a multiscale encoding of the high-fidelity fields, y n , into a set of random latent variables, z n , represented by the orange boxes in Fig. 4. To address the second design requirement, we use the convolutional conditional encoder proposed by Zhu et al. [36] which conditions Glow model on the low-fidelity input, x n , through a set of learnable features. This conditional encoder is shown in the left column of Fig. 4a and the pink boxes in Fig. 4b. Lastly, to allow for temporal conditioning of the Glow model, recurrent connections are integrated into novel LSTM affine coupling blocks discussed in Section 3.3.1. These LSTM based operations allow for recurrent features to flow in and out of the generator illustrated by the green boxes in Fig. 4b.  As shown in Fig. 4a, the TM-Glow core component is the multiscale Glow model comprised of squeeze, LSTM Affine Block and split operations discussed in detail in Section 3.3. Superscript numbers enclosed by parenthesis are used to denote variables at different TM-Glow model levels. We emphasize that TM-Glow is an INN, thus this model can evaluate the conditional likelihood exactly through the change of variables: in which {h n k } K k=1 is used to denote the hidden layers of TM-Glow that are the inputs/outputs of the various invertible operations discussed in Section 3.3 and specifically in Table 1. The forward pass of the model, f θ (·), encodes the high-fidelity obser-vation y n into a set of random latent variables z n = z (1),n , z (2),n , . . . , z (k d ),n , z (e),n . The backward/inverse pass of the model, f −1 θ (·), generates a sample of y n by sampling each random latent variable. The novel LSTM Affine block contains recurrent connections between time-steps conditioning the INN on the latent states of previous time-steps τ n−1 = τ (1),n−1 , ..., τ (k d ),n−1 . The dense convolutional encoder, detailed in Section 3.4, encodes a low-fidelity input into conditional feature maps, ξ n = ξ (1),n , ξ (2),n , . . . , ξ (k d ),n that are injected into the multiscale glow at each dimensional level as depicted in Fig. 4b. The use of the recurrent connections in the LSTM block as well as the conditioning encoder results in the following directed graphical representation of the model in Fig. 5.

Multiscale Glow
Our model is centered around a multiscale structure to promote the discovery of lowdimensional representation of the physics that govern the system. As seen in Fig. 4, a multiscale Glow model originally proposed by Dinh et al. [41] is employed to generate a flow field realization. In Fig. 4a, this is the right column of the model comprised of Squeeze, LSTM Affine Block and Split operations each of which is invertible. We remind the reader that the goal of each of these operations is to provide a computationally efficient but descriptive mapping between the high-fidelity flow field and the random latent variables. As previously discussed in Eq. (4), this is achieved through the series of transformations between the hidden layers {h n k } K k=1 . These transformations are precisely the operations discussed in the subsequent sections and listed in Table 1.

LSTM Affine Block
The core component of the generative portion of the TM-Glow model is the LSTM affine block, which is a novel extension of the conditional affine coupling layers [36,47] designed specifically for transient time-series prediction. The LSTM affine block is comprised of three different sub-components illustrated in Fig. 6: an unnormalized conditional affine block, a stack of conditional affine blocks and a conditional LSTM affine block. The core component of all these blocks are affine coupling layers [40], a specially designed function that allows for an efficient inversion and Jacobian calculation. As depicted in Fig. 7a, half of the input, h 2 k−1 , is modified by the scale and translation parameters, s and t, respectively, calculated from a coupling neural network (coupling NN). As implemented in Zhu et al. [36], this coupling NN is a shallow dense convolutional network with an input of the other half original feature map, h 1 k−1 , and the conditional input ξ (i),n , which are simply concatenated together. This coupling NN contains the learnable parameters that can be updated using any gradient decent method. As detailed in Table 1, the retention of input to the coupling NN allows for a simple inversion and Jacobian calculation.
This conditional affine coupling layer is further extended with a convolutional LSTM (convLSTM) depicted in Fig. 7b for transient problems. ConvLSTM is a variation of the traditional LSTM structure that employs convolutional operations [49], making it better suited for convolutional models such as TM-Glow. This input to the ConvLSTM is the same as the input of the coupling NN in the conditional coupling layer, which is conditioned on ξ (i),n . Following the standard ConvLSTM formulation, the recurrent features have two states, in , which correspond to the LSTM hidden and cell state, respectively. The output of the LSTM, out , is passed to the subsequent time-step and a (i) out is used as an input to the coupling NN of the affine layer. The initial states of the hidden and cell states at the first time-step are assigned the following densities: The resulting coupling layer is conditioned on both the current low-fidelity input as well as past time-step states. In the recent work of Kumar et al. [44], residual connections between generative flow models are also proposed which are implemented by simply using the previous latent variables as an input to the shallow neural network in the split operation, as detailed in Section 3.3.3. The proposed use of ConvL-STM affine layer prevents a vanishing gradient and enables much more descriptive recurrent feature maps to be learned.  In the coupling layer blocks ActNorm is used which was originally proposed by Kingma and Dhariwal [42] as an alternative to batch-normalization. ActNorm applies an invertible normalization to each feature channel, detailed in Table 1, that allows for smaller batch-sizes to be used without performance desegregation seen in traditional batch-normalization. The last essential component of the affine coupling blocks is the 1 × 1 convolution also originally proposed in the Glow model [42]. Due to this convolutional operation being 1 × 1, it can be efficiently inverted and has a trivial Jacobian as detailed in Table 1. The purpose of this convolution is to permute the feature maps between coupling layers. Since, the coupling layers used in Fig. 7 only operate on half of the input data, permutation between layers is essential to increase the expressibility of the model. Table 1: Invertible operations used in the generative normalizing flow method of TM-Glow. Being consistent with the notation in [42], we assume the inputs and outputs of each operation are of dimension h k−1 , h k ∈ R c×h×w with c channels and a feature map size of [h × w]. Indexes over the spatial domain of the feature map are denoted by h(x, y) ∈ R c . The coupling neural network and convolutional LSTM are abbreviated as N N and LST M , respectively. Time-step superscripts have been neglect for clarity of presentation.

Operation
Forward Inverse Log Jacobian

Conditional
Affine Layer

Squeeze
As seen in Fig. 7, the affine coupling layer requires two inputs for which only one is modified to allow for efficient inversion. To form these two inputs, a squeeze operation is applied to the feature maps which reduces the dimensions of the feature map by a half and increases the number of channels by a factor of two. In this work, we use the squeeze method originally proposed by Dinh et al. [41] and also implemented in the Glow model [42]. As depicted in Fig. 8a, the image is separated using a checkerboard pattern resulting in four sub-sampled versions. Note that this differs from the conditional Glow model by Zhu et al. [36] which implements a chunk based squeeze where the image is separated by four quadrants. We found that the checkered approach had better performance likely due to the checkerboard squeeze providing a sub-sampled version of the full image rather than a local quadrant to the affine coupling layer.

Split
Unlike standard convolutional operations, the affine coupling layer is volume preserving meaning that the number of output elements must be the same as the input.
Retaining the total dimensionality input through all layers of the model is not ideal for a convolutional model since this increases the computational and memory cost of the model. Thus we use the multiscale architecture proposed by Dinh et al. [41], which is illustrated clearly in Fig. 4b. This multiscale flow model factors out half of the current feature maps at multiple intervals of the architecture which are then treated as random latent variables [42,36]. A single split operation is illustrated in Fig. 8b in which the density of these latent variables is taken to be a fully-factorizable Gaussian with mean and standard deviation governed from the remaining features using a shallow neural network. When the split is executed in the inverse direction, the hyper-parameters are dependent on the features being provided from deeper within the model as seen in Table 1. This dependence on deeper feature therefore conditions the random latent variables on both conditional features representing the coarse simulation input, x n , as well as the recurrent features τ n−1 . As an example to illustrate this point, consider a TM-Glow with a model depth of k d = 3 as illustrated in Figs. 4b. Each of the four random latent variables and high-fidelity output for a single time-step can be described as the following conditional distributions: z (e),n ∼ p θ z (e),n |ξ (3),n , z (3),n ∼ p θ z (3) |z (e) , z (2),n ∼ p θ z (2),n |z (3),n , ξ (3),n , τ (3),n−1 , z (1),n ∼ p θ z (1,n) |z (2),n , ξ (2),n , τ (2),n−1 , y n ∼ p θ y n |z (1),n , ξ (1),n , τ (1),n−1 , which clearly is a hierarchical modeling of the distribution y n ∼ p (y n |x n , τ n−1 ) for which TM-Glow was designed to learn.
As discussed by Dinh et al. [41], this multiscale architecture has multiple intrinsic benefits. The first is that it results in the model learning intermediate representations of the output field, with deeper latent variables representing more global characteristics and shallower ones representing finer details. Additionally this permutes the loss across multiple layers of the network which can improve training and predictive accuracy. With respect to modeling physical systems, such a multiscale architecture is well suited as a vast number of physical phenomena are multiscale in nature. Specifically in fluids, it is well known that turbulence occurs at multiple length and time scales making TM-Glow well suited for fluid flow prediction.

Remark 2.
A particularly interesting attribute of the Glow model is the presence of random latent variables at multiple levels in the generator. This characteristic is absent from traditional VAE or GANs models for which the random latent variables are only present at one level of the model, typically the lowest-dimensional. This unique architecture arises out of necessity, but allows for the generative model to learn probabilistic densities at multiple scales. In the context of physical systems, this could allow the model to learn stochastic phenomena at varying length scales which lends itself nicely to many multiscale systems.

Low-Fidelity Conditioning
To condition the generative model on the low-fidelity fluid field at multiple levels, a densely connected convolutional encoder is used. This convolutional encoder, illustrated on the right side of Fig. 4a, is comprised of encoding and dense blocks following the approach originally taken by Zhu et al. [36]. Examples of the encoding and dense blocks are illustrated in Fig. 9 which have been used successfully for modeling many physical systems in the past [35,36,37,50]. The encoding blocks down-scale the feature maps forcing the model to learn low-dimensional representations while the densely connected blocks increase predictive accuracy of the model and have better performance than standard residual connections [51]. The feature maps are taken from multiple levels of the convolutional encoder, up-scaled and passed to the affine coupling blocks conditioning the generator. These are denoted by ξ (i),n in Fig. 4, passing detailed high-dimensional features towards the beginning of the encoder and global low-dimensional features towards the end. Figure 9: Dense block with a growth rate and length of 2. Residual connections between convolutions progressively stack feature maps resulting in 12 output channels in this schematic. Standard batch-normalization [52] and Rectified Linear Unit (ReLU) activation functions are used [53] in junction with the convolutional operations. Convolutions are denoted by the kernel size k, stride s and padding p.

TM-Glow Training
One of the key benefits of using INNs is the ability to calculate the likelihood of the data exactly with respect to the latent variables. This makes data-driven training straight forward as one can simply pose the optimization as the minimization of the negative of the log likelihood in Eq. (5) [40,41,42,54]. However, since this encodes the output of the model to the latent parameters, this training does not allow physical-constraints to be imposed on the generated samples of the model. In the work of Zhu et al. [36], in which physics-constrained learning is used in the absence of data, the reverse Kullback-Leibler(KL) divergence is used as an optimization objective. The reverse KL divergence poses the optimization though the generated samples of the INN, which allows for physical-constraints to be imposed on the produced realizations.
Due to the complex dynamics of the N-S equations at high-Reynolds numbers, physics-constrained learning turbulent fluid flows through a PDE based loss alone poses a difficult optimization objective. Thus we will use here a semi-supervised extension of the reverse KL-divergence loss that allows both supervision with data as well as additional physics-constrained components. Consider a training set of , then the loss is as follows: in which we have made use of the fact that the KL divergence is additive for independent distributions. p θ (Y |X) is the density of TM-Glow with parameters θ for a single time-step. p β (Y |X) is an energy based density function with a controllable parameter β representing the true high-fidelity targets. Note that the expectation is calculated using the samples of the generative model y ∼ p θ , requiring a backward pass of the INN which is the opposite direction of the standard maximum likelihood approach. Currently this loss is posed across the entire time-series, however we desire it to be expressed in terms of single time-steps to make it computationally tractable with TM-Glow. First, we pose the energy-based density as a product of independent distributions at each individual time-step p β (Y |X) = N n=1 p β (y n |x n ). This is a similar form as the definition of the model's likelihood in Eq. (6), p θ (Y |X) = N n=1 p θ (y n |x n , τ n−1 ). The loss for a time-series of N time-steps can be written as: The first term is an entropy promoting term, log p θ y n d |x n d , τ n−1 d , encouraging diversity in the models samples and avoiding mode collapse. A unique advantage using an INN is that the entropy, H(y n d |x n d , τ n−1 d ) = −E p θ log p θ y n d |x n d , τ n−1 d , can be evaluated exactly though the change of variables in Eq. (3) as oppose to approximating [55,56] or learning it [57]. The second term, the negative log energy density, − log p β (y|x), encourages consistency between the models generated samples and the specified physical-constraints. In this work, we use the Boltzmann distribution to model p β (y n d |x n d ) which is standard in energy based models [58]: in which V P DE (·) is a PDE based potential discussed in further detail in Section 4.1. Z β is a normalizing constant which does not impact the optimization and thus neglected. β is a tunable parameter that corresponds to the inverse temperature in the Boltzmann distribution that controls the strength of the potential in the backward KL loss. The resulting form of the reverse KL divergence follows: In practice, the expectation in the KL divergences is taken as point estimate during training. Due to the large number of times this loss is evaluated during the stochastic optimization of our model, the effects of such point estimates have been empirically shown to be minimal [38].

Physics-Constrained Potential
The potential V P DE represents imposed physical constraints one wishes to impose on the model's samples. Similar to past physics-constrained literature [36,37], we will use the governing equations to aid the formulation of this potential. Within this work, we pose V P DE in terms of three components: which consists of the residual of the Poisson equation for pressure, the divergence free constraint for incompressible flow and two L 1 supervised learning terms. The first is between the predicted state variables of the model, denoted by y n , and the observed high-fidelity solution y n HF . The second is between the root-mean-square (RMS) of the fluctuation states predicted by the model, y , and the observed highfidelity RMS values of y HF . This term can be interpreted as matching the turbulent intensity between the predicted time-series and the high-fidelity observables. The RMS of the states is defined as follows, which is a time-averaged quantity and thus has no time-step index. n s is the number of nodes in the predicted high-fidelity spatial domain. Both residual loss terms are scaled by the cell volume, v c = ∆x · ∆y, to help balance each loss component. While the potential resembles forms of other data and PDE based constrained loss functions [22,33], ours is posed in probabilistic framework for learning the full distribution of solutions opposed to a single deterministic prediction.
The PDE residual terms are evaluated using the model's predictions and constrains the predictions to be physically realizable. To evaluate the gradients we use the same methods successfully used in our past works for various physical systems [36,37]. Efficient finite difference based convolutions to approximate first-order gradients: as well as second-order gradients: These smoothed second-order accurate finite difference approximations are based on image processing filters such as the Sobel filter 2D convolutions [59] which have been found to improve training stability over pure finite-difference calculations. The convolutional filter approach allows for efficient computation of these gradients during training that directly integrates itself into the computational graph for backpropagation. In this work, since we are predicting a sub-domain for which we do not know the complete boundary conditions, we only compute the PDE constraint terms on the deep nodes of the predicted domain ignoring the boundary values. The L 1 term helps stabilize the PDE based losses which can be unstable due to their gradients as well as encouraging turbulence in the predicted fluid flow. Similar L 1 losses are used in GAN models for time-series predictions to increase time-series accuracy and continuity [60,61]. Pseudocode for the training process is outlined in Algorithm 1 for a single training case but easily extends to a full training data-set. The pseudocode for sampling of TM-Glow is also outlined in Algorithm 2, from which statistics are then computed in traditional Monte Carlo fashion.

Hyper-parameter Tuning
TM-Glow contains a large set of hyper-parameters including model depth, the number of affine coupling layers, coupling neural network depth, learning rate, mini-batch size, etc. which are all coupled together making an extensive hyper-parameter search extremely difficult. While automated methods exist to aid this search, we opted to take a simpler approach by empirically finding a reasonable model architecture that Approx. mean flow field for epoch = 1 to M do τ 0 ∼ p (τ 0 ) ; Sample initial recurrent state for n = 1 to N do y n , τ n , log p(y n |x n , fits the desired needs of the problems of interest (e.g. predictive capability, memory consumption of the model, stability, etc.) and do a more extensive search on ones deemed more important. First various model depths are tested by adjusting the number of affine coupling layers in each LSTM affine block, denoted by k c in Fig. 6. Each model is trained on a small data-set (32 flows from the second numerical example in Section 6) to try to keep the computational cost of the hyperparameter search reasonable. To quantify the accuracy of each model, the following time-averaged prediction mean squared errors (MSE) are used for a validation set of n test = 16 flows: in which the expected value of the model's prediction is estimated using 20 model samples. The first error assesses the accuracy of the mean flow magnitude, the second assesses the accuracy of the predicted turbulent kinetic energy (TKE). The test error of the models considered are plotted in Fig. 10. We find that there is a trade off between average velocity and turbulent energy accuracy and larger models begin to over fit on this small training data-set. Based on these results, we select a TM-Glow model with k d = 3 and k c = 16. Essential TM-Glow and training hyper-parameters are outlined in Table 2. The resulting model contains 1.7 million learnable parameters. Back-propagation through time (BPTT) [48] occurs at 10 time-step intervals due to the memory constraints of the GPU used to train the model. Although ideally the loss should be calculated for the entire time-series with a single back-propagation, this results in a large computational graph that requires significant memory on the GPU making it not practical. This idea of back-propagation at regular intervals has been employed successfully in our past work for various dynamic PDEs [37]. An observed side effect of using smaller BPTT intervals is the convergence of the recurrent states for later time-steps resulting in predicted fields converging as well to a similar prediction for all samples. To prevent this the initial recurrent state, τ 0 , is averaged with the current recurrent state after every BPTT. This borrows the idea of using various recurrent features at different timescales in hierarchical RNNs for neural language processing [62,63]. Although not necessary, this algorithmic heuristic helps prevent the information from the initial state being lost which was found to improve the model's accuracy and sample diversity. For additional details, we direct the reader to the source code. The inverse temperature parameter, β, in the energy density controls the balance between the model satisfying the physics-based potential and the model's entropy. Given this parameter close relation to the model's probabilistic nature, reliability diagrams of the model's predictions are used to assess the predictive uncertainty quality. Several models with different β values are trained on the same small data-set of 32 flows from the second numerical example in Section 6 used when calibrating the model depth. For a small validation data-set of 16 flows, for each model we compute the empirical density function for each of the model's output fields over all samples, time-steps and validation cases at each spatial location independently. The values of the predicted density function at several quantiles are then compared to the empirical density function of the high-fidelity data which is then averaged over the spatial domain and plotted in Fig. 11 for each state variable. Interestingly, unlike Zhu et al. [36], the predicted quantiles all match fairly well with the highfidelity data with apparently little sensitivity to β. Based on these results, a range of β = [200, 500] appears to be likely the most reliable from which we opted to use the upper bound of β = 500.
Remark 3. For each numerical example, additional fine tuning is certainly possible to obtain the highest level of accuracy. However, in this work we will not be performing any case specific tuning to demonstrate that decent results can be obtained using TM-Glow for multiple problems of different nature and dimensionality.

Ablation Study
An ablation study is performed to investigate the impact each component has on the model's predictive accuracy. Additionally, the model is also trained using the standard maximum likelihood approach for INNs, by maximizing Eq. (7), to act as the traditional base-line. The same training/validation data-set used for the accuracy and uncertainty calibration studies was also used here. As listed in Table 3, we train several models using variants of the propose backward KL loss and compute the mean squared error of various flow-field quantities across the validation data-set. Again, 20 model samples are used to compute the expected value of each predicted flow quantity from which the error is computed.
First we note that training the model through the traditional maximum likelihood approach generally yields worse results than the backwards KL losses with the exception of some of the time-averaged mean flow quantities. Additionally, the large residual errors for the maximum likelihood training indicated that the instantaneous flow fields are non-physical. Interestingly, the proposed loss does not produce the most accurate mean flow or turbulent statistics. This appears to be due to the inclusion of the Poisson pressure residual loss, which enforces physical coupling of the output fields. Without this PDE loss, the model has more freedom and can achieve greater accuracy of the flow statistics. However, this comes at the cost of having nonphysical instantaneous flow field realizations which is indicated by the increase in the time-average pressure residual. Given that we are interested in predicting physical fluid flow, we believe that inclusion of the Poisson residual is essential even at the sacrifice of the time-average statistics. Table 3: Ablation study of the impact of different parts of the backward KL loss. As a base-line we also train TM-GLow using the standard maximum likelihood estimation (MLE) approach. The mean square error (MSE) of various flow field quantities for various loss formulations are listed. The lowest values for each error are bolded.

Turbulent Flow over a Backwards Step
We first apply the proposed model to surrogate modeling turbulent flow over a backward step at different Reynolds numbers, a classical benchmark problem in computational fluids. As illustrated in Fig. 12, the feature of interest is the flow separation that occurs following the step. Such phenomena can be found in a surprisingly large number of systems including heat exchanges, flow around buildings, combustion engines and aerodynamic elements [65,66]. The Reynolds number of the flow is governed by the inlet velocity u 0 , viscosity ν = 0.0002 and the height of the step h = 1. In this benchmark, the inlet boundary condition is varying in magnitude and thus varying the Reynolds number of the flow. Here we are interested in predicting the recirculation region, marked by the green box in Fig. 12, for different Reynolds number. This region is the typical area of study for this flow due the presence of flow separation, Kevin-Helmholtz instability and turbulent flow with various eddy formations. Figure 12: Flow over a backwards step. The green region indicates the recirculation region TM-Glow will be used to predict. All domain boundaries are no-slip with the exceptions of the uniform inlet and zero gradient outlet. The total outlet simulation length is made to be double that of the prediction range to negate effects of the boundary condition on this zone.
The low-fidelity simulator that will be the input of the model has a mesh characteristic resolution of l c = h/12 and the target high-fidelity field has a resolution of l c = h/32 as shown in Fig. 13. Thus TM-Glow will be provided an input of [24 × 96] and predict a field [64 × 256] both with a time-step size of ∆t = 0.5. The resulting model input for a single time-step is x n = {u n l , p n l } ∈ R 3,24,96 with an output y n = {u n h , p n h } ∈ R 3,64,256 . The full training data set consists of fluid flows evenly distributed between Reynolds number 5000 to 50000 each consisting of 80 time-steps. Simulations were performed using the OpenFOAM finite volume solver using standard Smagorinsky LES sub-grid scale models [67]. During training we augment these time-series by splitting them in half into two time-series of 40 time-steps to artificially create more flow training cases. Training input and output data is normalized to a standard unit Gaussian. Further details on the computational cost of the low-fidelity and high-fidelity simulations along with the training of TM-Glow are discussed in Section 7.  Table 4 along with the error obtained from naively interpolating the low-fidelity solution to the high-fidelity mesh. TM-Glow is able to produce time-average statistics that are far more accurate than the low-fidelity solution as expected.    To illustrate the improvement TM-Glow is able to produce, we plot several timesteps of the velocity magnitude for several time-series samples of the model in Fig. 15 as well as the Q-criterion (also known as the elliptic Okubo-Weiss criterion for 2D flows) [68,69] in Fig. 16. Additional, samples of each state variable for this numerical test case are illustrated in Figs. 17 and 18. TM-Glow clearly generates a fluid flow that is far closer to the high-fidelity solution compared to the low-fidelity simulation both in the magnitude of the fluid velocity but also the predicted vortex structure. The variance of the sampled time-series appears to vary depending on Reynolds number. For example, the test case Re = 27500, which exists in the center of the training data range, has a noticeably larger sample diversity compared to the edge case of Re = 47500. We believe that this is largely due to a lack of multiple flows at the same Reynolds number in the training data-set, thus the inclusion of multiple training flows at the same Reynolds numbers would increase sample diversity. In general, the model's samples produce accurate fluid flow and turbulent statistics as illustrated in Figs. 19 and 20. In Fig. 19, the mean flow profiles are plotted of the state variables along with the predicted uncertainty for two test flows. Following in Fig. 20, the turbulent kinetic energy and Reynolds shear stress profiles are illustrated. TM-Glow is able to make dramatic improvements to the flow statistics for turbulent flows differing in Reynolds numbers by almost an order of magnitude using only 32 fluid simulations to learn.

Turbulent Flow around an Array of Cylinders
While the prediction of a flow at different Reynolds numbers is a practical test case, the reality is that the underlying flow structures have a relatively similar form. Thus for our second numerical example, we wish to stress this model further by investigating the prediction of a flow where the underlying flow structures are varying dramatically between test cases. A classical fluid mechanics benchmark is the flow around a cylinder, however in its traditional form its not up to the complexity we are interested in. Thus for a more challenging problem, we will consider the prediction of turbulent wake behind an array of cylinders with a stochastic location. Flow around multiple bluff bodies is important due to its various applications in engineering including: wind flow around urban structures [70], water flow around bridge pylons [71,72], wake from an array of wind turbines [73,74], modern offshore structures [75], heat transfer applications, etc. Depicted in Fig. 21, in this case study five cylinders are randomly placed within a specified area of a channel with a fixed uniform inlet velocity. The sub-domain we wish predict is the wake region directly behind the cylinder array in which the majority of the turbulence exists. Differing from the previous surrogate model where the Reynolds number was varying, the physical boundary of this flow is changing resulting in very different fluid structures in the predictive sub-domain. The bulk Reynolds number of the flow, set at a constant value Re = 5000, is governed by the inlet velocity u 0 = 1, viscosity ν = 0.0002 and the cylinder diameter d = 1. This numerical example is akin to flow optimization problems for which a structure is optimized to yield desired flow properties. The predicted flow fields for both a low-fidelity and corresponding high-fidelity finite volume simulation are shown in Fig. 22 for two different cylinder arrays to demonstrate the difference in the resolved flow features. Figure 21: Flow around array of bluff bodies. The red region indicates the area for which the bodies can be placed randomly. The green region indicates the wake zone that we will use TM-Glow to predict a high-fidelity response from a low-fidelity simulation. The low-fidelity simulator that will be the input of the model has a mesh characteristic resolution of l c = 5d/16 and the target high-fidelity field has a characteristic resolution of l c = 5d/64 as shown in Fig. 23. The mesh is structured in the wake region allowing for this data to be directly used with our convolutional generative model. Thus TM-Glow will be provided an input of [16 × 16] and predict a field [64 × 64] both with a time-step size of ∆t = 0.5. The model input for this numerical example is x n = {u n l , p n l } ∈ R 3,16,16 with an output y n = {u n h , p n h } ∈ R 3,64,64 . The full training data set consists of fluid flows with cylinders randomly placed in different configurations. Just like the previous numerical example, simulations were performed using the OpenFOAM finite volume solver using standard Smagorinsky LES sub-grid scale models [67]. During training we augment these time-series by splitting them in half into two time-series of 40 time-steps to artificially create more flow training cases. Training input and output data is normalized to a standard unit Gaussian. Additional details on the computational cost of training of TM-Glow for this numerical example can also be found in Section 7.  Fig. 24. The test errors of various flow field quantities are listed in Table 5 along with the error obtained from naively interpolating the low-fidelity solution to the high-fidelity mesh. TM-Glow is able to produce time-average statistics that are far more accurate than the low-fidelity solution as expected. As the training data set increases, we do see improvements in the flow statistics as we would expect. We note though, that even on the smallest data set large improvements over the low-fidelity simulation can still easily be obtained. For the remaining results we will use the model trained on 96 flows to illustrate the highest accuracy model obtained.  Table 5: Cylinder array test error of various time-averaged flow field quantities of the low-fidelity solution interpolated to the high-fidelity mesh and TM-Glow trained on different training data set sizes. Lower is better. TM-Glow errors were averaged over 20 samples from the model. The training time of each data set size is also listed. Similar to the previous numerical example, we plot several time-steps of the velocity magnitude for several time-series samples of the model in Fig. 25. Additional, samples of each state variable for this numerical test case are illustrated in Figs. 26 and 27. Although the low-fidelity simulation differs significantly from the high-fidelity solution, we can see TM-Glow is able to produce fluid realizations that qualitatively appear similar to the high-fidelity. In this particular example, the low-fidelity solution exhibits nearly laminar flow due to the coarse discretization used which TM-Glow is able to correct. Additionally, the fluid flows sampled from TM-Glow appear much more diverse than that seen in the previous numerical example. Profiles of time-averaged flow quantities and turbulent statistics are plotted in Figs. 28 and 29 for two test flows. Indeed we can see that TM-Glow is able to improve both with reasonable uncertainty bounds as well. In general, TM-Glow is able to yield an accurate prediction for the time-averaged flow field. However, the model seems to consistently under-predict the turbulent intensity of the flow field. This could be improved through some ad hoc tuning of the loss by weighting the RMS term more heavily.

Computational Cost Analysis
In the following section, the computational cost associated with the training and prediction of TM-Glow will be discussed. The cost of a surrogate needs to be low enough to justify its use, which includes the training cost for deep learning models. To compare processes ran on different hardware and CPU cores, we adopt the measure of a service unit (SU) hour, which is equal to a single CPU core hour or a single GPU hour. As shown in Table 6, both the low-fidelity and high-fidelity simulations were ran on CPUs while the deep generative model used a single GPU. Differences between CPU models were neglected since computation of TM-Glow is bottle-necked by the GPU. The comparison of CPU consumption versus a GPU is not trivial due to the fundamental hardware differences between the two and an in depth investigation using energy consumption or floating point operations is beyond the intended scope of this paper. Hence, we use this simple definition resembling that used by the Extreme Science and Engineering Discovery Environment (XSEDE). 2 For both numerical examples, OpenFOAM finite volume simulator was used due to its extensive validation and efficiency [67]. Both the low-and high-fidelity simulations used standard LES Smagorinsky sub-grid scale model [76] with default parameters. When the high-fidelity simulations were parallelized between CPUs, Open-FOAM's in house domain decomposition algorithm "scotch" was used to partition the meshes. Additionally, the fluid flows are solved between time t = [0, 80] for both resolutions but only t = [40,80] is used as training/testing data. This is done to ensure the flow fields sampled were of fully developed turbulence.

Turbulent Flow over a Backwards Step
The low-fidelity and high-fidelity simulations for the flow over backwards facing step consisted of a mesh with a resolution of ∆x, ∆y = h/12 and ∆x, ∆y = h/32, respectively. A sub-section of both meshes are plotted in Fig. 13 to illustrate the resolution difference. This results in the low-fidelity and high-fidelity meshes containing 22k and 145k cells, respectively. A single low-fidelity simulation takes about 4.5 minutes on a single CPU core while a high-fidelity simulation takes about 42 minutes on 8 CPU cores. This increase in compute time is due to both the increased number of mesh elements but also a reduction in time-step size to ensure the simulation remains stable. In Fig. 30, the required SUs needed to train TM-Glow for various data set sizes is plotted. It is clear that the majority of the computation is expended on obtaining the high-fidelity training data. In Table 7, the predictive computational cost of TM-Glow is compared to that of a high-fidelity solver. We classify the computational cost of the surrogate's prediction as the cost of the low-fidelity solver as well as the prediction of 20 model samples. Here we can see that both in terms of SU hours and raw wall-clock time, the surrogate is significantly more efficient than the highfidelity solver. The majority of the computational cost of the surrogate is from the low-fidelity simulation with minimal overhead from TM-Glow.

Turbulent Flow around an Array of Cylinders
The low-fidelity and high-fidelity simulations for the flow over a cylinder array consisted of a mesh with a resolution of ∆x, ∆y = 5d/16 and ∆x, ∆y = 5d/64, respectively. A sub-section of the meshes are plotted in Fig. 23 to illustrate the resolution difference. This results in the low-fidelity and high-fidelity meshes containing 9k and 125k cells, respectively. A single low-fidelity simulation takes about 3.1 minutes on a single CPU core while a high-fidelity simulation takes about 32 minutes on 8 CPU cores.
In Fig. 30, the required SUs needed to train TM-Glow for various data set sizes is plotted. Additionally, in Table 8, the predictive computational cost of TM-Glow is compared to that of a high-fidelity solver. Again, we classify the computational cost of the surrogate's prediction as the cost of the low-fidelity solver as well as the prediction of 20 model samples. Similar to the flow over a backwards step, we note that the surrogate is much faster than the high-fidelity simulation. Based on these results, as long as the number of training cases needed to achieve the desired accuracy is reasonable compared to the total number of predictions needed by the surrogate, the use of this model is well justified. This is of course problem dependent, however for many optimization and inverse settings this will be easily satisfied.   Table 6.

Conclusion
The application of machine learning methods to CFD requires significant advances to extend such models to realistic problems. In this work we investigate the prediction of fully-turbulent systems using deep learning. We proposed a multi-fidelity approach for which a computationally inexpensive low-fidelity solver is used as a conditional input to a deep generative model that predicts fluid realizations at high-fidelity resolution and accuracy. The model, Transient Multi-fidelity Glow (TM-Glow), is a conditional invertible neural network that allows for the analytical evaluation of the likelihood though the change of variables formula. TM-Glow is trained using variational backwards KL divergence loss which allows for the seamless combination of data-driven and physics-constraint based learning. This model was demonstrated on two numerical examples to surrogate model turbulent flow at different Reynolds numbers as well as a stochastic boundary. With just the low-fidelity solution, TM-Glow was able to predict diverse samples of turbulent flow time-series that produce accurate mean field/turbulent statistics with error bars for uncertainty quantification.
The multi-fidelity aspect of our model is a key ingredient. The low-fidelity input provides critical information to the generative model such as information regarding boundary conditions, mean flow properties and general flow field structure. While this low-fidelity simulation is typically inaccurate, it is a reliable starting point for the model to extrapolate from. The prediction from low-to high-fidelity is a significantly simpler problem compared to a blind high-fidelity flow prediction allowing for reduced training data-set sizes and training time. For this reason, we believe that deep learning has significant potential in multilevel/multi-fidelity modeling of a vast number of physical systems where it can be used on even very high-dimensional complex phenomena due to a low-fidelity solver aiding the machine learning model. In this spirit, future steps to be investigated include the extension of this model to other multi-fidelity physical systems. Additionally, as the deep learning field evolves, more modern architectures and training techniques could be integrated into the model to increase its predictive capability. Regardless of potential future directions, TM-Glow demonstrated that modern generative deep learning methods can be used effectively for multi-fidelity modeling of complex dynamical systems.