Parameter identification techniques applied to an environmental pollution model

The retrieval of parameters related to an environmental model is explored. We address computational challenges occurring due to a significant numerical difference of up to two orders of magnitude between the two model parameters we aim to retrieve. First, the corresponding optimization problem is poorly scaled, causing minimization algorithms to perform poorly (see Gill et al., practical optimization, AP, 1981,401pp ). This issue is addressed by proper rescaling. Difficulties also arise from the presence of strong nonlinearity and ill-posedness which means that the parameters do not converge to a single deterministic set of values, but rather there exists a range of parameter combinations that produce the same model behavior. We address these computational issues by the addition of a regularization term in the cost function. All these computational approaches are addressed in the framework of variational adjoint data assimilation. The used observational data are derived from numerical simulation results located at only two spatial points. The effect of different initial guess values of parameters on retrieval results is also considered. As indicated by results of numerical experiments, the method presented in this paper achieves a near perfect parameter identification, and overcomes the indefiniteness that may occur in inversion process even in the case of noisy input data.


(Communicated by Kok Lay Teo)
Abstract. The retrieval of parameters related to an environmental model is explored. We address computational challenges occurring due to a significant numerical difference of up to two orders of magnitude between the two model parameters we aim to retrieve. First, the corresponding optimization problem is poorly scaled, causing minimization algorithms to perform poorly (see Gill et al.,practical optimization, AP,1981,401pp). This issue is addressed by proper rescaling. Difficulties also arise from the presence of strong nonlinearity and ill-posedness which means that the parameters do not converge to a single deterministic set of values, but rather there exists a range of parameter combinations that produce the same model behavior. We address these computational issues by the addition of a regularization term in the cost function. All these computational approaches are addressed in the framework of variational adjoint data assimilation. The used observational data are derived from numerical simulation results located at only two spatial points. The effect of different initial guess values of parameters on retrieval results is also considered. As indicated by results of numerical experiments, the method presented in this paper achieves a near perfect parameter identification, and overcomes the indefiniteness that may occur in inversion process even in the case of noisy input data.
1. Introduction. In the atmospheric boundary layer (ABL), air pollution has increasing importance in urban, industrialized regions. The pollutant transport is frequently affected by many dynamic processes such as advection, diffusion etc. due to the role played by turbulence of atmospheric motion. Many studies have been carried out to understand the mechanisms of the pollutant transport and to predict transport processes in the ABL by various means such as field measurements, laboratory experiments, analytical studies, and numerical models. For the purpose of pollutant prediction, numerical modeling is by far the most useful and effective method. Difficulties in estimating model parameters resulting from both laboratory experiments and field observations still prevent numerical modeling of pollutant transport from achieving a high level of predictability. In order to tackle this problem, the authors have used Levenberg-Marquardt algorithm coupled with the generalized eigenfunction expansion method to solve an inverse problem and obtained a statistical agreement with the true value of the model parameters [25]. However there remain some challenges we face in practical application. First, the computation of sensitivities required for performing iterative algorithm can be very costly, rending it computationally impracticable for large-scale problems (as is the case for a typical meteorological model, e.g. 10 7 variables). This is particularly true if a straightforward way of computing the gradient (or sensitivity) is implemented in the following manner: perturbing each control variable in turn and estimating the change in the cost function; Second, it is indeed convenient to apply eigenfunction to expand unknown function (or generalized Fourier series expansion) in a certain direction. However, the eigenfunctions depend heavily on the boundary conditions of the problem. Computing the eigenfunction will become quite a difficult task when the boundary of the region of interest does not have a regular shape, which will affect the application of method itself and further extension. In this case, exploring and selecting a more effective method becomes very important. For such a situation, the variational adjoint method [14,1,16,2] turns out to be the most adequate option.
The variational adjoint approach is commonly used to compute the sensitivity coefficients and the gradient of the cost function. The adjoint method of sensitivity computation is particularly useful and has proven its efficiency in operational meteorology and other branches of geosciences [18,17]. The variational adjoint method can yield the exact gradient of the cost function with respect to the control variables by integrating the adjoint model only once backwards in time. Such a backward integration of the adjoint model is of similar complexity to a single integration of the forward model. Another key advantage of adjoint variational data assimilation is the possibility to minimize the cost function using standard unconstrained minimization algorithms [30].
In this paper, we provide an application of variational adjoint method to the retrieval of two model parameters namely the diffusivity and surface friction velocity. This theoretical exploration will not only present an alternative means of solving the problem, but, more significantly, provide an insight into practical measurements for some important model parameters. According to the requirement of variational adjoint method, we will transform the retrieval of the relevant parameters into an optimal control problem constrained by a partial differential equation (PDE), and the parameters to be estimated can be regarded as additional controls during the procedure of minimizing the cost functional [12,27,24]. Some challenges remain to be addressed. First, this optimization problem is poorly scaled, and minimization algorithms that fail to rescale properly will fail to converge [8]. This issue can be solved by proper rescaling. Second, difficulties arise from the presence of strong nonlinearity and ill-posedness which means that the parameters do not converge to a single deterministic set of values, but rather there exists a range of complementary combinations that produce the same model behavior. We will deal with this lack of identifiability by resorting to inclusion of a regularization term in the cost function.
The main contribution of this paper is to (a) develop a variational adjoint assimilation system using the simulated observational data from only two spatial points can retrieve the expected parameters, i.e., diffusivity and surface friction velocity related to the ABL; (b) numerical experiments reveal that the regularization term included in cost function plays a key role in the process of estimation of parameters when noise is present in measurement data; (c) studying the effects of different initial guess values on the retrieval results. The outcome of this paper will contribute to the reduction of the practical field experimentation carried out for the determination of the relevant model parameters. The remainder of this paper is organized as follows. In section 2 we provide the optimal control system based on the adjoint method. In section 3, numerical simulation experiments are performed, and finally conclusions are provided in section 4.
2.1. Problem description. The forward (or direct) problem used in this paper is the same as the one used in [25], which is based on the following 2-D mathematical model: c| t=0 = 0 (2.1.1d) where c is the time-averaged pollutant concentration, and K 1 and K 2 are the diffusivities along horizontal and vertical directions, respectively, given by and U (z)is the velocity profile that depends on vertical height z like and was assumed to be greater than zero here, while the parameters u and u 0 are given by k is the von Karman's constant with the value of 0.4, z 0 is the surface roughness and L is the Monin-Obukov length, aiming to construct the stability-dependent function H L . When H L is greater than zero, the stratification is stable; while if H L is smaller than zero, the profile behavior is unstable. And (µ * ) 0 is the surface friction velocity.
For the sake of illustrating our method, we select K 1 ,(µ * ) 0 as the retrieved parameters due to their significant numerical difference of up to two orders of magnitude. Assume K 1 ,(µ * ) 0 are given, the evolution of state variable with space and time can be obtained by solving the problem (2.1.1). This pertains to the direct problem. On the contrary, given a set of prior measurements, we are required to determine the parameters K 1 and (µ * ) 0 , which belong to the inverse problem. The present research effort aims to retrieve the parameters K 1 and (µ * ) 0 in the framework of variational adjoint data assimilation.
In order to determine simultaneously an optimal K 1 ,(µ * ) 0 such that a reasonable solution satisfies the observed measurement data, we must first define an objective function: which measures the discrepancy between the solution given by the original model (2.1.1) for the continuous assimilations and the known observation data. The spatial domain is referred to as Here Q represents spatial position(x, z) , and N s stands for the number of spatial observational locations (in present study, N s = 2). While the second term appearing in the RHS of (2.1.7) is referred to as regularization term which can stabilize the minimization process, and γ is a regularization parameter. This kind of regularization term which we add to the objective function J describes a priori constraint one would like the state function to be subject to, instead of including a constraint for the inverse solution itself like the one frequently used, [K 2 1 + (µ * ) 2 0 ]. The main reasons for adopting this regularization procedure stem from the following considerations: (1). The effect of [K 2 1 +(µ * ) 2 0 ] tends to seek for the minimum-norm solution. And when the noisy components in the observation data lead to the changes of solution which will induce larger H 1 0 -seminorm of state function, we attempt to use the H 1 0seminorm constraint of state function, rather than the minimum-norm constraint of inverse solution. It is shown experimentally later that imposing such a-priori constraint will lead to a stable solution within a suitable range of observational noise; (2). From the point of view of physics, the H 1 0 -seminorm is essentially a component part of energy functional derived in variational setting from the equations (2.1.1a-d). Addition of this term to the cost functional is equivalent to performing a bound constrained minimization of energy seminorm. As for the [K 2 1 + (µ * ) 2 0 ] , it is feasible in mathematics to regard it as a constraint, but its physical meaning is unknown. So it is natural to select stable functional in the current study; (3). The second term in the RHS of (2.1.7) can also serve as a forcing term of the adjoint equation (2.2.6) and the relevant smooth information of the state function c(x, z, t) is passed on to the calculated gradient obtained using the adjoint equation.
For simplicity of notation, we denoteK 1 ,(µ * ) 0 by P = [P 1 , P 2 ] . So solving K 1 ,(µ * ) 0 from the problem (2.1.1) and the corresponding observation data will be transformed into the following optimal control problem: Find P opt such that where the control variable P is related to the state variable c(x, z, t) via the problem (2.1.1). The classical way to solve a problem such as (2.1.8) is usually called gradient-type descent method. So the key issue is to find an efficient way to calculate the gradient, with gradient components denoted as ∇ K1 J and ∇ (µ * )0 J of J with respect to the parameters K 1 and (µ * ) 0 . Their calculation can be done by using variational adjoint method [31].

2.2.
Calculation of gradient and variational adjoint method. In this section, we show that the calculation of gradient will be related to the derivation of the adjoint problem. The whole process can be divided into the following steps.
Step one, the sensitivity equation (or tangent linear equation). Let us consider the perturbation of P : P →P = P + αP , where α is a perturbation parameter tending to zero, and theP stands for the direction of perturbation.
and c →c = c + αĉ, whereÛ (z) ,K 2 , andĉ may be defined as follows: c is the solution of the sensitivity equation Step two, the change of the cost function. According to the definition of gradient, we have On the other hand, Step three, the adjoint problem.
Applying the integration by parts, and assuming that the following relations (adjoint problem) holds Then the following equation can be obtained Step four, the gradient of cost function with respect to P = [K 1 , (µ * ) 0 ]. Comparing the RHS of eqs. (2.2.3) and (2.2.7), the gradient of cost function with respect to the unknown parameters is followed: Step five, gradient test. The gradient obtained above by the adjoint method [20] should satisfy numerically the relation below where the ∇J(P ) = [ ∂J ∂P1 ; ∂J ∂P2 ], denoted as g . And the variation of function with perturbation α can be seen in Figure 1.
Having the gradient of the cost function at our disposal, we can use various gradient-based unconstrained minimization algorithms. Compared with the steepest descent method and Newton method, evident advantages of the conjugate gradient method are its convergence rate, and low memory requirements [19,22]. To obtain the optimal parameter P = [K 1 , (µ * ) 0 ] , the main steps of the current nonlinear conjugate gradient (NCG) method include: a). Start with an initial guess P g , if the J less than a given ε (stopping criterion), then stop, otherwise continue the next step; b). Solve the direct problem (2. , we obtain the desired gradient g (k) = ( ∂J ∂P1 , ∂J ∂P2 ) (k) ; d). Determine the conjugate descent direction D (k) = −g (k) + β k−1 D (k−1) by  e). Choose the step size ρ (k) = argminJ(P (k−1) + ρ (k) D k ) by linesearch; f). The k−th iteration formulation can be expressed as P (k) = P (k−1) +ρ (k) D (k) . For a detailed description, please see Figure 2 below. Note that the adjoint model used in the next numerical experiments is derived through the discretize-thendifferentiate approach [9,7], in which one first discretizes the direct model (2.1.1), then differentiates it by hand and obtains the discrete version of the adjoint model. Different from the existing nonlinear conjugate gradient method (ncg)included in the published optimal tool-Poblano V1.0 [4], the present nonlinear conjugate gradient method (NCG) introduces the classical golden section search, and considers the update β + k−1 = max{0, β k−1 } [22], which can be competitive for the current problem in comparison with the algorithm (ncg) , as well as unconstrained minimization algorithms (lbfgs) and (tn) included in Poblano. This can be found from the Table 1, which is derived based on the whole observation data and without taking into account the regularization term temporarily. This is sufficient to outline the differences between the NCG and other methods related to Poblano.  Figure 2. Flow chart of parameter estimation with NCG method 3. Numerical experiments. As mentioned above in Section 2, we are required to solve the direct problem (2.1.1). Since its analytical solution is very difficult to obtain, so we had to employ finite difference numerical discretization. The Alternating Direction Implicit (ADI) method, first introduced by Peaceman and Rachford, is a finite difference method for solving the heat equation or the diffusion equation [23]. From then on, it is used frequently to the numerical solution of parabolic, hyperbolic [15] and elliptic partial differential equations. It can be viewed as an iterative method to solve a higher dimensional problem by solving a series of lower dimensional problems repeatedly. For the current 2D case, the ADI iteration process from n to n+1 can be separated into two parts, the x-axis sweeping at time level n and the z-axis sweeping at time level n + 1 2 . We use uniform grid in both space and time. It is assumed that the discrete functions are defined on an N × M -grids in space, i.e. Ω = [0, A] × [0, H]. The following notation will be used: x j = j∆x, z i = i∆z, t n = n∆t, c n ij = c(x j , z i , t n ). Here ∆x = A N , and ∆z = H M . And the central difference scheme is used for the first-and second-order derivatives of c with respect to x. While the vertical diffusion term ∂ ∂z (K 2 ∂c ∂z ) is approximated specially by the following formula at time level n, When performing the numerical computation, where diffusion terms and convection term are in time implicit and explicit, respectively. Based on the numerical scheme of the forward problem, the discretized version of the adjoint problem can be formed through careful modification of boundary conditions, forced terms and so on, and solved with almost the same routine as that of the forward problem. For the purpose of the numerical experiments, we assume that the true parameters are K 1 = 50m 2 s −1 , (µ * ) 0 = 0.38ms −1 , and in order to ensure the stability of numerical computation, the other related parameters are also set as, respectively, A = 6000m, H = 1120m, (x e , z e ) = (100m, 115m), L = −71m, z 0 = 0.6m, M = N = 40, T = 560s, ∆t = 56s. The Dirac function δ(x − x c ) in source term [29] is approximated by . (3.1) In this paper, we take k = 20 .
In order to retrieve the parameters P , we use simulated measurements in the forthcoming analysis, which can be obtained from the solution of the direct problem (2.1.1) at the two locations Q 1 (3, 3) := Q 1 (3∆x, 3∆z) and Q 2 (3, 4) , respectively. More concretely, given the true value of P as seen above, the corresponding solution c(x, z, t) of the direct problem (2.1.1) is derived from which two time series c obs (Q 1 , t n ) = c(Q 1 , t n ) and c obs (Q 2 , t n ) = c(Q 2 , t n ) can be extracted to serve as observation data set. This can be regarded as a set of observation data without noise, called observation data I. Additionally, if noise is present, then another set of observation data, called observation data II, can be also obtained in the following manner: c(3, I, t n ) obs noise = c(3, I, t n ) + σ * rand(1, n), I = 3, 4 (3.2) where σ is referred to as the noise level. We first present a verification of the convergence of the current algorithm using the observation data II, where σ = 0.02. The initial parameter guesses are set as (1, 1), (10, 1), (30,10), (20,3) and (30, 1), respectively. As seen from Figure 3, the parameter values will tend to evolve towards the ideal initial value (50, 0.38) with minimization iterations, which shows that the algorithm is very robust with respect to variations in the initial parameter guesses, although the initial guesses for the parameters are far from the exact parameters. Next we will illustrate the role of regularization term in improving the accuracy of the retrieved result.
The initial guesses for the unknown parameters are taken as(1, 1), i.e. P guess = (1, 1). In this situation, we consider four groups of experiments. i), using observation data I without regularization term (σ = 0, γ = 0); ii), using observation data I with regularization term (σ = 0, γ = 0); iii), using observation data II without regularization term (σ = 0, γ = 0); iv), using observation data II with regularization term (σ = 0, γ = 0). Table 2 presents the results obtained for the estimated parameters, wherein the descent process of the cost function with respect to the number of minimization iterations also provided, see Figure 4. In addition, the retrieval process of two parameters K 1 and (µ * ) 0 with iterations can be shown in We see that in the case of initial guess value (1, 1) the expected results are obtained by including the regularization term. In particular, when dealing with noisy data [11], employing the regularization technique will help us get the desired result, otherwise the recovery of the parameters may not be as good as it should be, see Figure 6, even when performing a larger number of iterations (not limited to 15 iterations as in the present test). Generally in practical application, the related parameters cannot be known in advance and no information is provided. With this objective in mind, another set of initial guess value of parameters are also taken into account, i.e., (20,20). We see that there is still a significant difference between the guess value and the true parameter (50, 0.38). The retrieval results are presented in Table 2, which illustrates that the retrieval results match well the true value, as for the descent process of the cost function versus the minimization iterations, a similar behavior to the case of initial guess value (1, 1) can be found (not shown here). This indicates that the current method has good stability. The successful model tests demonstrate that parameter estimation with variational adjoint data assimilation technique is feasible due to the addition of the regularization term allowing the identification of poorly known model parameters using the simulated observational data originating only from two spatial locations at different time instants. For the computational complexity of the current numerical algorithm of variational adjoint method, the most costly portion is related to the calculation of objective function (2.1.7) and its gradient (2.2.8-9) in each minimization iteration, while the key point requires one forward model run and additionally only a single integration of the adjoint dynamical model backwards in time to obtain the required gradient. Solving forward model (two sweeps in x and z direction, respectively) is measured by the computational operation count (flops), i.e., O(nM 2 ) , where n stands for the number of time levels, and M is the number of spatial grids in x and z direction, respectively. And we also note in current study that the integration of the adjoint model backwards in time can be equivalent to several forward model simulations and thus remains computationally expensive. This is mainly due to the forcing term in the adjoint model originating from the model-data differences, and therefore one needs to store the whole model trajectory to run the adjoint model. For the objective function, its operation count is approximated by O(n) without taking into account the regularization term, and O(nM 2 ) when the regularization term is included. When it comes to the gradients of the objective function with respect to two parameters K 1 and (µ * ) 0 , their operation counts are then O(nM 2 ), respectively. The computation becomes expensive particularly when spatial mesh resolution increases. This may not be an active factor for the current method. However, with the fast development of the reduced-order techniques [3,26], it is also possible to use this method in the framework of inverse problem of reduced-order model. This issue will be addressed in follow-up research efforts.

4.
Conclusions. Motivated by the work in [25] of parameter estimation, we have developed an alternative approach in the framework of variational adjoint data assimilation. Here when there is no priori information about the uncertain parameters to be relied on, we have successfully combined the variational adjoint data assimilation with a regularization technique to construct a scheme that is capable of recovering near-perfect parameter values, therefore improving our ability to predict future pollutant transport. To date the method has only been developed and tested in the current 2D convection-diffusion model that has two uncertain parameters to be determined, but the results of this study are extremely positive, which may be viewed as a first step towards solving practical optimal control problem  by using more realistic observation data and models in order to assess the practical utility of the method. In addition, particular attention should also be paid to the regularization technique. The combination of the Tikhonov regularization with the discrepancy principle can indeed overcome to some degree the difficulties arising frequently due to the strong nonlinearity and the ill-posedness which result in the parameters failing to converge to a unique value, yielding instead various combinations that result in similar model behavior (i.e. lack of identifiability). In our numerical experiments, when a typical minimization is encountered, which is characterized by a fast decrease in the cost function during the first few tens of iterations, followed by a slow decrease (almost flat) behavior, even if the solution keeps improving during this slow convergence period, practical considerations should prompt us to stop the minimization. As for the Tikhonov regularization, despite its success when equipped with an appropriate regularization parameter, it is yet a challenge how to determine this regularization parameter except when a clear corner of the L shape function is observable for L-curve method. As we all know, this is one of the most active fields of inverse problem research [10,21,13]. The regularization parameter in this paper was determined after a large number of trials and errors [6,28], while some more advanced methods [28,5] are not involved in current study. This topic will constitute the subject of future research.