CONVERGENCE AND QUASI-OPTIMALITY OF L 2 − NORMS BASED AN ADAPTIVE FINITE ELEMENT METHOD FOR NONLINEAR OPTIMAL CONTROL PROBLEMS

. This paper aims at investigating the convergence and quasi-optimality of an adaptive ﬁnite element method for control constrained nonlin- ear elliptic optimal control problems. We derive a posteriori error estimation for both the control, the state and adjoint state variables under controlling by L 2 -norms where bubble function is a wonderful tool to deal with the global lower error bound. Then a contraction is proved before the convergence is proposed. Furthermore, we ﬁnd that if keeping the grids suﬃciently mildly graded, we can prove the optimal convergence and the quasi-optimality for the adaptive ﬁnite element method. In addition, some numerical results are presented to verify our theoretical analysis. 2020 Classiﬁcation. 49J20, 65N30; 65N12.


1.
Introduction. Since the pioneer work in adaptive finite element methods was proposed by Babuska and Rheinboldt [2], adaptive finite element methods have been applied successfully in engineering and scientific computations. The adaptive finite element method is based on the error information obtained by the computer to determine whether the solution is accurate enough. Hence the soul of adaptive finite elements is the a posteriori error estimation.
When Dörfler [15] presented a marking strategy aiming at electing the set of elements for refinement, based on the error indicators which was controlled by the control, the state and adjoint state, adaptive finite element algorithm was put on the stage of academic research. He provided a fineness assumption on the initial grid T h0 which was used to prove the reduction of energy errors while in later investigations, Morin, Nochetto and Siebert [30] removed the assumption. Moreover, they proposed the interior node property in order to obtain the proof for convergence of adaptive finite element methods [31]. More pioneering works on adaptive finite element methods see literatures [5,3,9,13,16,17,22], in which the linear elliptic optimal control problem was mainly been investigated.
Convergence and quasi-optimality are the two key factors of adaptive finite element methods. It was noteworthy that Mekchay and Nochetto [29] extended the convergence result of Morin, Nochetto and Siebert [31] for general second order linear elliptic partial differential equations by introducing a novel concept that was the total error which was the sum of the energy errors adding the oscillations. This provided a valuable empirical basis for future scholars' work on convergence analysis. Meanwhile, Binev, Dahmen and Devore [4] firstly presented the property of optimality. Later, a large number of scholars participate in the study of the property. For example, Carstensen and Hoppe [6] proposed convergence and quasi-optimality which were established for the Raviart Thomas finite element method. Gong and Yan [20] considered the convergence analysis of adaptive finite element method for elliptic optimal control problems with pointwise control constraints.
As to our best knowledge, nonlinear optimal control problems have gradually penetrated into many fields of scientific research and engineering technology. Chen and Lu [11] investigated adaptive fully-discrete finite element methods for semilinear parabolic quadratic boundary optimal control problems. Gaevskaya, Hoppe, Iliash and Kieweg [17] developed an adaptive finite element method for a class of distributed optimal control problems with control constraints, and found requirement h 0 1 on the initial grid T h0 is not restrictive for the convergence analysis of adaptive finite element for nonlinear problems. Chen, Gong, He and Zhou [10] studied an adaptive finite element method for a class of a nonlinear eigenvalue problems that may be of nonconvex energy functional and prove the convergence of adaptive finite element approximations.
Leng and Chen [23] proved the convergence and the quasi-optimality of an adaptive element method with integral control constraints while we extend the result of [23] to a nonlinear optimal control problem with integral control constraint on L 2 −norms in this paper. We follow the idea of [27] to derive reliable and efficient posteriori error estimations and the idea of [19] to prove the posteriori upper and the global lower bound of the errors, in which the bubble function matters. Moreover, a contraction for an adaptive finite element method is obtained based on a mild assumption on initial mesh T h0 which can be seen in [10,14,20,21,23,24,25,29,31]. Furthermore, we propose the optimal convergence rate. However, the quasi-optimality is the best obstacle mission for us to prove. Therefore, we continue to use the idea of [14,21,23]. Finally, we provide some numerical experiments to verify our theoretical analysis.
The rest of our paper is arranged as follows. In Section 2, we give what the optimal control problems we want to investigate and some basic notations must be used. Then the a posteriori error estimation is obtained and an adaptive algorithm is proposed in Section 3. In Section 4, we use quasi-orthogonality and discrete local upper bound to prove the convergence of the adaptive finite element method and so is the quasi-optimality for details in Section 5. In the end, some numerical simulations is given to verify our theoretical analysis.

2.
Nonlinear optimal control problem. In this section we first introduce some basic notations, and then we show what the nonlinear optimal control problem we discussed about.
T h is a regular triangulation of Ω such thatΩ = ∪ T ∈T hT . T is an element of T h . Let T h0 be the initial partition ofΩ into disjoint triangles. By newest-vertex bisections for T h0 , we can obtain a class T of conforming partitions. For T h ,T h ∈ T, we use T h ⊂T h to indicate thatT h is a refinement of T h and h T = |T | 1/2 . According to [14], the continuous piecewise linear mesh function is defined by h T h . Moreover, h T h (z) is the average of the h T over all T ∈ T h for any vertex z of T h with z ∈ T . Then we have the following properties via keeping the meshs level low enough [21].
Lemma 2.1. [21] For some constants c and C and fixed constant µ, there holds where all grids satisfied above are denoted by T µ .
Then the nonlinear optimal control problem can be restated as follows min u∈U ad It is well known [26,27] that the nonlinear optimal control problem has at least one solution (y, u), and that if a pair (y, u) is the solution of the optimal control problem, then there is a co-state p ∈ V such that the triplet (y, p, u) satisfies the following optimality conditions: a(q, p) Since the coercivity of a(·, ·), we define a solution operator S : L 2 (Ω) → H 1 0 (Ω) of (4) such that S(f + u) = y and let S * be the adjoint of S such that S * (y − y d ) = p. Suppose V h is the continuous piecewise linear finite element space with respect to the partition T h ∈ T. For T h ∈ T, we define U h as the piecewise constant finite element space with respect to . Then we derive the standard finite element discretization for the nonlinear optimal control problem as follows: Similarly the nonlinear optimal control problem (10)-(11) has at least one solution (y h , u h ), and that if a pair (y h , u h ) is the solution of (10)- (11), then there is a co-state p h ∈ V h such that the triplet (y h , p h , u h ) satisfies the following optimality conditions: Based on [12,14,21,27], we have the following Lemmas in order to derive a L 2 −norms posteriori error estimation for both the control, the state and adjoint state variables.
Lemma 2.4. [14,27] For all v ∈ H 1 (Ω), T h ∈ T and T ∈ T , we have 3. A posteriori error estimation. In this section, we will recall a residual-based a posteriori error estimation for nonlinear elliptic equations. For the model problem that we studied in Section 2, a reliable and efficient a posteriori estimation will be obtained. In the end of this section, an adaptive finite element algorithm will be introduced.
Here we define some error indicators. η(·) are error indicators and osc(·) represent the data oscillations. For T h ∈ T, T ∈ T h , we define ad , y h , p h ∈ V h , (∇y h ) · n denotes the jump of ∇y h , and n denotes the outward normal oriented to ∂T \∂Ω, and where f T is L 2 -projection of f onto piecewise constant space on T and f T = T f |T | . For ω ⊂ T h , we have Similarly, we have η 2 2,T h (u h , y h , ω), η 2 3,T h (y h , p h , ω) and osc 2 T h (y h − y d , ω). Lemma 3.1. Let T h ∈ T µ under the conditions of Lemma 2.2, we have for sufficiently small µ.
Proof. Suppose that y h , p h ∈ V are intermediate variables satisfying the equations as follows Employing the Galerkin orthogonality, the approximation properties, Lemma 2.2, and ||∇h T || ∞ ≤ µ, there exist similar results, which resemble to Lemma 3.1 in [21], for nonlinear elliptic optimal control problems with sufficiently small µ that It has been proved |||y −y h ||| ≤ C||y −y h || 1 ≤ C||u−u h || 0 in the Theorem 3.1 of [28] for nonlinear elliptic optimal control problems. Then associated with the Lemma 4.4 in [7], we deduce similar conclusions for nonlinear optimal control problems that By using the triangle inequality, we have In connection with what we discussed above, the triangle inequality and Lemma 2.1, it is easy to prove the prevenient results in Lemma 3.1.
Now we are in the position to derive a posteriori error estimation for both the control, the state and adjoint state variables.
be the solution of (7)- (9) and be the solution of (12)- (14). Then we have a posteriori error upper bound and apparently a global lower bound where c and C only depend on the shape regularity of T h .
Proof. In view of Lemma 7.3.1 in [27], we similarly derive that where p h is the solution of (3.2). Then we shall to deal with ||p h − p h || 2 0 . Let ξ = p h − p h , and ξ I = π h ξ where π h is the standard Lagrange interpolation operator of ξ, then it follows from Lemma 2.1, Lemma 2.3 and (8) that , the embedding ||v|| 0,∞ ≤ C||v|| 2 and ||p h || 0 ≤ C have been adopted and δ is positive.
Similarly there is going to be proved by lettingξ = y h − y h that where φ(·) ∈ W 2,∞ (Ω) have been applied and hence we have It is easy to derive the expect upper bound by combining with (20)- (22). Next we are going to deduce the global lower error bound through the standard bubble function [1,19]. Similar to Lemma 3.7 in [19], it can be similarly proved that there exists polynomial w T ∈ H 2 0 (T ) such that and apparently Then it follows from (23) and (24) that Combining with (2.8) and Lemma 2.4, there holds where φ(·) ∈ W 2,∞ (Ω) have been used, the embedding ||v|| 0,∞,T ≤ C||v|| 2,T and the property ||p h || ≤ C have been adopted. Similarly, we have Hence by using the Cauchy inequality with the help of (25)- (27), we obtain Then it brings about Then we need to use the new bubble functions defined in [19] to both in deal with the jump. Similar to [18,19], it can be similarly proved that there exists polynomial and apparently And then it follows from (29) and (30) that Similarly, it can be deduced that (13) and Lemma 2.4, there holds Hence by using the Cauchy inequality with the help of (29)-(30), we have where φ(·) ∈ W 2,∞ (Ω) and w ∂T ∈ H 2 0 (Ω) have been used. In connection with (28) and (32), it is easy to get that . It can also be deduced that . Above-mentioned results tell the proof of Theorem 3.2 is accomplished. Theorem 3.2 gives a reliable and efficient posteriori error estimations for the sum of the L 2 −norms errors for the control, the state and the co-state variables. Then we introduce an adaptive finite element algorithm to explain what we mainly investigate in this paper.
Algorithm 3.1. Adaptive finite element algorithm for nonlinear optimal control problems: (o) Given an initial mesh T h0 and construct finite element space U h0 ad and V h0 . Select marking parameter 0 < θ ≤ 1 and set k := 0.
(1) Solve the discrete nonlinear optimal control problem (12)- (14), then obtain approximate solution (u h k , y h k , p h k ) with respect to T h k .
(2) Compute the local error estimator and generally additional elements are refined in the process in order to ensure that T h k+1 is conforming.
4. Convergence analysis. In this section, we will do our best to demonstrate the convergence while we first give some properties which take vital significance to the proof of the convergence and even the quasi-optimality for the error indicators and the data oscillations before we begin to show the convergence analysis.
be the solution of (7)- (9) and (12)- (14). Then we have a posteriori upper bound , where C only depends on the shape regularity of T h .
Proof. By applying (18), the triangle inequality, and Lemma 2.1, we obtain Similar to the proof of Lemma 3.3 in [21], we deduce that Analogously, the following conclusions can be drawn and apparently It is obvious to get the expected result in Lemma 4.1 via using (33)-(36), and the upper bound in Theorem 3.2.
Next, we gives a stability result for error indicators which is can be found in Lemma 3.4 in [21], Lemma 4.1 in [23], and even Proposition 3.3 in [8], and so on.
where ω T denotes the patch of elements that share an edge with T .
Proof. We first prove (37) while (40) can be just proved similarly. Consulting the literatures [1,21,25], namely the trace inequality, there exists T ∈ T h , T h ∈ T such that for arbitrary v ∈ H 1 (Ω). In connection with the inverse estimates and (41), we have Recalling (1) in Lemma 1, we know that Recalling the definition of η 1,T h (p h , T ), we employ the triangle inequality to calculate for T ∈ T h k that Then it is easy to derive the desired result (37) by adopting (42)-(44). Next we are to prove (38) while (39) can be proved similarly. We calculate while applying the inequality (15) in Lemma 2.4 for the edge T ∩ T to obtain that Recalling the definition η 2,T h (u h , y h , T ), Lemma 2.3 and Lemma 3.1, we adopt the triangle inequality to calculate for T ∈ T h k that Then it is easy to deduce the expected result (38) by connecting (45) into (46).

Lemma 4.3.
For T h ∈ T, let M h ⊂ T h be the set of marked elements and let T h ∈ T be the refinement of T h so that we have Proof. We just prove (49) and (50). The proofs of (47) and (48) are similar with (49). Employing the Young's inequality with parameter σ and (39), we obtain For T ∈ T h \M h , we deduce that which can use to derive that Then adding (52) into (51) and rearranging the terms can obtain the expected result (49). Next, for arbitrary T ∈ T h ∩T h by using (50) in Lemma 4.2, we have where the Young's inequality have been applied and osc T h (y − y d , T ) = oscT h (y − y d , T ). Then summing over T ∈ T h ∩T h for (53), we can easy to derive the desired result (50).
In order to facilitate computation, we introduce the following new notation for ω ∈ T h k . As to the proof of the convergence, one of the main obstacle is that there do not have the orthogonality while it is vital of the proof for the convergence. Thus getting back to the second place we transfer proof of the quasi-orthogonality. The latter is popularly adopted in the adaptive mixed and the nonconforming adaptive finite element methods [24]. Apparently it is true for the following basic relation- so that we obtain the quasi-orthogonality in Lemma 4.4 by estimating the last term of (54).

Lemma 4.4.
For any > 0, T h k , T h k+1 ∈ T, there holds where h 0 = max Proof. We just prove (55) while (56) can be proved in a similar way. Let y h k+1 satisfying (16) with f + u h k+1 , then we have Similar to the proof of Lemma 4.3 in [23], we can estimate the second term of the right side of (57) as shown below Next we will subdivide the proof. By applying (2), (18), and the triangle inequality, we have in which we use the same way to derive that It follows from the triangle inequality that and apparently Then in connection with what we discuss above to deduce that For the first term of the right side of (57), we obtain Contacting (57), (58) and (59) to gain Then we can obtain (55) by combing with (54), (60), and (61).
Theorem 4.6. Let (u, y, p) ∈ U ad × H 1 0 (Ω) × H 1 0 (Ω) be the solution of (7)-(9) and (u h , y h , p h ) ∈ U h ad × V h × V h be the solution of (12)-(14) generated by the adaptive finite element algorithm 3.1 with the other conditions being same with Theorem 4.5, then there holds Proof. It follows from Lemma 4.1 and Theorem 4.5 that we can obviously get Then combining with Lemma 3.1, it is distinct to get the desired result in Theorem 4.6.

5.
Quasi-optimality analysis. In this section, we consider the quasi-optimality for the adaptive finite element method. Firstly we give the notations interpretation. For any T h ,T h ∈ T, let #T h be the number of elements in T h , and T h ⊕T h be the smallest common conforming refinement of T h andT h satisfying [15,30,31] the property According to [14,21,23,24], we need to define a function approximation class and We need a local upper bound for the distance between two nested solutions consulting [8] in order to illustrate the quasi-optimality of an adaptive finite element method due to the errors here can only be estimated by using refined element indices without buffer.
Lemma 5.1. Let (u, y, p) ∈ U ad × H 1 0 (Ω) × H 1 0 (Ω) be the solution of (7)- (9). Given sufficiently small µ, let T h ∈ T µ and T h ⊂T h ∈ T, (u h , y h , p h ) ∈ U h ad × V h × V h and (ũ h ,ỹ h ,p h ) ∈ Uh ad × Vh × Vh be the solution of (12)-(14) on T h andT h . Then there holds where R h := R T h →T h is the subset of elements that are refined from T h toT h .
Proof. According to the optimal condition (14), we obtain thus getting Combining (72)-(76) and Lemma 2.2 with H 2 -regularity, we derive that Next we are going to estimate the second term on the right side of (70). Assume that Then we set v h = π T hũ h for which π T h is L 2 −projection onto P 0 (T h ), thus obtaining for arbitrary T ∈ T h and where I T h is the identical L 2 −projection onto P 0 (T h ).
In connection with (70), (77) and (78), we infer that Employing (73)-(76) and the triangle inequality, we deduce that It is similar to Lemma 2 in [14] that we infer that To sum up, the proof is finished by adopting (79)-(81).
Next lemma tells the Dörfler property on the set R h = R T h k →T h in order to bound the number of marked elements.
Lemma 5.2. We assume that the marking parameter θ ∈ (0, θ * ) with θ * = C where Then there holds Combining with (82) and the upper bound in Theorem 4.5, we can obtain Employing the triangle inequality and the Young's inequality, here holds . Thus obtaining the result with Lemma 5.1. By applying the dominance property which is similar to Remark 2.1 in [8], we infer that for T ∈ R h and apparently for T ∈ T h \R h . For T ∈ T h ∩T h , we can get the following result by employing (50) of Lemma 4.3 and the inverse estimates Then it can be derived via using (83), (84) and (86) that which tells the proof.
Lemma 5.3. Assume that the marking parameter θ satisfies the the conditions in Lemma 5.2. Let (u, y, p) ∈ U ad × H 1 0 (Ω) × H 1 0 (Ω) be the solution of (7)-(9) and (u h k , y h k , p h k ) ∈ U h k ad ×V h k ×V h k be the solution of (12)-(14) generated by Algorithm 3.1. Then the number of marked elements M h k ⊂ T h k satisfies if (u, y, p, y d ) ∈ A s for µ being small enough.
Proof. Let ν 2 = δβ 1 2s (e 2 , where δ is defined in Lemma 5.2 and β is to be defined as follows. Then there exists a T hν ∈ T and (u hν , y hν , for any (u, y, p, y d ) ∈ A s . Next we suppose T h = T hν ⊕ T h k is the smallest common refinement of T hν and T h k , and let (u h , y h , p h ) be the solution of (12)- (14). Then we give the inequalities as follows in order to deduce the expect result we wanted where there are some notations been defined by and apparently e 2 T h and osc 2 T h (T h ) can be defined similarly. Just as obviously, here holds for all v ∈ U ad , v hν ∈ U hν ad , and v h ∈ U h ad . Applying the Young's inequality and (91), we have and apparently in the same way Hence combining (91)-(94) to get For all T ∈ T hν , assuming that T h T := {T ∈ T h : T ∈ T }, then we derive that We consider the nonlinear optimal control problem subject to the state equation  In terms of the same error and actuarial accuracy, the adaptive refinement process saves time than the uniform refinement process. In Figures 1-2, we provide the profiles of the exact state variables, the numerical state variables, the exact co-state and the co-state on adaptively refined grids with θ = 0.3 and 15 adaptive loops for Example 1 generated by Algorithm 3.1 and then we plot the adaptive girds after 5 steps and 13 steps with θ = 0.3 and 15 adaptive iterations for Example 1 in Figure  3. It is easy to observe that lager gradients exist in some certain regions but the adaptive girds after 5 steps with θ=0. 3 adaptive girds after 13 steps with θ=0.3  solutions are smooth. Moreover, the grid refinement at the center of the domain, while the solutions may have lager gradients near the boundary. In Figure 4, we plot the profiles of the numerical state and the co-state variables on uniformly refined grids (θ = 1) and 15 adaptive loops for Example 1 generated by Algorithm 3.1, and then the adaptively refined triangulations after 5 adaptive iterations with θ = 0.4 and uniformly refined triangulations after 5 uniform iterations of 15 loops are performed in Figure 5. Obviously, it is not difficult to find that the uniformly refined grids seem to have a better encryption effect, but in connection with Figures 1-2 and Figure 4 to observe that the adaptive finite element method may deliver even much smaller errors compared to uniformly refined method. Combining Figure 3 and Figure 5, the adaptive iteration is more efficient and effective than the uniform iteration whereas the uniform iteration needs to be solved at a higher cost of grids, thus increasing iteration time. So the adaptive finite element method has some advantages in the numerical approximation process.
In Figure 6, we show the convergence history of the total error estimate indicators, where we plot the adaptive triangle iterations of 15 adaptive loops with the coefficient θ = 0.3 and θ = 0.4, even more there provides a convergence of the total error estimation indicators for the uniform triangle iterations (θ = 1). We can see an error reduction with slope -1 that is the optimal convergence rate what we expect adaptive girds after 5 steps with θ=0. 4 uniform girds after 5 steps Figure 5. The adaptive girds after 5 steps with θ = 0.3 and the uniform refinement (θ = 1) after 5 steps for Example 1 generated by Algorithm 3.1. via applying linear finite elements from the upper two pictures in Figure 6. Meanwhile, we give the comparisons of the error estimations, for which we know that the optimal second order convergence for the error reductions of error estimations.
Example 2. We consider the same nonlinear optimal control problem as Example 1 with α = 0.1, Ω = (0, 1) × (0, 1), and apparently the exact solution  In Figure 7, we plot the numerical state and the co-state on adaptively refined grids with θ = 0.3 and 27 adaptive loops for Example 2 generated by Algorithm 3.1. Obviously, we can observe the lager gradients concentrate on the certain regions as the adaptive finite element method may deliver much smaller errors compared with the uniform refinement. In Figure 8, we present the adaptive grids after 15 and 25 adaptively refined by choosing Dörfler's marking parameter θ = 0.3 with 27 adaptive loops. Apparently, the grids gather around the regions where there exist much lager gradients. Therefor they just validate the phenomenon in Figure 7. In addition, the grides focus on the points as f and y d have singularities near these points. Hence when dealing with singular points, adaptive encryption has better effect.
We give the comparisons of convergence history of Example 2 in Figure 9. The left plot in Figure 9 is adaptively refined with Dörfler's marking parameter θ = 0.3 and 27 adaptive loops while the right one is uniform refinement (θ = 1). With the optimal L 2 −norms convergence we desired, we can see the errors reduction for adaptive refinement. Moreover, the reduced orders only can be found in uniform refinement because of the singularity of the solutions.