On a Parabolic-Hyperbolic Filter for Multicolor Image Noise Reduction

We propose a novel PDE-based anisotropic filter for noise reduction in multicolor images. It is a generalization of Nitzberg&Shiota's (1992) model being a hyperbolic relaxation of the well-known parabolic Perona&Malik's filter (1990). First, we consider a `spatial' molifier-type regularization of our PDE system and exploit the maximal $L^{2}$-regularity theory for non-autonomous forms to prove a well-posedness result both in weak and strong settings. Again, using the maximal $L^{2}$-regularity theory and Schauder's fixed point theorem, respective solutions for the original quasilinear problem are obtained and the uniqueness of solutions with a bounded gradient is proved. Finally, the long-time behavior of our model is studied.


Introduction
Image processing (also referred to as digital image processing) is one of central tasks in the image science. It includes, but is not limited to denoising, deblurring, decomposition or segmentation of images with appropriate edges (cf. [13, p. 5]). Since the presence of noise is unavoidable due to the image formation process, recording and/or transmission (cf. [32, p. 259]), in practice, a noise reduction technique must be applied before any further processing steps can reasonably be performed.
One of the earlier systematic theories dates back to Marr and Hildreth [25] (cf. [16, p. 182]) and incorporates a low-pass filtering as a noise reduction tool. For a detailed overview of the historical literature, we refer the reader to the comprehensive article by Alvarez et al. [1]. After a decade of gradual improvements and developments by Canny [14], Witkin [38] and many other authors, the field has been revolutionized by Perona & Malik [29] in early 90s, when they proposed their famour anisotropic Perona & Malik image filter. Their development marked a new era in image processing -the era of (timedependent) partial differential equations (PDEs).
In the analytic sense, it can be observed the Equation (1) is ill-posed due to its connection with the reverse heat equation (cf. [37, pp. 15-19]). Surprisingly, numerical discretizations have been observed to be stable ( [4, pp. 20-21], [21]), though undesirable staircaising effects still occur sometimes (cf. [24, p. 176]). Some of the numerical studies have though been critically perceived by other authors (viz. [16, p. 185]). Another major drawback of Perona & Malik's filter is that it can break down if being applied to images contaminated with an irregular noise such as the white noise. Indeed, in such situations, ∇u becomes unbounded almost everywhere in G and the diffusion collapses (see [16, p. 183]). Unfortunately, despite of numerous numerical results, no rigorous analytic theories are known in the literature for Equation (1). As an alternative for Equation (1), Catté et al. [16] proposed to consider a spaceconvolution regularization called the 'selecting smoothing' given by ∂ t u = div g(∇ σ u)∇u in (0, ∞) × G, g(∇ σ u) · n = 0 on (0, ∞) × Γ, u(0, ·) =ũ 0 in Ω, (2) where ∇ σ u (σ > 0) denotes the gradient operator applied to the convolution of u with a multiple of Gaussian pdf in space (see Section 2.1 below). Heuristically, the convolution is meant to play the role of a low-pass filter which iteratively smoothes the image at the scale of t 1/2 before recomputing the diffusivity matrix. Under a C ∞ -smoothness assumption on the scalar function g, forũ 0 ∈ L 2 (G), Catté et al. [16] proved an existence and uniqueness theorem for Equation (2) in the class of weak solutions H 1 0, T ; L 2 (G) ∩L 2 0, T ; H 1 (G) together with a C ∞ -regularity of solutions in (0, T )×G for any T > 0. They also developed a finite-difference numerical scheme and presented some illustrations of its performance. As later discovered by Amann [3, p. 1030], Equation (2) results in smoothing of sharp edges and, therefore, produces unwanted blurring effects. A more detailed discussion and some numerical illustrations can be found in [4,Sections 1 and 2].
To overcome the smearing effect of Equation (2), Amann [3,4] studied a memory-type regularization of Perona & Malik's equation (1) given by where θ ∈ L s loc (0, ∞; R + ) for some s > 1. For C 2 -domains G (which rules out rectangular images), 1 < p, q < 1 such that 2 p + d q < 1 and g ∈ C 2− (R d , (0, ∞) with C 2− denoting the space of functions with locally bounded difference quotients up to order 2, the initial data u 0 ∈ H 2,q Neu := u ∈ H 2,q (G) ∂u ∂n = 0 on Γ were shown to admit a unique strong solution u ∈ H 1,p 0, T ; L q (G) ∩ L p 0, T ; H 2,q Neu (G) with a maximal existence time T * > 0, which is even infinite if θ has a compact support in (0, ∞). Moreover, it has been proved the solution continuously depends on the data in the respective topologies and a maximum principle for u has been shown, etc. The proof is based on the maximal L p -regularity theory. A generalization of Equation (3) has also been studied and the results of numerical experiments have been presented.
An abstract linear version of Equation (3) was studied by Prüss in [30] and Zacher in [39]. We refer the reader to the fundamental monograph [31] by Prüss for further details on this and similar problems.
Cottet & El Ayyadi [17] studied the initial-boundary value problem together with the periodic boundary conditions for u for the case d = 2. Here, F is a function mapping R d into the space of positive semidefinite (d×d)-matrices, σ > 0 and L 0 is uniformly positive definite. Equation (4) has first been proposed in a similar form and without any mathematial justification by Nitzberg & Shiota in [27]. For the initial data [17] showed the existence of a unique solution for any T > 0, which, moreover, continuosly depends on the data in a certain topology. The proof is based on a convolution-like time discretization and a priori estimates.
where g, F are scalar C 1 -functions such that g is positive non-increasing and F is bounded together with its first derivative. The main disadvantage of Equation (5) over Equation (4) is that the former is not genuinely anisotropic in sense of [ with v 0 ≥ 0, the sequence of numerical solutions was shown to subconverge to a 'weak' as the lattice size goes to 0, whence an existence theorem for Equation (5) follows. As pointed out by Amann [4, p. 20], their proof is only valid in 2D. For a Hölder-space treatment of Equation (5), we refer the reader to Belahmidi's PhD thesis [9,Chapter 4], for which the author assumes, in particular, and Ω ∈ C 2,α for α > 0, thus ruling out both rectangular images and rough noise patterns.
Equation (5) shares a certain degree of similarity with the equations of compressible and incompressible fluids. Recently, Hieber & Murata [22] studied a fluid-rigid interaction problem for a compressible fluid and used the maximal L p -regularity theory to prove the local well-posedness. For an overview on the recent developments in the theory of parabolic systems we refer the reader to the same paper [22].
For the sake of completeness, one should also mention the vast literature studying various image filters incorporating the total variation functional as originally proposed by Rudin et al. [32], which turned out to perform particularly well in practice. Omitting the time-independet case (cf. Remark 1 below), the total variation counterpart of Perona & Malik's Equation (1) is given by together with appropriate boundary conditions, where ∇u |∇u| formally denotes the gradient/subdifferential of the total variation functional evaluated at u. Without being exhaustive, we refer the reader to the well-posedness and long-time behavior studies [5,6,12] and the references therein.
In the present paper, we revisit Equation (4). In Section 2, we derive a multicolor, genuinely anisotropic generalization of Equation (4) and discuss the choice of parameters for our new image filter. In Section 3, we present a well-posedness theory for the multicolor version of Equation (4) for σ > 0. In contrast to [17], we obtain a more regular solution under a weaker data regularity assumption. In Section 4, we consider the limiting case σ = 0. First, we prove the existence of mild and/or strong solutions using the classical variational theory for parabolic equations. Again, our approach requires less regularity than in the earlier work [10] and is valid in any space dimension. Under an additional assumption, we further prove the solutions are unique and continuously depend on the data. Next, we study the long-time behavior of our model and prove the exponential stability under a uniform positive definiteness condition on the diffusivity function. In the appendix Section A, we briefly summarize the classical maximal L 2 -regularity for non-autonomous forms as well as its recent improvements.

Filter Description
In this section, we present a multicolor generalization of the monochromic PDE image filter proposed by Nitzberg & Shiota [27] and further developed by Cottet & El Ayyadi [17]. Our filter is more comprehensive than the monochromic one since it takes into account possible local correlactions between the color components. Besides, we provide some geometric intuition and a connection to diffusion processes to justify the logic of our filter.

PDE Based Image Filtering
Let G be a bounded domain of R d (d ∈ N) with n : ∂G → R d standing for the unit outer normal vector. Typically, d = 2 and G = (0, L 1 ) × (0, L 2 ). Let u 0 : G → R k with u 0 = (u 0 1 , . . . , u 0 k ) T denote initial color intensity of the image at point x = (x 1 , x 2 , . . . , x d ) T ∈ G measured with respect to an additive k-color space (e.g., the RGB space with k = 3).
In most practical situations, not the original image u 0 but a corrupted version of it, say,ũ 0 is known. Various pollution scenarios can occur ranging from noise effects and blurring to missing parts, etc. Here, we want to restrict ourselves to the situation that u 0 is distorted by an additive noise ε, i.e., For a probability space (Ω, F , P), the noise ε can be modeled as an F -measurable random variable taking its values in a closed subspace of L 2 (G, R k ), e.g., a Gaussian random field on G. The goal is to reconstruct or at least to 'optimally' estimate the (unknown) original image u 0 based on the noisy observationũ 0 .
We outline the following abstract approach (known as the scale-space theory) to constructing such estimators based on general techniques of semiparametric statistics (cf. [37,Section 1.2.2]). First, a deterministic semiflow S(t) t≥0 on L 2 (G, R k ) referred to as a 'scale-space' is introduced. There are various rationales behind a particular selection S(t) t≥0 . For example, S(t) t≥0 can be designed such that any 'reasonable' unpolluted image u 0 can be approximated by one of the stationary points of S(t) t≥0 . In this case, an estimateû 0 of u 0 is given bŷ Another example is given when S(·) is selected to play the role of a kernel smoothing operator from the nonparametric statistics (cf. [34,Chapter 8]). In this case, the evaluation time T in Equation (7) roughly represents the (reciprocal) bandwidth and is typically selected to minimize the asymptotic mean integrated square error (AMISE) as a function of the design size n (e.g., the number of pixels available).

Remark 1.
Among other popular approaches such as the low-pass filtering, morphological multiscale analysis, neural networks, Bayesian techniques, etc., one should mention the penalized nonparametric regression. Given a noise imageũ 0 (x) from Equation (6), the filtered image is obtained by minimizing the penalized objective functional with a regularization parameter λ > 0. Here, the first term measures the L 2 -goodness of fit between the noisy and the filtered images and can alternatively be replaced with any other L p -norm. The second represents a Tychonoff-regularization associated with a stronger topology. The typical choices are where TV(u) stands for the total variation of u.  (8) is typically given as a (parameter-)elliptic partial differential equation or inclusion.
In the following, we use a synthesis of these two approaches to put forth the semiflow S(t) t≥0 . The latter is also referred to as a C 0 -operator semigroup (cf., e.g., [8,Chapter 4] or [11,Chapter 5]). As a matter of fact, one can not expect the filter to be able to perfectly reconstruct the original image. At the same time, the filter should be designed the way it performs 'well' on a certain class or set of images. Assuming ε is only locally autocorrelated, a natural choice is to let the evolution associated with semiflow be driven by a partial differential equation (PDE). In the following, we briefly outline our PDE model.
Motivated by the standard approach adopted in the theory of transport phenomena (cf. [37, p. 2]), let u(t, ·) ∈ L 2 (G, R k ) denote the color intensity at time t ≥ 0 after applying the semiflow to the initial noisy measurementũ 0 . In physical applications, u is usually a scalar variable representing the heat or material concentration density, etc. Further, let J(t, ·) ∈ L 2 (G, R k×d ) denote the 'color flux' tensor at time t ≥ 0. Intuitively speaking, J(t, ·) represents the direction the color intensity is flowing into to compensate for local distortions caused by the noise. Assuming that there are no other sources of color distortion, we exploit the divergence theorem to obtain the following conservation or continuity equation Since Equation (9) is underdetermined, a so-called constitutive equation establishing a relation between u and J is needed. In many applications, one adopts the well-known Fick's law of diffusion, which postulates P(t, ·) to be proportional to −∇u(t, ·), i.e., Here, ∇u stands for the Jacobian of u and the symmetric fourth-order tensor H(t, ·) ∈ R (k×d)×(k×d) plays the role of diffusivity tensor and can be interpreted as a symmetric linear mapping from R k×d into itself. With this in mind, Equation (9) rewrites as If H is a constant tensor, Equation (11) is referred to as the homogeneous diffusion. Otherwise, Equation (11) is still underdetermined and a futher constitutive relation between H and ∇u is indispensable. In physics, this equations models the properties of the medium the diffusion is taking place in and/or the properties of the substance which is diffusing. In image processing, some other principles are adopted. See Section 2.2.1 below for details. Assuming, for example, for an appropriate response function F : R k×d → R (k×d)×(k×d) and plugging Equation (12) into (11) As a parabolic PDE system, Equation (13) exhibits an infinite signal propagation speed. Since any practically relevant selection of the diffusivity function F violates the causality principle, the equation even turns out to be ill-posed. Besides, no direct intuition on how to select the stopping time T from Equation (7) is provided. This motivated Nitzberg & Shiota [27] and Cottet & El Ayyadi [17] to consider a hyperbolic relaxation of Equation (12) for the particular case k = 1. For a positive relaxation parameter τ > 0, they replaced Equation (12) with the first-order hyperbolic equation They called their regularization a 'time-delay', which is, strictly speaking, not correct since it rather has the form of a relaxation. At the same time, Equation (14) can be viewed as a first-order Taylor approximation with respect to τ of the delay equation Equation (14) together with (11) yields a nonlinear PDE system Recall that H∇u stands for the tensor-matrix multiplication (cf. Equation (10)). In fact, Equations (15)-(16) are very much reminiscent of the well-known Cattaneo system (cf. [15]) of relativistic heat conduction. Formally speaking, Equation (14) 'converges' to (12) as τ → 0. Equations (15)-(16) can be viewed as parabolic-hyperbolic PDE system or a nonlinear Gurtin & Pipkin heat equation (cf. [20]). Indeed, solving Equation (16) for H and plugging the result into Equation (15), we obtain a memory-type equation which, after being differentiated with respect to t, yields a quasilinear wave equation with a Kelvin & Voigt damping and memory-time coefficients. Next, appropriate boundary conditions for (u, H) T need to be prescribed. Neither Dirichlet, nor periodic boundary conditions seem to be adequate for the most applications. In contrast to that, a nonlinear Neumann boundary condition turns out both to be mathematically sound and geometrically intuitive. Equation (18) states that the color flow on the boundary vanishes in the normal direction. As for the initial conditions, we prescribe Here, the fourth-order tensorH 0 can be chosen to be symmetric and positive definite, e.g.,H 0 = αδ iI δ jJ j,J=1,...,d i,I=1,...,k for a small parameter α > 0. Collecting Equations (15)- (19), we arrive at an initial-boundary value problem for the quasilinear PDE system Equations (20)- (23) can be viewed as a mixed parabolic-hyperbolic system. Additionally, the boundary condition in Equation (21) is nonlinear. Hence, neither the classical hyperbolic solution theory (viz. [28] or [35]), nor the classical parabolic solution theory (see, e.g., [8]) are directly applicable.
To make Equations (20)-(23) better feasible, Cottet & El Ayyadi considered in [17] a regularization of ∇u in Equation (21) though a spatial mollification. For u ∈ L 2 (G, R k ) and a kernel ρ ∈ L ∞ loc (R d , R), the convolution u * ρ is given by Selecting now a fixed mollifier ρ ∈ W 1,∞ R d , R with ρ ≥ 0 a.e. in R d and R d ρ(x)dx = 1, e.g., ρ can be the Gaussian pdf, as well as a bandwidth σ > 0, we define for u ∈ L 2 (G, R k ) the nonlocal operator as a regularization of the gradient operator. With (ρ σ ) σ>0 being a delta sequence, ∇ σ is a regular approximation of the ∇-operator. Replacing ∇ with ∇ σ in Equation (21), we arrive at the following system of partial integro-differential equations

Parameter Selection
In this subsection, we discuss the choice of parameters τ , F andH 0 . Also, depending on whether the model (20)-(23) or (24)-(27) is adopted, a kernel ρ and a regularization parameter σ > 0 need or need not to be selected. Here, we restrict ourselves to the limiting case σ = 0.

Response function F
For d, k ∈ N, let the space R k×d of real (k × d)-matrices be equipped with the Frobenius scalar product For a fixedD ∈ R k×d , we can thus define an orthogonal projection operator PD ⊥ : R k×d → R k×d onto the orthogonal complement ofD via Obviously, PD ⊥ can be viewed as an element of R (k×d)×(k×d) . A first choice of F could be In addition to being unsmooth at zero and thus possibly leading to technical difficulties when treating Equations (20)-(23) analytically or numerically, this particular choice of the nonlinearity does not seem to meet practical requirements which are desirable for an image filter. Indeed, as reported in [17, Section III.A], the resulting system possesses too many undesirable stationary points and exhibits a too fast convergence speed leading to such a serious drawback that a rather high amount of noise is retained. That is why a regularization of Equation (29) based on a contrast threshold parameter should be adopted. Motivated by [17,Equation (21)], we let For a discussion on the particular choice of the coefficient 3 2 and a connection to a neural network model, we refer to [17,Section IV]. With the color variables being each rescaled to lie in the interval [−1, 1], the contrast threshold s is usually selected as 5% to 10% of the image width or height. Note that the choice of function F in Equation (30)

TensorH 0
As for the initial diffusivity tensorH 0 , assuming the noise ε in Equation (6) is weakly autocorrelated, we can selectH under an additional uniform positive definiteness condition on Cov ε(·) . Since Cov ε(·) is not known in practice, the value of Cov ε(x) for a particular x ∈ G can be estimated by computing the sample covariance matrix of ∇ũ 0 evaluated over an appropriate neighborhood of x. We refer to [14,25], [34,Chapter 8] for a discussion on the optimal neighborhood size.

Relaxation time τ
Repeating the calculations in [17, Section III.A] and [38], any particular selection of the parameter τ can be shown to imply that any graphical pattern occurring on scales smaller than √ τ vanishes asymptotically, i.e., converges to its spatial mean. If no prior information on the minimum pattern size is available, statistical methods need to be employed to estimate the former.
3 The regular case σ > 0 In this section, we provide a well-posedness theory for Equations (24)- (27). In contrast to [17], nonlinear Neumann and not linear periodic boundary conditions are presribed in Equation (26). Being more adequate for practical applications, they are mathematically more challenging since the evolution is now driven by an operator with a time-varying domain. Thus, a standard application of Faedo & Galerkin method is rather problematic. Therefore, we propose a new solution technique based on the maximum L 2 -regularity theory for non-autonomous sesquilinear forms due to Dautray & Lions [18,Chapter 18, §3] as well as its recent improvement by Dier [19]. Another major advantage of our approach over [17] is that we get a much more regular solution under weaker smoothness assumptions on the initial data. Our results have a certain degree of resemblance to [16], where Equations (24)- (27) were studied for τ = 0 and k = 1. In contrast to [16], we consider both the weak and strong settings without requiring F to be a C ∞ -function. Without loss of generality, let Gũ 0 dx = 0. Otherwise, replace Equation (27) with and solve the resulting system for u. Later, by adding 1 |G| Gũ 0 dx to u, a solution to the original system is obtained.
We consider the space R (k×d)×(k×d) of real fourth-order tensors. Similar to Equation (28), we equip R (k×d)×(k×d) with the Frobenius inner product With all norms being equivalent on the finite dimensional space R (k×d)×(k×d) by virtue of Riesz' theorem, the Frobenius norm (·) : (·) is equivalent with the operator norm Throughout this subsection, we require the following assumption on ρ and F. Assumption 2. Let the functions F : R k×d → S ≥0 (R k×d ) as well as ρ : R d → R be weakly differentiable and let their first-order weak derivatives be essentially bounded by a positive number c > 0.
Proof. Using Hölder's inequality and Assumption 2, we can estimate for any x ∈ G where |G| < ∞ is the standard Lebesgue measure of G. Hence, for any u,û ∈ L 2 (G, R k ), we get which finishes the proof.
We let Then (V, H, V ′ ) is a Gelfand triple. For a tensor-valued function H ∈ L ∞ G, S ≥κ (R k×d ) for some κ > 0, we further consider the bilinear form Using Assumption 2 and the second Poincaré's equality, we can estimate a(u, u; H) ≥ κ ∇u 2 L 2 (G,R k×d ) ≥ κC P u 2 V for any u ∈ V, where C P = C P (G) > 0 is the Poincaré's constant. Hence, A(H) is continuously invertible and self-adjoint. From the elliptic theory, we know A(H) to generalize the Neumann boundary conditions in (26) associated with the PDE in Equation (24). Further, we know that the maximum domain of the strong realization of A(H) is a dense subspace of H.

Remark 5.
If H ∈ C 1 Ḡ , S ≥κ (R k×d ) for κ > 0, the elliptic regularity theory implies if G ∈ C 2 (cf. [23, Lemma 3.6] for the case G is a rectangular box). This regularity for H can be assured by selecting a regular convolution kernel ρ and a smooth nonlinearity F as well as considering smooth initial data H 0 for H. (24)-(27) can be written in the following abstract form:

With the notation introduced above, Equations
where the Neumann boundary conditions are now incorporated into the definition of operator A(H).

Proof. Equivalent formulation and a priori estimates: Solving Equation (35) for H and plugging the result into Equation (34), Equations (34)-(36) reduce to
We prove several a priori estimates we later used in the proof. For any D ∈ R k×d , Equation (38) together with Assumption 2 imply and, therefore, H(t, ·) ∈ S ≥κ (R k×d ) for a.e. t ∈ [0, T ] a.e. in G. On the other hand, by virtue of Equation (38) and Assumption 2, For t ∈ (0, T ], multiplying Equation (37) with u in L 2 (0, t; H), we obtain Hence, using Equation (39), we arrive at Constructing a fixed point mapping: Consider the Banach space equipped with the standard topology. Now, we define an operator F :X →X which maps eachũ ∈ X to the unique weak solution of Equations (37)-(38) (See Equation (75) for the embedding above.) By virtue of Equations (39) and (40), the bilinear non-autonomous form t → a ·, ·; H(u) (t, ·) is uniformly coercive and bounded. Hence, by Theorem (18), the operator F is well-defined.
Proving the contraction property of F : To show the contraction property, similar to the classical existence and uniqueness theorem of Picard & Lindelöf, we first equip the Banach space X with an equivalent norm with a positive constant L to be selected later. For the sake of simplicity, we keep the same notation for this new isomorphic space. Forũ 1 ,ũ 2 ∈X , let u 1 := F (ũ 1 ) and u 1 := F (ũ 2 ).
By definition,ū := u 1 − u 2 solves then the non-autonomous linear (w.r.t.ū) problem For t ∈ (0, T ], multiplying Equation (43) withū in L 2 (0, t; H), we get where C Lip is the Lipschitz constant of the mapping F from Lemma 4. Combining the estimates from Equations (45) and (46), we arrive at ū(t, ·) H ≤ L Multipying Equation (47) with exp(−Lt), we estimate Hence, taking the maximum over t ∈ [0, T ] on the left-hand side of Equation (48), we find which implies X is a contraction. By virtue of Banach's fixed point theorem, F posseses then a unique fixed point u ∈ X . Hence, applying Lemma 4 to Equation (38) and recalling Equation, we further get Taking into account Equation (42) as well as the equivalence between Equations (34)- (36) and (37) Proof. For the unique weak solution (u, H) T , consider the linear initial value problem The associated non-autonomous form is of bounded variation since for 0 ≤ s ≤ t ≤ T and u, v ∈ V. Theorem 19 applied to Equation (49) yields then u ∈ W 1,2 (0, T ; H) and div H∇u ∈ L 2 (0, T ; H).
Hence, (u, H) T is also a strong solution.

Limiting case σ = 0
In this remaining section, we want to obtain a solution theory for the original PDE system (20)- (23). In contrast to the regularized case, a slightly stronger assumption on the function F is required here. It is precisely the one used in [17,Equation (4)].
Assumption 10. Let the function F : R k×d → S ≥0 (R k×d ) be weakly differentiable such that F together with its weak Jacobian are essentially bounded by a positive number c > 0. Now, we introduce the nonlinear mapping F : Obviously, F is well-defined. Indeed, F(∇u) is strongly measurable as a composition of two strongly measurable functions and essentially bounded by virtue of Assumption 10. Unlike F σ , generally speaking, F is not Lipschitzian from H 1 and L ∞ . At the same time, due to Assumption 10, we trivially have: With the notations of Section 3, the abstract form of Equations (20)-(23) reads as We adopt the following solution notions for Equations (51)-(53). Note that the regularity condition on H differs from the one employed in the regularized case.   This together with the fact u 0 ∈ H combined with Theorem 18 yields a unique solution u ∈ Y to Equation (54). Hence, F is well-defined as a self-mapping on Y .
Showing the continuity of F : For an arbitrary, but fixedũ ∈ Y consider a sequence (ũ n ) n∈N ⊂ Y such thatũ n →ũ in Y as n → ∞. Further, let u := F (ũ) and u n := F (ũ n ) for n ∈ N. We want to show u n → u in Y as n → ∞. Note that the sequential continuity of F is equivalent with the regular continuity since Y is separable.
Letū n := u − u n . By definition,ū n solves the Cauchy problem Due to the Lipschitz continuity of F (cf. Lemma 11), we have Hence, by virtue of Equation (55), Using Assumption 10 to verify we apply Lebesgue's dominated convergence theorem to Equation (61) to obtain Hence, since div is a continuous linear mapping between the Hilbert spaces L 2 (G, R k×d ) and V ′ , we find div H(ũ) − H(ũ n ) ∇ũ n → 0 in L 2 (0, T ; V ′ ) as n → ∞.
Therefore, by virtue of Theorem 18, This together with the assumption u 0 ∈ V enables us to deduce u ∈ MR a (H) by virtue of Theorem 18.
Extra regularity: Applying [33, Theorem 4] to the following family of elliptic problems the desired regularity follows.
Concerning the uniqueness for Equations (51)-(53), no existence results are known in the literature both for weak and strong solutions in sense of Definition 12 (cf. [10]). Same is true for the quasilinear heat equation in non-divergence form (see [7]), etc. Nonetheless, under a boundedness condition for ∇u, the following uniqueness result can be proved.
In a similar fashion, we can prove: Corollary 16. Under the conditions of Theorem 15, for any T > 0, there exists a constant C > 0 such that For a global weak solution (u, H) T to Equations (51)-(53) given in Theorem 13, consider the energy functional Theorem 17. In addition to the assumptions of Corollary 14, there may exist a number ω > 0 such that F(D) ∈ S ≥ω (R k×d ).
Hence, the exponential decay of E is a direct consequence of Gronwall's inequality.

A Maximal L 2 -Regularity for Non-Autonomous Forms
In this appendix, we briefly summarize the theory of maximal L 2 -regularity for nonautonomous forms. We start with the classical theory dating back to Dautray & Lions (cf. [18]), which furnishes the existence and uniqueness of weak solutions. Further, we present a recent theory developed by Dier in [19] guaranteeing the existence of strong solutions under a boundedness assumption on the variation of non-autononous form associated with the 'elliptic' part of evolution problem. Let H and V be separable Hilbert spaces such that V is continuously and densely embedded into H. For T > 0, we consider the initial value probleṁ u(t) + A(t)u(t) = f (t) in L 2 (0, T ; V ′ ), u(0) = u 0 ∈ H. Additionally, let a be uniformly coercive, i.e., there may exist some number α > 0 such that Re a(u, u; t) ≥ α u 2 V for any u ∈ V and a.e. t ∈ [0, T ]. With V ′ denoting the antidual of V, the linear bounded operator A(t) : V → V ′ associated with a(·, ·; t) for t ∈ [0, T ] is defined as A(t)u, v V ′ ;V := a(u, v; t) for u, v ∈ V.
A classical result due to Lions states the following well-posedness result in the class of weak solutions. For the weak solution to be strong, additional assumptions on the non-autonomous form a are required. In the following, let the non-autonomous form a be of bounded variation, i.e., there may exist a nondecreasing function g : [0, T ] → [0, ∞) such that |a(u, v; t) − a(u, v; s)| ≤ g(t) − g(s) u V v V for all u, v ∈ V and 0 ≤ s ≤ t ≤ T.