A DUAL EM ALGORITHM FOR TV REGULARIZED GAUSSIAN MIXTURE MODEL IN IMAGE SEGMENTATION

. A dual expectation-maximization (EM) algorithm for total vari- ation (TV) regularized Gaussian mixture model (GMM) is proposed in this paper. The algorithm is built upon the EM algorithm with TV regulariza- tion (EM-TV) model which combines the statistical and variational methods together for image segmentation. Inspired by the projection algorithm pro- posed by Chambolle, we give a dual algorithm for the EM-TV model. The related dual problem is smooth and can be easily solved by a projection gradient method, which is stable and fast. Given the parameters of GMM, the proposed algorithm can be seen as a forward-backward splitting method which converges. This method can be easily extended to many other applications. Numerical results show that our algorithm can provide high quality segmentation results with fast computation speed. Compared with the well-known statistics based methods such as hidden Markov random ﬁeld with EM method (HMRF-EM), the proposed algorithm has a better performance. The proposed method could also be applied to MRI segmentation such as SPM8 software and improve the segmentation results.


1.
Introduction. Image segmentation is a basic part of the image processing with a long history [22]. Images may come from every corner of the world, and some of the natural images may contain noise and are not good enough to segment directly. There are many methods for image segmentation after several decades of development, one of the most important methods is the statistical method. In such a method, a region with its noise can be approximately presented by a Gaussian mixture model (GMM) [26,17,23,39,41]. A common method of solving GMM is the well-known expectation-maximization (EM) algorithm, which is firstly introduced in [16]. GMM-EM method has been studied a lot in many references such as [9,30,38,41]. However, when using GMM-EM method in image segmentation, one drawback is its sensitive to noise due to the lack of regularization. To fill this flaw, in [41], the authors proposed a hidden Markov random field method (HMRF) to GMM and solved it by EM algorithm (HMRF-EM), which can achieve stable segmentation results under noise. In fact, the local spatial positions of the neighborhood pixels are taken into consideration in HMRF, which makes the segmentation more robust to noise. However, the classification provided by HMRF may produce some zigzags near the boundaries due to the improper regularization [25], especially when the levels of noise are high. One may see this phenomenon in the numerical experiments section.
As for regularization, total variation (TV) is one of the most successful methods in inverse problems. TV was introduced by Rudin-Osher-Fatemi in [33], known as ROF model. In fact, Mumford and Shah introduced a multi-region segmentation model in [31], where we can also treat the regularization term as TV, as well in [20,35]. In [12], Chan and Vese proposed the CV model which combined TV regularization and level set method [32] for two-region image segmentation. Some extensions can be found in [1,11]. It is well-known that there are many fast algorithms to solve TV based minimization problems, such as the dual method [10], split Bregman method [21], augmented Lagrange method [37], graph-cut (max-flow) method [7,8], continuous max-flow method [40] and so on. Besides, there is a natural connection between TV and the length of the curves theoretically [19]. From the denoising experimental results, we can see that, as a regularization term, TV can keep the jump between classification and suppress noise, which means that TV is good at preserving edges and removing noise.
In [26], Liu et al. proposed an image segmentation model called EM-TV model by integrating EM and TV. They proposed a unified variational functional which brings EM algorithm and TV regularization together, and this model holds both advantages from the statistical and PDE methods. In their method, the minimization problem was solved by a splitting scheme where a L 1 penalty term was used to keep the structure of EM. To be different from the traditional L 2 penalty problem, the convergence of the algorithm is unknown due to lack of strict convexity of L 1 . Besides, they introduced some extra auxiliary variables and many penalty parameters in the algorithm. For real implementations, it is very important and difficult to find some good values of these parameters for such a splitting scheme. Bad parameters may cause the algorithm fail to converge or produce undesirable segmentation results.
In this paper, we introduce a new method to solve this EM-TV model. The idea comes from the Chambolle projection method [10]. With the help of Fenchel duality method, we get a smooth dual problem for the EM-TV model. This dual problem can be easily solved by a projection gradient method, which is fast and stable. Moreover, we can get the convergence of this algorithm. Compared with introducing two auxiliary variables in [26], we only introduce one dual variable during the iteration, and the solution can be given by an explicit formulation. The proposed algorithm does not have many control parameters, which tends to be more stable.
Our method is related to the global minimization method described in [5]. Let us point out the main differences of these two methods. Firstly, the theoretical fundaments are totally different. Our method is built upon the statistics method while theirs is from the viewpoint of deterministic approach. In this paper, we try to solve EM-TV model, but [5] studied the Potts model with approximation method. Thus, our model can segment data with the same mean but with different variances due to the property of GMM. In this case, [5] cannot work since the mean is the only factor for segmenting images. Secondly, in [5], the authors did not provide the convergence of their algorithm. In this paper, we will mathematically show that our algorithm converges.
The rest of the paper is organized as follows. In section 2, we introduce the basic idea of GMM, EM and the EM-TV model. In section 3, we will show the dual algorithm for solving the EM-TV model. In section 4 describes the discrete method of our algorithm. In section 5, some numerical results and comparisons are presented. Finally, we summarize our algorithm.
2.1. GMM and EM. GMM and EM algorithm were first introduced in statistics, which can be used in image segmentation. GMM is a probabilistic model, representing the distribution of multi-class. EM algorithm is a 2-step iterations algorithm, the first step is usually called E-step, which uses current estimate of parameters to calculate the maximum a posteriori (MAP) function, the second step is usually called M-step, which uses the function in E-step to update parameters.
Here we take gray image as example, let I(x) be a gray image where x ∈ Ω ⊂ R 2 . A GMM's distribution for image pixels can be represented as follows: where K is the number of mixtures, α k is the weight of k-th mixture class and it satisfies K k=1 α k = 1. p is the Gaussian density function, with p(I(x); µ k , ) and µ k , Σ k are the mean and variance of the k-th mixture, respectively. Usually, K is an integer given by the user as a prior.
It is easy to get φ * and Θ * from H(φ, Θ) by alternating scheme The above iteration strictly corresponds to the E-step and M-step of EM algorithm, respectively.
By applying the first-order optimization condition and Lagrange multiplier technique, we can easily get the solution of E-step and M-step as follows: E-step: for x ∈ Ω, k = 1, 2, · · · , K, . M-step: for k = 1, 2, · · · , K, 2.2. TV regularization. TV regularization was first introduced in ROF model in [33], which aims at noise removal.
In this paper, we use the rotation invariant isotropic TV as due to its better performance than anisotropic TV [28], since it can provide more precise boundaries.
2.3. EM-TV model. In [26], Liu et al. proposed EM-TV model, which combines GMM-EM method with TV regularization, and got some good results. GMM-EM algorithm clusters pixels only by the intensity of pixels, which does not take any spatial or geometric information into the segmentation. In [26], the authors added the TV regularization into the energy (2) to get the EM-TV model: Here γ is a regularization parameter. 3. The proposed method. In this section, we discuss how to solve (5) efficiently. Alternating method is applied to solve φ and Θ, which can be written as, For the minimization of φ, we shall propose a dual method due to existence of TV norm. This dual algorithm is built upon [10]. Compared with the original dual algorithm [10], there are some segmentation constraints and nonlinear entropy regularization in EM-TV model, which make this problem more difficult. For simplification, we denote our method as Dual EM-TV.
. For shortness, in the next, we omit the superscript m of (f k ) m (x), and write it as f k (x) in this section.
Note that there is a constraint for φ, we use Lagrange multiplier method and get the following unconstrained energy minimization problem, It is easy to see that E L (φ, Θ m ) is convex with respect to φ, when φ k (x) > 0, ∀k ∈ {1, 2 · · · , K}, x ∈ Ω. Moreover, this energy can be separated and thus we can deal with each k separately as follows, We can see that the intersection of domain of each term is nonempty, which means the subgradient of E L (φ, Θ m ) is equal to the sum of subgradient of each term in E L (φ, Θ m ). Now the necessary and sufficient condition for problem (8) with its minimizer φ * = ((φ * ) 1 , (φ * ) 2 , · · · , (φ * ) K ) satisfies Here ∂T V (u) is the subdifferential of T V (u) at u, defined as where ·, · denotes the corresponding L 2 inner product. Then we have: and we get It is well-known that T V * is a "characteristic function" of set B: More details about the dual presentation of TV norm can be found in Chambolle projection method [10], as well as in [18,24]. Now, let and we get the equivalent form which indicates that (g * ) k is the minimizer of energy, By (11), if g k / ∈ B, F (g k ) = +∞, thus (14) can be written as a constraint minimization problem (15) min According to the definition of B, computing (g * ) k can be transferred to solving the following problem: in which, g k and η k are related as We can use the projection gradient method to solve (16), then one can get the iteration scheme Where the operator P rojB(ξ) means we project ξ onto a convex setB, and ∆t is a proper time step.
As for the Lagrange multiplier, from (13), we can give out the closed-form solution of λ(x) by if (g * ) k (x) is known. Thus, we set the Lagrange multiplier as And fix λ = λ m (x) when solving η. Now put λ back into (17), we can get the iteration of (η k ) n as )).
3.2. Minimization of Θ. At the m-iteration, we fix (φ k ) m+1 . Then we have to minimize such an energy respect to Θ: This optimization problem is the same as the classical EM model, so it can be updated by (4).
The discrete operator ∇ d : F 1 → F 2 with Neumann boundary condition can be defined as It is not difficult to calculate that the discrete adjoint operator div d : The projection operator P rojB d : } is also discrete. The corresponding discrete scheme of the main iteration becomes : )).

Convergence analysis.
For the η-sub problem, we have the following convergence result.
Proposition 1. Assume f is bounded by M , i.e. |f ij | M . Then for all 0 < ∆t < K 4γ exp(−(2M + 8γ)), the sequence generated by (21) converges to one η * ∈ G if the fixed point set G of iteration scheme (21) is nonempty. Here K is the number of classification and γ is the regularization parameter.
We put the proof of Proposition 1 in the appendix section.
5. Experiment results. In this section, we demonstrate the segmentation results. The computing platform is a laptop equipped with Processor Intel Core i7-8550U CPU @ 1.80Hz and Matlab R2010b. We take HMRF-EM method [36,41] as counterparty. In HMRF method, the Gibbs prior is applied as a regularization, while the Gibbs distribution is given by where Z is a normalization constant. The prior energy function is set to U (x) = γ y∈Nx (1 − Class(x, y)), where N x denotes the neighborhood of point x, and In our numerical experiments, the size of N x in regularization term of HMRF-EM is chosen as |N x |=4, 8, 12, 20 neighborhoods, respectively. For short, we call them as HMRF-EM 4n, HMRF-EM 8n, HMRF-EM 12n, HMRF-EM 20n, respectively. The codes for running HMRF-EM are got from https://se.mathworks. com/matlabcentral/fileexchange/37530-hmrf-em-image in [36].  [42] to choose the step size ∆t in (21) by line search. One can set the theoretical upper bound of ∆t appeared in the proposition 1 as the Armijo rule's low bound (α min in [42]). We can also set the initial ∆t as some proper constants if fast convergence is preferred. In this paper, we do not further study the optimal step size and just use some constants as step size. There are two parameters δ and γ in our model. δ controls the smoothness of the classification function. In the experiment, δ is not very sensitive, so it can be chosen in a wide range. Generally speaking, the heavier noise, the larger δ should be taken. In our experiments, we choose δ from 1 to 1000. γ is the regularization parameter, it would affect the smoothness of the boundary.

Comparison of dual EM-TV, Original EM-TV
To compare the segmentation results, we use the segmentation accuracy index (SAI) to measure the classification quality. Here N c is the correctly labeled points, while N t is the total number of points. The CPU time and accuracy results are listed in Table 1 and Table 2. The segmentation results comparisons are displayed in Figures 1, 2, 3, 4. In Figures  1 and 3, the first row contains original, noisy images and the energy evaluation of the Dual EM-TV method. While the second and third rows are the ground truth, segmentation results of Dual EM-TV, Original EM-TV [26], HMRF-EM 4n, HMRF-EM 8n, HMRF-EM 20n, respectively. In Figure 2 and 4, the enlarged images of the related red square areas in Figure 1 and 3 are shown.
One can find that our algorithm is faster than HMRF-EM model and provides better segmentation results. Compared to the original EM-TV method, our proposed method produced slightly better results and costed much less CPU time.

5.3.
Comparison of dual EM-TV and CV method. In this subsection, we show an example that our model can segment an image with two regions which have the same means but different variances.
We first synthetic two regions with the same mean but different variances in Figure 5  slightly apart from 128. The histogram in Figure 5(b-d) clearly shows the difference of variance. Please see Figure 5 for details. Then we apply GMM-EM method on Figure 5(a) and gets the segmentation result on Figure 6(a). And then we use the parameters got from this GMM-EM method and apply out Dual EM-TV method with the segmentation result in Figure  6(b). Figure 6(c) is the segmentation result of CV model. As you can see that GMM-EM based method successfully segmented the target image while CV cannot.

Natural image segmentation.
Here, we show more comparisons for Dual EM-TV and HMRF-EM method in Figure 7. We add Gaussian noise with mean 0, standard deviation of 50/255. We can see that our algorithm can get a very good segmentation. In Figure 7, the first and second columns contain the original and noisy images. All the segmentation results provided by Dual EM-TV, HMRF-EM 4n, HMRF-EM 8n, HMRF-EM 12n are displayed in the right four columns, respectively.
The parameters are as follows: ∆t = 0.0001, γ = δ = 1. We use Dice metric (DM) to compare the segmentation accuracy, as  where N ag denotes the number of the voxels that are correctly assigned to the k-th class by both the ground truth and the segmentation algorithm, N a and N g are the numbers of the voxels assigned to the k-th class by the algorithm and ground truth, respectively. We compare our results with new segment in SPM 8, the DM is listed in Table  3, WM denotes white matter, GM denotes gray matter. As you can see that our method is slightly better.
6. Discussion and conclusion. In this paper, we proposed a dual algorithm for solving the EM-TV model. To combine the dual, project gradient and Lagrange multiplier methods, we get an elegant dual EM algorithm which is fast and stable.   Here, we discuss how the noise level affect our model and the segmentation results. Figure   8 10 is a two class segmentation results. The noise is additive Gaussian white noise, with 9 standard deviation of 120/255, 240/255, 300/255, 390/255, respectively. In Figure 11, We  easier to solve. Compared with HMRF-EM model, our algorithm is faster and has better segmentation performance since the isotropic TV cannot be solved by HMRF type algorithm. Based on the experiment results, we can see that our algorithm can produce good segmentation results under high levels of noise, and can also achieve some good segmentations on both color and gray images. The proposed method could also be applied to SPM8 and gets good results. Our method can be extended to non-parameter segmentation mixture model such as kernel method which is parameter free. We will leave these as our future research.     7. Appendix A. For self-contained in this paper, we give out the proof of Proposition 1.
Let T : F 1 → F 2 be an operator. Introduce notations, Definition 7.2 (Non-expansive operator). T is a non-expansive operator if T is 1-Lipschitz continuous, i.e.,  Definition 7.4 (Firmly non-expansive operator). T is firmly non-expansive if and only if T is a 1 2 -average non-expansive operator. Another equivalent definition of firmly non-expansive is that the following inequality holds, In our paper, the spaces F 1 , F 2 are often taken as R or R n . In the next, we give some properties of T 1 , T 2 , T 20 appeared in our algorithm. Proof. T 1 is a projection operator, which is a special proximate operator, and thus it is firmly non-expansive (more details please find in [15]).
Proposition 3. If f is bounded by M , then T 20 is β-Lipschitz continuous with β = 8γ K exp (2M + 8γ). To prove Proposition 3, we need the following lemmas, and where K is the classification number and γ is the regularization parameter. Proof. .
Proof. We can see that exp(z) is only affected by the (i, j) position, so we can consider it in each point as in the one dimensional case: where δ is between z and y by the differential mean value theorem. And since exp(z) is monotone , we can say  While in our algorithm, z = −γdiv d η n , then z ∈ [−4γ, 4γ], and let K 2 = exp(4γ), Now ∀g 1 , g 2 ∈ F 1 || exp(g 1 ) − exp(g 2 )|| ≤ K 2 ||g 1 − g 2 ||.

Now we show the proof of Proposition 3,
Proof. By the Lemmas 7.5-7.8, we have  Figure 12. Results of SPM8 with one slice of MR Images. Left two column: MR Images with 0% RF and 0% noise, the first column is the newsegment method, the second column is the proposed method. Right two column: MR Images with 40% RF and 9% noise, the third column is the newsegment method, the fourth column is the proposed method. The first row: original image, the second and third row: the segmentation results. Lemma 8.5. Under the condition of Proposition 3, exp(−(f k + 1 + λ m )) is bounded by Proof.
. 4,4], due to the definition of div d . Then we know that Here we denote K 0 = exp(2M +4γ) Proof. It is easy to check ∇ d is linear. Moreover which means ||∇ d g|| K 1 ||g|| with K 1 = √ 8.

5
Inverse Problems and Imaging Volume 00, No. 0 (2018), Figure 12. Results of SPM8 with one slice of MR Images. First row: left image, MRI with 0% RF and 0% noise; right: MRI with 40% RF and 9% noise). Second and third rows: the first and third columns are the newsegment method for left and right images, respectively; the second and fourth columns are the proposed method for left and right images, respectively.
So there exists a subsequence of η n denoted as η nj that convergences to η * .