A VARIATIONAL APPROACH TO EDGE DETECTION

. In this paper, using the variational framework and elements of topological asymptotic analysis, we derive an algorithm for edge detection in a digital image in which an optimal value for the threshold is computed automatically. In order to examine this algorithm, we perform a simple experiment on synthetic images composed of two objects with diﬀerent values of intensity and size. In this case, we are be able to ﬁnd an exact condition which has to be satisﬁed so that an edge of object with lower contrast would be detected. At the end, we compare results of numerical experiments obtained by applica- tion of our algorithm and the algorithm proposed by Desolneux et al. [7, 8]. We indicate some similarities between these two approaches to edge detection and discuss their diﬀerences.


1.
Introduction. Detection of edge position in a digital image is one of the most intensively investigated problem since the beginning of studies on computer vision. Generally, it is closely related to the problem of image segmentation, as edges can be equated with boundaries of objects or coherent segments in an image. The early research on the edge detection problem was mainly an attempt to define an appropriate differential operator in order to characterize an edge which would work similar to the human visual system. In the original paper [14], Marr and Hildreth proposed to consider an edge as the set of points where the Laplacian of the Gaussian kernel convolved with a given image changes sign. Although the idea to use this operator to edge detection was supported by neurophysiological experiments, it turned out that it does not work well in practice. One can show, for instance, that this operator also returns false edges corresponding to local minima of the image gradient magnitude. This limitation has been later overcome by Canny [4], who proposed to define an edge as the set of points where the directional derivative of an image convolved with the Gaussian kernel assumes a local maximum. Of course, beside these two classical approaches to edge detection there is also a huge number of other, still widely used algorithms that have been developed based on similar criteria. Their detailed description can be found in any fundamental book on image processing (see, e.g., Jain [13] or Pratt [18]). However, the main drawback of all these methods, both in application to real world problems and in further research towards better understanding of visual perception, is a need of adjusting a number of parameters for each processed image. In practice, these parameters are either determined by application of some heuristic methods consisting on the histogram analysis or manually selected by user based on experience from earlier performed experiments.
In [7,8], Desolneux, Moisan and Morel introduced a different approach to the problem of edge detection. Their idea was inspired by the basic perception principle of Helmholtz, which states that whenever some large deviation from randomness occurs, a structure is perceived (cf. Desolneux et al. [8]). Based on this observation, the authors proposed a global criterion for decision on whether an edge is meaningful or not. This criterion was the basis for construction of a method, which main advantage over the discussed above is possibility of edge detection without need of prior setting of parameters. In this paper, we also propose a different view at the problem of edge detection than the authors of methods mentioned in the beginning of this introduction. It means, instead of trying define more appropriate differential operator to indicate positions of edges, we rather aim for deriving a formal approach that would allow to find these positions automatically. The approach that we propose is based on the variational framework and uses ideas of the topological asymptotic analysis. In its original formulation (see, e.g., Soko lowski et al. [19], Garreau et al. [11], Feijóo et al. [10]), this theory investigates a variation of a given objective functional depending on some domain in R n with respect to the subtraction from it of a small ball. This variation is a scalar function, called the topological gradient or the topological derivative, and its largest negative values indicate positions where a small ball should be removed in order to obtain steepest descent of a functional. In our considerations, we base on the idea of Amstutz [2], who proposed to modify the definition of topological gradient and investigate a variation of a given functional with respect to a change of certain material properties, and not a domain topology. For the edge detection problem, we consider a functional which definition is inspired by the Mumford-Shah segmentation model [15]. We investigate its variation with respect to perturbations of the edge indicator function, which are obtained by setting values of this function close to zero in positions of small balls covering an optimal edge. Such formulation is closely related to the original concept of topological optimization, as a solution of some differential equation defined in a perforated domain may be approximated by a solution of this equation in a whole domain but with nearly isolated inclusions. Besides, such approach allows to overcome some technical difficulties as well as provides a link to the theory for identification of small inhomogeneities from boundary measurements (Cedio-Fengya et al. [6], Vogelius et al. [20], Capdeboscq et al. [5]).
Recently, the idea of topological asymptotic expansion has been applied to the problem of image segmentation and line segment detection in Grasmair et al. [12] and Beretta et al. [3]. The functional considered in these papers is the Γ-convergence approximation of the Mumford-Shah model [15]. In the first paper, the authors investigate its variation with respect to perturbations of the edge indicator function by small balls, whereas in the second one, by thin stripes. In both these papers, the derived topological asymptotic expansion is the basis for the construction of an algorithm for computing an approximate minimizer of the proposed functional. However, the problem of finding an optimal threshold value for the edges set is not considered. We would like to remark that the preliminary version of the algorithm introduced in the present paper has been already proposed in [16]. This work, however, contains some inaccuracies which we improve here. Moreover, we have added new results and completed missing proofs.
In order to precisely formulate the problem of edge detection which we aim to solve, first we need to define what we mean by a digital image and by the set of edges contained therein. Let Ω be a set of pixels, each of the size δ × δ, which centres form nodes of a regular grid. We define a digital image as a bounded function f : Ω → R and assume that values of this function are constant within each pixel in Ω. We denote the set of centres of all pixels in Ω by X and assume that the gradient ∇u of the function u : Ω → R is defined by the standard central finite difference scheme on a grid with nodes in X. Throughout this paper, we use the following definition for the set of edges: Definition 1.1. Let u be a smoothed version of the image f . Then, the edges set of f is a set of all pixels in Ω with centres in x ∈ X, such that |∇u(x)| 2 is greater than some positive constant (a threshold).
The above definition is exactly the same as assumed when one uses the basic technique for edge detection that consists on thresholding of the gradient magnitude of a smoothed image. Naturally, such characterization of an edge is very simplified and limited among others by the fact that it assumes an edge as a two-dimensional set, and not a curve of dimension one. Nevertheless, in this paper, we will restrict ourselves to such a concept. Our goal is to construct an algorithm for edge detection in which an optimal value for a threshold would be computed automatically.
In Section 2, we reformulate the problem of edge detection into the continuous setting and introduce results of topological asymptotic analysis of the considered functional. In Section 3, based on these results, we construct an algorithm for edge detection in a digital image and prove the existence of a unique value for an optimal threshold. In Section 4, we examine the proposed algorithm on a simple experiment and draw some analogies between its function and human detection of visual information. Later we compare results of numerical experiments obtained by application of our algorithm and the probabilistic one of Desolneux et al. [7,8]. We indicate some similarities between these two approaches to edge detection and discuss their differences.
2. Topological asymptotic expansion. In this section, we introduce results of topological asymptotic analysis, which will form the basis for construction of the algorithm. To derive these results, we need to reformulate the considered problem in the continuous setting. Here, we assume that Ω is an open and bounded subset of R 2 with the Lipschitz boundary ∂Ω and f : Ω → R is a given function in L ∞ (Ω). Furthermore, we assume that Y is a finite but not fixed set of points in Ω such that for some δ > 0, dist(y, ∂Ω) ≥ δ/2 for all y ∈ Y , and |y 1 − y 2 | ≥ δ for all y 1 , y 2 ∈ Y with y 1 = y 2 . Now, let ε > 0 be a sufficiently small number, such that ε < δ/4, and define the piecewise constant function α ε : Ω → R by where B ε (y) denotes a small ball of radius ε and center in y ∈ Y . Here α 0 and α 1 are positive constants, such that α 1 α 0 . We consider the problem of minimization of the functional where |Y | denotes the cardinality of the set Y and β ≥ 0 is a parameter playing a role of a threshold, what will be clarified further. The first term in J ensures that its minimizer with respect to u is close to f , the second term imposes to smooth f more in Ω \ y∈Y B ε (y), and the presence of the third term in J enforces minimal number of points in Y . We remark that the fact α ε (x) = α 0 for all x ∈ Ω implies Y = ∅ and We note that under slightly different but essential assumption on the set Y , it has been proven in Grasmair et al. [12] that if the last term in J is β2ε|Y | and α 1 = o(ε 2 ), then J converges in the sense of the Γ-limit as ε → 0 to the Mumford and Shah functional [15]. This means that the term 2ε|Y | in a model considered in [12] approximates the one-dimensional Hausdorff measure of the one-dimensional edges set. Therefore, one would suspect that it could be shown in a similar way that the last term in (2) converges to the two-dimensional Hausdorff measure of the edges set, understood in the sense of Definition 1.1. However, we note that due to conditions imposed to the set Y in the beginning of this section, which are needed for construction of our algorithm, the functional J defined in (2) converges to (3) as ε → 0, what can be shown in a straightforward way.
Let us first consider the problem of minimization of J(·, α ε ) when one variable, the function α ε , is fixed. Using standard methods of calculus of variations it can be shown that the functional J(·, α ε ) attains a unique minimum at the function u ε ∈ H 1 (Ω), which is a solution of the equation for all ϕ ∈ H 1 (Ω). The existence and uniqueness of a solution to the above equation can be proven by the Lax-Milgram theorem. We note here that for this proof, the assumption α 1 > 0 is necessary to ensure the coercivity of the bilinear form on the left hand side of (4).
Similarly, by u 0 ∈ H 1 (Ω) we denote a unique minimizer of the functional J(·, α 0 ), which is a unique solution of the equation for all ϕ ∈ H 1 (Ω). Moreover, the function u 0 is also a solution, in the sense of distributions in Ω, to the equations where n is the unit normal vector exterior to the boundary ∂Ω.
Proof. This proposition can be proven in a similar way as in Grasmair  For fixed β ≥ 0 and ε > 0, the asymptotic expansion in Proposition 2.1 provides a method for finding an approximate minimizer of the functional J with respect to the second variable α ε . It is not difficult to observe that in order to obtain the steepest descent of J one should include to the set Y admissible points y ∈ Ω for which |∇u 0 (y)| 2 achieves next maximal values and such that We remark that in the continuous setting, this method allows to find only an approximate minimizer of J, which in general is not unique.
Assume now that the set Y is fixed and observe that the functional J in α ε can be written as We introduce the definition of derivative of J(u 0 , α 0 ) at Y , given by where h ε : Ω → R is the sequence of functions with support in y∈Y B ε 2 +ε (y), satisfying the following conditions: Using properties of the function h ε , the result in Lemma 2.1 and the fact that |∇u 0 | is bounded in any compact subset of Ω, we obtain that The definition (10) can be seen as generalization of the classical Fréchet-Volterra derivative (see Donsker and Lions [9] or Nashed and Hamilton [17]) and provide the total rate of a change of the functional J as α 0 varies in Y but remains unchanged for Ω \ Y . Since we wish to keep α 0 and α 1 constant, to ensure the optimality condition we choose β so that the right hand side in (11) would be equal zero.
In the next section, we propose an algorithm for edge detection in a digital image that has been developed based on the results presented above.
3. Algorithm for edge detection. Here we return to the discrete formulation of the edge detection problem which has been introduced in the first section of this paper. It means, we assume that the image domain Ω is a set of pixels, each of the size δ × δ, which centres form nodes of a regular grid. The digital image f : Ω → R is a function which assigns to each pixel in Ω some gray level value. We recall that the set of centres of all pixels in Ω is denoted by X and note that for the algorithm we can simply take δ = 1. Further, in order not to complicate the notation, by ∇u 0 (x), we will denote the discrete gradient of a numerical solution to the problem (6) in the point x ∈ X computed by the standard central finite difference scheme on a regular grid with nodes in X.
Let P(X) be the set of all subset of X and assume that [0, b] is a non-empty interval in R with (12) b We define the function G : and observe that this function is a discrete version of the dominant term in the asymptotic expansion given in Proposition 1. Moreover, throughout this and the next section we will use the notation Y β for the set of all points y ∈ X satisfying the inequality Now we can prove the following result: Proof. To prove this proposition we will show that G(Y β , β) ≤ G(Y, β) for all Y ∈ P(X). Let Y be an arbitrary set in P(X). Using the definition (13) and the fact that for every The above result allows to find a minimizer of G over all sets Y ∈ P(X) when a value of β is fixed. However, we note that this minimizer is not unique. It is not difficult to observe that G(Y β , β) = G(Ȳ β , β) whenȲ β ⊃ Y β is a set in P(X) such that for every y ∈Ȳ β \ Y β , a value α 0 (α 0 − α 1 )/(α 0 + α 1 )|∇u 0 (y)| 2 is equal β. We also observe that in some cases, for two different values of β we may obtain the same set Y which minimize G. This fact can be illustrated by considering, for instance, a synthetic image with a black object in a white background, where for all values of β greater or equal 0 and less than α 0 (α 0 − α 1 )/(α 0 + α 1 ) min x∈X (|∇u 0 (x)| 2 : |∇u 0 (x)| > 0), the same set of edges will be detected. Nevertheless, combining the result in Proposition 2 and the fact that the last term in asymptotic expansion (8) depends linearly on |Y |, we get that for a fixed value of β, the functional J attains its discrete minimum if Y ∈ P(X) is a minimizer of G(·, β) with the least cardinal number, i.e., when Y = Y β .
Further, we will show that there exists a unique β ∈ [0, b] and the set Y ∈ P(X) for which the functional J attains its discrete minimum and such that the optimality condition 1 2|Y Then, we have: , what implies that [0, b] is a complete lattice. Next we note that for all β ∈ [0, b]. Now let β 1 and β 2 ∈ [0, b] be such that β 1 < β 2 and denote by Y β1 and Y β2 the set of points in P(X) satisfying the inequality (14) with β 1 and β 2 , respectively. We havẽ and note thatF (β 1 ) −F (β 2 ) = 0 if Y β1 = Y β2 . Therefore, we have showed that the functionF is non-degreasing. From the Knaster-Tarski fixed point theorem it follows then, that there exists non-empty set of fixed points ofF , moreover, this set is a complete lattice. This means thatF has a least and a greatest fixed point. Now assume that β 1 and β 2 ∈ [0, b] are two arbitrary fixed points ofF such that β 1 < β 2 , and let us again denote by Y β1 and Y β2 the set of points in P(X) satisfying the inequality (14) with β 1 and β 2 , respectively. From Proposition 2 it follows that G(Y β1 , β 1 ) ≤ G(Y, β 1 ) and G(Y β2 , β 2 ) ≤ G(Y, β 2 ) for all Y ∈ P(X). Furthermore, the assumption that β 1 and β 2 are the fixed points ofF ensures that Y β2 Y β1 . It remains to show that Therefore, we have showed that the thesis of Proposition 3 holds with β being the least fixed point ofF and Y β its corresponding set of points in P(X).
We summarize the above results in the following algorithm: Inverse Problems and Imaging Volume 10, No. 2 (2016), 499-517 For a given image f , parameters α 0 and α 1 , perform the following steps: 1. Solve the problem (6) to obtain approximate values of u 0 in all x ∈ X.
2. Find the least fixed point β of the functioñ , where b is given by (12). 3. Find the set of edge points Y β .
In order to solve numerically the problem (6), one can derive the standard finite difference discretization on a regular grid with nodes in X. This yields a system of algebraic equations which can be efficiently solved using, e.g., the preconditioned conjugate gradient method. The least fixed point β of the functionF can be found by the iteration: β 0 = 0, β n =F (β n−1 ), for n = 1, 2, ..., with the stopping criterion β n − β n−1 = 0. The convergence of the sequence {β n } to the least fixed point ofF guarantees the Kleene fixed point theorem.

A simple experiment.
In this section, we examine the proposed method for edge detection by performing a simple experiment on synthetic images. For this experiment, we consider images composed of two circular objects, O 1 and O 2 , with constant gray intensity values. Our aim is to derive conditions which need to be satisfied so that an edge of object with lower contrast would be detected by Algorithm 3.1. All considered cases are illustrated in Figure 1. In the image presented in Figure 1(a), the object O 2 seems to be more perceptually meaningful than O 1 , thus we are interested in the relation between contrasts of these two objects that should be satisfied so that an edge of object O 1 would be classified by Algorithm 3.1 as meaningful. The case in Figure 1(b) is bit different. Two objects presented in this image have the same low contrast but different sizes, however both seems to be meaningful. We would like to verify whether Algorithm 3.1 will detect their edges. Considering images in Figures 1(c) and 1(d), we ask what size should have an object with lower contrast in order Algorithm 3.1 would classify its edge as meaningful.
To analyse all the cases mentioned above, we denote by Y 1 and Y 2 the set of edge points of object O 1 and O 2 , respectively. Assume that edges of both objects are detected by Algorithm 3.1. Then, there exists β such that F (β ) = 0 with Denoting by µ 1 = min y∈Y1 (|∇u 0 (y)| 2 ) and µ 2 = min y∈Y2 (|∇u 0 (y)| 2 ) a minimal contrast of object O 1 and O 2 , respectively, we get that which leads to (15) From the other side, we have that the optimal parameter β satisfies the inequality for all y ∈ Y β . Thus, combining (15) with (16) and denoting by µ = min(µ 1 , µ 2 ), we obtain The above inequality is the necessary condition for detection of both objects, O 1 and O 2 , by Algorithm 3.1. As a corollary we get that if (17) does not hold, then the object with lower contrast will not be detected. Let consider all cases mentioned in the beginning of this section. First, assume that µ 1 < µ 2 and sizes of two objects are arbitrary (see Figures 1(c) and 1(d)). Then, according to the condition (17), the edge of object O 1 will be detected by our algorithm if We note that in the particular case when |Y 1 | = |Y 2 | (see Figure 1(a)), the above relation reduces to what roughly means that the contrast of object O 1 must be equal to at least one third times the contrast of O 2 in order edge of this first one would be detected. Next, assume that µ 1 = µ 2 (see Figure 1(b)). Then, from (17) it follows that an edge of the object O 1 (and O 2 as well) will be detected if α 1 < α 0 . Therefore, edges of two objects with µ 1 = µ 2 always will be detected by Algorithm 3.1, independently on their sizes and intensity value.
In a similar way as above, we can derive a condition that has to be satisfied so that an edge of the pth object among m presented in an image f would be detected by Algorithm 3.1. This observation leads to the definition of a meaningful edge in the sense of our approach:

5.
Comparison of results. In this section, we compare and discuss results of numerical experiments obtained by application of Algorithm 3.1 and the approach proposed by Desolneux et al. [7,8]. To make this paper self-contained, we briefly introduce the basic notation of this latter one. Following Desolneux et al. [8,Sec. 9.3], we define a level line at the level λ > 0 as the boundary of a set of pixels in Ω for which values of the image f are such that f (x) ≤ λ (or f (x) ≥ λ). The probability for a point on any level line to have contrast larger than µ > 0 is defined by H(µ) := 1/M |{x ∈ Ω : |∇f (x)| ≥ µ}| , where M denotes the number of pixels in Ω, such that |∇f (x)| > 0. Furthermore, by N llp we denote the total number of pieces of level lines in the image f , which are computed at quantized levels λ 1 , . . . , λ k . In the case when an image ranges from 0 to 255, then the quantization step q is equal 1. The number N llp is computed in the following way. For each level line L i with length l i , we compute its number of pieces, sampled at pixel rate. Then, we have N llp = i l i (l i − 1)/2.  In Figure 5, we present results of edge detection obtained by application of the algorithm of Desolneux et al. [7,8]   In Figure 6, we also present results of edge detection using Algorithm 3.1 for fixed value of α 0 = 1 and two different values of α 1 (i.e., α 1 = 0.1 and α 1 = 0.3). We observe that results for α 1 = 0.1 are similar to these for α 1 = 0.001 (see Figure 4).  In the second experiment we were testing performance of both algorithms on the set of synthetic images composed of two circular objects with different intensity values and sizes but the same background intensity equal to 255. Images used for this experiment are presented in Figure 7. The gray intensity of brighter disks in the upper row of Figure 7 is equal to 200 and darker ones to 150, whereas gray intensities of corresponding disks in the lower row are equal to 240 and 10, respectively.  Figure 7. In order to solve numerically the problem (5), we have applied the standard finite difference discretization. The derived system of algebraic equations has been solved using the Matlab function pcg, which is an implementation of the preconditioned conjugate gradient algorithm. To obtain a solution with the tolerance for relative residual equal 10 −6 , the algorithm needed 50 iterations for α 0 = 10 and 5 iterations for α 0 = 0.01, on average. For comparison, in Figures 12 and 13 we present results of edge detection obtained by application of the algorithm of Desolneux et al. [7,8] to slightly smoothed version of images in Figure 7 and for parameters ε = 0.1 and ε = 0.01, respectively. In both cases we set q = 1.
To summarize, first we would like to note that both considered in this section approaches theoretically fail in the case when applied to detect edge in clean images composed of objects with constant intensity values. The approach proposed in this paper requires to set some positive value for the regularization parameter α 0 and to smooth an input image f . Otherwise, if α 0 = 0, then a minimizer of G is Y = ∅, and therefore, we obtain a trivial solution. Such a problem occurs in the approach proposed by Desolneux et al. [7,8] as well. When one wants to detect edge in an image composed of a single object with constant gray intensity, the probability for a point on any level line to have contrast larger than minimal positive value of |∇f | over all x ∈ Ω is always equal to 1, and therefore, no edge is ε-meaningful when ε = 1. Moreover, using the same argument, we can deduce that in the case when an input image contains more than one object, edge of the one with lowest contrast will not be detected.
Although there are still few parameters involved in both algorithms, numerical experiments confirm that small changes of their default values do not have much influence on obtained results, and in fact, the edge detection procedure can be carried out automatically for large class of images. In order to apply Algorithm 3.1, the only parameter which value has to be indeed adjusted is α 0 . We note that our approach does not involve a parameter that would correspond to the quantization step q in the algorithm of Desolneux et al [7,8].
Finally, we note that one of the most important feature which is common for both approaches is their global character. It means, the edge detection result depends on choice of a window within which a given algorithm is applied. In the case of our approach, this fact is confirmed by numerical experiments (see, e.g., results in Figure 3) and can be also analytically verified by considering simple synthetic images as in Section 4.    al. [7,8] to corresponding images in Figure 7 with ε = 0.01 and q = 1.