A MINIMAL SURFACE CRITERION FOR GRAPH PARTITIONING

,


1.
Introduction. Given a graph G = (V, E) with non-negative edge weights {w ij } (i,j)∈E , we consider the problem of "optimally" partitioning the vertex set, V , into k subsets. The challenges to partitioning are that datasets are typically large-scale and high-dimensional, common mathematical formulations of the problem lead to NP-hard problems, and the criterion of optimality is application dependent. Here we propose a model for graph partitioning based on an extension of the Beltrami energy to graphs. To motivate this model, we consider the following isoperimetric problems.

1.1.
Motivating minimal surface problems. Plateau's problem is to find a minimal surface with a prescribed boundary condition [48]. That is, if Ω ⊂ R 2 is a bounded open subset of the plane and f : ∂Ω → R, Plateau's problem is to find the parametric surface, u(x), with minimal surface area that satisfies the prescribed boundary conditions u(x) = f (x) for x ∈ ∂Ω. Physically, this solution can be interpreted as finding the soap film that covers a closed wire, described by the set {(x, f (x)) : x ∈ ∂Ω} ⊂ R 3 . One approach to solving this problem is to introduce the functional where α > 0, and minimize over functions of bounded variation, min u∈BV (Ω) The first term of J is the surface area of the surface (x, u(x)) while the second term penalizes where u violates the boundary condition, but allows for a discontinuity at the boundary. It can be shown that the infimum of this problem is attained and for convex domains, some regularity results of the minimizer can be shown [30]. A related problem is to find the surface with minimal area that encloses a volume V above the plane and is zero on the boundary, u(x) = 0 for x ∈ ∂Ω. This problem can be similarly expressed as the optimization problem Again we can interpret this problem in terms of a soap film. We imagine a closed wire in the shape of ∂Ω that lies flat on the Euclidean plane. The wire is covered in a film of soap and a person slowly inflates ("blows on") the soap film to form a bubble enclosing volume V . The shape of the bubble is then described by the parametric surface (x, u(x)) ∈ R 2+1 . Of course, if the person blows too much air at the soap film, then the soap film will "bubble out" and no longer be a parametric surface of this form. At this value of V , the regularity of the solution to (1) breaks down and there is a discontinuity at the boundary. We don't dwell on these issues here, as we will consider a discrete version of (1), where the questions of well-posedness and regularity are much simpler. As an illustration, for V = 1 and a domain Ω in the shape of a "pinched oval" (see Figure 1(left)), the optimal surface is plotted in Figure 1(center). Details of this computation are discussed in Section 5.4.
We now informally consider a "foam" consisting of k interacting bubbles of equal volume that are confined to a domain Ω. If bubble i is contained in Ω i with Ω = k i=1 Ω i , satisfying Ω i ∩ Ω j = ∅ for i = j, then we could solve (1) on each subdomain, Ω i to obtain the shape of bubble i, say u i (x) for x ∈ Ω i . The total surface area of the foam is given by Ωi Of course, in this scenario the bubbles are also going to exert a force on one another to try to reduce the overall surface tension. For example, if one of the Ω i is very small, then bubble i will exert a force on its neighbors and the domain Ω i will enlarge as to decrease the overall surface tension. To model this behavior, we consider minimizing the total surface area (2) over all k partitions of the domain Ω = k i=1 Ω i such that u i is the solution to (1) on each subdomain, Ω i . This motivates the following isoperimetric question: what k partition of the domain minimizes the sum of the minimal surface areas, (2)? That is, we seek to solve the optimization problem, Here, the minimum is taken over all disjoint k-partitions of Ω. For the pinched oval domain, we plot the optimal partition for k = 2 in Figure 1(right). On each subdomain, Ω i , we also plot the surface u (x) which attains the minimum in the isoperimetric problem (1).
There are two interesting limits for (3). 1. In the limit that V → 0, the first term in the objective function in (1) is equivalent to the Dirichlet energy. Optimal Dirichlet partitions have been considered in a variety of papers [5,6,7,8,9,10,12,14,15,32,33,34,45,46]. One difference is the constraint; we consider the L 1 constraint and not the L 2 constraint since it is convex and has a physical interpretation in terms of the minimal surface problem. If, in addition, α → ∞, (1) is equivalent to the torsion problem and can also be interpreted as the expected exit time for Brownian motion from the domain [13,61]. 2. For large V , the optimal solution is almost flat on each partition component and there is a large discontinuity at the interfaces of the partition components. The problem limits to the classical problem of finding a partition of k equal-volume components for which the surface area of the interface between components is minimal [28,31,47,56]. Finally, we remark that Pólya and Szegő also considered generalized energies of the form in (1) in their study of symmetrization techniques [49].
In this paper, we formulate graph analogues of (1) and (3) and show that these problems are well-posed. To solve the bi-level optimization problem efficiently, we propose a primal-dual hybrid gradients method for the computation of the ground state of each partition (inner problem) and a rearrangement algorithm to find the optimal partitions (outer problem). We prove that the primal-dual hybrid gradients method is convergent and that the iterations of the rearrangement algorithm do not increase the objective value. We apply the proposed graph-partitioning method to a variety of example problems in graph partitioning coming from manifold discretizations, clustering and transductive learning, and image segmentation.

1.2.
Outline. In Section 2, we give a brief background on the use of the Beltrami energy in image analysis and introduce terminology in order to formulate the graph analogues of (1) and (3). In Section 3, we introduce the graph analogue of the Beltrami energy, formulate a graph partitioning problem based on this energy, and give a rearrangement algorithm for solving the outer problem. In Section 4, we show how the inner problem can be solved using primal-dual optimization methods. In Sections 5 and 6, we apply these methods to a variety of graph partitioning examples. Finally, we conclude in Section 7 with a discussion.

2.
Background. In this section, we briefly review how the Beltrami framework has been used previously for image analysis and introduce some graph terminology in preparation for the introduction of the Beltrami graph functional.
2.1. Beltrami framework and minimal surfaces. The Beltrami framework was introduced in the late 1990s for geometric image regularization. Adopting this geometric viewpoint amounts to interpreting objects such as images, shapes, or vector fields as surfaces embedded in some higher dimensional space [58]. The Beltrami framework associates the spatial coordinates along with the features by defining a Monge surface, say X : (x, y) → (x, y, I(x, y)) for a 2D scalar image. The Beltrami energy then measures the area of this surface and its minimization amounts to regularizing the image without losing important edge details.
More formally, the Beltrami embedding associates the spatial coordinates with the features. For a scalar function I : Ω → R, we define the diffeomorphism X : Ω ⊂ R n → Ω×R, by x → (x, I(x)). The embedding space of the manifold M ⊂ Ω×R has coordinates with dimensions of a different nature, namely spatial components versus feature components. The relative scaling between these components is defined by a metric tensor h ij that incorporates tuning of the aspect ratio, by a factor β, h ij := diag(1, . . . , 1, β 2 ). A metric tensor g µν associated with the original domain Ω is obtained through a pullback, g µν := β 2 (∇I ⊗ ∇I) + Id n , where ⊗ denotes the outer (tensor) product and Id n is the n × n identity matrix. The determinant of the metric tensor is g(I) := det g µν = 1+β 2 |∇I| 2 , and can be seen as a generalized edge detector. Finally, the Beltrami energy integrates this edge detector to compute the volume of the manifold M: The Beltrami energy is a measure of image regularity and generalizes to more complicated (co)domains [37]. The Beltrami functional is superficially similar to the regularized TV functional [52]. However, while regularized TV can be viewed as the mere smoothing of the TV functional, e.g., Ω |∇u| ≈ Ω 1 + −2 |∇u| 2 with 1, for numerical purposes, the ability of the Beltrami functional to interpolate between the TV and Dirichlet functionals has proven beneficial in many instances [4,21,27,36,39,41,50,51,53,58,57,60,63].
For β = 1, J β,Ω [u] is the area of the Monge surface given by the graph (x, u(x)) ∈ R d+1 , where x ∈ Ω ⊂ R d . Consequently, properties of J 1,Ω are central to the study of minimal surfaces [24,30].

2.2.
A discrete calculus for graphs. In this section we recall several definitions of graph-based differential operators, which we then need to generalize the Beltrami energy to graphs. Let G = (V, E, w) be an undirected weighted graph with vertex set V = {i} N i=1 , edge set E = {(i, j)} for some i, j ∈ V , and non-negative edge-weights w : E → R. For notational convenience, we extend w to V ×V by setting w(i, j) = 0 if (i, j) / ∈ E.

Inner products and norms on graphs.
Definition 2.1. For functions f, g : V → R, we define the L 2 inner product, Definition 2.2. For skew-symmetric 1 edge functions u, v : E → R, we define the L 2 inner product, We recover L 2 (V ) for S = V and p = 2.
Definition 2.4 (Magnitude). We define the magnitude |v| ∈ L 2 (V ) of v ∈ L 2 (E), 2.2.2. Differential operators on graphs. We define the following operators on graphs, which can be viewed as discrete differential operators.
Definition 2.5 (Graph gradient). For f : V → R, define the graph gradient to be the edge vector, ∇ w f : E → R, with entries consisting of weighted vertex differences, Definition 2.6 (Graph divergence). For skew-symmetric v : It is easy to verify that the graph gradient and divergence are adjoint using the vertex and edge L 2 inner products, Definition 2.7 (Graph Laplacian). The graph Laplacian, ∆ w : L 2 (V ) → L 2 (V ) is defined as the composition of graph divergence and graph gradient,

Definition 2.8 (Dirchlet Energy). The Dirichlet energy of
Definition 2.9 (Total Variation). The isotropic total variation 2 of f :

3.
Proposed geometric graph partitioning model. With the definitions of inner products and differential operators on graphs from Section 2 in hand, we next define the Beltrami energy on graphs.
In this section, we assume that G = (V, E, w) is a given weighted graph, where the weights are determined by the application. These weights then propagate to the graph objects defined in Section 2.
3.1. Definition and properties of the graph Beltrami energy. Definition 3.1. We define the graph Beltrami energy of u : V → R: Lemma 3.2. The graph Beltrami energy, J β , defined in (6), is a proper, closed, convex function on L 2 (V ).
The function f is non-decreasing and strictly convex on R N + . Further, define If we can show that g i for i = 1 . . . N , are convex functions, then J β is convex since To see that g i : R N → R is convex, we define This h i is non-decreasing and convex on R N + . We then define γ i : Since (γ i ) j for j = 1, . . . N , are convex and g = h • γ, then g is convex.
Remark 1. For a graph which corresponds to a square lattice, the graph Beltrami energy approximates the Beltrami energy as considered in Section 1. Namely, let Ω ⊂ R 2 , V correspond to the set of points in a square lattice, {x i , y i } i∈Z 2 , which are a subset of Ω, and the edge set, E, connect vertices with their four nearest neighbors with weight w(i, j) = 1 2 h −1 , where h is the lattice mesh size. Then, for β = 1, the graph Beltrami energy approximates the surface area of the discretized parameterized surface (x i , y i , u i ) where u i = u(x i , y i ) for some u ∈ H 1 (Ω).
Remark 2. We note that varying β allows interpolating between the graph Dirichlet energy (β → 0) and graph total variation (β → ∞), This graph-based definition makes the benefits of Beltrami regularization available for data defined on graphs, and applies, e.g., to clustering or segmentation, with numerous applications in machine learning.
3.2. Minimal Beltrami energy. Motivated by the minimal surface problems discussed in the introduction (Section 1), we consider the following isoperimetric graph problem. Let S ⊂ V be a vertex subset. On S, we consider the vertex function u , which has unit L 1 (V ) norm, vanishes on the complement S c := V \ S, and has minimal Beltrami energy, i.e., satisfies Here J β is defined in (6).
are both closed convex.
Proof. The set U is simply the standard simplex and thus closed convex. U 0 is closed convex since it is the intersection of closed convex sets.
It now follows from Lemmata 3.2 and 3.3 that (8) is a convex optimization problem, so there exists at least one solution. The respective minimizers of these two problems are related by z = βu , while the optimal values are not equal.
In terms of the graph (V, E, w) described in Remark 1, we interpret J β [S] for β = 1 to be the minimal surface area of the unit-mass distributed over the set S ⊂ V .
3.3. Minimal Beltrami energy graph partitioning. We now consider a vertex partition of the graph, V = k =1 V . To this vertex partition, we associate an energy, which is the sum of the minimal Beltrami energies, . We then consider the Beltrami graph partitioning problem,

Remark 4.
Since there are only a finite number of partitions, the minimum in (9) is attained by some partition.
3.4. Boundary conditions relaxation. An interesting route of rendering (9) accessible to efficient computational optimization has been proposed for the closely related Dirichlet energy graph partitioning problem in [10,46]. The approach consists of relaxing the Dirichlet boundary conditions of the inner minimization problem by replacing them with a penalty. This method also appears for enforcing non-natural boundary conditions in the finite element method [2]. Indeed, the boundary conditions are equivalently obtained by imposing the partial L q norm of u to vanish on S c : (10) u| S c = 0 ⇔ u q S c ,q = 0. Instead of imposing this constraint strictly, one can penalize violations by including the partial L q norm in the optimization problem. Natural choices include q = 2 (quadratic penalty) and q = 1 (absolute error).
The relaxed partitioning model then becomes: where the inner model (8) is relaxed to Proposition 1. Fix β > 0 and S ⊂ V . For any α > 0, the relaxed problem (12) is a convex optimization problem and thus has a minimizer. As α → ∞, the minimum value of J α β converges to the minimum value of J β , Also there exists a sequence of values α k with corresponding minimizers u k of (12) such that the u k converge to a minimizer of (8).
Proof. The relaxed boundary condition term, α q u q S c ,q , is closed convex, J β is closed convex (Lemma 3.2), and the constraint set U is convex (Lemma 3.3), so (12) is a convex optimization problem.
We compute d dα where u α is a minimizer for J α β . For any finite α, we have that For the reverse inequality, we consider an increasing sequence There exists aũ and a subsequence of {u k } k (labeled using the same index) such that u k →ũ. Passing to the limit we see that ũ q S c ,q = 0, so u ∈ U 0 (admissible for (8)). Thus we have that lim . This holds for any limit point of a sequence of minimizers.
3.5. Rearrangement algorithm for the bi-level optimization problem (11). With the relaxation of the boundary condition in place, a local solution of the bi-level optimization problem can be found using the following rearrangement algorithm, extended from [46].
Repeat the following two steps until stationarity: 1. Given a partition V = V , compute a minimizer u for satisfying (12) with S = V for each = 1, . . . , k. (This is the inner problem for (11).) 2. Given minimizers u , reassign partitions by the "winner takes all" rule, The relaxation of the boundary conditions in (12) is crucial for this algorithm, as it causes the ground states to have overlapping support so that the arg max in the reassignment step is non-trivial. This two-step iterative algorithm is also reminiscent of the structure of popular clustering algorithms like k-means [40] and expectation-maximization (EM) for Gaussian mixture models [22]. This iterative algorithm also shares many attributes with diffusion generated motion by mean curvature [25,29,42,43,44,62]. Proof. Let V and V + denote the old and updated vertex partitions, respectively, and u and u + the corresponding minimizers of the inner problem (12). The old objective satisfies which is the value of the updated objective. The first inequality holds by construction of the algorithm and the second inequality holds by the optimality of u + in (12).
In practice, we observe that the rearrangement algorithm converges in a small number of iterations.
4. Primal-dual methods for solution of the inner problem. The computation of the minimizers u satisfying (12) with S = V for each = 1, . . . , k is the computational bottleneck of the proposed optimization scheme in Section 3.5. Thus, it is important to make this step as efficient as possible. By Proposition 1 these inner problems are convex. There is a huge corpus of algorithms that apply to these convex optimization problems. Recently, primal-dual splitting algorithms have gained particular attraction [24], most notably so in the context of TV and L 1 -type problems in imaging, e.g., [16,26,65,66]. Zosso and Bustin have applied the theory to the particular case of the Beltrami functional [58], used as an interesting regularizer in imaging problems that interpolates between the classical H 1 and TV regularizers [68].
In Subsection 4.1 we show how the inner problem (12) can be formulated as a saddle point problem. In Subsection 4.2 we summarize a general family of algorithms which we adapt to our problem in Subsections 4.3-4.5. In Subsection 4.6 we present a semi-supervised extension to our algorithm.

Saddle point formulation.
We recall the definition of the convex conjugate (a.k.a. Legendre-Fenchel transform) of a function: The convex conjugate of a closed convex function is again a closed convex function. Also, the biconjugate, f * * := (f * ) * , is the largest closed convex function with As the following Theorem shows, convex problems of a certain form can be rewritten as a primal-dual problem using the Legendre-Fenchel transform. This saddle point formulation of the problem will be exploited by the numerical methods employed here. . Let F : W → R be a closed and convex functional on the set W , G : X → R a closed and convex functional, and K : X → W be a continuous linear operator. Then we have the following equivalence: where x and φ are the primal and dual variables, respectively, F * is the convex conjugate of F , and W * = dom F * is the effective domain.
Comparing (12) with the primal-dual model in Theorem 4.2, we make the following identifications: In particular, we thus define the reflexive spaces X = U and W = L 2 (E) and The Legendre-Fenchel transform F * of the function F is found as: which is closed convex. In Lemma 3.2 it was shown that F is convex. Thus, using Theorem 4.2, we now get the equivalent primal-dual problem: Further, using the adjoint relation (5) of the graph differential operators, (18) can be rewritten: Thus, by Theorem 4.2, (19) and (12) are equivalent.

4.2.
Alternating direction minimization. Having established the general primal-dual saddle point problem (16), the saddle-point can now be found iteratively, for example by the backward-backward splitting algorithm by Chambolle and Pock [16]. This algorithm has also been classified as a modified primal-dual hybrid gradient method (PDHGM) in [26]. The iterative steps are: The first two steps are the proximal update of the dual and primal variable, respectively. The third, extra-gradient step is an overrelaxation of the primal update in order to overcome the stepsize shortening typical of first order methods. Relaxation parameter θ = 0 corresponds to the Arrow-Hurwitz algorithm, whereas faster convergence can be achieved for θ = 1 [16]. In general, O(1/N ) convergence, where N is the number of iterations, has been shown for fixed r 1 , r 2 , such that where K is the operator norm/induced norm of the linear operator K, or equivalently K 2 = K * K is the induced norm of the composition K * K. It should be mentioned that the above PDHG-algorithm was preceded by TVminimization by Zhu and Chan [65,66], which solves the minimization steps only approximately, through an explicit step of gradient ascent/descent followed by projection onto the convex sets W * and V , respectively, and without the extra-gradient step: where P S (·) = arg min s∈S s − · 2 2 denotes Euclidean projection on the closed convex set S. This is an instance of Arrow-Hurwicz-Uzawa algorithm [1]. Our algorithm will combine Chambolle and Pock's implicit minimization steps with Zhu and Chan's projections onto convex sets.

4.3.
Primal-dual hybrid gradients for the inner problem. The saddle-point problem of the inner optimization is given in (18). We adapt the general primal-dual hybrid gradients algorithm in (20) to (18) as follows: We identify in our case K = ∇ w , and more importantly K * K = ∆ w . The general O(1/N ) convergence guarantee (21) of [16] applies if where ρ(∆ w ) denotes the spectral radius of the graph Laplacian.
In the following subsections, we show how the respective minimization subproblems (23a) and (24a) can be solved efficiently.

4.4.
Dual variable update. We first address the update of the dual variable φ. The minimization problem of the dual variable update (23a) can be rewritten as We propose to solve this in an iteratively reweighted least squares (IRLS) approach as follows. We observe that the objective in (26) without the square-root in the denominator is quadratic in φ. We fix the current estimate of φ in the square-root to obtain a weighted least squares problem, this weight is then updated, and the process is repeated until convergence: Our focus is on the solution of the fixed-point minimization step (27b). Lemma 4.3. For small enough r 1 , the solution to (27b) is given by Proof. The solution to the unconstrained optimization problem (29) arg min is given by (28). It remains to show that, for sufficiently small r 1 , ψ t+1 ∈ W * and is therefore the solution of the constrained problem (27b) as well. In (28), for each i, the coefficient in front of the square brackets is less than 1. A geometric illustration of the contraction is given in Figure 2(a). A sufficient condition for ψ t+1 ∈ W * is φ n + r 1 (∇ wū n ) ∈ W * , or equivalently: We compute for each i ∈ V : Since φ n ∈ W * , for is the degree of vertex i in the graph, the condition is met.
Proof. Using Lemma 4.3 and denoting α(i, j) := φ n (i, j)+r 1 (∇ wū n )(i, j), for (i, j) ∈ E, the iterates in (27) can be written We then have that Thus, in the iterative scheme (27) there are no dependencies between the vertices. We claim that the mappings for every i ∈ V are contraction mappings on the interval [0, β] and hence by the contraction mapping theorem, each has a unique fixed point in [0, β]. To show that f i is a contraction mapping, we compute From (30), we have that |α| i ≤ β and thus where ξ = βx 2r1 . A plot of the function g(ξ) reveals that it is bounded by 0.16 for all ξ ≥ 0. We have that |f i (x)| ≤ 0.08 β 2 r1 < 1 for every x, implying that f is a contraction mapping. In practice we take the value of r 1 to be much smaller than the sufficient condition given in Lemma 4.4 and have not encountered any convergence issues.

4.5.
Primal variable update. We now turn our attention to the minimization problem (24a), associated with the proximal update of the primal variable u. There are two cases to be considered, corresponding to the two choices q = 2 and q = 1 of the penalty term of the relaxed Dirichlet boundary condition. As will be shown shortly, both problems admit a simple closed form solution u q , for the unconstrained problem with u ∈ L 2 (V ). To approximate the optimal solution within U , we then project u q onto U , which is a reasonable approach for practical purposes, given the convexity of the involved functionals and the set U . Indeed, this mimics the projected gradient steps in the primal update of the Arrow-Hurwitz-Uzawa algorithm (22b), modulo the backward update used here instead of explicit, forward gradient descent. We first have to find the respective u q . 4.5.1. Primal variable update for q = 2. Let us consider the relaxed problem u ∈ L 2 (V ): The minimization problem is quadratic and admits the following closed form solution: where χ is the characteristic function of the current partitioning, and the update can be computed by simple vertex-wise operations.

4.5.2.
Primal variable update for q = 1. Let us in turn consider the relaxed problem u ∈ L 2 (V ) of the q = 1 based problem: We first note that an equivalent minimization problem is given by which is an L 1 -type problem solved by shrinkage: , where the soft-thresholding operator is defined as (37) shrink(z, τ ) := Again, we see that all operations are simple and acting vertex-wise. 4.5.3. Probability simplex projection. Here, P U : L 2 (V ) → U denotes projection on the probability simplex: To avoid special considerations regarding negative components, we first uniformly shift the function u so that it is non-negative at each vertex: Such a translation happens along a direction orthogonal to the probability simplex, and the projection is thereby not altered. The actual projection algorithm then practically distinguishes between two cases, depending on the L 1 -norm of the nonnegative u + ∈ L 2 (V ): If u + V,1 > 1, then P U (u) is obtained by an appropriate soft-thresholding of u + [23]. On the other hand, if u + V,1 ≤ 1, then the projection is obtained by distributing the lacking mass over all vertices, uniformly: where |V | denotes the number of vertices in V . Different cases of this projection are illustrated for a 2-dimensional u in Figure 2 (b). An efficient strategy for finding the appropriate thresholding parameter λ is given in [23].

4.6.
Semi-supervised extension. The proposed partitioning algorithm lends itself to a straightforward semi-supervised extension. Indeed, in Section 3.5 we defined the rearrangement algorithm as the outer step of the bi-level optimization scheme: Given the k minimizing u , = 1, . . . , k, each vertex is reassigned by the "winner takes all" rule. It is easy to modify this rearrangement step to keep fixed the given assignments of a subset of vertices. The motivation for such a semi-supervised extension is twofold: First, the semi-supervised extension allows to integrate prior knowledge in to the partitioning problem. In the context of data clustering, this is known as transductive learning, where the dataset to be clustered consists of both labeled and unlabeled points.
Second, when a graph is a discretization of a manifold with boundary, it is desirable to enforce extra conditions on some "boundary vertices", e.g., Dirichlet boundary conditions. A Dirichlet boundary condition can be weakly enforced on a set of vertices using the same method; this is explained with an example in Section 5.4.

5.
Experiments I: Graphs from discretized manifolds. In this first experimental section, we focus on the partitioning of graphs which are discretizations of manifolds. We consider several examples that have been previously studied and explore a few new problems. These graphs are local in the sense that vertices are adjacent only to their nearest neighbors. These graphs are particularly valuable examples because they are easy to visualize and the existence of analytical solutions in some cases provides instructional insight into the nature of the proposed method. Graphs which are not discretizations of manifolds are considered in Section 6.
The proposed experiments gradually increase in complexity. We start by partitioning a small graph representing a discretized line segment. We then increase the problem dimension and partition the discretization of a square domain with both Neumann and periodic boundary conditions. We then partition barbell and ring-like subdomains of the square graph using Dirichlet boundary conditions, to illustrate the use of the semi-supervised extension. Finally, we bi-partition a discretization of the cube with periodic boundary conditions. In each case, along with the locally optimal partition V and a composite of ground states, max u , we also report the value of the objective function minus the value N · k, In this section and in Section 6, we present results based on a Matlab implementation of the algorithms described in Sections 3 and 4. All computations were done on a 16-core 2.90 GHz Intel Xeon desktop computer with 128 GB of RAM. 5.1. Line graph. We start by constructing a line graph-the discretization of a line segment-with 20 vertices. We fix the parameter β = 1 and consider the bi-partition k = 2. A partition V is randomly initialized and our partitioning algorithm is applied for different values of α and q, respectively. The evolution of the partitions and their minimizing u is illustrated in Figure 3. These results Figure 3. Partitioning a line segment, for different α and q. In each column, the topmost subplot shows the initial (random) partition (colored circles) and the corresponding optimal u i . From one row to the next, the partition is rearranged according to §3.  suggest that the parameter q, which is the norm which penalizes the violation of the Dirichlet boundary condition, does not heavily affect the minimal partition, while the parameter α, which is the coefficient of the penalization, plays an important role: too large an α effectively pins the minimizing u in local minima, while too small an α doesn't enforce the desired boundary conditions. In this simple example, the iterative algorithm converges in only 5 outer iterations.

5.2.
Square lattice graph. We consider a square domain with free boundary conditions. To this end we construct a 120 × 120 square grid and a graph where each (non-boundary) vertex is connected to its four nearest neighbors; vertices at the edge are connected to their three nearest neighbors, and the four corner vertices are connected to their two nearest neighbors. Edge weights w are taken to be constant. For this graph, the optimal partitions are non-trivial. In Figure 4, we show some (locally) optimal partitions for k ∈ {3, 5, 7}, computed with our algorithm. We use constant β = 4 and q = 1 for all k, while the parameter α increases with the number of partitions. These results can be compared to the numerical experiments conducted in [5,6,7,32] for optimal Dirichlet partitions (lim β → 0 with an L 2 constraint on the ground state). Interestingly, we find that the partitions obtained for finite β are very similar to the optimal Dirichlet partitions. Namely, for k = 3, there appears to be a continuous family of 3-partitions with all components meeting at a triplejunction with similar values of energy. The partition for k = 5 with smallest energy is similar to a Dirichlet partition in [32]. Some of the obtained partitions also bear striking resemblance to steady-state configurations of Renyi-entropy minimizing space partitioning processes [19, Fig. 5].

5.3.
Torus graph (square lattice graph with periodic boundary conditions). We consider the 110 × 127 torus graph, obtained from a truncated square lattice by identifying opposing "boundary vertices". The aspect ratio 2 : √ 3 is chosen such that regular hexagons can potentially tile the manifold with correct periodicity. The edge weights, w, are taken to be constant. We initialize the partition with a Voronoi tessellation for randomly chosen generators. In Figure 5, we plot the k = 4, 9, 16, and 25 partitions computed for constant β = 4 and q = 1, while the parameter α increases with the number of partitions. We refer the reader to [9] for the analogous Dirichlet partitioning problem.
To illustrate the periodicity of the minimizers, we also show a tiling of the obtained partitions in Figure 6. Indeed, for k ∈ (2N) 2 , the two-periodic square partitioning problem admits a tiling of regular hexagons and our numerical experiments suggest that this is the optimal partition for the Beltrami energy. It has been proven that the hexagonal tiling minimizes perimeter subject to a fixed volume constraint [31]. It has also been conjectured that the optimal partitions for the Dirichlet objective function are hexagonal [15]; see also [10,46] for numerical support of this conjecture.

Discretizations of a barbell and annulus.
After having considered domain partitioning with free and periodic boundary conditions, we now make use of the semi-supervised extension to partition subdomains of the square with Dirichlet boundary conditions. To this end, we again construct a square grid with edge weights as in Section 5.2. We relax the Dirichlet boundary condition, by introducing an additional partition component label, say 0 / ∈ {1, . . . , k}, to indicate which vertices are outside the intended domain. The labels for these vertices are keep unchanged during the rearrangement step and in the rearrangement step we take u 0 ≡ 0 so that this label does not compete with the others.
As a first example, we look at the "pinched oval" or "barbell" domain as illustrated in Figure 1. We compute locally optimal partitions for k = 3, 4 and 5. Representative local minimizers are shown in Figure 7.
Our second example is a punctured disk: an annulus with varying size hole. Depending on the relative size of the hole, the optimal partitions differ significantly, as shown in Figure 8, for k = 8. For a relatively large hole radius r = 0.5 (half the disk radius), the eight locally optimal partitions are simply radially distributed on the ring. For very small puncture r = 0.05, on the other hand, we find a less symmetric partition. As a matter of fact, for small hole size and too small α, a single partition can "surround" the puncture. The most interesting locally optimal  5.5. Periodic volume partitioning. We consider partitions of a triply-periodic volume, [0, 2π] 3 . We recall the triply-periodic minimal surface found by Hermann Schwarz [55] referred to as the P -surface ("primitive") and approximated implicitly by the zero level-set of the function (40) ψ(x, y, z) = cos(x) + cos(y) + cos(z). This minimal surface bi-partitions the volume into two equally-sized components such that the interface is triply periodic and necessarily has zero mean curvature. Indeed, ψ(x, y, z), defined in (40), is an element of the eigenspace {cos(x), cos(y), cos(z)} of the Laplacian on the periodic cube with corresponding eigenvalue 1, which is the smallest non-trivial eigenvalue. Any linear combination of these eigenfunctions has two equal-volume nodal domains and hence defines a partition with equal volume components. It stands to reason that these partitions are good candidate minimizers of the Beltrami energy.
We consider the 26 × 26 × 26 torus graph, corresponding to the three-dimensional extension of the two-dimensional torus graph previously considered in Section 5.3. The edge weights, w, are taken to be constant. Starting from a randomly initialized partition, our algorithm typically converges to one of two local optima, shown in Figure 9: a planar bisection of the cube, or a surface very similar to the P-surface. The P-surface minimum has smaller objective function value than the planar bisection. For comparison, we also plot the zero level-set of the function in (40), which very well approximates the P-surface.
This result is interesting because the P-surface is a stationary point for an optimization problem which only agrees with our proposed partitioning model in the TV limit (β → ∞); see (7). The fact that our computations suggest that this surface is also optimal for finite values of β suggests that the P-surface is a minimizer of a larger class of optimization problems.
6. Experiments II: Data graphs. We now turn our attention to the clustering of data-sets via the partitioning of associated weighted (non-lattice) graphs. As per convention, we refer to "partition components" in this section as "clusters". The first two subsections investigate two benchmark datasets for clustering.
6.1. Five moons. We begin by considering the five moons clustering problem, a collection of points in the shape of five moons. We assign edge weights using a Gaussian kernel, w(i, j) = e −d 2 (xi,xj )/σ 2 , where d(·, ·) is Euclidean distance and 6.2. MNIST handwritten digits. The MNIST handwritten digit dataset consists of 70,000 28 × 28 greyscale images of handwritten digits 0 to 9. As input we used the similarity matrix for the MNIST dataset obtained from the website of Z. Yang [64]. We symmetrize this matrix viaW ij = max{W ij , W ji }.
First, we perform naive clustering with our proposed algorithm. The purity obtained, as defined in [64], differs a lot between individual runs, and we show the best result (in terms of purity) over several dozen runs without supervision. In each case, the algorithm converges in approximately 20 iterations. For parameters q = 1, β = 2, α = 10 −5 , the best observed purity (without supervision) is 96.23 % which is comparable to the performance of state-of-the-art clustering algorithms. Figure 11 is a graphical display of the quality of the output. The first row shows the blurry averaged images within each cluster, while the second row indicates the representative images for each cluster (where each ground state achieves its maximum). In this particular run, the representative image of the '3' cluster is actually an '8', indicative of the strong ambiguity between these classes in terms of graph separation. The second best representative of this class (where the ground state is second largest), is truly a '3' and shown as 3bis. To further illustrate the diversity of handwriting samples, for each cluster we show the misclassified image with largest ground state value along with its true label. Visual inspection shows that the misclassification is "intuitively explainable".
In Table 1, we display the confusion matrix for the obtained clusters, both in absolute and relative numbers. The columns represent the ground truth labels and the rows represent the labels assigned by the proposed algorithm. Each column represents the distribution of the true labels across our partitions. Again the confusion matrix highlights the difficulties in the classification seen above.
In addition to the unsupervised clustering, we perform instances of transductive learning by making use of the semi-supervised algorithm extension ( §4.6) and the true labels for a small portion of the digit images. We apply the rearrangement algorithm to the MNIST dataset, using n = 20 random initializations and an increasing percentage of labelled data points. See Figure 12 for a scatter plot of attained purity and objective function, and Table 2 for numerical values. For each percentage of labels, we report the lowest, average, and highest purity, and the lowest, average, and highest objective function value, as well as the purity of the lowest objective function partition (which would be the reasonable best guess). The objective function value for the ground truth labels is 1.8913 · 10 −4 . We observe that in the unsupervised case, state-of-the-art purity can be achieved, but overall the results are very diverse. As soon as a few true labels are provided, the spread is dramatically reduced and average purity greatly improved. We note that low objective function value does not necessarily signify high purity. In particular, the ground truth labels have higher objective function value than any of the 2.5, 5, or 10 % semi-supervised results.
6.3. Image segmentation. Finally, we consider the formulation of image segmentation as a graph partitioning problem. Automatic analysis of images often requires the identification of coherent regions of interest via efficient and robust algorithms, a task called image segmentation. Due to the non-locally repetitive nature of the patterns in natural images, correlations between pairs of pixels extend well beyond just neighboring pixels. Therefore, graphs are a natural representation: we identify pixels of the image with vertices V , while edges E model the correlations between any pairs of pixels. Thereby, image segmentation effectively becomes a graph partitioning task of the graph G = (V, E).
Region-based image segmentation looks for homogeneity (read: least intensity variance) within the partitions. The canonical region-based image segmentation model is the Chan-Vese model (CV), [17], which strives for region homogeneity while keeping the region's interface short: 0  1  2  3  4  5  6  7  8  9   0  6841  1  38  3  3  22  25  2  21  19  1  5  7812  56  19  49  4  17  88  73  23  2  6  22  6692  44  3  3  1  22  20  7  3  1  6  18  6845  1  137  0  1  138  92  4  1  9  6  2  6510  6  5  20  29 Table 1. MNIST confusion matrix. The columns represent the ground truth labels and the rows represent the labels assigned by the proposed algorithm. Each column sums to the number of digits in each classe/one and represents the distribution of the true labels across our partitions. We see that the algorithm does generally well, but exhibits some confusion among the digits (3,5,8), (4,9) and (  where µ i denotes the mean feature of each region, the levelset function φ : Ω → R is positive in object regions, negative in background regions, zero on the object boundaries, and H is the Heaviside function. The last, total variation term is the co-area-formula equivalent of the boundary length, and its coefficient β balances the data-fidelity versus boundary regularity. A simple graph partitioning model corresponding to such region-based image segmentation (including the multiphase case k > 2) can be obtained through extension of the square-lattice graph by strengthening the weight between spatially distant but feature-wise similar pixels: where N (i) denotes the 4-connected von Neumann neighborhood of pixel i, I(i) ∈ R d denotes the value of pixel i, I(j) − I(i) 2 is the squared L 2 -distance of the pixels in feature space, and p ∈ (0, 1) balances the relative weight of the regionhomogeneity term vs versus the boundary length regularization stemming from the lattice graph. The optimal partition aligns flat regions of u with homogeneous regions in the image, and gradients of u with gradients in I. A fully connected graph is not practicable and we uniformly subsample the non-local connectivities at a very low rate.
As an example we consider segmenting the 318 × 212 RGB input image, I : V → [0, 1] 3 , shown in Figure 13(a). From this image, we construct a graph using (42) with parameters p = 10 −4 , = 0.01, and λ = 15. The long range connections are sub-sampled at about 80 edges per pixel. In Figure 13(b), we give a scatterplot of the pixel distribution in color space. We apply the graph partitioning algorithm with parameters k = 3, α = 0.1, β = 1, and q = 2. The resulting segmented image is shown in Figure 13(c) and the associated ground states are shown in Figure 13(d).
The color-based image segmentation problem shown here is, of course, more efficiently solved with state-of-the-art segmentation models. However, it is easy to see how the current graph construction (42) can be generalized to more interesting feature distances (e.g., patch-based or using other texture descriptors) that greatly extend beyond the Chan-Vese model. Also, by non-uniformly sub-sampling the longrange connectivities, we can render the region-based segmentation more local and can thus seamlessly deal with images affected by bias, for example as in [11,38,67].
7. Discussion and further directions. In this paper we introduced a novel nonconvex graph partitioning objective based on the sum of the minimal discrete Beltrami energies of the partition components. This objective is a discrete analogue of an energy that appears in classical minimal surface problems. We adapt primaldual convex optimization methods to solve for the minimal Beltrami energy for each component of a given partition and a rearrangement algorithm is proposed to find locally optimal graph partitions. The methods are illustrated for a variety of example graph partitioning problems in Sections 5 and 6. Similar to partitions obtained from minimizing the Dirichlet energy sum of partition components [46], the objective proposed here naturally provides confidences for label assignments and consequently produces a representative for each cluster. We view the Beltrami energy considered here as an interesting generalization and interpolation between the Total Variation and Dirichlet energies that have previously been used for graph partitioning (see (7)). In fact, we observe in computational experiments that several of the optimal partitions obtained are insensitive to the interpolation parameter β; our experiments support the hypothesis that certain partitions are optimal for a class of objective functions. This phenomena has recently been studied for certain point-set configurations on a sphere which minimize a class of energies which generalize the inverse squared energy appearing in Thompson's problem [18]. Configurations that minimize such classes of energies are referred to as "universally optimal". It wouldn't be surprising if certain partitions were also minimizers for a class of objective functions including the Beltrami energy.
It is interesting to observe that many of the manifold partitions in Figures 4-8 feature 120 • triple junctions, even in cases where a 90 • quadruple junction would be more symmetrical; see, e.g., k = 4 in Figure 7. This behavior is referred to as the "equal angle property", is well known in crystal grain growth [35], has typically been associated with stable configurations of interface motion by mean curvature [43,44,54]. This observation further strengthens the intuition that the bi-level rearrangement algorithm employed in this work very closely relates to the MBO scheme and motion by mean-curvature stemming from diffusion-threshold dynamics more generally.
The proposed computational methods for solving the bi-level optimization problem (11) are based on primal-dual optimization methods for the inner problem and a rearrangement algorithm for the outer problem. We view this application as an interesting demonstration of the power and flexibility of primal-dual methods for solving a broad class of convex optimization problems. However, in the present application, the inner problem is the computational bottleneck and it seems doubtful that algorithmic advances will make partitioning methods requiring the (approximate) solution of such non-local energies competitive with more local strategies [3,62]. However, it is plausible that many local heuristics for partitioning can be viewed as approximations of partitions which minimize a non-local energy, such as the one studied here.
Many interesting research directions remain. One obvious direction would be to formulate the analogous partitioning problem in the continuum. Proving the existence and studying properties of optimal partitions of surfaces is likely to be challenging. The rearrangement algorithm proposed in Section 3.5 has the same structure as expectation-maximization (EM) algorithms. Making this connection more concrete, i.e., defining the correct probability distribution, is a very interesting direction of pursuit. Finally, there is an in intriguing relationship between the Dirichlet partitioning problem and an antagonistically-interacting random walker Model [19,20,45,46]. It would be of interest if there was a connection between a non-linear diffusion process, such as the one studied in [59], with the Beltrami partitioning problem.