Diagonal stationary points of the bethe functional

We investigate stationary points of the Bethe functional for the Ising model on a $2$-dimensional lattice. Such stationary points are also fixed points of message passing algorithms. In the absence of an external field, by symmetry reasons one expects the fixed points to have constant means at all sites. This is shown not to be the case. There is a critical value of the coupling parameter which is equal to the phase transition parameter on the computation tree, see [ 13 ], above which fixed points appear with means that are variable though constant on diagonals of the lattice and hence the term “diagonal stationary points”. A rigorous analytic proof of their existence is presented. Furthermore, computer-obtained examples of diagonal stationary points which are local maxima of the Bethe functional and hence stable equilibria for message passing are shown. The smallest such example was found on the \begin{document} $100× 100$ \end{document} lattice.

1. Introduction. Inference from graphical models is a general term for the wide range of problems which arises from the AI, machine vision, statistical physics and many others fields. This multitude of aspects is a consequence of the fact that many theoretical and real-world problems can be reduced to the problem of inferring the probability distribution from the appropriate graphical model (e.g. Bayesian networks, random Markov fields, Tanner graphs, etc. see [17]). There is always the brute force approach -summing over all nodes leads to the desired distribution, but this is actually only a theoretical solution -the number of summands grows exponentially with the size of the graphical model. A computable approach to the problem is embodied by a class of heuristic algorithms, generally known as belief propagation or message passing, which are widely used in statistical inference, combinatorial optimization, image processing and other applications [17]. These algorithms iterate some abstract beliefs, and if they converge this allows a fast approximation of the desired probability distribution. The interest in those techniques derives from the work of Pearl [9] and their importance was emphasized by the Turing prize awarded to Pearl in 2011 for fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning. As we will see belief propagation algorithms are strictly connected with Bethe approximation.
Bethe approximation is a technique which originated in statistical physics, see [2]. To explain the idea behind this approach let us consider an Ising model with the energy function U (σ) where σ = (σ) i∈V is a random configuration of spins and U is assumed to a sum of terms which depend on spins at single sites or pairs of adjacent sites. As a general convention, we will denote by σ = (σ i ) random variables indexed by the sites and taking values ±1 and by s = (s i ) particular configurations of such values. Given a probability distribution P on the set of all configurations we define the entropy S P = − s P(s) ln P(s), and the average energy U (σ) P = s P(s)U (s).
These two terms allows one to define the free energy functional As is well-known the free energy functional has a unique maximum which is given by the exponential distribution. Even though the exponential distribution is given explicitly, the formula is not usable in practice since it involves summing up over all configurations whose number is exponential in terms of the size of the model. As noted, this is a common problem with inference from graphical models. To address this, Bethe approximation is defined on a much smaller space of pseudo-marginals (b i (s i )), (b ij (s i , s j )) where i, j run over all pairs of neighboring sites. When a probability distribution P on the set of configurations is given, one can define actual marginals b i (s i ) := P(σ i = s i ) and b ij (s i , s j ) = P(σ i = s i , σ j = s j ), however pseudo-marginals are only required to satisfy natural marginalization conditions (see Eq. 5) and may not derive from any probability distribution on the set of configurations. Given a system of pseudo-marginals b the entropy part of the Bethe functional is with the first summation extended over all pairs of closest neighbors i, j. Then the Bethe functional is Note that the expected value of U (σ) is defined given a pseudo-marginal due to the assumption that U is the a sum of terms which depend on spins at most at two neighboring sites, while the entropy term has been modified in a way that may appear arbitrary. It turns out that the Bethe functional equals the free energy functional for Ising models on trees, see [14] (4.13). The equality means that if a pseudo-marginal is an actual marginal derived from a probability distribution P on the set of all configurations, then both functionals are equal. The Bethe functional is defined on the space of all pseudo-marginals whose dimension grows polynomially with the size of the model and hence is computable. The idea is that knowing the global maximizer of the Bethe functional provides an approximation of the marginals for the exponential distribution. The precise formulation of the concepts introduced here is worked out in Section 2.
The basic connection between Bethe approximation and belief propagation algorithms is that fixed points of belief propagation algorithms are stationary points of the Bethe functional, see [3,17]. This is natural since the classical belief propagation algorithm can be seen as a discrete approximation of the gradient flow of the Bethe functional, see [16]. Local maxima of the Bethe functional can be effectively found by those algorithms, while on the other hand the analysis of belief propagation must largely rely on the understanding of the Bethe functional [11]. Since the global maximum of the Bethe functional is the object of interest, clearly the analysis of the set of local maxima is relevant to evaluating and improving performance of belief propagation methods.
It appears that the theory of the Bethe approximation has lagged behind its algorithmic development. In the literature one can find local results concerning fixed points of belief propagation and their stability [5,6,7,10]. Studies of the global aspect of the problem, such as the structure of the fixed point set have been almost exclusively the domain of experimentation.
From the dynamical point of view, one can consider the gradient flow of the Bethe functional. This makes the system formally very simple, but complication arises because the high dimension of the phase space. From the algorithmic point of view, the desirable situation is convergence to the global maximum and the danger lies in the existence of local maxima.
1.1. The goals of this paper. We deal with an Ising model on a finite twodimensional square lattice with periodic boundary conditions i.e. on the toral lattice. Formally, the sites are elements of the set {0, · · · , n − 1} × {0, · · · , n − 1} wrapped doubly periodically so that neighbors are calculated mod n. We will denote such a graph by T n . The interaction between neighbors is given by Jσ i σ j for a fixed J > 0, where σ i are variables of a random Markov field which take values ±1. One can also add the sum over all sites of the terms h i σ i in which case the vector (h i ) is the external field.
This paper grew out of a simple question posed during our research on the previous work [12]: if the external field possesses a certain symmetry, does the same symmetry hold for stationary points of the Bethe functional? After some initial investigation the problem was narrowed to the situation in which the external field is simply 0 and the question becomes whether Bethe stationary points necessarily have constant means for all sites of the lattice.
Surprisingly, the answer to even such a simple question appeared unknown and even more surprisingly turned out to be negative. The goal of this paper is to describe a class of non-constant stationary points, prove their existence by rigorous analytic methods and study some of their properties by a combination of analytic and numerical methods.
1.2. Critical coupling and phase transitions. A diffculty in studying the set of stationary points of the Bethe functional lies the fact that except for special cases the functional is not convex (strictly speaking, using the sign convention of this paper: its negative is not convex). One paper which deals with this problem is [13]. For a given graph, its approach is to construct the computation tree, or a universal covering in the language of topology, which is generally infinite. For Markov fields on infinite graphs one expects a phenomenon known as phase transition which occurs when J exceeds a critical value J c . The main result of [13] is that the absence of a phase transition on the computation tree implies the uniqueness of the Bethe stationary point and convergence of the belief propagation algorithm. A similar construction for the Bethe approximation had been known in statistical mechanics, see [1], Chap. 4. Rather than "computational tree", the term used there is "Cayley tree" and in the case of zero external field the critical value J c = 1 2 ln 2 is obtained. This is equivalent to tanh J c = exp(Jc)−exp(−Jc) exp(Jc)+exp(−Jc) = 1 3 . In this context, let us state our main result Theorem 1.1. In the case of zero external field, the following alternative holds. If tanh(J) ≤ 1/3, then there is only one stationary point of the Bethe functional with all singleton components b i equal to 1/2. On the other hand, for every J : tanh(J) > 1/3, there exists n(J) such that for every k ≥ n(J) there is a stationary point of the with values of singleton components b i both greater and smaller than 1/2, depending on i.
Stationary points of Theorem 1.1 will be referred to as diagonal points since as will become clear from the proof the values of b i are constant on diagonals of the lattice.
We conclude first that the result of [13] is sharp in this case (Ising model with zero external field): for J ≤ J c there is no phase transition on the computation tree and for J > J c there are examples with many fixed stationary points. But Theorem 1.1 goes further than that in stating not only that multiple stationary points exist, but that they are different from the obvious ones: one with fixed positive means and the other its negative image.
1.3. Stability. Both from the point of view of the dynamical systems theory and applications, the important issue is the stability of diagonal fixed points. For the Bethe gradient flows stability of a fixed point is clearly equivalent to its being a strict local maximum and the same holds for discrete versions of belief propagation, see [5]. Here, we have only been able to obtain a numerically-supported result.
Finding 1. For the value of J = 0.5 and n = 100 there exists a pseudo-marginal on T n with singleton components fixed on diagonals, but otherwise taking values both greater and less than 1/2, which is a strict local maximum of the Bethe functional with zero external field. Finding 1 is less robust than Theorem 1.1. First, it requires the use of a computer with floating-point arithmetic and we have made no attempt to rigorously verify either the correctness of the program, or the numerical error. Secondly, the example is only for a particular choice of parameters and we do not know whether J can be lowered to a value arbitrarily close to J c . A key question here is whether stable equilibria can be obtained for J arbitrarily close to J c .
These results and particularly Finding 1 are bad news for algorithmic uses of belief propagation. Even in the arguably simplest setting spurious local maxima appear. Although in case of no external field it is clear what the global maximum is, under the actual circumstances one can easily see how finding it could become a computationally hopeless task. On the other hand, it offers a non-trivial dynamical system to research.
Lets take a quick glimpse into the possibilities. Owing to the diagonal symmetry, if an example of the type we consider exists on a toral lattice T n , it can also be realized to T kn for any k natural or on the infinite lattice Z 2 by periodic repetition on blocks of n consecutive diagonals. Hence, if indeed stable equilibria for zero external field appear for any J > J c and T n sufficiently big, there would be infinitely many on the infinite lattice for any J > J c , all essentially different, i.e. not related by shifts. This would be analogous to the Newhouse phenomenon, see [8], and while an infinite lattice is needed, it would still be striking given the complete homogeneity of the phase space in the absence of an external field.
1.5. Plan of the work. In Section 2 we present a setting for our work and develop methods which will be used in the sequel. That part of the work is done in a general setting of Ising models on arbitrary graphs and allows for an external field. The most important result is Proposition 2 which provides a necessary and sufficient condition for Bethe stationary points without reference to message passing, but relying instead on a harmonicity-type condition expressed in terms of the neighbor average function.
In Section 3 the discussion is specialized to the case of two-dimensional toral lattice and no external field. They lead to the basic formula (8) and the statement of Theorem 3.3 about a connection between the neighbor average function and arithmetic mean.
In Section 4 we build on those results to construct the diagonal stationary points from Theorem 1.1. This is followed by numerical studies which visualize these points and illustrate difficulties inherent in studying their stability. This section ends with the description of numerical evidence in favor of Finding 1.
The last part of the work is an appendix in which we group calculations which are important for the verification of our work, but somewhat complicated and detached from the main line of the argument.
With the exception of Sections 4.2 and 4.3 which are based on numerical work, rigorous analytic proofs are provided. We tried as much as possible to use the terminology and conventions of [14] in order to make the work accessible to applicationoriented readers. A dynamicist will have no difficulty recognizing that "exponential families" are "Gibbs distributions". Perhaps more confusingly, in combinatorial optimization one maximizes things instead of minimizing as is the convention in thermodynamic formalism, hence reversed signs in several formulas.

2.
Ising models in general setting. We work with an undirected graph G with no self-connections. Its nodes will be denoted with small Latin subscripts such as i and the set of all vertices will be denoted with V . The edge set E is identified with a subset of unordered pairs of vertices {i, j}. For each i we consider its neighborhood N (i) = {j ∈ V : {i, j} ∈ E}. With each site we associate a variable σ i which takes values ±1. Given a vector s := (s i ) i∈V , s i = ±1, called a configuration and we also denote s 0 = {i,j}∈E s i s j .
The family of exponential distributions [14] parametrized by the canonical parameters f on the set of all configurations is given by where is the normalizing factor known as the partition function. Given a probability distribution P on the set of configurations the expected values will be written as · P . Recall that we denote by σ i random variables and by the (s i ) a certain configuration. We will also use covariance: The following identities are well known for all i, j ∈ V ∪ {0}. and These formulas offer a convenient way for expressing the marginals of the exponential distribution, which are a particular case of pseudo-marginals and can be used to express the Bethe functional.
We rely on the following: Proposition essentially follows from Theorem 3.3 [14], but the difference is that we fixed some of the canonical parameters, which requires an additional argument. The formal proof is given in Appendix A.
Proposition 1 implicitly defines the function given by where f is characterized uniquely by the conditions

2.1.
Pseudo-marginals and the Bethe variational Problem. We now proceed to define formally pseudo-marginals, which have been informally discussed in the introduction. By a pseudo-marginal we mean a system of positive singleton quantities (b i (s i )) i∈V and their pairwise counterparts and subject to the local consistency conditions for all i ∈ V and s i = ±1: The space of all pseudo-marginals will be denoted with B. One can easily see the that B can be parametrized by variables Given a pseudo-marginal and a set of canonical parameters (f i ) i∈V ∪{0} we can define the Bethe functional on b ∈ B: where the summation with respect to s i , s j is always meant to range over the values ±1. In the future, we will often use the same letter for a graph and its vertex set, so H G could also be written as H V . Discussed above the Bethe Variational Problem consists in finding the stationary pseudo-marginals {(b i ), (b ij )} under a given ensemble of canonical parameters (f i ).

Pairwise maximality.
Definition 2.1. A pseudo-marginal is called pairwise maximal, given a set of canonical parameters (f i ), provided that its pairwise components (b ij ) maximize the Bethe functional under its fixed singleton components (b i ).
Finding the maximizing value of b ij given b i , b j is tantamount to maximizing the Bethe functional in the case of a subgraph W which consists of i, j and the edge between them. Since b i , b j are fixed, the first and third sums in Eq. (6) become irrelevant and hence the problem is solved by taking the marginal of ). The concept of pairwise maximality was examined in [16], from where an algebraic formula for f W can be derived. We will not need it and will simply introduce a function B max defined by for all edges and a pairwise maximal pseudo-marginal.

The Bethe Variational Problem on a subgraph.
If W is a subgraph V , we will write ∂W for the set of the vertices of W which are connected to a vertex not in W , • there exists a unique set of canonical parameters Proof. The first claim follows by the inspection of formula (6), since all terms which depend on the variables b i , b ij occur in the same form in both H V and H W .
For b i such that i ∈ ∂W this is no longer true since the orders of vertices which occur in the third term of (6) are different and b i also enter into b ik (s i , s k ) for k / ∈ W and such terms are missing in H W . Note that the first term in H W can be rewritten as i∈W si and these are the only terms which depend on f i . Hence, f i for i ∈ ∂W can be adjusted in a unique way to bring about the equality of partial derivatives which is referred to in the second claim. The third claim when W = V reduces to the well-known fact that for trees the unique stationary point of the Bethe functional is the marginal of the corresponding exponential distribution, see Theorem 4.2(b) in [14]. In the general situation, the third claim follows for this and the second claim for some f W . Since the mean values are (2b i − 1) for i ∈ ∂W while at the remaining vertices f W i = f i , f W must be in the form given by Proposition 1.

Neighbor average function and the criterion of stationarity.
Definition 2.3. For every i ∈ V consider the star graph C i which consists of i, its neighbors from N (i) and exactly the edges in the form {i, j}, j ∈ N (i).

Definition 2.4. Define the neighbor average function
Sometimes f 0 will be skipped when its value is fixed and not important in the context.
The neighbor average function provides a local condition for stationarity among pairwise maximal pseudo-marginals, as shown by the following properties. Lemma 2.5. For any star subgraph C i , any choice of canonical parameters as well as a vector (m j ) j∈N (i) : ∀j m j ∈ (−1, 1), the marginal of the exponential distribution G Ψ N (i)∪{i},N (i) ((mj ),(f0,fi)) is the unique conditional stationary point and a maximizer of H Ci (b, (f j )) in the set of pseudo-marginals on star graph C i whose singleton means on N (i) agree with (m j ).
Proof. Since C i is a star graph and thus a tree, by the fact already used in the proof of the third claim of Lemma 2.2 the marginal of the exponential distribution is the

The difference is
We get a Corollary: Corollary 1. Among all pairwise maximal pseudo-marginals (cf. Definition 2.1) on V with singleton means (m j ) fixed except for j = i, the unique maximizer of the Bethe functional is given by the condition This follows from Lemma 2.5 since H V is the sum of H Ci and terms which are There is a simple characterization of stationary points of the Bethe functional in terms of the neighbor average function.
So, stationary points of the Bethe functional are effectively characterized only through their singleton components.
Proof. First assume that b is stationary. Then the third claim of Lemma 2.2 can be used to all subgraphs which are trees. For subgraph which consists of a pair of vertices and an edge between them, this reduces to the condition of pairwise maximality. For a star graph C i the vector of canonical parameters f Ci mentioned in the third claim of Lemma 2.2 is the same as F in Definition 2.3. Hence the marginals are the same which implies the equality of the claim of Proposition 2.
To prove the opposite implication, suppose that b is not stationary. Then there is an arc of pseudo-marginals b(t), with b(0) = b such that Replace b(t) with an arc of pairwise maximal pseudo-marginals (for proof of the differentiability of b max cf. [16]), namely In other words, we keep the singleton components from the original b(t) and adjust the pairwise components to gain pairwise maximality. By definition, for all t and hence d dt What we have proved is that b is also non-stationary on the submanifold of pairwise maximal pseudo-marginals. In other words, if we define For that site i consider an arc of pairwise maximal pseudo-marginals which varies only b i (t) and leaves all other (b j ) j =i fixed. Then the only b ij (t) that vary are those which correspond to the edges of V which come out of i. From the first claim of Lemma 2.2 we conclude that b restricted to the star graph C i is not a stationary point of H Ci .
On the other hand, we are assuming the equality which means that it implies the same singleton means as G F from Definition 2.4. The pairwise components (b ij ) j∈N (i) are also maximizing since b was pairwise maximal. Thus, b restricted to the star graph C i is the actual marginal of G F . Then it is stationary for H Ci by the Lemma 2.5. This contradiction ends the proof. Proof. Pick a j ∈ N (i) \ {k}. Using conditional expectations Furthermore, it is known that conditional expectations of exponential distributions on a subgraph W are again exponential with the conditions playing the role of boundary conditions on ∂W . In this case, W is just the edge from i to j and the exponential distribution is readily computed to give Combining these formulas we see that with m j , m i , f 0 fixed the only variable is F(m) j which must therefore be fixed as well.
Let G f be any exponential distribution with the vector f of canonical parameters. As we have already observed, for any i ∈ V This is always non-negative when f 0 ≥ 0 as a consequence of the Fortuin-Kasteleyn-Ginibre (FKG) inequality, see [4]. As a corollary, we get the following Fact which we refer to as the monotonicity of expectations: Fact 1 leads to a proposition concerning the monotonicity of the neighbor average functions.
where the final inequality follows from the monotonicity of A with respect to the last variable, which has been proven at the beginning. This ends the proof of Proposition 3.
3. Special case of the Ising model. In this section we will consider the Ising model on a torus without external f ield.
We will chiefly be interested in the lattice made into a graph by assuming connections between the closest neighbors. This will serve as graph V from the general setting. Furthermore, the potential takes the simple form In relation to the general setting, the canonical parameter f 0 = J and all others are 0.
3.1. Exponential distribution of the cross subgraph.
3.1.1. The partition function. We will now establish certain properties of the neighbor average function by direct calculations. Star graphs (cf. Definition 2.3) C i are isomorphic for all i and we will refer to such a graph as a cross. We will label its vertices c for the central one and 1 through 4 for the others. Consider the partition function of the Gibbs distribution on the cross where i = c, 1, 2, 3, 4. Note the presence of the additional linear term in the energy in spite of our hypothesis of no external field.
Proof. The proof is by inspection. The factor 16 corresponds to removing 1/2 from the hyperbolic cosines. Then, after multiplying out, on the right-hand side one gets the sum of 32 exponentials and each of them is exp(U (J, (s i ))) for a particular configuration (s i ) 4 i=c,1 . To observe this it helps to notice that the terms resulting from the first product correspond exactly to those configurations with s c = +1.

Symmetric form of the means on the cross.
To simplify further calculations, we assume that m 1 = m 2 and m 3 = m 4 . From Proposition 1, we get (J, 0)) . Note that f c is always 0 since we will be looking for stationary points with 0 canonical parameters.
Notice that f 1 = f 2 and f 3 = f 4 . If f 1 = f 2 , then we consider the canonical parameters f 1 = f 2 , f 2 = f 1 , f 3 = f 3 , f 4 = f 4 and denote this set of canonical parameters by f . By symmetry, But since m 1 = m 2 , this is the same as the original set of σ i G f and f 2 = f 1 = f 1 from the uniqueness in Proposition 1. We will write m = m 1 , M = m 3 and f = f 1 , Also, adopt notations c ± = cosh(J ± f ) and C ± = cosh(J ± F ). In the future, we will also use s ± , S ± , t ± , T ± replacing cosh with sinh and tanh, respectively. Then,

Expected values.
As it is well known (cf. Eq. (2)) expected values with respect to the exponential distribution are obtained by , for i = c, 1, · · · , 4. Using our notations this leads to Here Now recall the formula When we use this to transform C − c − and C + c + , the result is Taking into account the identity we arrive at where in getting to the final term we have also used the identity sinh(2γ) = 2 cosh(γ) sinh(γ).

3.2.
Consequences of formula (8). We proceed to derive properties of the neighbor average function.  The claim of Theorem 3.3 is equivalent to showing that the second difference ∆ 2 (J, f, F ) is non-negative or zero, respectively. That is given explicitly by formula (8). The only difficulty is that the formula is given in terms of the canonical f, F while the hypothesis of the Proposition involves the means m, M . The proof of Theorem 3.3 contains therefore some amount of detail not important for the main line of the paper and can be found in the Appendix.
holds and m i = 2b i − 1 are the singleton means.
Observe that Theorem 1.1 then follows in the light of Proposition 2.

Construction.
We construct a pseudo-marginal which is pairwise maximal and hence defined by its singleton means. The largest positive mean t 0 is taken on some diagonal, then a smaller but still positive mean t 1 on both adjacent diagonals, then even smaller t 2 on two diagonals one more step away from the initial and so on until for some k we have t k = 0. The construction is illustrated on Fig. 1. The neighbor average condition from Proposition 4 is equivalent to for 0 < < k. Recall that the final zero argument of A corresponds to the fact that there is no external field. If these conditions can be satisfied, then the means are further defined by symmetry. First, we set t k+ = −t k− for 0 < ≤ k. This results in t k+1 = −t k−1 which means that (9) also holds for = k. Also, t 2k = −t 0 and (9) hold for k < < 2k by symmetry. Finally, t 2k+ = −t for 0 ≤ < 2k which leads to t 4k−1 = −t 2k−1 = t 1 consistent with our original setting. The neighbor average conditions are then automatically satisfied and so Proposition 4 will follow. 4.1.1. The possibility of satisfying (9). It remains to show that conditions (9) can be satisfied for any J > J c with some k. Pick J so that tanh(J) > 1 3 and specify t 1 ∈ (0,m(J)) and set t 1 , t 1 , t 1 , 0) .
Further construction depends on the following Lemma.
A visualization of this diagonal stationary point of the Bethe functional is presented on Fig. 1 Eq. (11) shows a vector of means, fixed on diagonals of T 16 , which according to Propositions 2 and 4 is a stationary point of the Bethe functional. One can also check this numerically. For this purpose let B 0 be the original vector of means (t ) (see Eq. (11)), and B η the following vector where t are given with Eq. (11), X are positive numbers drawn independently from the uniform distribution on the interval [0, 1] and η is a real parameter. For thus constructed B η one can calculate the Bethe functional from Eq. (6) according to the following rules are then determined from the requirement of pairwise maximality. Calculated values of the Bethe functional for the B η as a function of η are presented in Fig. 2. Fig. 2 may misleadingly suggest that the diagonal stationary point is a local maximum. By the method of Section C one can show that is not the case. This shows the danger of trying to determine the type of a stationary point by random experimentation -not one of our perturbations detected the non-definiteness of its Hessian. Only after computations described in Section C one Values of the negative Bethe functional calculated in the direction of P are presented in Fig. 3. So, while it is much more likely to find a stable direction, as illustrated by Fig. 2, Fig. 3 shows that at least one unstable direction given by P also exists.

4.3.
Existence of a local maximum of diagonal type. Let us consider the evidence of Finding 1. To begin with, we provide the listing of the means of a diagonal stationary point (cf. Eq. 13) which was found by methods of Section 4.1 for J = 0.5 and k = 25. Unlike an example from Section 4.2 (given by Eq. 11), this stationary point is stable. The stability of the stationary point given by Eq. (13) was verified using two numerically-based methods. Firstly, we checked that the negative Hessian matrix of the Bethe functional is positive definite by the method of Section C. The resulting value of χ was χ = 9.9956 · 10 −9 > 0, which implies stability of the stationary point given by Eq. (13). case it was a positive "bump" ε = 10 −5 added to all terms of Eq. (13) the pseudomarginal returns to its non-perturbed form under iterated neighbor average function. This test, which is explained on Fig. 5 in an algorithmic form, is very sensitive, because it breaks the symmetry of sign flipping. In the absence of stability this should lead to convergence to the fixed-mean positive solutionm(J). That happens for saddle point given by Eq. (11).
Notice that by Corollary 1 replacing the mean at the center of any cross by the value of its neighbor average function increases the value of the Bethe functional, so local maxima attract their neighborhoods under the iteration of this procedure. On the other hand, for a saddle convergence could only occur if the initial point was chosen on the stable invariant manifold under the scheme, which has probability 0.
where P ranges over all probability distributions on the set of configurations. As a consequence of Theorem 3.3 of [14] we get the claim of Proposition 1 in the situation when V = V ∪ {0} and provided that m belongs to the interior of the correlation polytope M.
Consider the set First observe that W is non-empty. This follows from the Theorem quoted above once we notice that the vector m can be extended bŷ so that m ∪m ∈ M o , where M o means the interior of the correlation polytope. This can be realized by a Bernoulli distribution.
The condition σ j G f ∪f = m j for j ∈ V can be seen as a system of equations with variables f andf . By the implicit function theorem it can be locally solved forf in terms of f provided that the matrix is non-singular, where j, ∈ V . That is the case for the covariance matrix in the Ising model. Thus, W is open and one can also conclude that the map f → m is a local diffeomorphism. Suppose thatf ∞ ∈ ∂W and letf n be a sequence from W which converges tof ∞ , whilef n are the corresponding vectors supported on V for which the condition which defines W holds. Observe that for every j ∈ V the sequence (f j,n ) n∈N is bounded. Indeed, the canonical parameters f 0,n form a convergent sequence and are bounded, which means that the coupling between different nodes in the Ising model remains bounded. In that case we can see from formula (1) that f j,n tending to +∞, or −∞, would result in σ j G fn tending to 1, −1, resp., contrary to the fact that this quantity remains fixed at the value of m j . Without loss of generality, then, f j,n → f j,∞ for all j ∈ V . Writing f ∞ = (f j,∞ ) j∈V and f ∞ = f ∞ ∪f ∞ , by continuity we conclude that Since W is non-empty, open and closed it must be all of R |V |−|V |+1 . This proves the existence part of Proposition 1.
To see uniqueness, fixf and recall the observation that f → m is a local diffeomorphism. It is also proper, that is the pre-image of any compact set in (−1, 1) |V | is compact. This again follows from the observation that with f 0 fixed, f j tending to ±∞ implies σ j G f tends to ±1. By topology, a local homeomorphism which is proper and onto a simply-connected space is a global homeomorphism.
Lemma B.2. If 0 < m ≤m(J) and m = M , then the inequality (14) holds with the equality only when m =m(J).
The proof of inequality (14) in the general setting of Theorem 3.3 is based on the next Proposition. We postpone the proof of Proposition 5 and finish the proof of Theorem 3.3. Given m, M which satisfy its hypothesis, we fix m ≤ M for definiteness. Inequality (14) holds when m := M . For a smaller value of m, the term δ becomes non-zero so that cosh(δ) > 1, but γ does not increase according to Proposition 5, so that a sharp inequality follows.
Concerning the conditions for equality in Theorem 3.3, one has to have either sinh(γ) = 0 in formula (8) B.0.1. Proof of Proposition 5. We return to the general setting of the exponential distribution on the cross, with canonical parameters f = (J, 0, f i ), i = 1, . . . , 4, and means m i = σ i G f i = c, 1, · · · , 4. We will be interested in derivatives ∂ σi G f ∂fj . Lemma B.3. For every 1 ≤ i, j ≤ 4 and any f , and the covariance matrix is symmetric.
Lemma B.4. If F ≥ f ≥ 0 and q = −2, 0, then ∂p q (ϕ) ∂F < 0, Proof. To see the first claim, note that for all configurations in E q the value of depends on F through the term F (σ 3 + σ 4 ) = F q. Hence, if q ≤ 0, then p q (ϕ) has the form g(f, F )/Z(J, 0, f, f, F, F ) where ∂g(f,F ) ∂F ≤ 0. On the other hand, by the monotonicity of expectations, since all canonical parameters are non-negative. As for the second claim, the conditional exponential distribution is given by the exponential distribution on the sub-graph which consists of nodes c, 1, 2 with canonical parameters 0, f, f , respectively, and again the expected value of σ 0 is non-negative since f ≥ 0.
Proof. We start by observing that Similarly, m(ϕ) = q∈{−2,0,2} Now the conditional expectations with respect to E q no longer depend on F , so Subtracting, we get From Lemma B.4 and the monotonicity of expectations all three terms are seen to be non-negative.
Proof. In terms of the partial derivatives with respect to the canonical parameters f i , we can write Gϕ ∂f 2 . In view of Lemmas B.6 and B.5 dF df < −1. Hence, the sum f + M (f ) decreases to its final value γ(M ), which proves Proposition 5.
It is important to observe that v(α) and c(α, α + 1) are always positive.
From here we first infer that all u α,β are strictly positive. To this end choose α 0 so that the diagonal set {u α0,β } β∈Zn contains the largest possible number of zero elements. Since c(α, α + 1) are all positive and u α,β non-negative, condition (19) implies that u α,β , u α,β+1 are both 0 (ditto for the α − 1 diagonal, but we shall not need it). Thus, the α + 1 diagonal set contains more 0 elements, unless all or none u α0,β were 0. All zeros are not possible since it would violate the constraint, so none were 0.
Notice that we now also know that λ > 0.