A NEW INTERPRETATION OF THE PROGRESSIVE HEDGING ALGORITHM FOR MULTISTAGE STOCHASTIC MINIMIZATION PROBLEMS

. The progressive hedging algorithm of Rockafellar and Wets for multistage stochastic programming problems could be viewed as a two-block alternating direction method of multipliers. This correspondence brings in some useful results. In particular, it provides a new proof for the convergence of the progressive hedging algorithm with a ﬂexibility in the selection of primal and dual step lengths and it helps to develop a new progressive hedging algorithm for solving risk averse stochastic optimization problems with cross constraints.


Introduction. Recently Rockafellar and Wets
developed a theoretical framework for multi-stage stochastic variational inequality (SVI), which includes the multistage stochastic optimization (MSO) program as a special case. Their progressive hedging algorithm (PHA) [9] developed in 1991 serves as a major computational tool in this framework. In the paper of Rockafellar and Sun [8] PHA has been extended to solve the monotone SVI, where the proof of convergence depends on interpreting the PHA as a proximal point method of Moreau and Rockafellar [4,5] applied to a partial inverse of a certain monotone mapping in the sense of Spingarn [11]. This paper provides an alternative proof for the convergence of PHA by interpreting the PHA, when applied to MSO problems, as a two-block alternating direction method of multipliers (ADMM). This interpretation provides a flexibility in choosing step lengths -particularly in the dual step -for PHA. In addition, the correspondence between PHA and ADMM could widen the applicability of PHA. For instance, if the stochastic optimization problem includes certain constraints that are not decomposable with respect to the scenarios, then the original PHA can be modified by considering an ADMM, as we shall do in Section 3 of this paper. This paper is organized as follows. In Section 2, we establish the correspondence between PHA and a two-block ADMM after a general introduction to PHA and its background. In Section 3, we consider an application of PHA in risk-averse stochastic optimization and reduce this problem to a multi-stage stochastic program with cross constraints. We point out that this new problem could be solved by a new progressive hedging scheme. We achieve this goal by invoking results in ADMM to show the convergence of the new PHA. The paper is concluded in Section 4 with a few remarks.
2. PHA for multi-stage stochastic optimization. Consider a finite set Ξ of scenarios ξ = (ξ 1 , . . . , ξ K ) ∈ R m1 × · · · × R m K =: R m , composed of elements ξ k that are regarded as being revealed sequentially in K stages. Each scenario ξ has a known probability p(ξ) > 0, and these probabilities add to one. In this way Ξ is a probability space. Our attention is directed to mappings that designate responses to the scenarios in Ξ in the notation The linear space 2 consisting of all such functions x(·) from Ξ to R n is given the expectation inner product which makes it into a finite-dimensional Hilbert space, where "T " stands for the transpose. Note that by x(·) we mean a function (mapping) of Ξ → R n , while we use x(ξ) to represent the image of x(·) for realization ξ. Our real interest centers on mappings x(·) that are nonanticipative in the sense that the response x k (ξ) at stage k depends only on the portion (ξ 1 , . . . , ξ k−1 ) of the scenario ξ realized in earlier stages. That is, under nonanticipativity, x(ξ) = (x 1 , x 2 (ξ 1 ), x 3 (ξ 1 , ξ 2 ), . . . , x K (ξ 1 , ξ 2 , . . . , ξ K−1 )).
We capture this condition as a linear constraint by requiring x(·) to belong to the nonanticipativity subspace N of 2 , N = x(·) ∈ 2 : x k (ξ) doesn't depend on ξ k , . . . , ξ K .
In our modeling framework x(·) arises as a feasible solution to the following MSO problem of the form where g(x(ξ), ξ) is a finite-valued, convex and smooth function in x(ξ) for all ξ ∈ Ξ and C ξ is the feasible set for x(ξ). In order not to mix with the real feasible set of (1) imposed to x(·), here we call C ξ the admissible set of x(ξ), which is assumed to be convex and closed for all ξ ∈ Ξ. Let us denote the feasible set of Problem (1) as Then Problem (1) can be written as an optimization model in 2 : The optimality condition, both necessary and sufficient, for problem (2) can be written as an SVI in the sense of Rockafellar and Wets [10] − ∇G(x(·)) ∈ N C∩N (x(·)) where N stands for the normal cone. Under certain constraint qualifications, e.g., ri C ∩ N = ∅, or simply C ∩ N = ∅ if the sets C ξ are all polyhedral, the following equality holds.
). Since N is a subspace, which yields that N N (x(·)) = N ⊥ := M. Under the constraint qualification, the SVI (3) is equivalent to the following extensive form of SVI.
The progressive hedging algorithm for problem (2) corresponds to applying the proximal point algorithm [5] to a maximal monotone mapping T derived from the gradients and normal cones in (4). For details the reader is referred to Rockafellar and Sun [8]. It proceeds as follows.
Rockafellar and Wets [9] showed that the above algorithm generates a sequence converging to a solution to (2) as long as such a solution exists. There is no restriction of the value of r as long as being positive. However, a suitable choice of r is crucial for fast convergence of Algorithm 1, see [8].
Knowing so much for PHA, we next introduce the (2-block) ADMM. Given two proper convex functions f 1 (u) and f 2 (v) both from R n to (−∞, +∞], consider a minimization problem in the following form.
Its augmented Lagrangian function is as follows: The ADMM applied to solve problem (5) proceeds as follows. Step Step 3.
For a more general constraint of x = Av in Hilbert spaces, Gabay and Mercier [3] showed that the ADMM generates a sequence {x ν , v ν } → (x * , v * ), a solution to (5), as long as f 1 is strictly convex and A is an isomorphism. Later, Fazel et al [2] removed the strict convexity requirement for f 1 and extended the convergence conditions to f 1 being convex and the constraint being of the form Ax + Bv = c. It is shown that, as long as A T A or B T B is nonsingular, the convergence result holds. This condition is of course satisfied by Problem (5).
We next show that for a special setting of f 1 and f 2 and assume A = I and the same starting point, the PHA and ADMM generate the same sequence when applied to solve the MSO (2).
Note that recent progress in ADMM allows σ to change with ν as long as it is bounded away from zero, which offers more flexibility in choosing parameters in the PHA to speed up the convergence. However, we would not explore in this direction in this paper, instead, we next consider a new PHA, which can be used in risk-averse models of MSO and can not be handled by the original version of PHA.
3. PHA for MSO with cross constraints. In risk-averse applications of the MSO models, it often comes with additional constraints of the form where B is a linear operator from 2 to R s and is defined through where B(ξ) ∈ R s×n ∀ξ.
We call system (6) the cross constraint since it involves all scenarios, rather than a single one. In particular, the expectation of x(·) is a such linear operator from 2 to R n defined as We call the following problem MSO with cross constraints (MSOCC) An example arises in the recent work of Rockafellar [6] and Sun et al [12] on risk measure optimization. Suppose we would like to minimize the mean-absolutedeviation of g(x(ξ), ξ), which is a risk-averse measure for g(x(ξ), ξ), see Ang et al [1] for details. Then, it means that we need to solve where 0 < λ < 1 and g(x(ξ), ξ) + := max{g(x(ξ), ξ), 0}. This problem can be equivalently converted to (see details in [12]) Since the objective function of this problem is not smooth, we introduce new variables t ∈ R, u(ξ) ∈ R for every ξ, and new constraints and change the objective function to E ξ [y + λu(ξ) + t]. Then it is not hard to see that the third constraint in (8) is a cross constraint in the enlarged space of (y, t, x(·), u(·)). Since the new constraint Bx(·) ≤ b is generally dependent on all ξ as shown in (7), the system (6) is a barrier for the decomposability requirement of the PHA over the scenario space (see Step 1 of Algorithm 1). Thus, the current form of PHA cannot be applied. In the following, we propose a strategy which keeps the decomposability of the PHA by introducing some slack variables.
Based on the convergence result of ADMM, it would be a straightforward matter to show the convergence of Algorithm 3. We summarize the result in a theorem.
Theorem 3.1. Suppose that g(x(ξ), ξ) is convex and smooth for all ξ and problem (9) satisfies constraint qualification, i.e., one of the following two conditions holds.
It should be noted that Theorem 3.1 can be proved by viewing Algorithm 3 as a more general PHA with respect to z(·) andN , and employing the Rockafellar-Sun approach in [8], i.e., taking Algorithm 3 for τ = 1, σ = r and treat it as a special proximal point method, then Theorem 3.1 is true and we can obtain stronger results on the rate of convergence as follows.
The proof is totally similar to the one used in [8]. For brevity, we omit it.

4.
Conclusions. The progressive hedging algorithm for MSO generates the same convergence sequence as the ADMM if both methods start from the same initial point and take τ = 1 and σ = r in the ADMM. This fact, on one hand, offers flexibility in selecting step lengths for PHA, and, on the other hand, strengthens the convergence result of ADMM by providing a q-linear rate for the linear-quadratic MSO problems. The connection between these two methods can results in new algorithms for MSO with cross constraints. It is our belief that this idea can be further used in developing more PHA algorithms for more complex MSO problems.