Quantitative stability analysis of stochastic mathematical programs with vertical complementarity constraints

This paper studies the quantitative stability of stochastic mathematical programs with vertical complementarity constraints (SMPVCC) with respect to the perturbation of the underlying probability distribution. We first show under moderate conditions that the optimal solution set-mapping is outer semiconitnuous and optimal value function is Lipschitz continuous with respect to the probability distribution. We then move on to investigate the outer semiconitnuous of the M-stationary points by employing the reformulation of stationary points and some stability results on the stochastic generalized equations. The particular focus is given to discrete approximation of probability distributions, where both cases that the sample is chosen in a fixed procedure and random procedure are considered. The technical results lay a theoretical foundation for approximation schemes to be applied to solve SMPVCC.


1.
Introduction. Mathematical program with equilibrium constraints (MPEC) is an optimization problem whose constraints include some variational inequalities or complementarity systems. MPECs play a very important role in many fields such as engineering design, economic equilibrium, multilevel game, and mathematical programming theory itself, and it has been receiving much attention in the recent optimization world. We refer to [10,11,17,21,22] for details about the basic theories, effective algorithms, and important applications of MPECs.
In practice, MPECs often involve some stochastic data, which motivates one to consider stochastic MPECs (SMPECs). Since the first paper on SMPEC by Patriksson and Wynter [13], many researchers have paid attention to the class of optimization problems; see for example [1, 6-8, 18, 19].
In this paper, we focus on the stochastic mathematical programs with vertical complementarity constraints (SMPVCC): where Z is a nonempty, closed and convex subset of R n , f : R n × R q → R and F i : R n × R q → R m , i = 1, · · · , l, are continuous functions, ξ : Ω → Ξ is a vector 452 YONGCHAO LIU of random variables defined on probability (Ω, F, P ) with support set Ξ ⊂ R q , and E P [·] denotes the expected value with respect to probability distribution P . SMPVCC (1) can be taken as the extension of deterministic mathematical programs with vertical complementarity constraints [5,17] by considering the case that some random data are involved or as the extension of stochastic mathematical program with equilibrium constraints (SMPEC) [7] by replacing the complementarity constraints with the general vertical complementarity constraints. Birbil et al. [1] apparently first study the problem (1) and propose a sample-path optimization method. They investigate the convergence of optimal solutions and stationary points when the underlying functions are approximated by sample-path based simulation.
The motivation of the approximation schemes in [1] is to replace the underlying multivariate probability distribution P with another distribution Q, where the expectation E Q [·] is relatively easier to calculate. Then the quantitative stability analysis with respect to the probability distributions may provide theatrical guarantee to choose the approximation probability distributions Q. For example, choose the sample size for sample average approximation with a given tolerance. This kind of analysis is pioneered by Römisch and has been well studied in past decades, see the monograph [16] and reference therein. In the recent work, Liu et al. [7] study the impact of the changes of probability distribution P in the problem on optimal values and optimal solutions of SMPECs. Here we extend the works [7,16] to SM-PVCC. As far as we concerned, the contributions of the paper can be summarized as follows.
• By defining a suitable metric, we show the Lipschitz continuity of optimal solutions and optimal value of SMPVCC (1) with respect to the probability distributions. Compared to [7] where the authors just show the existence of the Lipschitz constant, we present an explicit estimation of the Lipschitz modulus. • By employing some new results on stochastic generalized equations [9] and the reformulation of stationary points [5], we study the stability of the Mstationary points to SMPVCC (1). • As a special case, we focus on discrete approximation of true probability distribution P . Rather than focusing on the empirical distribution approximation which means the sample is independent and identically distributed [7], both the cases that the sample is chosen in a fixed procedure or random procedure are considered. Moreover, we provide an upper bound of the approximation errors by the Hausdorff distance between the support sets of the discrete approximation distribution and true probability distribution.
The rest of the paper is organized as follows. In section 2, we present some basic definitions and recall the stability results on stochastic generalized equations. In section 3, we study the stability of optimal solutions and optimal values to SMPVCC (1) with respect to the perturbation of the probability distributions. In section 4, we study the stability of the M-stationary points of SMPVCC (1). In section 5, we focus on the case that the true probability distribution is approximated by the discrete probability distribution.

2.
Preliminaries. For vectors a, b ∈ R n , a T b denotes the scalar product.
· denotes the Euclidean norm of a vector, d(z, D) := inf z ∈D z − z denotes the distance from a point z to a set D. For two compact sets C and D, denotes the deviation of C from D and dist H (C, D) := max (dist D (C, D), dist D (D, C)) denotes the Hausdorff distance between C and D. Moreover, C + D denotes the Minkowski addition of the two sets, that is, {C + D : C ∈ C, D ∈ D}. For a function g : R s → R s , we use ∇g(z) to denote the transposed Jacobian of g at z. If g(z) is a real-valued function, ∇g(z) denotes the gradient of g at point z.
Let Ψ : X → Y be a set-valued mapping. Ψ is said to be closed atx if x k ∈ X, x k →x, y k ∈ Ψ(x k ) and y k →ȳ impliesȳ ∈ Ψ(x). Ψ is said to be upper semicontinuous (usc for short) atx ∈ X if for every > 0, there exists a constant δ > 0 such that where N Z (z) denotes the normal cone of convex set Z atz.
Consider the following stochastic generalized equations (SGE): and its perturbation where Γ : X × Ξ → Y and G : X → Y are closed set-valued mappings, X and Y are subsets of Banach spaces X and Y . E Q [·] denotes the expected value with respect to probability distribution Q. Let Γ(x, ξ) be defined as above and σ(Γ(x, ·), u) be its support function. Let X be a compact subset of X. Define The following stability results of solution set of SGE have been provided by Liu et al [9, Theorem 3.1].
Lemma 2.2. Consider the stochastic generalized equations (2) and its perturbation (3). Let X be a compact subset of X, and S(P ) and S(Q) denote the set of solutions of (2) and (3) restricted to X respectively. Assume: (a) Y is a Euclidean space and Γ is a set-valued mapping taking convex and compact set-values in Y; (b) Γ is upper semi-continuous with respect to x for every ξ ∈ Ξ and bounded by a P -integrable function κ(ξ) for x ∈ X ; (c) G is upper semi-continuous; (d) S(Q) is nonempty for Q ∈ P(Ω) and dl(Q, P ) sufficiently small. Then for any > 0, there exists a δ > 0 such that 3. Stability analysis. Let P(Ξ) denote the set of probability measures on (Ξ, F). Assume Q ∈ P(Ξ) and consider which is regarded as a perturbation of SMPVCC (1). We study the convergence of problem (4) to SMPVCC (1) when Q converges to P under some metric. Specifically, we study the relationship between the perturbed problem (4) and original problem (1) in terms of optimal values, optimal solutions and stationary points.
In probability theory, various metrics have been introduced to quantify the closeness of two probability distributions; see [3,15]. One class of them is the metric with ζ-structure which subsume a number of interesting metrics. Let P, Q ∈ P(Ξ) and G be a family of real-valued bounded measurable functions on Ξ, the metric with ζ-structure is defined as: With different sets G , dl G (·) covers a wide range of metrics in probability theory such as total variation metric, Wasserstein metric etc. In this section, we consider a tailored metric 1 to the problem (1), that is, : z ∈ Z, i = 1, · · · , l, j = 1, · · · , m}. (6) We present the stability of SMPVCC (1) under the metric dl G (·). To this end, the following assumptions are needed.
Assumption 3.1. There exists a neighborhood U P of P under the metric dl G (·) such that, for any Q ∈ U P , the set of feasible points to problem (4) is nonempty.
Assumption 3.2. There exists a neighborhood U P of P under the metric dl G (·) and positive constant β such that for any Q ∈ U P and z ∈ Z where F Q denotes the set of feasible points to problem (4).
Assumption 3.1 is the nonempty condition of the feasible set of the perturbed problem. Assumption 3.2 is the error bound condition for the vertical complementarity constraints system. Error bound condition plays a key role in theoretical analysis and numerical methods for optimization problem and it has been well studied in the past decades. For the case that l = 2, Assumption 3.1 is known as natural type error bound whereas inequality (7) is known as S-type error bound of the complementarity constraint. Moreover, if E Q [F 1 (z, ξ)] := z and E Q [F 2 (z, ξ)] is a Lipschiz continuous, uniform P-function, Assumption 3.1 holds [2]. We refer readers interested in the topic to monograph [2], the survey paper by Pang [12] and a recent progress paper [4] on error bound of variational inequalities and complementarity problems.

Optimal solutions.
We are now ready to show the stability of SMPVCC (1) when the probability distribution Q approximates the true one P under the metric dl G (·). For simplicity, let • F P and F Q denote the sets of feasible points to problems (1) and (4); • S P and S Q denote the sets of optimal solutions to problems (1) and (4); • V P and V Q denote the optimal values to problems (1) and (4).
Theorem 3.3. Let Assumptions 3.1-3.2 hold. Suppose that Z and Ξ are compact sets. Then the following assertions hold: where β * := lmβ, l, m are the dimensions of F (z, ξ), β is the error bound constant in Assumption 3.2; (ii) lim sup Q→P S Q ⊆ S P ; (iii) if for any ξ ∈ Ξ, f (z, ξ) is Lipschitz continuous in z with bounded modulus L, then where L * := 1 + Lβ * , β * is given in part (i); (iv) if, in addition, problem (1) satisfies the growth condition, that is, there exists a positive constant γ such that where L * is given in part (iii).
Proof. Part (i). The proof of part (i) is an extend of the proof of [7, Proposition 3.1]. By Assumption 3.2, there exists a positive constant β such that for any z ∈ F Q d(z, F P ) where the equality follows from feasibility of z to problem (4) and m, l are the dimensions of F (z, ξ). By the arbitrariness of z ∈ F Q , dist D (F Q , F P ) ≤ lmβdl G (P, Q).
In a similar way, it is easy to show dist D (F P , F Q ) ≤ lmβdl G (P, Q). Subsequently, (8) holds. Part(ii) follows from Part (i) and the compactness of Z.
Part (iii). Let z 1 ∈ S Q and z 2 ∈ S P . Denote the projections of z 1 and z 2 on the set F P and F Q by z p 1 and z p 2 respectively.
where the first inequality follows from the fact that z p 2 ∈ F Q and the third inequality follows from the definition of dl G (·) and the Lipschitz continuity of f (z, ξ) with bounded modulus L.
If V P > V Q , Summarizing the discussion above, we arrive at (9).
Part (iv). For any z ∈ S Q , the growth condition implies where the second inequality follows from (9) and L * is given in part (iii). Then d(z, S P ) ≤ L * γ dl G (P, Q).
(10) follows from the arbitrariness of z ∈ S Q .

M-stationary points.
It is well-known in the literature that SMPVCC (1) is generically non-convex due to their combinatorial nature of the constraints. This motivates us to undertake stability analysis of stationary points, in addition to that of optimal value and optimal solutions. In this section, we focus on the stability of M-stationary points to SMPVCC (1). By introducing some slack and auxiliary variables, the first order optimality conditions of problem (1) which characterizes the M-stationarity can be reformulated as a constrained stochastic generalized equations [5]: where(z, u, y, x, v, α, β) ∈ W, F 1 (z, ξ) . . .
0 denotes the vector whose components are all 0. This means that w ∈ W is an Mstationary pair if and only if it is a solution of the stochastic generalized equations (11) and hence studying the stability of the stationary point amounts to that of the stochastic generalized equations.
Similarly, the stationary pair of problem (4) can be characterized by the following stochastic generalized equation: To characterize the stability of M-stationary points, we need to redefine the set G . Let W * be a compact subset of W. Define We are now ready to study the convergence of the M-stationary points of problem (4) to the true counterpart of problem (1).
Theorem 3.4. Assume: (a') there exist neighborhoods U P of P under metric dl F (·) and a compact set W * ⊂ W such that the set of stationary pair of problem (4), denoted by W Q , is not empty and for every Q ∈ U P , W Q ⊆ W * ;(b') for any ξ ∈ Ξ, f (z, ξ), F i,j (z, ξ), i = 1, · · · , l, j = 1, · · · , m are Lipschitz continuously differentiable and Lipschitz modulus are bounded by a positive constant L. Then for any > 0, there exists a δ > 0 such that Proof. The thrust of the proof is to apply Lemma 2.2 to stochastic generalized equations (11) and its perturbation (12). To this end, we verify the hypotheses of Lemma 2.2.
Observe first that Φ(·) is single valued, it is convex and compact valued and hence verifies (a). The upper semi-continuity of Φ(·) and its integrable boundedness follows from condition (b') and hence verifies (b). The condition (c) follows from the upper semi-continuity of normal cone, while (d) coincides with (a'). The proof is complete.
4. Discrete approximation. Discrete approximation of probability distributions is an important approach to solve a stochastic optimization problem. The well known Monte Carlo method and quasi-Monte Carlo method in stochastic programming are fundamentally based on discrete approximation. In this section, we focus on the case of approximating the distribution P with a discrete probability distribution P N .
Let Ξ N := {ξ 1 , · · · , ξ N } be a subset of Ξ and {Ξ 1 , · · · , Ξ N } be a Voronoi tessellation [14] of Ξ, that is, are pairwise disjoint subsets forming a partition of Ξ. Define where p i = Prob(Ξ i ) for i = 1, · · · , N . We call P N defined by (13) the Voronoi projection of the probability distribution P on space P(Ξ N ). Indeed, it has been shown that Voronoi projection P N converges to P under the Wasserstein metric. The Wasserstein metric dl W : P(Ξ) × P(Ξ) → R is defined as where G W = {g : g is Lipschitz continuous and Lipschitz modulus L q (g) ≤ 1} .
The following result characterizes the convergence of P N to P under the Wasserstein metric. Proposition 1. [14, Lemma 4.9] Let P ∈ P(Ξ) be fixed and P N be defined as in (13). Then where Proposition 1 provides an upper bound of dl W (P, P N ) by the Hausdorff distance between the support sets Ξ and Ξ N . Based on Proposition 1, we can present the quantitative stability of the optimal solutions and optimal values as follows.
Corollary 1. Let Assumptions 3.1-3.2 hold. Suppose that (a) Ξ is a compact set and Ξ N is a subset of Ξ, (b) For any ξ ∈ Ξ, f (z, ξ) and F i,j (z, ξ), i = 1, · · · , l, j = 1, · · · , m are Lipschitz continuous in z with bounded modulus L. Then the following assertions hold: (ii) if, in addition, problem (1) satisfies the growth condition, that is, there exists a positive constant γ such that Proof. By the Lipschitz assumption (b) and the definitions of the sets G (6) and G W (14), for any g ∈ G , g/L ∈ G W . Then dl G (P, P N ) ≤ Ldl W (P, P N ). The rest follows from Theorem 3.3 and Proposition 1 directly.
In general, the upper bound in Theorem 3.3 is much tighter than the one in Corollary 1. However, the upper bound in Corollary 1 is much easer to estimate than the one in Theorem 3.3.
If the discrete set Ξ N is randomly chosen, i.e., ξ 1 , · · · , ξ N is an independent and identically distributed (iid) sample, then we are able to employ the large deviation theorem to establish an exponential rate of convergence as stated in the proposition below.