An approximation scheme for stochastic programs with secondorder dominance constraints

It is well known that second order dominance relation between two random variables can be described by a system of stochastic semi-infinite inequalities indexed by $\mathcal R$, the set of all real number. In this paper, we show the index set can be reduced to the support set of the dominated random variable strengthening a similar result established by Dentcheva and Ruszczynski [9] for discrete random variables. Viewing the semi-infinite constraints as an extreme robust risk measure, we relax it by replacing it with entropic risk measure and regarding the latter as an approximation of the former in an optimization problem with second order dominance constraints. 
To solve the entropic approximation problem, we apply the well known 
sample average approximation method to discretize 
it. Detailed analysis is given to quantify both the entropic approximation and sample average approximation 
for various statistical quantities including the optimal value, the optimal solutions and the stationary points obtained from 
solving the sample average approximated problem. 
The numerical scheme provides an alternative to the mainstream numerical methods for this important class of stochastic programs.


1.
Introduction. Stochastic optimization models with stochastic second order dominance (SSD) constraints introduced by Dentcheva and Ruszczyński [9] provide important optimization models with effective constraints of risk control. Over the past decade, there has been increasing research on the subject ranging from the theory of optimality and duality to numerical methodology and practical applications, see [7,8,9,10,12,16,17,21,23] and references therein.
Stochastic dominance is a fundamental concept in decision theory and economics [26]. For random variable ξ(ω) with finite first moment, denote F 1 (ξ(ω), s) := Prob{ξ(ω) ≤ s} and F 2 (ξ(ω), t) := t −∞ F 1 (ξ(ω), s)ds. Then ξ 1 (ω) is said to do minate ξ 2 (ω) in the second order, denoted by ξ 1 (ω) 2 ξ 2 (ω), if The objective of this paper is to propose an approximation scheme for the following stochastic program with stochastic second order dominance constraints where X is a nonempty compact subset of R n , ξ : Ω → Ξ is a vector of random variables defined on probability (Ω, F, P ) with support set Ξ ⊂ R m , f, G : R n × Ξ → R are Lipschitz continuous functions and for every ξ ∈ Ξ, G(·, ξ) : R n → R is concave and continuously differentiable; Y (ξ) is a random variable, and E[·] denotes the expected value with respect to random variables ξ. For the simplicity of discussion, we make a blanket assumption that f and G are P -integrable. By changing the order of integration, (1) may be reformulated as: Here and later on we write (a) + for max{a, 0}. Then, problem (2) can be expressed as a stochastic semi-infinite programming (SSIP) problem: It is well-known that the above SSIP problem does not satisfy the well-known Slater's constraint qualification, a condition that a stable numerical method often relies on. Consequently, a so-called relaxed form of the SSIP is proposed [9]: where H(x, t, ξ) := (t − G(x, ξ)) + − (t − Y (ξ)) + and T is a compact subset of R.
The following question is whether there is a subset T of R such that the semiinfinite constraints of problems (4) and (5) are equivalent, in order words, the index set R can be reduced to T without affecting the semi-infinite constraints. This question has been partly addressed by Dentcheva and Ruszczyński [9] where they show the equivalence with T = Y (ξ) when Ξ is a finite set.
Here we first extend their result to the case when ξ is continuously distributed and then propose an approximation scheme fundamentally based on entropic risk measure to approximate the stochastic semi-infinite constraints. Specifically, by treating t as a random parameter, we use entropic risk measure of E[H(x, t, ξ)] to approximate the essential supremum of E[H(x, t, ξ)] (w.r.t. t) and then discretize the latter (which has two layers of mathematical expectations w.r.t. ξ and t) by the well known sample average approximation method. The main advantages of the proposed method are two-fold: (a) instead of discretizing ξ and t separately, we do through a single sampling on ξ and then transfer the discretization to t; (b) the resulting optimization problem has a single deterministic constraint. To ensure the approximation schemes are theoretically founded, we carry out detailed convergence analysis for the statistical quantities of the optimal value, the optimal solutions and the stationary points obtained from solving the each of the approximated regimes.
The rest of the paper are structured as follows. Section 2 provides a theoretical background for setting index T as the image of random variable Y (ξ) and give sketches of entropic approximation to the semi-infinite constraints. Section 3 details the stability analysis of entropic approximated problem against variation of the entropic parameter. Section 4 discusses sample average approximation of the entropic approximated program and investigates convergence of the optimal value, the optimal solutions, and the stationary points obtained from the sampled averaged approximated regime as sample size increases. Section 5 reports some numerical test results.
Throughout this paper, we use the following notation. For vectors a, b ∈ R n , we denote by a T b for the scalar product, · for the Euclidean norm of a vector, and · ∞ for the maximum norm of continuous functions. When D is set of R n , D := sup D∈D D . We write d(x, D) for the distance from a point x to a set D and B(x, δ) for the closed ball centered at x with radius δ, in particular B represents the closed unit ball in a finite dimensional space.
For a sequence of subsets {C k } in a metric space, we follow the standard notation [30] by using lim sup k→∞ C k and lim inf k→∞ C k to denote its outer limit and inner limit, that is, Finally, for a set-valued mapping A : 2. Reformulation. In this section, we first address the question as to whether we can reduce the index set R to a compact set T in the SSIP without affecting the semi-infinite constraints. The next proposition gives an affirmative answer and says that T can be indeed chosen as the image of Y (ξ) over support set Ξ. It is an extension from an earlier result by Dentcheva and Ruszczyński [9] where ξ is a discretely distributed random variable.
is equivalent to the semi-infinite system that is, the index set in system (7) can be effectively reduced to Y (Ξ) without affecting its solutions.
Proof. Given the reformulation (3) of second order dominance, it suffices to show that We do so by going through three cases: and F 2 (Y, ·) is continuous, by taking a limit on both sides of the inequality above, we arrive at .
and F (G, ·) is continuous, then In that case, there exist two adjacent convex subsets of Y (Ξ), denoted by T 1 and T 2 , such that T 1 ∩ T 2 = ∅ and t 1 := sup T 1 ≤ t ≤ t 2 := inf T 2 . We first show If t 1 , t 2 / ∈ Y (Ξ), we may verify the formulae above by similar analysis of case 1 for t 2 and case 2 for t 1 . For t ∈ (t 1 , t 2 ), we have by the convexity of F 2 (G, ·) that, for any 0 ≤ λ ≤ 1, the equality holds by taking λ = (t 2 − t)/(t 2 − t 1 ). Summarizing the analysis above, we conclude that holds. The proof is complete.
In what follows, we focus on the case where T = Y (Ξ) is a compact set. Let h(x, t) := E[H(x, t, ξ)], we may rewrite problem (5) as and further as Of course, writing the semi-infinite constraints as above does not bring us any numerical convenience. Our plan here is to consider an approximation of sup t∈T h(x, t) through entropic risk measure so that the constraint can be handled numerically more conveniently. This kind of approximation scheme was recently considered by Anderson et al. [1] using CVaR and Liu and Xu with entropic approximation [25]. The rationale behind the scheme is that if we regard sup t∈T h(x, t) as an extremely robust risk measure, then the CVaR of h(x, t) would be a relaxation depending on the confidence level to be set. This kind of relaxation might not significantly affect the optimal decision through stability analysis but does allow one to take a few more samples near the extremum to calculate the CVaR (as opposed to a single sample at the extremely robust constraint) and in that way "smooth up" or "stabilize" the numerical calculation. Here we take a similar initiative but adopt entropic risk measure approximation in that the latter is a smooth function and fits to a broader class of random functions. For a random variable Z ∈ L ∞ , the entropic risk measure is defined as It is well known (see e.g. in the case when ess inf Z > −∞, where ess sup (inf) denotes the essential supremum (infimum). The following lemma states how entropic risk measure approximation works for a general random function.
Lemma 2.1. [25, Proposition 2.1] Let h : R n × R k → R be a continuous function and X be a subset of R n . Let ξ : Ω → R k be a random variable defined on the probability space (Ω, F, P ) with support set Ξ. Let τ (x), F (·) and Ω x denote respectively the essential supremum, the cumulative distribution function and the support Then for each fixed x ∈ X, Assume in addition that (c) and (d) for any fixed small positive number , there exists δ( ) ∈ (0, 1) such that

YONGCHAO LIU, HAILIN SUN AND HUIFU XU
where X := {x ∈ X : Diam(Ω x ) > 2 }. Then Following Lemma 2.1, we may consider an approximation of problem (10) by where Here and later on, P 0 is the distribution of Y (ξ) induced by the distribution P of ξ.
3. Stability Analysis. We begin by looking into the impact of parameter γ on the optimal value and the optimal solutions of problem (11) as γ increases. This may be covered by classical stability analysis of parametric programming. Here we include some detailed analysis in that we may exploit the specific structure of the entropic approximation to establish some useful quantitative analysis results.
The following proposition presents a quantitative relationship between of the two feasible sets under Hausdorff distance. and for any fixed small positive number , there exists δ( ) ∈ (0, 1) such that where F (·) denotes the cumulative distribution function of −h(x, ζ), and Ω x is the support set of random h(x, t), (c) there exist a positive number δ and a pointx ∈ X such that Then there exists a positive number τ * such that where and ∆ γ → 0 as γ → ∞.
Proof. Observe that F ⊆ F(γ). Thus by the definition of H, we only need to estimate D(F(γ), F). Since G(·, ξ) is concave and h(x, t) is convex in x, we deduce that condition (13) . The boundedness of τ * is ensured by the compactness of the set X. Consequently, Moreover, under conditions (a)-(b), it follows by [25, Lemma 2.1] that ∆ γ → 0 as γ tends to infinity. The proof is complete.
Lemma 3.1 says that the feasible set F(γ) converges to F in Hausdorff distance as γ tends to infinity. The condition (12) is similar to the so-called consistent tail behavior condition for CVaR approximation of the essential superemum of a random function in [1], see [25] for more details where a simple example is presented. Condition (13) is needed to ensure the quantitative estimation (14). Using Lemma 3.1, we can present the stability of the optimal value and the optimal solutions to problem (11) .
Theorem 3.2. Let X be a compact set, X(γ) and X * denote the sets of optimal solutions of problems (11) and (10) respectively, v(γ) and v * be the corresponding optimal values. Then (ii) if, in addition, problem (9) satisfies condition (13) and f (·, ξ) is Lipschitz continuous with integrably bounded modulus L(ξ), then there exists a positive number K such that where ∆ γ is defined as in Lemma 3.1.
Proof. The proof is similar to [25,Theorem 1]. Here we give some sketches for completeness. Part (i). It is sufficient to show convergence of optimal solutions in that the objective function E[f (x, ξ)] is continuous and independent of γ. Assume for a contradiction that there exists x k ∈ X(γ k ) such that x k → x * as k → ∞ and x * ∈ X * . Then x * ∈ F and there existsx ∈ X * such that ≤ 0, which contradicts with (15). Part (ii). Let x γ and x 0 be the optimal solutions of problems (11) and (10) respectively. By the continuity of the feasible set F(γ) under Hausdorff distance as demonstrated in Lemma 3.1, there existsx γ ∈ F such that where K := τ * E[L(ξ)]. Swapping the position between F(γ) and F under Hausdorff distance, we obtain v(γ) ≤ v * + K∆ γ . The conclusion follows.

Stationary points.
We now move on to investigate the case when we only obtain a stationary point rather than an optimal solution from solving the entropic approximation problem (11). This is motivated to address the case when f (x, ξ) is not convex in x. We start by defining the stationary points of problems (10) and (11).
Let ψ : R n → R be a locally Lipschitz continuous function. Recall that Clarke subdifferential of ψ at x, denoted by ∂ψ(x), is defined as follows: where D denotes the set of points near x at which ψ is Fréchet differentiable, ∇ψ(x) denotes the gradient of ψ at x and 'conv' denotes the convex hull of a set, see [6] for details. Let Since G(·, ξ) is concave, h(x, t) is convex in x and hence it is Clarke regular (see [ where P[S] signifies the collection of probability measures supported on S and the integral is in the sense of Aumman [2]. By [31, Chapter 2 Theorem 9], and Obviously, for any ξ ∈ Ξ, ∂ x H(x, t, ξ) is outer semicontinuous in (x, t) and so does ∂ x h(x, t).
A feasible point x is said to be a stationary point of problem (10) if there exists ρ such that where N X (x) denotes the Clarke normal cone to X at x, that is, for x ∈ X, and N X (x) = ∅ when x ∈ X and "⊥" denotes perpendicularity of two vectors. Likewise, a feasible point x is said to be a stationary point of problem (11) if there exists ρ γ such that Since logarithmic function log(·) is strictly differentiable and h(x, t) is convex in x, by [6, Theorems 2.3.9 and 2.7.2], .
In what follows, we show the stationary point satisfying (21)- (22) converges to the stationary point satisfying (19)- (20). To this end, we need the following intermediate result which characterizes the asymptotic convergence of subdifferential ∂e γ k (x k ) as k → ∞.
Proposition 2. Let γ k → ∞ and {x k } ∈ X be a sequence converging tox. Suppose that the Lipschitz modulus of f (·, ξ) and G(·, ξ) are bounded by a positive constant L. Then if one of the following conditions holds: (a) T * (x) encompasses finite points, where T * (x) is defined as in (16); is not a finite set but the probability P 0 {T * (x)} is strictly positive and ∂ x h(·, t) is outer semicontinuous at x uniformly for t ∈ T * (x).
Proof. By the definition of h(x, t), it is easy to show that h(·, t) is Lipschitz continuous over X with modulus L under the assumption that the Lipschitz modulus of f (·, ξ) and G(·, ξ) is bounded by L. Thus Condition (a). Let us label the points in T * (x) as t 1 , · · · , t M . Let be a small positive number. Since ∂ x h(x, t) is outer semicontinuous in (x, t), there exists a positive number δ such that where T (t i , δ) denote the δ neighborhood of t i . Since there are finite number of points in T * (x), we may set δ sufficiently small such that T (t i , δ) ∩ T (t j , δ) = ∅. Let k 0 be sufficiently large such that for k ≥ k 0 , x k ∈ B(x, δ). Let µ(x, t; γ)P 0 (dt) and α i (x) := lim k→∞,δ→0 µ(x k , t; γ k )∂ x h(x k , t)P 0 (dt).

It is known that lim
µ(x k , t, γ k ) = 0, see for example [22,Theorem 5.3]. Together with (24), we see that the first term ∆(δ) at the right hand side (rhs for short) of the inequality above goes to zero as k → ∞. Moreover, by (25)-(26), the second term tends to zero as → 0 (hence δ → 0) and k → ∞.
Condition (b). Let be a small number. Under the condition there exists a positive number δ such that Moreover, T * (·) as the set of optimal solution is outer semicontinuous on X, see [30,Theorem 1.17]. Hence x∈B(x,δ) T (x) ⊆ T * (x) + B.
Let T := T \ (T * (x) + B). Then where the inequality follows from (6). Let R 1 , R 2 and R 3 denote the three items at the rhs of inequality of (28) in sequel. By [22,Theorem 5.3], Together with (24), there exists a sufficiently large k 1 such that, when k ≥ k 1 , x k ∈ B(x, δ) and By the continuity of µ(x, t; γ) and the fact that the probability Prob P0 {T * (x)} is not zero, it follows by [22,Theorem 5.3] that there exist constants C > 0 and which implies Now let us estimate R 3 . Let η k := T * (x) µ(x k , t, γ k )P 0 (dt). By (29) and (31), there exists k 3 ≥ k 2 such that for any k ≥ k 3 , η k ≥ 1 − (C + 1) . Let .
Then we may view ν k (t) as a density function of some probability measure over set T * (x). By the definition of ∂θ(·), and hence The last inequality is due to [15,Theorem 3.5] for the exchange of integration and D. Let us write ∂h(x k , t) Then ∆ k → 0 and The first term at the rhs of the inequality goes to zero under condition (b).
With the proposition, we are ready to state our main stability result in this section. Theorem 3.3. Let X be a compact set, γ k → ∞ and (x k , ρ k ) be a KKT pair of problem (11). Let (x * , ρ * ) be an accumulation point of sequence {(x k , ρ k γ k )}. Under the conditions of Proposition 2, (x * , ρ * ) is a KKT pair of problem (10).
The theorem assumes convergence of the Lagrange multiplier ρ k . A sufficient condition for this is condition (13), under which we can easily show that ρ k is bounded. 4. Sample average approximation method. In practice, the distribution of ξ is often unknown or it is numerically too expensive to calculate the expected values. Instead it might be possible to obtain a sample of the random vector ξ from past data. This motivates us to find an approximate optimal solution to problem (11) on the basis of the sampling information. This kind of statistical approximation is guaranteed by the classical law of large numbers in statistics.
Let ξ 1 , · · · , ξ N be an independent and identically distributed sample of ξ, let t j = Y (ξ j ), for j = 1, · · · , N and h N (x, t i ) := 1 N N k=1 H(x, t i , ξ k ). We consider the following sample average approximation (SAA for short) problem: We call (34) SAA problem and (11) true problem. Let x N be an optimal solution of problem (34). We analyze the convergence of x N as the sample size N increases.
Proof. Under condition (a), we can easily show uniform convergence of f N (·) to E[f (·, ξ)] and e N γ (·) to e γ (·), see for example [11,Corollary 3.7]. The convergence of the optimal value and optimal solutions follows from well known stability result in nonlinear programming (see e.g. [36, Lemma 2.2] and the references therein) and [25, Lemma 3.1] for a similar proof.
We now move on to investigate the case when we obtain a stationary point rather than an optimal solution from solving the sample average approximation problem (34).
A feasible point x is said to be a sample average approximated stationary point (SAA-stationary point) of problem (34) if there exists ρ N such that By [6,Propostion 2.3.3], where and ∂ x H(x, t, ξ) is defined in (18). Proof. Let Φ N (x) be defined as in (37) and x N ∈ X N (γ). Then By taking a subsequence if necessary, we assume for the simplicity of notation , then we will arrive at To this end, we define where is a positive number. Let We will demonstrate lim sup and then By definition of D, Let R N 1 and R N 2 denote the two terms at the rhs of the inequality above in sequel. By the properties of D, where Under the uniform Lipschitz continuity of G(·, ξ) and outer semi-continuity of ∂ x H(·, ·, ξ), it follows by [32,Theorem 2] that almost surely (a.s. for short). Thus, for N sufficiently large, where ∆ ,δ (t) := D ∂ +δ x h(x * , t), ∂ x h(x * , t) . Notice that ∆ ,δ (t) is bounded, by Lebesgue's Dominated Convergence theorem, ∆ ,δ (t)P 0 (dt) = 0 a.s.. Next, we estimate R N 2 . By definition Since ∂ x h(x, t i ) is uniformly bounded on X × T and by [31, Chepter 6, Proposition 7], 1 N N i=1 φ N (·, t) and 1 N N i=1 ψ N (·, t) to the same limit, the rhs of the inequality goes to zero as N → ∞. By [32,Theorem 1], Driving to zero, we obtain (40) as desired.
5. Numerical results. In this section, we investigate the numerical performance of the approximation scheme (34). We do so by applying it to an academic example and a practical portfolio selection problem with empirical data. For fixed parameter γ and sample size N , problem (34) is standard nonlinear programming. By exploiting NLP solvers for problem (34), we implement the Algorithm as follows.
Since we use the approximation of the constraints as the termination condition, that is, then for any given positive number , Lemma 2.1 and the proof of Theorem 4.1 ensure that the algorithm 1 should terminate at finite steps. But, the number of iteration will go to infinity as → 0. We use the random number generator rand in Matlab R2012a to generate the samples of ξ and solver fmincon in the Step 1. The tolerance , step-size λ and initial value of γ are set 10 −5 , 2 and 100 respectively. Example 1. We consider a variation of [33,Exmaple 5.1]: ξ) and ξ is a random variable uniform distributed over interval [2,3].
It is rather complicated to work out the feasible set precisely. Here we only need to find out the optimal solution of problem (44). Observe first that if it satisfies the second order dominance constraints. For every x ∈ [1, 3], G(x, ξ) dominates G(1, ξ) in first order as the cumulative distribution function of the former is strictly smaller than that of the latter. This implies G(x, ξ) dominates G(1, ξ) in second order for x ∈ [1,3]. Thus x * = 5 √ 2 4 ≈ 1.7678 is the optimal solution. We apply Algorithm 1 to solve the problem and report the results in Table 1. Here we use the following notation. N -the sample size, γ -the parameter in entropy approximation, Appr.Sol -the approximate optimal solution and Appr.V al -the optimal value, t -the execution time in seconds. The results show that we can obtain an acceptable approximate optimal solution in a small sample size for this problem because the range of the support set is just 1. Increasing the sample size does not help improve the approximate solution due to computational errors.

Example 2.
In what follows, we consider a portfolio optimization problem. To simplify the discussions, we ignore the transaction fee, therefore we may denote the total value of portfolio as f (x, r) := r 1 x 1 + r 2 x 2 + · · · + r n x n , where r i is the random return of asset i. Moreover, we set G(x, r) = f (x, r) and use the equally weighted portfolio as the benchmark strategy y(r). The corresponding We carry out numerical tests of problem (45) on a historical data set of 10 assets (Aberdeen Asset Management plc, Admiral Group PLC, AMEC PLC, Anglo American PLC, Antofagasta PLC, AstraZeneca PLC, Aviva PLC, Babcock International Group PLC, BAE Systems PLC and Barclays PL) over a time horizon of 4 years (from 7th Dec 2009 to 18th Nov 2013) with a total of 1001 records on the historical stock returns (these are obtained from http://finance.google.com with adjustment for stock splitting). We have carried out out-of-sample tests with a rolling window of 500 days, that is, we use first 500 data to calculate the optimal portfolio strategy for day 501 and then move on a rolling basis. Figure 1 depicts the performance of second order dominance model (45). As expected, the cumulative wealth from the optimal strategy performs better than that of the benchmark strategy with an increasing margin from 0 to about 0.15 at the end of the observed time period which is 15% of the total asset in the initial investment.