Linear Programming Formulations of Deterministic Infinite Horizon Optimal Control Problems in Discrete Time

This paper is devoted to a study of infinite horizon optimal control problems with time discounting and time averaging criteria in discrete time. We establish that these problems are related to certain infinite-dimensional linear programming (IDLP) problems. We also establish asymptotic relationships between the optimal values of problems with time discounting and long-run average criteria.


Introduction
The linear programming (LP) approach to control systems is based on the fact that the occupational measures generated by admissible controls and the corresponding solutions of a dynamical system satisfy certain linear equations that represent the system's dynamics in an integral form. The idea of such linearization was explored extensively in both deterministic and stochastic settings (see, e.g., [5], [8], [9], [13], [24], [30], [31] and, respectively, [1], [12], [14], [15], [16], [17], [18], [21], [23], [25], [27], [29], [33] as well as references therein). In [15] and [16] in particular, the validity of LP formulations of deterministic infinite time horizon problems of optimal control with time average and time discounting criteria was proved for systems evolving in continuous time (note that other approachers/techniques for dealing with deterministic optimal control problems on the infinite time horizon have been studied, e.g., in [4], [7], [10], [34]; see also references therein). In the present paper, we show that the LP formulations of problems of optimal control with time average and time discounting criteria are valid for systems evolving in discrete time.
Note that some of the results of [15] and [16] were obtained under certain technical assumptions. For example, the statement implying the validity of the LP formulation of the long run average optimal control problem (see Theorem 2.6 in [16]) was proved under the assumption that the dependence of the control set on the state variables is Lipschitz continuous. These assumptions can be significantly relaxed in dealing with the discrete time systems. In particular, the result about the validity of the LP formulation of the long run average optimal control problem in discrete time is established in this paper under the assumption that the dependence of the control set on the state variables is upper semicontinuous. Also, it is worth noting that the results in [16] (see also Remark 4.5 in [15]) are stated with the use of the relaxed controls formalism, the latter playing no role in tackling the discrete time systems.
(1) 1 AMS subject classification: 49N15, 93C55 Here Y is a given nonempty compact subset of IR m , U (·) : Y U 0 is an upper semicontinuous compactvalued mapping to a given compact metric space U 0 , f (·, ·) : IR m × U 0 → IR m is a continuous function.
Note that the last two constraints of (1) can be rewritten as one: where the map A(·) : Y U 0 is defined by the equation As can be readily verified, the map A(·) is upper semicontinuous and its graph G, A control u(·) and the pair (y(·), u(·)) will be called an admissible control and, respectively, an admissible process if the relationships (1) are satisfied. The sets of admissible controls will be denoted by U (y 0 ) or U S (y 0 ), depending on whether the problem is considered on the infinite time horizon (t ∈ T := {0, 1, . . . }) or on a finite time sequence (t ∈ {0, . . . , S − 1}, where S is a positive integer).
Consider the optimal control problem min where g : IR m × U 0 → IR m is a continuous function and α ∈ (0, 1) is a discount factor. Consider also the optimal control problem min u(·)∈U S (y 0 ) Everywhere in the paper, it is assumed that A1. The set U (y 0 ) is not empty (that is, there exists at least one admissible control).
As shown below (see Propositions 2.1 and 2.3), the minima in (2) and (3) are achieved if A1 is satisfied. To obtain our main results, we use a stronger assumption: A2. The set A(y) is not empty for any y ∈ Y .
This assumption implies non-emptiness of U (y) for any y ∈ Y (systems that satisfy such a property are called viable; see [3]).
Along with optimal control problems (2) and (3), let us consider two infinite-dimensional (ID) linear programming (LP) problems: and min γ∈W G g(y, u)γ(dy, du) := g * , where W α (y 0 ) and W are subsets of P(G) (here and in what follows P(G) stands for the space of probability measures on Borel subsets of G) defined by the equations: and Note that (4) and (5) are indeed LP problems since both the objective functions and the constraints defining W α (y 0 ) and W are linear in the "decision variable" γ. Note also that W can be obtained from W α (y 0 ) by setting α = 1.
In the paper, we prove that, under Assumption A2, and the limits lim α↑1 min y∈Y (1 − α)V α (y) and lim S→∞ min y∈Y 1 S V (S, y) exist and are equal to g * : It is worth mentioning that there exists an extensive literature devoted to the relationship between the limits of the sums (1 − α) showing that these limit may not exist (see, e.g., [6], where relationships between the corresponding lower and upper limits were investigated). However, provided that the sequence {b t } is bounded, the existence of one of these limits implies the existence of the other and their equality (see, e.g., [32]). In the context of optimal control in discrete time, relationships between the lower and upper limits of (1 − α)V α (y) and 1 S V (S, y) were studied, e.g., in [26] and [28]. The (full) aforementioned limits may not exist, and, as was shown in [26] (without the assumption about the compactness of the set of admissible states Y ), these limits, even if exist, may be different. As mentioned above, in this paper we establish that, under the validity of A2, the limits of the minima over the initial conditions of (1 − α)V α (y) and 1 S V (S, y) exist and are equal to the optimal value of the IDLP problem (5).
The paper is organized as follows. Section 2 contains some preliminary results used in the sequel. In Section 3, we introduce discounted and "non-discounted" occupational measures and we reformulate problems (2) and (3) in terms of minimization over the sets of such measures. In Section 4, we establish that (8) is valid, and in Section 5 we prove the validity of (9). In this section, we also establish asymptotic properties of the sets of discounted and non-discounted occupational measures. In Section 6, we prove auxiliary results that are used in Sections 4 and 5.

Preliminaries
Everywhere in this and the following sections, it is assumed that A1 is satisfied.
Proof. For an admissible process (y(·), u(·)), denote J α (u, y 0 ) := ∞ t=0 α t g(y(t), u(t)). Let u k (·), k = 1, 2, . . . be a minimizing sequence of controls and let y k (·) be the corresponding sequence of trajectories. By using the diagonalization argument and taking into account compactness of G, we can find convergent subsequences (we do not relabel) u k (t) →ū(t) and y k (t) →ȳ(t) for all t. By passing to the limit in the relation y k (t + 1) = f (y k (t), u k (t)) as k → ∞ we conclude that the process (ȳ(·),ū(·)) is admissible. For any natural N we have Take ε > 0 and find N large enough so that the second sum does not exceed ε/2 for all k, then the first sum can be made less than ε/2 by taking sufficiently large k. Therefore, J α (u k , y 0 ) → J α (ū, y 0 ) as k → ∞, which implies that the process (ȳ(·),ū(·)) is optimal.
Proposition 2.2 The optimal value function V α (·) is lower semicontinuous.
Proof. Take a sequence y 0k → y 0 as k → ∞ such that V α (y 0k ) < ∞. Let u k (·) be the corresponding sequence of minimizing controls, that is, controls such that V α (y 0k ) = J α (u k , y 0k ). We want to show that lim inf k→∞ V α (y 0k ) ≥ V α (y 0 ). Without loss of generality assume that lim inf k→∞ V α (y 0k ) is reached on the same sequence y 0k . Again, using the diagonalization argument and passing to a subsequence, we can assume that u k (t) converges to admissible controlū(t) for all t. Using the same argument as in the proof of Proposition 2.1 we can show that lim k→∞ J α (u k , y 0k ) = J α (ū, y 0 ). We have which is the required inequality. (3) is achieved and the optimal value function V (S, ·) is lower semicontinuous.

Proposition 2.3 The minimum in
Proof. The fact that the minimum in (3) is achieved is obvious (since it is a finite-dimensional problem on a compact set), and the fact that V (S, ·) is lower semicontinuous is proved similarly to Proposition 2.2.
Proof. The proof follows from the fact that the functions V α (·) and V (S, ·) are lower semicontinuous.
Proposition 2.5 For any y ∈ Y such that V α (y) < ∞, the following equation is valid Proof. The proposition is the well known dynamic programming principle for problem (2). For completeness of the exposition, we reproduce its proof in Section 6.
For a lower semicontinuous function ψ : Y → IR, let H ψ (y) be defined as follows Then equation (10) can be written as which resembles the Hamilton-Jacobi-Bellman equation for continuous time systems; see, e.g., [4].
To describe convergence properties of occupational measures, we introduce the following metric on P(G): . . , is a sequence of Lipschitz continuous functions dense in the unit ball of the space of continuous functions C(G) from G to IR. This metric is consistent with the weak * convergence topology on P(G), that is, a sequence γ k ∈ P(G) converges to γ ∈ P(G) in this metric if and only if lim k→∞ G q(y, u)γ k (dy, du) = G q(y, u)γ(dy, du) for any q ∈ C(G). Note that the sets W α (y 0 ) and W are compact in this topology.
Using the metric ρ, we can define the "distance" ρ(γ, Γ) between γ ∈ P(G) and Γ ⊂ P(G) and the Hausdorff metric ρ H (Γ 1 , Γ 2 ) between Γ 1 ⊂ P(G) and Γ 2 ⊂ P(G) as follows: Note that, although, by some abuse of terminology, we refer to ρ H (·, ·) as a metric on the set of subsets of P(Y × U ), it is, in fact, a semi metric on this set (since ρ H (Γ 1 , Γ 2 ) = 0 implies Γ 1 = Γ 2 if Γ 1 and Γ 2 are closed and the equality may not be true if at least one of these sets is not closed).
Introduce the following notation for the sets of occupational measures: Due to (13) and (14), problems (2) and (3) and min respectively.
Note that from Proposition 4.1 it follows that Let LS be the class of bounded lower semicontinuous functions from Y to IR. Note that V α (·) ∈ LS if Assumption A2 is satisfied. In fact, in this case From this point on, it is everywhere assumed that Assumption A2 is indeed satisfied.
Our first main result is the following theorem.

Theorem 4.3
The optimal values in problems (4) and (18) coincide and are equal to the optimal value of (2) multiplied by (1 − α), that is, Moreover, the supremum in (18) is reached at ψ = V α .

Corollary 4.4
The following equality is valid whereco stands for the closure of the convex hull of the corresponding set.
Proof. Due to (4) and (15) Since the latter is valid for any continuous g, it proves the validity of (23).
Remark 4.5 Note that problem (18) can be shown to be equivalent to the problem dual to the IDLP problem (4) (see Appendix of [15]), with the equality of the optimal values being a part of the duality relationships between these two problems.

Validity of (9)
Let us introduce the following notation: where the minimization is over admissible controls and over the initial conditions in Y .
The main results of this section are Theorems 5.1 and 5.7 below. In Theorem 5.1 we, in particular, establish existence and equality of the limits in (9) The proof is broken down into a series of propositions and lemmas.
Proof. Take any ψ ∈ LS. Integrating the inequality Taking minimum with respect to γ ∈ W and supremum with respect to ψ ∈ LS, we conclude that Let us show the opposite inequality. Define that is, compared to (25), supremum in the formula above is taken with respect to continuous, rather than lower semicontinuous bounded functions. It is clear that therefore µ * C < ∞.
be a sequence of functions in C(Y ) with the following properties: (i) any finite collection of functions from this sequence is linearly independent on Y , (ii) for any ψ ∈ C(Y ) and any δ > 0 there exist N and scalars λ N i , i = 1, . . . , N such that sup It's easy to see that the setQ is compact and for any j = 1, 2, . . . the point (g * − 1 j , 0) does not belong toQ, where 0 is the zero element of l 1 (otherwise, g * is not the minimum in (5)). Due to Hahn-Banach separation theorem (see, e.g., [11], Section V.2) there exists a sequence (κ j , λ j ) ∈ IR × l ∞ (where λ j = (λ j 1 , λ j 2 , . . . )) such that where δ j > 0 for all j and ψ λ j := ∞ i=1 λ j i φ i . From the last formula it is easy to see that κ j ≥ 0. Let us show that, in fact, κ j > 0. Indeed, if it was not the case and κ j = 0, then we would have which is a contradiction to (29). Thus, κ j > 0. Dividing (30) through by κ j we obtain Therefore, g * ≤ µ * C . Taking into account inequalities (26) and (28) we conclude that g * = µ * . Proof. Let us show that Indeed, let α i ↑ 1, y i ∈ Y and γ i ∈ W α i (y i ) be such that γ i → γ. We have f (y, u)) − ϕ(y))γ i (dy, du).
The following two lemmas, proved in the Appendix, are discrete-time analogs of [19], Lemma 3.5 (ii) and [20], Lemma 3.8. For v ∈ IR the notation [v] stands for the integer part of v.
Lemma 5.4 Let g : T → IR be a function such that |g(t)| ≤ M for all t. Let α ∈ (0, 1) and Then for any ε > 0 there exists a positive integer T ≥ ε (4M + 4|σ| + ε)(− ln α) satisfying Lemma 5.5 Let g : T → IR be a function such that |g(t)| ≤ M for all t. Let t be an arbitrary positive integer and For any ε > 0 there exists t * ∈ {0, . . . , t − 1} such that Proposition 5.6 The limit lim S→∞ G S exists and is equal to g * .
Proof. Let us show first that lim sup Take a sequence S i → ∞ as i → ∞ and let γ i ∈ Γ S i be such that γ i → γ. Since γ i ∈ Γ S i , there exists an initial condition y 0i and a control u i (·) ∈ U S i (y 0i ) such that for the corresponding trajectory y i (·) and any ϕ ∈ C(Y ) we have Therefore,

G
(ϕ(f (y, u)) − ϕ(y))γ(dy, du) = lim i→∞ G (ϕ(f (y, u)) − ϕ(y))γ i (dy, du) = lim i→∞ 1 S i (ϕ(y i (S i ) − ϕ(y 0i )) = 0 due to boundedness of Y . Thus, γ ∈ W , i.e, inclusion (38) holds, which implies that Take a sequence α i ↑ 1. Due to Proposition 5.3 there exists a sequence of initial conditions y 0i , controls u i (·) ∈ U (y 0i ) and the corresponding trajectories y i (·) such that where lim i→∞ ξ i = 0. Applying Lemma 5.4 with σ = g * + ξ i and ε = √ − ln α i we conclude that there exists a sequence S i , such that S i ≥ K/ √ − ln α i (K is a constant independent of i) and therefore, lim inf S→∞ G S ≤ g * . Together with (39) this implies that lim inf The latter means that where lim i→∞ η i = 0. Let us apply Lemma 5.5 in which S i plays the role of t and σ = g * + η i . Set ε = 1/S i , denote the value corresponding to t * by t i and l(S i ) := S i − t i . We conclude that l(S i ) → ∞ as i → ∞ and Letũ i (·) = u i (t i + ·),ỹ i (·) = y i (t i + ·). Note that (ũ i ,ỹ i ) is an admissible process. It follows from (42) that which, along with (41), completes the proof of the proposition.
Combining the assertions of Propositions 5.2, 5.3, and 5.6, we complete the proof of Theorem 5.1.
The theorem below asserts convergence of the sets of occupational measures Γ α and Γ S defined in Section 2 to W given by (43).
The proof of this relation is based on formula (43) and weak * separation theorem. It follows the same steps as the proof of Proposition 6.1 in [15], starting with formula (6.6). The only difference is that the parameter C, approaching 0 in [15], should be replaced with α, approaching 1. We do not reproduce this proof here.
The proof of the second equality of the theorem lim S→∞ ρ H (co Γ S , W ) = 0 is very similar to the proof of (46). Namely, Proposition 5.6 can be written in terms of occupational measures as The rest of the proof follows from (47) and (48) using weak * separation theorem following the lines of [15], as described above.
In the case if 0 <T < 1, then 1