TIME-INCONSISTENT OPTIMAL CONTROL PROBLEMS WITH REGIME-SWITCHING

. In this paper, a time-inconsistent optimal control problem is studied for diﬀusion processes modulated by a continuous-time Markov chain. In the performance functional, the running cost and terminal cost depend on not only the initial time, but also the initial state of the Markov chain. By mod- ifying the method of multi-person game, we obtain an equilibrium Hamilton-Jacobi-Bellman equation under proper conditions. The well-posedness of this equilibrium HJB Equation is studied in the case where the diﬀusion term is independent of the control variable. Furthermore, a time-inconsistent linear- quadratic control problem is considered as a special case.


1.
Introduction. It is well-known that Bellman's optimality principle plays a key role in the classical optimal control theory. However, there are many examples, i.e. the so-called time-inconsistent control problems, in which this principle does not hold, such as optimal control problems with non-exponential discounting and mean-variance portfolio selection (see e.g. [7] and [1]). In the seminal work [18], Strotz studies a cake eating problem within a game theoretic framework where the players are the agent and his/her future selves, and seeks a subgame perfect Nash equilibrium point for this game. Strotz's work has been pursued by many others, such as [14], [13], [11] and [12], among others.
In recent years, research on the time-inconsistent optimal control in stochastic continuous-time setting has attracted an increasing attention. In [7] and [8], the precise definition of the equilibrium concept in continuous time is provided for the first time. Following the notion of equilibrium strategy in the previous papers, [3] studies the time-inconsistent control problem in a general Markovian framework, and derives an extended HJB equation as well as the verification theorem. In [4], the Markowitz's problem with state-dependent risk aversion is investigated by utilizing the extended HJB equation obtained in [3].
Another approach to the time-inconsistent control problem, i.e. the method of multi-person game, is developed by [21,22]. In these papers, the running cost and terminal cost functions depend on the initial time in some general way. A brief description of the method of multi-person game is given as follows. Let T > 0 be the fixed time horizon. Take a partition Π = t k 0 ≤ k ≤ N of 586 JIAQIN WEI the time interval [0, T ] with 0 = t 0 < t 1 < · · · < t N = T , and with the mesh size Π = max 1≤k≤N (t k − t k−1 ) . Consider an N -person differential game: for k = 1, 2, · · · , N , the k-th player controls the system on [t k−1 , t k ), starting from the initial state (t k−1 , X(t k−1 )) which is the terminal state of the (k − 1)-th player, and tries to minimize/maximize his/her own performance functional. Each player knows that the later players will do their best, and will modify their control systems as well as their cost functionals. In the performance functional, each player discounts the utility in his/her own way. Then for any given partition Π, a Nash equilibrium strategy is constructed to the corresponding N -person differential game. Finally, it can be shown that as the mesh size Π approaches to zero, the Nash equilibrium strategy to the N -person differential game approaches to the desired time-consistent solution of the original time-inconsistent problem. By this method, [21] considers a deterministic time-inconsistent linear-quadratic control problem. Considering a controlled stochastic differential equation with deterministic coefficients, [22] investigates a time-inconsistent problem with a general cost functional and derives an equilibrium HJB equation. For more research following the method of multi-person game, we refer the readers to [23], [26] and [19].
In this paper, we study the time-inconsistent optimal control problem for diffusion processes with regime-switching. More specifically, the drift and diffusion terms of the diffusion process are modulated by an exogenous continuous-time finite-state Markov chain. This kind of process has been extensively used in finance and actuarial science to describe the volatile financial market in the long run, see [5], [27], [25], [17] and [20], among others. Similar to [22], the objective of this paper is to derive an equilibrium Hamilton-Jacobi-Bellman equation for the time-inconsistent control problem. However, the running cost and terminal cost in this paper are allowed to depend on both the initial time and the initial state of the Markov chain. To handle the time-inconsistency caused by the Markov chain, we need to modify the method of multi-person game as follows (see Section 3 for more details): in each time interval [t k−1 , t k ), the k-th player only controls the system up to the first time the Markov chain jumps in [t k−1 , t k ), and then some other player controls the system on the rest of time in [t k−1 , t k ). Obviously, the second player in [t k−1 , t k ] arrives randomly as the Markov chain may not jump during [t k−1 , t k ). Thus, given a partition of the planning horizon, we study a multi-person game with random arrivals.
The remainder of this paper is organized as follows. Section 2 introduces the timeinconsistent optimal control problem. Section 3 studies the multi-person differential game and derives an equilibrium Hamilton-Jacobi-Bellman equation. Section 4 shows the well-posedness of the equilibrium HJB Equation in the special case where the diffusion term is independent on the control variable. Section 5 investigates a time-inconsistent linear-quadratic control problem with regime-switching. 2. The time-inconsistent optimal control problem. Let T > 0 be a fixed finite time horizon, W (·) be a 1-dimensional standard Brownian motion, and α(·) be a homogeneous, irreducible continuous-time Markov chain taking values in a finite set A = {1, 2, · · · , M }. Denote by Q = (q ij ) M ×M the generator of the Markov chain, where −q ii = q i > 0 for i ∈ A. We assume that α(·) is RCLL and independent of the Brownian motion. Let (Ω, F, F, P) be a filtered probability space on which W (·) and α(·) are defined, where the filtration F ≡ {F t } 0≤t≤T is the augmentation under P of F W,α If ξ k < t k , then it is the first time the Markov chain jumps in (t k−1 , t k ). If ξ k = t k , then there is no jump occurs in (t k−1 , t k ] or there is a jump exactly at t k . Before we proceed, let us give a remark on preference-(t, i) in the performance functional (2). Preference-t is deterministic and transient, i.e., the decision-maker only hold preference-t at time t. However, this is not the case for preference-i, as the decision-maker changes preference-i randomly and does keep preference-i for a while. To capture the randomness of preference-i, we shall consider the following multi-person game: given α(t k−1 ) = i, Player (k), who holds preference-(t k−1 , i), controls the system from t k−1 to ξ k . If ξ k < t k and α(ξ k ) = i , then some other player with preference-(t k−1 , i ), denoted by Player (k), controls the system on [ξ k , t k ). If ξ k = t k , then Player (k) controls the system on the whole time interval [t k−1 , t k ).
3.1. Definition of time-consistent equilibrium strategy. We give the following definition of time-consistent equilibrium strategy of Problem (N ). Definition 3.1. A map Ψ : [0, T ] × R n × A → U is called a time-consistent equilibrium strategy of Problem (N ), if for any i ∈ A, Ψ(·, ·, i) is continuous, and for any (x, i) ∈ R n × A, the equation admits a unique solutionX(·) ≡X(·; 0, x, i, Ψ(·)), and there exists a family of and two families of maps Ψ Π (·, ·, ·),Ψ Π (·, ·, ·) : [0, T ] × R n × A → U parameterized by partitions Π ∈ P 0 [0, T ] such that the following properties hold: holds for any i ∈ A and (t, x) in any compact subset of [0, T ] × R n .

3.2.
Multi-person game. In this section, we introduce and solve a multi-person differential game associated with partition Π. We first introduce some notations. Let S n be the set of all (n × n) symmetric real matrices. For any (τ, t) ∈ D[0, T ], j, i ∈ A and (x, u, p, P ) ∈ R n × U × R n × S n , let Similar to [22], we define the domain of H as such that H(τ, j, t, x, i, p, P ) > −∞} .
Assumption 3.1. The map ψ(·) is well-defined and has needed regularity.
3.2.1. Player (N ). Player (N ) controls the system on [t N −1 , ξ N ). Given α(t N −1 ) = i, the state process of Player (N ) is We shall give the cost functional of Player (N ) later, and now consider the optimization problem for Player (Ñ ) who controls the system on [ξ N , t N ]. Let us first introduce the following control problem starting from some fixed initial time z ∈ [t N −1 , t N ]. Given α(z) = i , the objective is to find a strategyũ N,i (·) such that wherẽ with X N,i (·) ≡ X N,i ·; z, x, i , u N,i (·) being the unique solution to the SDE Noting that i , the second variable of g(·) and h(·), is the state of the Markov chain at the initial time z, the optimization problem given by (5-7) is time-inconsistent. However, we look for the pre-commitment optimal strategy (i.e., i in the second variables of g(·) and h(·) is fixed as a parameter).
If the system of PDEs admits a unique classical solutionṼ N,i (·, ·, ·), by the similar arguments as in [24] and [9], and using the generalized Itô's formula (see, e.g. [2]), one can verify (see, e.g. [16] and [6]) that the pre-commitment optimal strategy is given bỹ For s ∈ [z, t N ], τ ∈ [0, s], i, j ∈ A and x ∈ R n , define the function: which can be regarded as the cost of the player with preference-(τ, i) on [s, t N ] for s ∈ [z, t N ]. Similar to [22,Section 4.1], by using the techniques of FBSDE, the functionΘ N,i (τ, i, ·, ·, ·) is given by the classical solution (if it exists) to where we have suppressed (s, x, j ) forṼ N,i x (·) andṼ N,i xx (·). Given F ξ N , the optimization problem for Player (Ñ ) is given by (5-7) with (z, x, i ) replaced by ξ N , X N,i (ξ N ), α(ξ N ) , and she/he seeks the pre-commitment optimal strategy. Therefore, conditioned on F ξ N , the cost of Player Thus, the cost of Player (N ) Thus, we have the following optimization problem for Player (N ).
Remark 3.1. Although Player (N ) controls the system only on [t N −1 , ξ N ), after taking the expectation with respect to ξ N , his/her optimization problem is equiv- , and the state process is a standard diffusion process without regime-switching.
Noting that Problem (C N ) is a standard optimal control problem, if the PDE admits a unique classical solution, then the optimal strategy of Player (N ) is is the optimal state process of Player (N ). Given the optimal pair X N,i (·),ū N,i (·) of Player (N ), we can define the function Similarly, this function is the cost on [t, t N ] of the player with preference-(τ, j). It can be given by the classical solution (if it exists) to the following PDE: Player ( N − 1) controls the system on [ξ N −1 , t N −1 ). Similarly, to state the optimization problem for Player ( N − 1), we consider the following control problem starting from some fixed initial time z admits a unique classical solutionṼ N −1,i (·, ·, ·), then the pre-commitment optimal strategy is given bỹ Given the optimal pair (X N −1,i (·),ũ N −1,i (·)), we can define the cost of the player with preference-(τ, i): , and she/he seeks the precommitment optimal strategy. Therefore, conditioned on F ξ N −1 , the cost of Player Thus, we have the following optimization problem for Player (N − 1).
admits a unique classical solution, then the optimal strategy of Player (N − 1) is given bȳ Given the optimal pair X N −1,i (·),ū N −1,i (·) of Player (N − 1), we define the function
Given α(t k−1 ) = i, the state process of Player (k) is Player (k) controls the system on [ξ k , t k ). We consider an optimization problem starting from some fixed z ∈ [t k−1 , t k ]. Given α(z) = i , the objective is to find a strategyũ k,i (·) such that wherẽ If the system of PDEs admits a unique classical solution, then the pre-commitment strategy for the optimization problem (17)(18)(19) is given bỹ Given F ξ k , the optimization problem for Player ( k) is given by (17)(18)(19) with (z, x, i ) replaced by ξ k , X k,i (ξ k ), α(ξ k ) , and she/he seeks the pre-commitment optimal strategy. Therefore, conditioned on F ξ k , the cost of Thus, we have the following optimization problem for Player (k).
and X k,i (·) is given by (16) with ξ k replaced by t k .
If the PDE has a unique classical solution, then the optimal strategy of Player (k) is given bȳ It is given by the classical solution (if it exists) to the PDE 3.3. Equilibrium HJB equation. This subsection is devoted to find the equation that can be used to characterize the time-consistent equilibrium strategy and equilibrium value function in the sense of Definition 3.1. Define It is easy to check or in the integral form: By the definition of Θ k (τ, j, t, x, i), k = 2, · · · , N , we have for any (t, x) ∈ [t k−1 , t k ) × R n and i ∈ A, Θ k (t k−1 , i, t, x, i) = V k,i (t, x), and thus Therefore, for (τ, t) ∈ D[0, T ], j, i ∈ A, and x ∈ R n , where Thus, for t ∈ [t k−1 , t k ),Θ Π,i (τ, j, t, x, i) and Θ Π (τ, j, t, x, i) have the same terminal values. Therefore, if all the coefficients are bounded and the equation is uniformly elliptic, then uniformly for (τ, t) ∈ D[0, T ], i, j ∈ A, x ∈ R n . Assume that there exists a constant K > 0 such that By backward induction, it is easy to see that Θ Π (τ, j, t, x, i), Θ Π x (τ, j, t, x, i) and Θ Π xx (τ, j, t, x, i) are continuous with respective to τ . Now, assume that there exists a function Θ(·, ·, ·, ·, ·) such that uniformly any i, j ∈ A, (τ, t) ∈ D[0, T ] and x ∈ R n . Letting Π → 0 in (26), we get Θ(τ, j, t, x, i) which is the integral form of the following differential equation: (32) Similar to [22], we call (32) the equilibrium HJB equation. In the following, we show that if (32) has a unique classical solution, then we can obtain a time-consistent equilibrium strategy and the corresponding equilibrium value function.

4.
Well-posedness of the equilibrium HJB equation. In this section, we discuss the well-posedness of the equilibrium HJB equation (32). Similar to [22], we assume that the control does not enter the diffusion of the state equation, i.e., In this case the equilibrium HJB equation becomes x, i, p)) , h ji (τ, x) = h(τ, j, x, i), g ji (τ, t, x, p) = g (τ, j, t, x, i, ψ (t, i, t, x, i, p)) , To study (44), we shall need the following notations. Let C 0 (R n ) be the space of all continuous functions ϕ : R n → R such that and C α (R n ), 0 < α < 1, be the space of all continuous functions ϕ : Furthermore, let C 1+α (R n ) and C 2+α (R n ) be the spaces of all functions ϕ : R n → R such that Let C (D[t k−1 , t k ]; C α (R n )) be the set of all continuous functions f : Similarly, we define C ([t k−1 , t k ]; C m+α (R n )) and C (D[t k−1 , t k ]; C m+α (R n )), respectively, for m = 1, 2. Furthermore, let C α (R n ) be the space of all matrix functions ϕ(·) ≡ (ϕ ji (·)) M ×M such that ϕ ji (·) ∈ C α (R n ) for all i, j ∈ A. The norm on C α (R n ) is defined as Similarly, for m = 0, 1, 2, we define the spaces C m+α (R n ), C ([t k−1 , t k ]; C m+α (R n )) and C (D[t k−1 , t k ]; C m+α (R n )). The norms on these spaces, denoted by · S with S being the corresponding spaces, are defined similar to (45). We make the following hypotheses for the equation (44).
for all (τ, t, x, p) ∈ D[0, T ] × R n × R n and i, j ∈ A. Furthermore, Σ i x (t, x) −1 exists for all (t, x) ∈ [0, T ] × R n and i ∈ A, and there exist constants λ 0 , λ 1 > 0 such that We first give a prior estimate of a solution to (44).
Proof. Give a partition Π = t k 0 ≤ k ≤ N of [0, T ] with the mesh size Π sufficiently small. We prove the result by backward induction: Step 1: Show the existence and uniqueness of Θ(τ, t, x) on D[t N −1 , t N ] × R n ; Step 2: Show the existence and uniqueness of Θ(τ, t, Step 3: For k = 1, · · · N − 1, show the existence and uniqueness of Θ(τ, t, x) for (τ, t, x) ∈ D[t k−1 , t k ] × R n , and then for ( The proof of this step is similar to Steps 1 and 2. Let us consider Step 2 first. Since we have got Θ(t, t, x) for (t, x) ∈ [t N −1 , t N ] × R n in Step 1, (44) becomes a standard linear parabolic system (parameterized by (τ, j)) which admits a unique solution under Assumption 4.1, see, e.g. [10,Chapter 9].