A non-exponential discounting time-inconsistent stochastic optimal control problem for jump-diffusion

In this paper, we study general time-inconsistent stochastic control models which are driven by a stochastic differential equation with random jumps. Specifically, the time-inconsistency arises from the presence of a non-exponential discount function in the objective functional. We consider equilibrium, instead of optimal, solution within the class of open-loop controls. We prove an equivalence relationship between our time-inconsistent problem and a time-consistent problem such that the equilibrium controls for the time-consistent problem coincide with the equilibrium controls for the time-inconsistent problem. We establish two general results which characterize the open-loop equilibrium controls. As special cases, a generalized Merton's portfolio problem and a linear-quadratic problem are discussed.

1. Introduction. We consider in this paper stochastic control problems when the system under consideration is governed by a SDE of the following type        dX (s) = b (s, X (s) , u (s)) ds + σ (s, X (s) , u (s)) dW (s) + Z c (s, X (s−) , u (s−) , z)Ñ (ds, dz) , and for any fixed initial pair (t, x), the objective is to maximize the expected utility functional J (t, x; u (·)) = E t,x T t ν (t, s) f (s, u (s)) ds + ν (t, T ) h (X (T )) , over the set of the admissible controls. In the above model b, σ, c, f and h are deterministic functions. Especially, ν (t, s) f (s, u (s)) is the discounted local utility and ν (t, T ) h (X (T )) is the terminal utility, where ν (·, ·) represents the discount function. The common assumption in most of the existing literature is that the discount rate of time preference is constant over time, leading to the exponential form of the discount function: ν (t, s) = e −δ(s−t) and ν (t, T ) = e −δ(T −t) ,

ISHAK ALIA
where δ > 0 is some constant which represents the discount rate. There is a very good reason for this assumption: It is easy to see that with the above form of the discount function, the optimal control problem (1)-(2) is time-consistent in the sense that Bellman's optimality principle is satisfied. Therefore, the dynamic programming approach can be easily applied and one may derive a closed-loop representation of the optimal control via a Hamilton-Jacobi-Bellman (HJB) equation. However, results from experimental studies contradict the above assumption (see, for example, [1] or [22]), indicating that the discount rate changes over time and the discount rates for the near future are much lower than the discount rates for the time further away in future. Specially, Ainslie [1] performed empirical studies on human and animal behavior and found that discount functions are almost hyperbolic; that is, they decrease like a negative power of time rather than an exponential. On the other hand, Strotz [31] showed that as soon as the discounting is nonexponential, the discounted utility models become time-inconsistent in the sense that they do not admit a Bellman's optimality principle. Therefore, an optimal control might not remain optimal as time goes. However, since time-consistency is important for a rational decision maker, some researchers began to find timeconsistent strategies for non-exponential discounting optimal control problems. The main approach is to formulate the time-inconsistent problem in a game theoretic framework and look for the Nash equilibrium solutions. For a detailed introduction see e.g. [31], [29], [28], [16], [4], [20], [13], [14], [15], [6], [8], [9], [23], [34], [36], [35], [37], [39], [12], [33] and references therein. Specially, Ekeland and Lazrak [13] and Ekeland and Pirvu [14] investigated the optimal consumption-investment problem under hyperbolic discounting for deterministic and stochastic models. Among their very important contributions, they provided a precise definition of the feedback equilibrium control in continuous-time, using a spike variation formulation. In addition, they derived an extended HJB equation (that is an extension of the standard Hamilton-Jacobi-Bellman equation displaying a non local term) along with a verification theorem that characterizes feedback equilibriums. An extended HJB equation was derived in Marín-Solano and Navas [23] which investigated a consumption-investment problem with non-constant discount rate for both naive and sophisticated agents. Björk and Murgoci [6] generalized the extended HJB equations method to a quite general class of time-inconsistent stochastic control problems. In addition, they proved that for every time-inconsistent problem, there exists an associated time-consistent problem such that the optimal control and the optimal value function for the consistent problem coincide with the equilibrium control and value function, respectively, for the inconsistent problem. Yong [34] provided an alternative approach by discretization of time for the game in his deterministic time-inconsistent linear quadratic (LQ) model and he constructed an equilibrium solution via some class of coupled Riccati-Voltera equations. Yong [36] still by discretization of time for the game, investigated a class of general discounting time-inconsistent stochastic optimal control problems and he derived the so-called equilibrium HJB equation along with a verification theorem that characterizes closed-loop equilibrium controls. Following Yong's approach, Zhao et al. [40] studied the consumption-investment problem under a general discount function and a logarithmic utility function. Hu et al. ([17], [18]) dealt with another kind of timeinconsistent stochastic LQ control problems. In their model the time-inconsistency arises from the presence of a quadratic term of the expected state as well as a state-dependent term in the objective functional. Among their achievements, they suggested the concept of the Nash equilibrium control within the class of open-loop controls. Using a duality method, they characterized the open-loop equilibrium control via a stochastic system that includes a flow of forward-backward stochastic differential equations (FBSDEs), whose solvability remains a challenging open problem except for some special cases. The work Djehiche and Huang [11] extended [17] by characterizing equilibrium controls via a Pontryagin's type stochastic maximum principle. More recently, Hu et al. [19] extended the work [18] by incorporating control constraints. Finally, in a series of papers, Basak and Chabakauri [5], Czichowsky [10], Böjrk et al. [7] looked at the mean variance problem which is also time inconsistent. This paper studies time-consistent solutions to the general discounting timeinconsistent stochastic optimal control problem (1)- (2). Specifically, we adopt a game theoretic approach to handle the time-inconsistency and we aim to characterize the open-loop Nash equilibrium controls. Different from most of the existing literature, in order to characterize the equilibrium controls, we begin by establishing a relationship between our time-inconsistent optimal control problem and a time-consistent problem, such that the equilibrium controls for the consistent problem coincide with the equilibrium controls for the time-inconsistent problem. As a consequence of this, any optimal control of the time-consistent problem coincides with an equilibrium control of the time-inconsistent problem. By using the standard approaches in classical stochastic control theory (i.e. the stochastic maximum principle (SMP) and the dynamic programming (DP)), we establish two general results which characterize the equilibrium controls: (i) The first one is a verification theorem associated to a standard HJB equation. (ii) The second result provides a complete characterization of the equilibrium controls via a necessary and sufficient condition under the form of a stochastic maximum principle. Finally, to illustrate our results, we discuss two concrete examples: (1) In the first example, we consider a consumption-investment problem with general discounting. We apply the verification theorem (Theorem 4.1) to derive the equilibrium consumption and investment strategies in state feedback forms. (2) In the second example, we discuss a general discounting LQ model. We apply the stochastic maximum principle in Theorem 4.2 to derive the Nash equilibrium solution in a linear feedback form via a standard Riccati equation.
The rest of the paper is organized as follows. In the second section, we formulate our problem and give necessary notations and preliminaries. In Section 3, we formulate the objective and present the main result of this work. In Section 4, we present some general results on equilibriums. Finally, in Section 5, we discuss two special cases.
2. Formulation of the problem. Throughout this paper (Ω, F, (F t ) t∈[0,T ] , P) is a filtered probability space such that F 0 contains all P-null sets, F T = F for an arbitrarily fixed finite time horizon T > 0, and (F t ) t∈[0,T ] satisfies the usual conditions. We assume that (F t ) t∈[0,T ] is generated by a one-dimensional standard Brownian motion (W (t)) t∈[0,T ] and an independent Poisson measure N on [0, T ]×Z where Z ⊆ R−{0}. We assume that the compensator of N has the form µ (dt, dz) = θ (dz) dt for some positive and σ−finite Lévy measure on Z, endowed with its Borel σ−field B (Z). We suppose that Z 1 ∧ |z| 2 θ (dz) < ∞ and writeÑ (dt, dz) = N (dt, dz) − θ (dz) dt for the compensated jump martingale random measure of N . Obviously, we have 544 ISHAK ALIA where N denotes the totality of θ-null sets, and σ 1 ∨σ 2 denotes the σ-field generated by σ 1 ∪ σ 2 . In addition, we use the following notations: 1. For a function f , we denote by f x (resp. f xx ) the gradient or Jacobian (resp. the Hessian) of f with respect to the variable x.
3. S n : the set of (n × n) symmetric matrices. 4. S n − : the subset of all negative definite matrices of S n . 5.
We consider on the time interval [0, T ] the following controlled stochastic differential equation with random jumps (SDEJ), where u : [0, T ] × Ω → U represents the control process, X x0,u(·) (·) is the controlled state process and x 0 ∈ R n is regarded as the initial state.
As time evolves, we need to consider the following controlled stochastic differential equation starting from the situation (t, T ] , According to the above game perspective, the concept of the Nash equilibrium controlû (·) can be intuitively described as follows: (i)û (·) ∈ U [0, T ].
(ii) Suppose that every player s, for s > t, will use the strategyû (s). Then the optimal choice for player t is that he/she also uses the strategyû (t). However, the problem with this "definition" is that the individual player t does not really influence the outcome of the game at all. He/she only chooses the control at the single point t, and since this is a time set of Lebesgue measure zero, the control dynamics as well as the functional J t, X x0,u(·) (t) ; u (·) will not be influenced. Therefore, to characterize the Nash equilibrium control, we need to a more practical definition. In this paper, we follow Hu et al. [17] who provided the following definition of the so-called open-loop Nash equilibrium control.
In the rest of this paper, sometimes we simply callû (·) an equilibrium control instead of open-loop Nash equilibrium control when there is no ambiguity.
3.1. An equivalent time-consistent problem. In this subsection, we provide a surprising link between the time-inconsistent Problem (N) and a time-consistent control problem. Specifically, by using simple arguments, we prove that any admissible controlû (·) ∈ U [0, T ] is an equilibrium control for Problem (N), if and only if,û (·) is an equilibrium control to a standard time-consistent optimal control problem. For and we introduce the following stochastic optimal control problem.
For a given (t, x) ∈ [0, T ] × R n , any admissible control satisfying (9) is called an optimal control for Problem (C) at (t, x). The functionṼ (·, ·) defined by (9) is called the value function of Problem (C).
Remark 3. Note thatJ (t, x; u (·)) is in a standard form (i.e. the local utility ν(s,s) ν(s,T ) f (s, u (s)) as well as the terminal utility h (X (T )) do not depend on t). Consequently, Problem (C) is a time-consistent stochastic control problem.
To establish the link between the time-inconsistent Problem (N) and the timeconsistent Problem (C), we need to introduce the concept of the equilibrium control for Problem (C).
The following theorem is the main result in this work; it ensures that Problem (C) and Problem (N) admit the same equilibrium controls. To prove Theorem 3.3, we need some preliminary results given in the following two lemmas.
where for any (t, Thus, since ν (t, T ) > 0, it is not difficult to see thatû (·) is an equilibrium control for Problem (N), if and only if, (10) holds.
Lemma 3.5. Let (H1)-(H2) hold. Then the following equality holds Proof. Letû (·) ∈ U [0, T ] be an admissible strategy. Consider the perturbed strategy u t,ε,v(·) (·) defined by the spike variation (6) for some fixed arbitrarily Using the Cauchy-Schwarz inequality together with the quadratic growth condition, it follows that for some constant K > 0. Now dividing both sides by ε and taking the limit when ε vanishes, we obtain that Remark 4. It seems that the Equality (11) in Lemma 3.5 does not hold if the local utility is state dependent (i.e., if we replace f (s, u (s)) by f (s, X (s) , u (s)) in the utility functional (5)).
By Lemma 3.5, we get Therefore, it follows from Lemma 3.4 thatû (·) is an equilibrium control for Problem (N).
As a consequence of Theorem 3.3, we have the following result.
Proof. Suppose thatû (·) ∈ U [0, T ] is an optimal control for Problem (C) at (0, x 0 ), then by Bellman's principle of optimality, for any t Accordingly, we havẽ Dividing both sides by ε and taking the limit when ε vanishes, we get which means thatû (·) is an equilibrium control for Problem (C). Therefore, it follows from Theorem 3.3 thatû (·) is an equilibrium control for Problem (N ).
Remark 5. Note that, the corollary above is with very practical value. The reason is of course that in order to obtain an equilibrium control to the time-inconsistent Problem (N), it suffices to find an optimal solution to the standard time-consistent Problem (C). This is quite different from [6] (Proposition 5.1 p. 41) in which in order to formulate the equivalent time-consistent problem one needs to know the equilibrium control policyû (·, ·).

Characterization of equilibriums.
In this section, we present two independent results which characterize the equilibrium controls. The first one is a verification theorem associated to a standard HJB equation and the second one is a stochastic Pontryagin's maximum principle.
4.1. Verification theorem. In this subsection, we present a stochastic verification theorem which provides a sufficient condition for equilibrium controls of Problem (N).
Define the generalized Hamiltonian function as a map from and consider the Hamilton-Jacobi-Bellman equation associated to Problem (C) (see e.g. [27]): is a classical solution to the HJB equation (12). If X x0 (·) ,û (·) is an admissible state-control pair such that Thenû (·) is an equilibrium control for Problem (N).
Remark 6. This approach is direct and the derivation of the equilibrium control is not very complicated. Moreover, the equilibrium controlû (·) admits a closed-loop representation via (13).

Stochastic maximum principle.
In this subsection, we present a necessary and sufficient condition for equilibrium controls. We derive this condition by using Theorem 3.3 and the second order Taylor's expansion in the spike variation, in the same spirit of proving the stochastic Pontryagin's maximum principle [32]. Moreover, we point out that our approach is different from the one followed by Hu et al. ([17], [18]); the stochastic maximum principle here does not involve a family of BSDEs (parameterized by initial time t) as in [17]. Throughout this subsection, an admissible control u (·) is defined as a U -valued T ] be a fixed admissible control andX x0 (·) be the state process corresponding toû (·). For some fixed arbitrary u ∈ U , we put for ϕ = b, σ: The following assumption, imposed in Tang and Li [32], will be in force throughout this subsection. , k = 1, 2 are bounded by (1 + |u| + |x|). The functions |b Define the Hamiltonian as a map from [0, T ]×R n ×U ×R n ×R n ×L 2 (Z, B (Z) , θ; R n ) into R by H (t, x, u, p, q, r (·)) = b (t, x, u) , p + σ (t, x, u) , q and let us introduce the adjoint equations involved in the stochastic maximum principle which characterize the equilibrium controls.
Thus, by setting v (t) ≡ u for an arbitrarily u ∈ U we get (17). Conversely, suppose thatû (·) is an admissible control for which the variational inequality (17) holds, then for any t ∈ [0, T ], v (·) ∈ U 0 [0, T ] and ε ∈ [0, T − t), we have Now dividing both sides by ε and taking the limit when ε vanishes, Hence, by Lemma 3.4,û (·) is an equilibrium control for Problem (N). This completes the proof.

Remark 7.
Define an H-function associated with û (·) ,X x0 (·) , p (·) , q (·) , r (·, ·) , P (·) , Γ (·, ·)) as follows, Then easy manipulations show that the variational inequality (17) is equivalent to is an optimal solution of Problem (C) thenû (·) is an equilibrium control for Problem (N). One will naturally ask whether an equilibrium control of Problem (N) is optimal for Problem (C). In this paragraph, we focus on proving that, provided some concavity assumptions are satisfied, any Nash equilibrium control for Problem (N) is indeed optimal for Problem (C). Let us first introduce two additional assumptions. (H3) The control domain U is convex. The map h is concave with respect to x and the Hamiltonian function H is concave with respect to (x, u) . (H4) The maps b, σ, c and f are continuously differentiable with respect to u.
The following result is comparable with Theorem 3.3. 5. Some applications. In this section, we discuss two special cases of timeinconsistent stochastic control problems to illustrate our results.
5.1. Generalized Merton portfolio problem. We consider a consumption and investment problem associated to a jump-diffusion market and a non-exponential discounted utility. We apply the verification argument in Theorem 4.1 to derive the equilibrium consumption and investment strategies in state feedback forms. Note that, in the absence of Poisson random jumps, this problem was considered by Yong [36] in which the equilibrium is, however, defined within the class of closed-loop controls (see [36], Definition 4.1).

Equilibrium solution.
In the next, we apply the verification argument in Theorem 4.1 to derive the equilibrium strategy. Specifically, we use the standard dynamic programming approach to compute an optimal solution for the equivalent time-consistent problem (30). This optimal solution coincides with an equilibrium strategy for the time-inconsistent consumption-investment problem (29).
Before providing the precise statement of the main result in this subsection, let us introduce the function F : [0, T ] × [0, 1] → R given by and note that where F π (·, ·) and F ππ (·, ·) denote, respectively, the first and second order derivatives of F (·, ·) with respect to π. Since β ∈ (0, 1), it is not difficult to see that the function F (t, π) is strictly concave with respect to π.
Theorem 5.1. The time-inconsistent portfolio problem in (29) has an equilibrium consumption-investment strategy that can be represented bŷ where π (t) * denotes the unique solution of F π (t, π) = 0 in (0, 1), ξ (·) is given by and the equilibrium wealth processX x0 (·) is given bŷ Proof. Assume for the time being that the conditions of Theorem 4.1 hold. The generalized Hamiltonian function G associated to the this problem is

NON-EXPONENTIAL DISCOUNTING OPTIMAL CONTROL PROBLEMS 561
Accordingly, the HJB equation takes the form Because of the terminal condition, we consider an ansatz of the form: for some deterministic function ξ (·) ∈ C 1 ([0, T ] , R) such that ξ (T ) = 1. Accordingly, the partial derivatives are Thus, by substituting V (t, x) and the above derivatives into (37), this leads to Note that the optimization problem in (38) breaks down into two independent optimization problems and its solution can be obtained in a sequential way. We start by optimizing the Equation (38) with respect to c, before proceeding to optimize with respect to the variable π.
Since β ∈ (0, 1), the quantity to be maximized in (38) is strictly concave with respect to the control variable c. Indeed, the first order condition associated with the optimization problem above provides a maximizerĉ (t, x), which is given bŷ invoking this into Equation (38) and factoring out the term x β β , we obtain that where F (t, π) is as introduced in (31). Taking into account the constraint π ∈ [0, 1] and the strict concavity of F (t, π), we conclude that the maximization problem in (39) has a unique solutionπ (t). Moreover, from the definition of the function F (t, π), and using simple arguments such as the intermediate value theorem it is possible to check that: (i) if F π (t, 1) < 0 and F π (t, 0) = β (µ (t) − r (t)) > 0, then there exists a unique interior point π (t) * ∈ (0, 1) such that F π t, π (t) * = 0 and, consequently, π (t) = π (t) * ; (ii) if F π (t, 1) ≥ 0 and F π (t, 0) = β (µ (t) − r (t)) > 0, then F (t, π) attains its maximum at the boundary point 1, and consequentlyπ (t) = 1. Now, we turn our attention to the ODE in (5.14). Replacing π byπ (t) into Equation (39), this leads to This is a Bernoulli equation. To solve it, let We find that y (·) should satisfy the following linear ODE, A variation of constant formula yields, for t ∈ [0, T ], which leads to (35). According to the above derivatives and Theorem 4.1, the equilibrium consumption-investment solution is given by (33) − (34). Moreover, the corresponding wealth process solves the SDEJ By argument of Itô's formula and a change of variable (see [27], Example 1.15, pp. 7-8), we obtain thatX (·) is given by (36) .
Remark 10. Time-consistent solution for the non-exponential discounting Merton's portfolio problem has been well explored using different methods and different concepts of Nash equilibriums. Please see [6], [23], [15] for the extended HJB equations method, [14], [2] for the duality method, and [36] for a multi person differential game approach. Although the existing literature has provided mathematically elegant results, the focus is still on financial models without jumps. As far as we know, our paper is the first to find the equilibrium consumption-investment solution under a jump-diffusion model.

Corollary 2.
In the case of the classical form of the discount function (i.e. ν (t, s) = e −δ(s−t) , for every (t, s) ∈ D [0, T ]), the equilibrium consumption and investment strategies are given byĉ Remark 11. The equilibrium consumption-investment strategy (ĉ (·) ,π (·)), presented in Corollary 2, coincides with the optimal solution of classical Merton portfolio problem under jump-diffusion model (see. e.g. [3], the case without regime switching). This confirms the well-known fact that the equilibrium strategy for exponential discount function is nothing but the optimal strategy. A relevant remark is that the investment strategyπ (t) is independent of the discount function, and it is the same for the non-exponential discount function.
The case without jumps. If the modeling framework is without jumps, then the function F (t, π) and its first and second order derivatives reduce to respectively. Moreover the equilibrium consumption-investment strategy given by (33) − (34) reduces toĉ

5.2.
Time-inconsistent stochastic linear-quadratic problem. Now, we consider the case where the state equation is linear in both the state and control, and the cost functional is quadratic in the control. Particularly, we apply the stochastic maximum principle in Theorem 4.2, in order to derive the equilibrium solution. The result in this subsection is comparable with some of the results in [34], [36] and [37]. The dynamics over [0, T ] is given by the following linear controlled SDEJ, where the functions A : The gain functional is given by where R (·) ∈ C [0, T ] ; S m − , G ∈ S n − and X (·) = X t,x,u(·) (·) solves the SDEJ The control domain is U = R m and the set of admissible controls is U 0 [t, T ] = S 8 F (t, T ; R m ).

Remark 14.
In the present case, the equivalent time-consistent optimal control problem is: for any (t, x) ∈ [0, T ] × R n , over u (·) ∈ S 8 F (t, T ; R m ).

Equilibrium solution.
In the next, we apply the stochastic maximum principle in Theorem 4.2 to derive the Nash equilibrium control. Subsequently, for brevity, we suppress the subscript (t) from A (t), B (t), C (t), D (t), E (t, z) and F (t.z), whenever no confusion arises.
In the context of this problem, the Hamiltonian function H takes the following form H (t, x, u, p, q, r (·)) = p, Ax + Bu + q, Cx + Du Accordingly, the first and second order adjoint equations associated to X x0 (·) ,û (·) are, respectively, given by and Noting that the terminal condition of Equation (45) is deterministic, it is straightforward to look at a deterministic solution. Moreover, P (·), Λ (·) and Γ (·, ·) are explicitly given by where, for each t ∈ [0, T ], Φ (t, ·) is the unique solution to the linear SDE where I n denotes the (n × n) identity matrix. Consequently, the H-function in (18) takes the form: Remark 15. From the assumption that G ≤ 0 and R (t) ≤ 0, it follows that P (t) ≤ 0 and, consequently, the function H t,X x0 (t) , · is concave with respect to u.
As a consequence of Theorem 4.2 we have the following result.
Remark 16. Note that, it is not difficult to verify that Assumptions (H1*)-(H4) are satisfied in the framework of the time-inconsistent LQ problem (41) − (42) . Therefore, it follows from Theorem 4.3 that the equilibrium control given by (48) is optimal for the equivalent time-consistent LQ problem (43) .
We turn our attention to the Riccati equation (47).

Concluding remarks
In this paper we have investigated open-loop equilibrium controls to a general discounting time-inconsistent stochastic control problem. We have shown that our time-inconsistent problem is equivalent to a standard optimal control problem in the sense that the equilibrium controls for the standard problem coincide with the equilibrium controls for the time-inconsistent problem. This link allowed us to characterize the equilibrium controls by using the standard techniques in the classical optimal control theory. The inclusion of concrete examples confirms the validity of our proposed study. We believe that, it would be very interesting to extend this approach to a fairly general class of time-inconsistent stochastic control problems. The research on this topic is in progress and will appear in our forthcoming paper.