An optimal control model of carbon reduction and trading

In this study, a stochastic control model is established for a country to formulate a 
carbon abatement policy to minimize the total carbon reduction 
costs. Under Merton's consumption framework, by considering 
carbon trading, carbon abatement and penalties in a synthetic manner, the model is converted into a two-dimensional Hamilton--Jacobi--Bellman 
equation. We rigorously prove the existence and uniqueness of its viscosity solution. 
We also present the numerical results and 
discuss the properties of the optimal carbon reduction policy and the 
minimum total costs.

1. Introduction. The greenhouse effect and climate change have been a major concern in recent years. Increases in the concentrations of greenhouse gases in the atmosphere due to human activities are considered to be largely responsible for climate change and global warming. Due to increased awareness of the severity of climate change, the Kyoto protocol was signed in 1997 and mandatory emission limits for greenhouse gases, primarily carbon dioxide (CO 2 ), were assigned to the signatory nations ( [12]). To implement the Kyoto Protocol in European Union, the European Climate Change Program was launched by the European Commission in June 2000. Subsequently, in January 2005, the European Union Emission Trading Scheme (EU ETS) ( [5]) began operating. The EU ETS was the first and is still by far the largest international system for trading greenhouse gas emission allowances, where it covers around 45% of the total greenhouse gas emissions from the 28 EU countries.
The EU ETS employs a cap and trade system, which means that the total amount of permissible greenhouse gas emissions by a country or factory are predetermined for a specific period. If the actual greenhouse gas emissions by signatory parties reach their limit, i.e., the emission cap, then they need to purchase the European Union Allowance from other countries or pay penalties for excess emissions. In addition, they can sell their spare allowances in the market to make profits ( [5,3]). Furthermore, in order to strengthen cooperation among countries, the signatory parties are allowed to buy international credits from emission-saving projects performed under the Kyoto Protocol's Clean Development Mechanism and Joint Implementation instruments around the world.
Many studies have investigated different aspects of carbon reduction and trading in recent years. Daskalakis et al. ([8]) studied the three main markets for emission allowances within the EU ETS, where they developed a framework for the pricing and hedging of futures as well as options on futures. Carmona et al. ([2]) showed that the economic mechanism of carbon allowance price formation can be formulated in the framework of competitive stochastic equilibrium models and they identified the main allowance price drivers. Jan Seifert et al. ( [16]) proposed a tractable stochastic equilibrium model that reflects stylized features of the EU ETS and they analyzed the resulting CO 2 spot price dynamics. Wang et al. ([18]) analyzed the emission reduction pathways of enterprises in practical processes based on China's emission abatement target, where they developed a framework to derive the magnitude of investment required in each pathway. Zagheni and Billari ( [21]) provided a stochastic differential equations model to evaluate the costs incurred for a country to reduce emissions to satisfy the cap from the perspective of options pricing ( [1,13]). Yang and Liang ([19,20]) proposed an optimal control model, where they used a stochastic process to describe the carbon emissions of a country and they set the objective of minimizing the total costs, including the costs of reducing emissions and the penalties for exceeding the cap according to the EU ETS. They obtained the corresponding Hamilton-Jacobi-Bellman (HJB) equations, as well as proving the existence and uniqueness of the classical solutions. In our previous study ( [11]), we built a stochastic optimal control model of carbon reduction by considering carbon trading and we obtained a semi-closed solution of the corresponding HJB equation in certain conditions. However, some important factors were not considered in this preliminary model, such as the penalty mechanism and the limited carbon reduction capability of a country.
In the present study, to describe the carbon emissions process using a modified environmental impact model, we build a stochastic control model for analyzing the carbon reduction policy of a country in a more general setting. In this model, we consider carbon abatement, trading, and penalties, as well as the limited capacity for carbon abatement in a synthetic manner. Our goal is to find the carbon abatement policy that minimizes the total carbon reduction costs. Under Merton's classical consumption framework ( [14,15]), the model is converted into a semilinear degenerate two-dimensional parabolic HJB problem. In general, the value function is not sufficiently smooth to satisfy a HJB equation in the classical sense. Therefore, it is natural to search for a weak solution such that the value function is unique although it is not smooth. A weak solution called the viscosity solution was introduced by Crandall and Lions ( [6,7]). In the present study, we rigorously prove that the value function is a unique viscosity solution of the HJB equation using the dynamic programming principle, Itô's Lemma, and the theory of viscosity solutions. Finally, we discuss the properties of the optimal policy and the minimum total costs based on numerical calculations.
The remainder of this paper is organized as follows. In Section 2, under some assumptions, we establish an optimal control model to minimize the total costs while considering carbon emissions abatement, trading, and penalties, as well as a limited capacity for carbon abatement. According to the dynamic programming principle, we reduce this model to the corresponding HJB equation problem. In Section 3, we present the continuity and quadratic growth properties of the value function. In Section 4, we prove the existence and uniqueness of the viscosity solution of the HJB equation problem. In Section 5, we present the numerical results and their analysis. We also discuss the properties of the optimal policy and the minimum total costs. We give our conclusions in Section 6.
2. Model formulation. Let (Ω, F, {F t } t≥0 , P) be a filtered probability space that satisfies the usual conditions. To describe the environmental impact of a country, Commoner ([4]) devised the IPAT equation, which states that the environment impact (I) is the product of population (P), affluence (A), and technology (T). In 1994, Dietz and Rosa ( [9]) reformulated the IPAT equation as "Stochastic Impacts by Regression on Population, Affluence and Technology" (STIRPAT) to make the model suitable for estimating parameters and hypothesis testing. Zagheni and Billari ( [21]) provided a stochastic representation of the IPAT equation and obtained a stochastic differential equations model. In this framework, Yang and Liang ( [19]) assumed that the population (P) of a country satisfies a logistic model ( [17]) and they obtained a modified environmental impact model. This model is used in the present study.
Let I t denote the carbon emissions by a country in one year. Suppose that the initial amount of emissions I 0 > 0. In the modified environmental impact model ( [19,20]), the carbon emissions from an area are determined by its economy and the population. The economy is represented by the GDP and the population is described by a logistic model. The process I t can be described as where a 1 and a 2 are constant parameters, which represent the effect of the GDP and the population of a country on the carbon emissions, respectively, µ 1 is the growth rate, and σ 1 is the volatility of the GDP. f (t) =ρ (Pm−P0) P0eρ t +Pm−P0 is the growth rate of the population at time t > 0. P m is the carrying capacity of the population of the country,ρ is the intrinsic population growth, and P 0 is the initial population size. W 1 t is the F t -adapted standard Brownian motion. q t ≥ 0 is the control policy, which is a progressively measurable (with respect to {F t } t≥0 ) process that represents the reduced growth rate of carbon emission. Furthermore, The capability of reducing carbon emissions is limited in the normal case. Thus, we assume that the control policy satisfies 0 ≤ q t ≤q andq is the upper bound, which represents the maximum capacity for carbon abatement.
In our model, a country needs to satisfy the carbon reductions target, i.e., there is an emissions cap at the given time T . If the amount of carbon emissions is below this cap, then the country can sell its spare allowance; otherwise, it needs to purchase permits for its excess emissions or accept the penalty. We assume that the price process C t for the emissions allowance in the emissions trading market satisfies geometric Brownian motion: where µ 2 is the constant drift parameter and σ 2 is the constant volatility parameter. W 2 t is F t -adapted Brownian motion and we assume that dW 1 t dW 2 t = ρdt,

HUAYING GUO AND JIN LIANG
where ρ is the correlation coefficient. At time T , if the amount of emissions by the country exceeds the emissions limitĪ, i.e., I T −Ī > 0, then the country needs to buy emissions allowances from the market at price C T , or accept the penalty in the form of a fine at price P . However, if I T −Ī < 0, this means that the country has spare allowances, so the allowances can also be sold at price C T for a profit. Thus, the payoff at the terminal time should be rewritten aŝ In this model, the goal of the decision maker is to minimize the total costs spent on carbon reduction activities. Thus, we consider the corresponding value function: where V (I, c, t) comprises all of the costs for reducing carbon emissions spent by the country, which have three parts, i.e., the costs of the carbon abatement process, the costs or profit of carbon trading activities, and penalties. g(·) is the price function for reducing the growth rate of carbon emissions, which means that g(q t ) represents the costs incurred by the country on the reductions project during the time interval [t, t + dt]. Thus, T t g(q s )ds is the total cost of carbon abatement processes in the country from t to terminal time T . β is the discount factor. By the dynamic programming principle ([10]), we obtain the corresponding HJB equation as The terminal condition is For the abatement cost function g(·), the statistical analysis in [20] shows that the marginal abatement cost increases with the abatement amount. g(·) is assumed to be an increasing quadratic function in the present study, i.e.
where m 1 is a positive constant. The same assumption was used in ( [16]). From (6) and by taking the boundedness of the control process into consideration, the optimal control policy q * can be written as: 3. Properties of the value function. Now, we focus on the value function (3) and we show some of its properties in preparation for the subsequent analysis.
Proof. For a fixed policy 0 ≤ q ≤q and (c, t) ∈ R + × [0, T ], we denote I t,I s as the emissions process with I t = I, s ≥ t. If we suppose that 0 < I 1 ≤ I 2 and define T . In addition, because the term in the expectation of the value function (3) increases with I T , then we obtain Since q is arbitrary, then V (I 2 , c, t) ≥ V (I 1 , c, t), which means that (i) holds.
For (ii), we first prove that V (I, c, t) is continuous with respect to I, uniformly in t. Fix c, t ∈ R + × [0, T ], ∀I 2 > I 1 > 0, then by the definition of the value function Therefore, For any 0 ≤ q * ≤q, a constant M 1 > 0 exists such that Thus, we have Similarly, using the techniques described above, we can prove that V (I, c, t) is continuous with respect to c, uniformly in t. Next, we prove the continuity property of V (I, c, t) in t. Fix I, c ∈ R + × R + , ∀t 2 ≥ t 1 ≥ 0, and 0 ≤ q * 1 , q * 2 ≤q exist such that

HUAYING GUO AND JIN LIANG
Similar to (7), it can be verified that where C 1,T , C 2,T are the processes in (2) at T with initial values C t1 = c, C t2 = c. I 1,T , I 2,T are the processes in (1) at T with initial values I t1 = I, I t2 = I. Then, Furthermore, where M 2 > 0, M 3 > 0 are some constants, and thus it is not difficult to obtain E |S 1 | I 2,t2 = I, C 2,t2 = c, I 1,t1 = I, where M 4 > 0, M 5 > 0 are some constants. Thus, from (10), (11), and (12), the constants M 6 > 0, M 7 > 0 exist such that Substituting (13) into (9) yields which means that V (I, c, t) is continuous with respect to t. Hence, according to the results given above, we obtain the continuity property of the value function V (I, c, t). Proof. From the definition (3) of the value function V (I, c, t), we can obtain Thus, the constants M 1 > 0, M > 0 exist such that In general, we do not know the continuous differentiability of the value function, so in the next section, we use the concept of the viscosity solution and we prove that the value function is the unique viscosity solution to the HJB equation (4). 4. Existence and uniqueness of the viscosity solution. For our problem, the viscosity solution ( [10]) is defined as follows. Denote Assume that u :Q T → R is locally bounded. Then, we define 1. The lower-semicontinuous function u(x, y, t) is a viscosity supersolution of equation (4) inQ T , if for all ϕ ∈ C 2,2,1 (Q T ), such that u − ϕ has a local minimum at (x,ȳ,t) ∈ Q T and u(x,ȳ,t) = ϕ(x,ȳ,t); then, ∂ϕ ∂t (x,ȳ,t) + F (D 2 ϕ(x,ȳ,t), Dϕ(x,ȳ,t), ϕ(x,ȳ,t),x,ȳ,t) ≤ 0.
Next, according to the method in [10], we obtain the following result, which implies the existence of a viscosity solution to problem (4) (5). Proof. We have proved the continuity of the value function V (I, c, t) and the terminal condition is obviously satisfied by V (I, c, t). Next, we show that V (I, c, t) is a viscosity subsolution of equation (4) inQ T . Suppose that a test function ϕ ∈ C 2,2,1 (Q T ), such that V − ϕ has a local maximum at (I 0 , c 0 , t 0 ) ∈ Q T and V (I 0 , c 0 , t 0 ) = ϕ(I 0 , c 0 , t 0 ), which means that V (I, c, t) ≤ ϕ(I, c, t) in a neighborhood O(I 0 , c 0 , t 0 ) ∈ Q T . Given h > t 0 and an arbitrary constant control q t ≡ q, then according to the dynamic programming principle, we have By applying Itô formula, we can obtain By substituting ϕ(I h , C h , h) with the equation above and dividing both sides by h, and finally letting h → t 0 , then since q is arbitrary, we obtain ∂ϕ ∂t (I 0 , c 0 , t 0 ) + F (D 2 ϕ(I 0 , c 0 , t 0 ), Dϕ(I 0 , c 0 , t 0 ), ϕ(I 0 , c 0 , t 0 ), I 0 , c 0 , t 0 ) ≥ 0.
Then, we show that V (I, c, t) is a viscosity supersolution of equation (4) inQ T . Similarly, we take a test function ϕ ∈ C 2,2,1 (Q T ) such that V − ϕ has a local minimum at (I 0 , c 0 , t 0 ) ∈ Q T and V (I 0 , c 0 , t 0 ) = ϕ(I 0 , c 0 , t 0 ), which means that V (I, c, t) ≥ ϕ(I, c, t) in a neighborhood O(I 0 , c 0 , t 0 ) ∈ Q T . By the dynamic programming principle, for every m ∈ N + , a control process 0 ≤ q m ≤q exists such that where t m = t 0 + 1 m and I m , C m are the solutions of (1) and (2) with control q m . By Itô's formula, e −β(s−t0) [g(q m ) + ∂ϕ ∂t Then, we have Thus, V (I, c, t) is a viscosity supersolution of equation (4) inQ T . This completes the proof.
Next, we prove the comparison principle, which enables us to verify the uniqueness of the viscosity solution. By taking x = ln I, y = ln c, we can derive the following from equation (4) (5).
Finally, according to the comparison principle, it is easy to show the following.   Figure 1 shows the relationship between the optimal policy, initial amount of emissions I 0 , and the initial price of the emissions allowance. The parameter I 0 ∈ [1.6, 2.3]. Recall that I 0 ≥Ī is the initial amount of emissions that exceeds the cap. First, we can see that the optimal policy curve increases with the price of the emissions allowance in the market, which means that a country should increase its energy-saving emissions reduction efforts if the initial price of the allowance is high. In fact, a higher initial price means that the country may face a higher price at the terminal time T , and thus they should pay a higher cost at T if the amount of emissions exceeds the limit, so the best policy is to vigorously reduce the carbon emissions before the checking time T . Second, under conditions where the other parameters are the same, a higher initial amount of emissions I 0 means that the country needs to implement more energy saving measures and make greater efforts to reduce carbon emissions at the beginning. Figure 2 shows the optimal policy with different m 1 , which indicates that the optimal policy is higher if m 1 is lower, i.e., the country should increase its carbon reduction efforts if the cost of cutting carbon is lower. Figure 3 shows the optimal policy with differentq, which represents the capability of reducing carbon emissions. A country with a higher capacity for reduction is more capable of adjusting the reduction policy when the price changes. Thus, a country with a lower reduction capability should increase its carbon reduction efforts to respond to possible increases in the price of the allowance in the future. Figure 4 shows the optimal policy with different values of the penalty standard P . The penalty standard P actually provides us with the highest price for purchasing the allowance at T . A higher value of P means that a country is likely to pay more       Figure 5 shows the minimum total costs with different values of the initial emission amount I 0 and the initial price of the emissions allowance. Under the same conditions, completing more reduction tasks will cost more. Thus, we can only expect more countries to participate in carbon reduction activities if we propose a fair carbon emissions obligation. Figure 6 shows the minimum total costs with different values of m 1 . Similar to the conclusions given in Lemma 3.1, the results in Figure 6 imply that the cost will be higher for a country if it has a higher cost of cutting emissions under the same conditions. From a long-term perspective, developing and introducing advanced emissions reduction technologies would be beneficial for reducing the total emissions reduction costs. Figure 7 shows the minimum total costs with different values ofq. A country with a higher capacity for emissions reduction will incur lower costs to satisfy the same target. In general, developing countries have a greater capacity for reducing carbon    Figure 8 shows the minimum total costs with different values of the penalty standard P , which indicates that a higher penalty standard increase the costs of reduction activities for a country. 6. Conclusions. In this study, we built a stochastic optimal control model to study the optimal carbon reduction and trading policy for a country. By considering carbon abatement, trading, and penalties in a synthetic manner, we established a model under Merton's consumption framework and we obtained the corresponding HJB equation with the dynamic programming principle. We also proved the existence and uniqueness of the viscosity solution of the equation. Based on numerical calculations, we discussed the properties of the optimal carbon reduction policy and the total minimal costs.
Our results demonstrate that a policy maker should formulate carbon reduction decisions by considering the initial amount of emissions, the price of the allowance, and the penalty level, as well as the capability and efficiency of reducing carbon emissions. According to the results obtained using our model, policy makers can adjust the carbon reduction policy to minimize the total carbon reduction costs.