OPTIMAL STRATEGIES FOR OPERATING ENERGY STORAGE IN AN ARBITRAGE OR SMOOTHING MARKET

. We characterize cost-minimizing operating strategies for an energy store over a given interval of time [0 ,T ]. The cost functional here can represent, for example, a traditional economic cost or a penalty for time-variation of the output from a storage-assisted wind farm or more general imbalance between supply and demand. Our analysis allows for leakage, operating ineﬃciencies and general cost functionals. In the case where the cost of a store depends only on its instantaneous power output (or input), we present an algorithm to determine the optimal strategies. A key feature is that this algorithm is localized in time, in the sense that the action of the store at a time t ∈ [0 ,T ] requires cost information over only some usually much shorter subinterval of time [ t,t k ] ⊂ [ t,T ] .

1. Introduction. Over the coming decades, world energy-markets face significant challenges as they strive to meet their climate-change targets. Renewable generation is set to play a more dominant role in the future electricity-supply markets but the availability of renewable energy is intermittent. This unreliability means that there will always be a need for a quick-reacting back-up, in order to ensure that demand is met. On the other hand, at off-peak demand times, supply may be curtailed (often at a high cost) if the rest of the system is not flexible enough to respond. On the demand side, future demand profiles are likely to look very different from today. Electric heat pumps, for example, could create a marked seasonal peaking of demand during the colder, winter months. Electric vehicle charging and new smart technologies, on the other hand, could change our daily demand patterns. In this introduction we pay particular attention to the British scene because it is the one with which we are most familiar, but the issues apply world-wide.
Electricity storage is one potential option for improving the flexibility and reliability of our electricity system. It could offer services such as peak-shaving, frequency response, reactive-power regulation and the provision of reserve. In Great Britain (GB), grid-scale electricity storage currently comes only in the form of pumpedhydro power plants, which are implemented largely to meet peak demands. Their main source of revenue is through Short-Term Operating Reserve (STOR) and Fast Reserve, which are funded by National Grid. The stores then replenish their supply during the night, when electricity prices are at their lowest [16].
Much research has been dedicated to analyzing storage viability. Some papers focus on the potential social benefits of storage (see, for example, [4,5,8,7]). A common approach here is to solve the unit dispatch problem, i.e. to select a costminimizing combination of generators (including storage) to run at each time, in order to ensure that demand is met (or that demand is met with a given probability). An advantage of this method is that it can incorporate the interactions between all assets of the energy system into a single optimization problem, allowing a comparison between the actions of storage against its competing options. A drawback, however, is that the system optimum does not necessarily coincide with each individual firm's most profitable strategy. In many cases, it is not even clear that each firm would make a profit under these solutions. Therefore, a whole-system approach of this kind is generally better suited to questions where the store is not privately owned, but instead owned and operated by a central controller, such as the system operator.
Other research focuses on the profits available to a store which faces stochastic or probabilistic prices. Some of these papers restrict the behaviour of the store to a pre-defined set of operating strategies (e.g. [9,14]). Another approach is to implement Dynamic Programming techniques (e.g. [6,15]). One drawback of this latter approach, however, is that it tends to be computationally heavy. A significant limitation is that it often requires information about prices over the entire time horizon over which the store's operation is to be optimized. Since electricity wholesale prices contain significant idiosyncratic components, this requires a heroic assumption regarding the foresight of its operator. Additionally, such methods do not give much scope for mathematical insight into the dynamics of the solutions.
In this paper, we characterize cost-minimizing strategies of an electricity store, for general cost functionals, in terms of the associated Karush-Kuhn-Tucker multipliers. Further, for a certain natural class of cost functional (see Section 1.2), we present an algorithm to determine these optimal strategies. A natural application here would be to maximize the arbitrage profit available to an electricity store which is operating in the wholesale electricity market. In doing so, we assume that the store can predict an electricity price function p : [0, T ] → R where [0, T ] is the period of time over which we wish to optimize. However, the method only actually uses price information over a smaller time interval, except in special cases or near T, thus reducing the required prediction horizon and the amount of computation needed. It is worth commenting here that given the presence of future markets at various maturities, which coalesce on the spot wholesale price as they approach maturity (see e.g. [13]), the prediction of short-term future prices in this way is not an unreasonable assumption. Another application, as discussed in Section 5.3, is in the use of a store for "smoothing". This could apply, for example, to cases where an energy store is built in conjunction with a wind farm, with the intention of smoothing the wind output from that farm. Such smoothing could be useful for reducing short-term fluctuations of supply.
Our method is derived from standard Calculus of Variations techniques and is intended as an extension to [10]. The main result is Proposition 2.1. Our work differs from [10] in several respects. Firstly, we present our model in a continuoustime setting, rather than the discrete setting employed in [10]. Even if prices (for example) are declared at discrete time intervals, in accordance with the current market system, our approach allows for the input of piecewise constant prices but with continuous variation in the operation of the store. Secondly, [10] allows only for convex cost functions, whereas here we allow for much more general operating costs and, in particular, we remove the convexity assumption. This is realistic: there is little reason to believe that the cost of running a motor, for example, is a convex function of the output power. Although these more general conditions do not guarantee the existence of solutions, we prove that optimal operating strategies of the form given in Proposition 2.1 exist if and only if the algorithm yields both a strategy q * and its associated multiplier function µ * . Finally, we allow the inclusion of slightly more general operating constraints -namely, time-varying power and capacity constraints.
The structure of the paper is as follows: Over the remainder of this section, we introduce our storage model and its associated costs. In Section 2, we present the main result, which characterizes the optimal strategies via a reference cost function (or multiplier). In Section 3, we present an algorithm which is a generalisation of that in [10]. The algorithm determines both the optimal strategy and the reference cost for the basic case where the store is constrained by only its power ratings and its capacity constraints. We prove that if an optimal strategy exists of the form of Proposition 2.1, then the algorithm will find it. Section 4 contains a discussion of the time-localization property that is inherent both in the algorithm and in the more general case of the standard storage model (see Section 1.2). In Section 5, we present some applications of the theory and the algorithm. Finally, the conclusions of the paper are summarized in Section 6.
1.1. The storage model. Let [0, T ] be the interval of time over which we want to optimize the actions of the store (with T > 0) and denote by U ⊂ R the set of admissible power outputs associated with the store. The operator of the store then chooses an operating strategy q : [0, T ] → U which allocates, at each time t, the power q(t) that is to enter the store (using the convention that if q(t) < 0, then the store discharges power −q(t) at time t). The power taken from the grid may be larger than q, to allow for inefficiency of conversion to storage. Similarly, if q is negative the power delivered to the grid may be between −q and 0. The constraint set U could, if we wish, vary with time, therefore allowing the inclusion of constraints such as planned closures or transmission congestion into the model. All results presented in this paper generalise naturally to this case but, for simplicity of notation, we will from now on refer only to a time-independent constraint set U. Throughout this paper we insist that 0 ∈ U. Without this assumption, the store would never be permitted to be at rest.
The level of stored energy at time t, associated with the strategy q, is denoted [q](t) and solves the differential equation where α ∈ [0, 1) is the leakage rate of the store and 0 is the initial level of stored energy. The level of stored energy thus evolves as

LISA C FLATLEY, ROBERT S MACKAY AND MICHAEL WATERSON
The capacity constraints of the store are characterized by two functions E + , E − : [0, T ] → R, so that any strategy q is constrained by the inequalities We assume that the initial and terminal levels of the store, denoted 0 and T respectively, are pre-specified, so that . We also assume, without loss of generality, that {Ė − (t)+αE − (t),Ė + (t)+αE + (t)} ⊂ U for all t ∈ [0, T ], since otherwise E could be adjusted so that this is true, without changing the set of admissible strategies (cf. Definition 1.1).
The set of possible levels of the store, at each time, is represented by the ad- . Often, one may choose to replace the capacity constraints E + and E − with constants (except at the initial and final times), so that E + > 0 is the physical size of the store and E − ≥ 0 is the minimum technically feasible level of the store. Choosing to represent E in this more general form allows the store, for example, to participate in multiple markets (such as forward or futures markets).
The total cost associated with a strategy q is denoted C[q] ∈ R. One natural interpretation is as a variable cost in the usual economic sense, which may incorporate factors such as running costs, warming-up costs and costs of conversion to and from stored form, as well as the cost of purchasing power and the payments received for providing power. Later, in Section 5.3, we will consider the use of a store for smoothing the output of a wind farm and, in this case, C will be the total variation of net wind farm output. The aim of this paper is to identify those operating strategies q which minimize the total cost C[q], whilst adhering to all of the physical constraints outlined above. Definition 1.1 (Admissible and optimal strategies). We say that q : [0, T ] → U is an admissible strategy if q is piecewise continuous and satisfies the capacity and the terminal conditions [q](0) = 0 and [q](T ) = T . We denote by X the set of all admissible strategies and say that q * ∈ X is an optimal strategy if 1.2. The "standard storage model". Our standard storage model, which will be investigated in detail in Section 3 and in the examples of Sections 5.2 and 5.3, is a store whose cost may be written in the general form for some cost rate function L : In a price-arbitrage model, for example, L(t, q(t)) may be the cost per unit time of buying enough electrical power from the grid at time t in order to input power q(t) into the store at that time (using the convention that if q(t) < 0, then this corresponds to outputting power −q(t) from the store and selling). If the store is profitable to run, then the minimal cost should, of course, be negative. There may well be costs that are fixed or sunk and so do not vary with how the plant is operating (e.g. paying employees, rent and utilities for the building, interest on loans etc.) but since these do not change the optimisation problem, we have subtracted them off, without loss of generality. For technical reasons we will require that, for each t ∈ [0, T ], the map λ → L(t, λ) is piecewise differentiable and that the associated partial derivative ∂L(t, λ)/∂λ has a continuous inverse at almost every λ ∈ U. The piecewise differentiability assumption allows for jumps in the cost which may relate, for example, to the cost of switching on an additional motor in order to provide a higher power output.
2. Characterization of optimal strategies. The following proposition provides a characterization of optimal strategies in terms of a multiplier function µ * . Owing to its relation to the arbitrage model outline in Section 1.2, we will often refer to this as a "reference cost". Proposition 2.1 (Characterization of optimal strategies). Suppose there exists a piecewise differentiable function µ * : [0, T ] → R and a strategy q * ∈ X with the following properties: (i) The strategy q * is a minimizer of (iii) If µ * is not differentiable at t ∈ [0, T ], then the following "jump" complementary slackness conditions are satisfied: where µ * (t − ) and µ * (t + ) are the left and right limits respectively of µ * at t. Then q * is an optimal strategy.
Proof. Let q ∈ X and let 0 = a 1 < . . . < a n < a n+1 = T be a partition such that and q * , q are continuous and µ * is differentiable over each (a i , a i+1 ). Notice that (1) can be equivalently written as Hence, (5) implies The second equality follows from integration by parts and the final equality results from a rearrangement of the first sum on the previous line, together with the condition that Applying the complementary slackness conditions (6)- (11) to the final equality above, we obtain In a simple price-arbitrage model, as discussed in Section 1.2, the multiplier function µ * provides a reference cost per unit of energy. Roughly, if the cost per unit of energy for operating the store is lower than the reference cost, then this indicates the store should charge at that time (and similarly for discharging).
It is worth mentioning here that exactly the same result follows if we apply the Karush-Kuhn-Tucker (KKT) conditions to the capacity constraints of the original minimization problem (3). By following such an approach, one instead looks for minimizers of the relaxed functional The relation between the two approaches is that the reference cost function µ * in Proposition 2.1 can be expressed as an integral of the KKT functions: and, in particular,μ * (t) = −λ 1 (t) + λ 2 (t).
3. Optimal strategies for the standard storage model. In Section 1.2 we introduced our standard storage model. We present here an algorithm to determine µ * , and consequently the optimal strategy q * , in this case. Our algorithm relies on Proposition 2.1 which we recall states that, if we can find the appropriate reference cost function µ * , then the optimal strategy q * ∈ X solves where The regularity assumptions on L, q and µ imply that (12) reduces to a separate minimization problem for each time. Specifically, the optimal strategy q * ∈ X solves Thus, as long as µ * is known, the problem has now simplified into a collection of minimizations over bounded scalars, rather than the original minimization over a set of constrained functions. The challenge therefore is to determine µ * . This is the objective of the algorithm. An important result is Proposition 3.5, which states that if the algorithm does not terminate early, then it does indeed provide an optimal strategy q * .

Preliminary results and definitions.
We aim to find a piecewise differentiable µ * : [0, T ] → R such that the optimal strategy q * ∈ X solves (12). The conditions of Proposition 2.1 state that µ * should be constant over intervals of time where the level of the store is away from the capacity constraints. This corresponds to intervals of the form [τ k , σ k ) in the algorithm of Section 3.2 and, together with (14), motivates the following construction: Given t ∈ [0, T ] and λ ∈ R, we want to define a quantity u(t, λ) ∈ R as a solution of If we know that µ * is constant over some interval [τ, σ] ⊂ [0, T ], then the task of the algorithm reduces to finding the correct λ and setting q * (t) = u(t, λ) for almost all t ∈ [τ, σ]. Care needs to be taken, however, because there may be multiple minimizers u(t, λ) associated with λ. With this in mind, we denote by M t ⊂ R the set of λ which admit more than one minimizer in (15), and deal with this technicality in the following lemma. Note that, if L is assumed to be strictly convex in its second argument, then minimizers of (5) are of course unique, implying that each M t = ∅.
The following lemma states the important result that u(t, λ) is monotone increasing with λ. This property will be key to the algorithm.
Lemma 3.1 (Monotonicity and regularity properties). For each (t, λ) ∈ [0, T ] × R, let u(t, λ) be any solution of (15). Then, at each t ∈ [0, T ], the mapping λ → u(t, λ) is monotone increasing and piecewise continuous. Moreover, the set M t is finite and discontinuities in λ → u(t, λ) occur only at points in M t .
Proof. Let t ∈ [0, T ] and suppose that the map λ → u(t, λ) is not monotone increasing. Then, there exists λ 1 < λ 2 such that u 1 := u(t, λ 1 ) > u(t, λ 2 ) =: u 2 . But then, The first inequality follows from the definition of u 1 and u 2 as solutions to (15), and the second inequality follows from the supposition. However, the above contradicts the definition of u 2 , and we conclude that the map λ → u(t, λ) is monotone increasing at all t ∈ [τ, T ]. The piecewise continuity of the map λ → u(t, λ) follows immediately from the regularity assumptions on L. Precisely, if for almost all λ ∈ R, the minimizers u(t, λ) lie away from discontinuities of L, then they must satisfy The continuous invertibility assumption on the partial derivative of L therefore implies the piecewise continuity of the map λ → u(t, λ). If, on the other hand, there is a non-degenerate subset W ⊂ R such that λ ∈ W implies u(t, λ) lies at a discontinuity of L, then the above monotonicity property and the piecewise continuity of L imply that the map λ → u(t, λ) must be piecewise constant over W.
In either case, the map λ → u(t, λ) is piecewise continuous. Finally, we prove the finiteness of the set M t by supposing that the opposite is true. To this end, let A ⊂ R be an infinite set such that, for each λ ∈ A, there are two distinct local solutions x(λ), y(λ) ∈ U \ ∂U to (15), where ∂U denotes the boundary of the set U. Let λ 0 ∈ A be such that there exists a sequence (λ n ) n∈N in R with lim n→∞ λ n = λ 0 , and set x(λ 0 ) = x 0 and y(λ 0 ) = y 0 . Assume without loss of generality that denotes the partial derivative of L with respect to the second argument, and that x(λ) and y(λ) lie away from any discontinuity of and similarly, h (λ 0 ) = −y 0 . Hence, if x 0 and y 0 are both global minimizers solving (15) but x 0 = y 0 , then there exists N ∈ N such that g(λ n ) = h(λ n ) for all n ≥ N. In particular, x(λ n ) and y(λ n ) cannot both be minimizers for n ≥ N. This contradicts the assumption that there exists such a set A.
The definitions which follow will be employed in the algorithm and are written under the assumption that there exists a pair (µ * , q * ) which satisfies the conditions of Proposition 2.1. Condition (6) and inequality (14) motivate the following definition.
Note that, in general, a (λ, τ )−admissible strategy q is not admissible: for a general choice of λ, one of the capacity constraints is likely to be broken at some time in (τ, T ]. The idea of the algorithm is to piece together (λ, τ )−admissible strategies, adjusting λ at times when the store is full or empty, in order to construct the optimal strategy q * ∈ X. The reference cost µ * attains the appropriate value of λ over intervals of time when µ * is constant. In the special case that one can find a (λ, 0)−admissible strategy which is also admissible, then of course we are done, since we may set µ * ≡ λ. Such a strategy satisfies condition (6) at every t ∈ [0, T ].
The key feature of the algorithm is to determine: (i) the intervals of time (t k , σ k ) over which µ * is constant and (ii) the value λ k that µ * attains over these intervals.
We may then set q * to coincide with a (λ k , τ k )-admissible strategy over these intervals. The correct choice of µ * relies on the following characterization of strategies.
In particular, the set X − (τ, m) contains all τ −admissible q whose first violation of the capacity constraints would occur at the lower boundary E − if the level of the store at time τ were known to be m (and similarly for X + (τ, m)). The monotonicity property of Lemma 3.1 implies that Λ − (τ, m) and Λ + (τ, m) are connected subintervals of R satisfying 3.2. The algorithm. We are now in a position to present the algorithm, which is a generalisation of that in [10]. The algorithm will identify an N ∈ N and two increasing sequences of times for each i, and such thatμ * (t) = 0 whenever t ∈ (τ i , σ i ). The multiplier µ * is only allowed to jump in value at times τ i or σ i . Fixing τ ∈ [0, T ), we suppose that µ * and q * are known over [0, τ ] and that the algorithm has already identified all of the times τ 1 , . . . , τ k−1 and σ 1 , . . . , σ k−1 which apply over this interval, with τ k < σ k ≤ τ. We further assume without loss of generality that m := [q * ](τ ) ∈ {E − (τ ), E + (τ )} (otherwise, move τ backwards until this is satisfied). Step and define the restriction of q * to (τ k , T ] to be q. The algorithm is complete.
If there is no such q, proceed to step 2.
Step 2. Again, set λ := sup Λ − (τ, m). There are three cases. For each case, we either set τ k = τ and define the pair (µ * , q * ) restricted to an interval (τ k , σ k ], or we proceed to the next step in order to identify τ k > τ . , where e is the exit time associated with q (cf. Definition 3.3). Select the latest such σ and let q ∈ Λ be the (λ, τ )-admissible strategy associated with it.
If σ > τ, then set τ k = τ, σ k = σ and e k = e. Set µ * (t) = λ for all t ∈ (τ k , σ k ] and define the restriction of q * to (τ k , σ k ] to coincide with q. If σ k < T, proceed to step 3b; if σ k = T, proceed to step 4. , where e is the exit time associated with q. Select the latest such σ and let q ∈ Λ be the (λ, τ )−admissible strategy associated with it. If σ = τ, proceed to step 3a.
If σ > τ, then set τ k = τ, σ k = σ and e k = e. Set µ * (t) = λ for all t ∈ (τ k , σ k ] and define the restriction of q * to (τ k , σ k ] to coincide with q. If where e is the exit time associated with q. Select the latest such σ and let q ∈ Λ be the (λ, τ )−admissible strategy associated with it. If σ = τ, proceed to step 3a.
. Select the first time s ∈ (τ, T ] such that, on relabelling m(s) as m, and s as τ, and on determining the new associated σ from steps 1 and 2, we have σ > τ. Set τ k = s and return to step 1.
. Select the first time s ∈ (τ, T ] such that, on relabelling m(s) as m, and s as τ, and on determining the new associated σ from steps 1 and 2, we have σ > τ. Set τ k+1 = s and return to step 1.
If there is no such s, then we have found all intervals (τ i , σ i ) over which µ * is constant and defined the restriction of q * to these intervals. Proceed to step 4.
Step 4. Over each interval (σ i , τ i+1 ], corresponding to the times found in the steps above, and for each t ∈ (σ i , τ i+1 ], define Define the restriction of q * to the union of these intervals to coincide with q. We have now constructed a strategy q * ∈ X.
) → R such that the following conditions hold: ). If such a µ i exists for each i ∈ {1, . . . , N }, then define the restriction of µ * to each of the intervals (σ i , τ i ) to coincide with µ i . We have now constructed a pair (µ * , q * ) such that q * is optimal (see Proposition 3.4).
If, on the other hand, there exists i ∈ {1, . . . , N } with no such µ i , then we have constructed a strategy q * ∈ X but no corresponding multiplier µ * : [0, T ] → R. In this case, q * may not be optimal, and we say the algorithm terminated early.

End of algorithm
3.3. Optimality of the strategies generated by the algorithm. The following result states that, whenever the algorithm yields a pair (µ * , q * ), then q * is an optimal strategy, according to Proposition 2.1. The converse is then given by Proposition 3.5: if there is an optimal strategy which satisfies the conditions of Proposition 2.1, then the algorithm will find it.
Proof. Let (µ * , q * ) be the pair which is determined through the algorithm and let be the associated sequences of times, so that µ * is constant over each interval (τ i , σ i ). Assume that the conditions of Proposition 2.1 are satisfied up If the conditions of step 1 are satisfied at τ k , then q * is clearly admissible and satisfies the properties of the proposition. Hence, q * is optimal and the algorithm is complete. Assume therefore that the conditions of step 1 do not hold at τ k so that sup Λ − (τ k , m) = inf Λ + (τ k , m). It is clear that the strategy defined by the algorithm is admissible, and it remains to check that conditions (6)-(11) are satisfied by µ * .
To this end, note that, the algorithm ensures that the conditions are satisfied over the intervals (τ k , σ k ) and (σ k , τ k+1 ], and it remains to check that the conditions hold at σ k . Suppose first that case a) of step 2 is satisfied at time τ k , so that and condition (10) is satisfied at σ k . If, on the other hand, σ k < τ k+1 , then the construction of µ * through the algorithm immediately implies that and condition (11) is satisfied at σ k . If, on the other hand, σ k < τ k+1 , then the construction of µ * through the algorithm immediately implies that . Finally, if case c) holds at τ, then the above two cases together imply that the conditions of Proposition 2.1 are satisfied.
Proposition 3.5. If there is a pair (µ, q) which satisfies the conditions of Proposition 2.1, then the strategy q * ∈ X generated by the algorithm is optimal.
Proof. Let (µ, q) be a pair satisfying the conditions of Proposition 2.1 and let q * be the strategy generated by the algorithm. At the initial time t 0 = 0, there exists an interval [t 0 , s) ⊂ [0, T ] with two associated options: either (i) µ is constant over this interval, or (ii) µ is varying over this interval, in the sense thatμ(t) = 0 or µ(t + ) − µ(t − ) = 0 for almost all t ∈ (t 0 , s). We will prove the following two claims: Claim 1. If case (i) holds and [t 0 , s) is the largest connected interval over which µ is constant, then the algorithm yields a constant µ * over this interval and we may adjust µ so that this is correct, without violating the conditions of Proposition 2.1).
This will complete the proof of Proposition 3.5 since, as long as [q](s) = [q * ](s), we may re-set the start time as t 0 = s and re-apply these claims to find another interval [t 0 , s ) over which the above results hold. Since µ * coincides with µ over [t 0 , s ), we conclude that the pair (µ * , q * ) generated by the algorithm satisfies the conditions of Proposition 2.1 over this interval (and consequently over all of [0, T ] by iteration). We will employ the notation λ ∼ λ if the set of (λ, t 0 )−admissible functions coincides exactly with the set of (λ , t 0 )−admissible functions over the interval (t 0 , s) and we introduce the ordering λ ≺ λ if λ λ and λ < λ . We set m 0 = [q](t 0 ) and λ * = sup Λ − (t 0 , m 0 ).
To prove Claim 1, suppose that case (i) holds. The monotonicity and continuity properties of Lemma 3.1 imply for θ ∈ R the relations:  (22) implies that the lower capacity constraint is broken by q, which contradicts the assumption that q ∈ X. If µ is not constant over any such interval, then Proposition 2.1 requires that q(t) = u(t, µ(t)) =Ė − (t) + αE − (t) for all t ∈ (s, t ), where u is the map of Lemma 3.1. By the monotonicity of u, this is possible only if µ(t + ) − µ(t − ) > 0 at some t ∈ (s, t ), since otherwise the fact that λ ∈ Λ − (t 0 , m 0 ) implies that q will break the lower capacity constraint. However, this contradicts condition (8) of the proposition. Hence, we must have λ ∼ λ * or λ λ * . If λ λ * , then similar arguments as above lead us to analogous contradictions. We conclude that λ ∼ λ * , as stated in Claim 1 above.
The remainder of Claim 1 states that [q](s) = [q * ](s). Since λ ∼ λ * (and therefore µ and µ * coincide over [t 0 , s)), this is always true if (16) admits a unique minimizer at each t ∈ [t 0 , s). However, if this uniqueness does not hold, then there are potentially two (or more) strategies q 1 , q 2 ∈ X which satisfy the conditions of Proposition 2.1, and we need to make sure that Claim 1 holds regardless of which strategy we choose. Let x 1 , x 2 be the corresponding (λ, t 0 )−admissible strategies and relabel our previous value of s as s 1 , s 2 , so that q i (t) = x i (t) for all t ∈ [t 0 , s i ), for i = 1, 2. Let e 1 and e 2 be the corresponding exit times for x 1 and x 2 respectively (cf. Definition 3.3), with s 1 < e 1 and s 2 < e 2 . As shown above, the corresponding multipliers µ 1 , µ 2 both satisfy that µ i (t) ∼ λ * for all t ∈ [0, s i ), i = 1, 2, but according to our definition for s, the value of each µ i must change at time s i , so that λ i + := µ((s i ) + ) λ. In particular, this means that if x i ∈ X − (t 0 , m 0 ), then we must have [q i ](s i ) = E + (s i ) and λ i + λ, since otherwise Proposition 2.1 tells us that [q i ](s i ) = E − (s i ) and λ i + ≺ λ, which would be a contradiction because, by the monotonicity property of Lemma 3.1, this would mean that q i breaks the lower capacity constraint. Similarly, if x i ∈ X + (t 0 , m 0 ), then we conclude that Assuming for now that x 1 ∈ X − (t 0 , m 0 ), we consider the following (exhaustive list of) cases: (a) x 2 ∈ X + (t 0 , m 0 ) and s 1 ≤ e 2 and s 2 ≤ e 1 ; (b) x 2 ∈ X + (t 0 , m 0 ) and either s 1 > e 2 or s 2 > e 1 ; (c) . If case (a) holds, then [q 1 ](s 1 ) = E + (s 1 ) and λ 1 + λ. However, we can construct a new (λ, t 0 )−admissible x which agrees with x 1 over [t 0 , s 1 ) and which agrees with x 2 over [s 1 , s 2 ). By construction, x ∈ X + (t 0 , m 0 ). Thus, the monotonicity property of Lemma 3.1 implies that if λ 1 + λ, then q 1 must break the upper capacity constraint at some time, which contradicts q 1 ∈ X. A similar argument reveals an analogous contradiction for q 2 . Thus, q cannot coincide with x i over [t 0 , s i ), for i = 1, 2 but rather must coincide with some other (λ, t 0 )−admissible x over its corresponding interval [t 0 , s).
Finally, if case (d) holds, we again have that [q 1 ](s 1 ) = E + (s 1 ) and λ 1 + ≺ λ, and we can construct a (λ, t 0 )−admissible x ∈ X + (t 0 , m 0 ) which agrees with x 1 over [t 0 , s 1 ) and agrees with x 2 over [s 1 , T ]. Thus q 1 must break the upper capacity constraint at time T and we conclude that q cannot coincide with x 1 over [t 0 , s 1 ).
A unifying feature of the above cases (a)-(d) is that, if s 1 < s 2 , then we arrive at a contradiction. Thus, the pair (µ, q) can satisfy the conditions of Proposition 2.1 only if q agrees with the (λ, t 0 )−admissible x which has the latest corresponding update time s. This means that, not only does the algorithm choose λ * ∼ λ over the interval [t 0 , σ 1 ) over which µ * is constant (with σ 1 being the first update time, as defined in the algorithm), but also that σ 1 = s. Analogous results hold if we assume instead that x 1 ∈ X + (t 0 , m 0 ). Thus, the only case not covered above (after relabelling  To prove Claim 2, suppose that case (ii) holds. Then, by Proposition 2.1, the level of stored energy remains fixed at a capacity constraint throughout [0, s) and we may assume, without loss of generality, that [q](t) = E − (t) for all t ∈ [0, s) and consequently that µ is decreasing over this interval. Suppose now that there is a connected interval I ⊂ [t 0 , s), containing t 0 , such that q(t) = q * (t) for almost all t ∈ I. Then, the algorithm tells us that µ * attains a constant value λ * over all of I and that there is a (λ * , t 0 )−admissible x which coincides with q * over [t 0 , σ 1 ). Since q remains at the lower capacity constraint over [t 0 , s), we have q(t) ≤ x(t) and µ(t) ≺ λ * for all t ∈ I∩[t 0 , s). (If, instead, µ(t) ∼ λ * for all t ∈ I∩[t 0 , s), then we have already dealt with this case in Claim 1.) If x / ∈ X + (t 0 , m 0 ), then (22) together with conditions (8) and (11), imply that q breaks the lower capacity constraint before the exit time associated with µ * , thus contradicting the assumption that q ∈ X. If, on the other hand, x ∈ X + (t 0 , m 0 ), then conditions (8) and (11) imply that q breaks the lower capacity constraint before µ * changes value, again contradicting the assumption that q ∈ X. This completes the proof of Claim 2. Proof. The proof is a direct analogue of that presented in [10], so we do not replicate it here.

The time-localization property.
Consider the standard storage model of Section 1.2. The algorithm presented in Section 3.2 reduces the original problem -an optimization over a constrained set of functions -to a series of new simpler optimizations over a set of bounded scalars. Moreover, each of these new optimizations are localized in time, in the sense that at each time τ k from the algorithm, one needs cost information only up to the associated exit time e k to know how to operate the store over (τ k , σ k ). Indeed, the same is true for each time t ∈ (τ k , σ k ). We call this property the "time-localization property".

An intuitive illustration.
As an illustration of both the algorithm and the time-localization property, consider a simple arbitrage model of the standard storage model form, with no leakage (α = 0) and with cost rate function where, for each t ∈ [0, T ], p(t) ∈ [0, ∞) is the price of unit energy. Let U = [−q − , q + ] be the range of admissible powers, with q − , q + ∈ (0, ∞) denoting the discharge and charge power ratings of the store respectively. 1 , 2 ∈ (0, 1] are efficiency factors: in order to put one unit of electrical energy into the store, the operator must buy more than this to account for losses during the charging process; similarly, when discharging one unit of energy from the store, some energy is lost during the discharge process and the operator can only sell a fraction 2 of this energy. We also assume that the store starts and finishes empty so that 0 = T = 0. Both Proposition 2.1 and the algorithm say that the optimal strategy consists of intervals of time over which the multiplier is constant. These intervals are each of one of the following forms: F (the store is full at the end of the interval, without touching the capacity constraints in between the start and end time of the interval); and E (the store is empty at the end of the interval, again not touching the capacity constraints in between the start and end times of the interval). Sometimes the algorithm may yield a multiplier which is constant over longer intervals of time, so that a capacity constraint is attained somewhere away from the end-times, but the same categorization holds on splitting such intervals up into smaller ones. In the complement of these intervals of constant multiplier, the value of the multiplier is allowed to vary but the level of stored energy must remain at one of the capacity constraints. In the case of our simple arbitrage model above, a multiplier λ can be interpreted as a reference cost which dictates whether the store should be charging, discharging or doing nothing at any given time, according to (15), with (25) Proposition 2.1 states that the reference cost can increase only if the store is full. In terms of (25), this can be interpreted as the operator waiting for prices to rise to a high enough level that it is worth selling energy. Similarly, the reference cost can fall only when the store is empty and the operator is waiting for low buying prices. If λ = p(t)/ 1 or λ = 2 p(t), then we have to choose the appropriate value for q * (t) via the algorithm but, for simplicity, let us assume that the price is always varying so that the set of times at which one of these conditions occurs is of zero measure and therefore the choice of q * at these times does not matter. An alternative approach is to make (24) strictly convex by appending a small strictly convex function to L, e.g. replacing L(t, x) with L(t, x) + δx 2 for some small δ > 0. Then, the solution to (16) is unique for each multiplier λ and the optimal strategy obtained with this modified cost rate function will converge to q * as δ tends to zero. This approximation is useful for practical computations where p(t) might be constant over intervals.
The purpose of steps 1-3b of the algorithm is to identify these intervals (which are labelled (τ k , σ k )) of constant multiplier and to choose the correct value for the multiplier over these times. To illustrate the time-localization property, suppose that the multiplier attains a constant value λ over some interval (τ, σ) and that this interval is of type E. Suppose now that we can accurately forecast prices up to some timeh > σ. If, on keeping the multiplier constant and extending the associated strategy (as defined by (25)) up to timeh, we find that the store becomes full at some time h ∈ (σ,h], then any price information beyond time h is irrelevant to the actions of the store over (τ, σ) : if prices are relatively high after time h, then the store is empty at time σ but is able to charge up at suitably low prices over the interval (σ, h), in preparation for discharging after time h; if prices are relatively low after time e, then the store can adjust its strategy over (σ, h) by waiting for the cheapest prices at which to charge after time σ. In such a way, we can argue that a store operator needs forecast prices only up to time h in order to determine the optimal strategy over (τ, σ).
In a similar way, if instead the interval (τ, σ) is of type F, then we look for the first time h at which the store would be empty if the multiplier were kept constant and the associated strategy extended up to time h. As above, we find that price forecasts over the interval (τ, e) provide enough information in order to optimally operate the store over (τ, σ). Thus, in the case of either type E or type F, the determination of the optimal strategy over (τ, σ) requires price forecasts up until the first time that the store would have reached the opposite capacity constraint to that attained at time σ, if the multiplier were kept constant from time τ onwards.
Moreover, (τ, h) is the shortest interval of time over which we need to forecast prices in order to optimally operate the store over (τ, σ). If, for example, (τ, σ) is of type E but we could only forecast prices up to a time h < h, then prices beyond time e might be exceptionally high but, as the store is not full at time h, we may not be able to take full advantage of these high prices. In this situation, it would have been better to increase the multiplier at time τ in order to charge the store in preparation for the subsequent high prices. Similarly, if (τ, σ) is of type F, then prices beyond time h might be very low, so that it would be best to choose a low multiplier which allows us to empty the store in preparation for charging during these times. However, if prices beyond h are very high, then it is better to maintain a high-valued multiplier in order to ensure that the store is full prior to these high prices.
We call h = h(τ ) the "look-ahead time" associated with time τ. Price information over (τ, h) suffices in order to determine the optimal strategy at time τ (and indeed over the entire interval (τ, σ)). Moreover, (τ, h) is the shortest interval with this property -price forecasts only up to an earlier time h < h will not be sufficient to determine the best strategy for the store at time τ.

The look-ahead time associated with the algorithm and Proposition 2.1.
For more general versions of the standard storage model, an alternative viewpoint of the time-localization property comes directly from Proposition 2.1 and the algorithm. If the optimal strategy q * is of the form of Proposition 2.1, then steps 1-3 of the algorithm say that, over an interval (τ, σ) of constant multiplier λ, we have λ = sup Λ − (τ, m), where m is the optimal level of stored energy at time τ. To identify λ, choose any λ 1 ∈ R and consider the set of corresponding (λ 1 , t)−admissible q 1 (cf. Definition 3.2). If each q 1 breaks the lower capacity constraint of the store and does not hit the top of the store before doing so, as shown in Figure 1, then λ 1 is too small (by Lemma 3.1). On the other hand, if we choose λ 2 such that each corresponding (λ 2 , t)−admissible q 2 breaks the upper capacity constraint without hitting the bottom of the store beforehand, then λ 2 is too large. The correct choice is λ ∈ (λ 1 , λ 2 ), with a (λ, t)−admissible q which either hits the top of the store (at some time after t) before hitting the lower constraint, or vice versa, at the look-ahead time h. Without changing the multiplier λ at some subsequent time, it is possible that the associated q may break a capacity constraint at time h. Now notice that, to recognise that λ 1 and λ 2 are not the correct choice of multiplier, we only need to determine the corresponding q 1 and q 2 up until the exit times where the capacity constraints are first broken (labelled e 1 and e 2 respectively in Figure 1). As we increase λ 1 (or decrease λ 2 ) to converge to λ, the corresponding time horizons e 1 and e 2 also increase monotonically and are bounded from above by h = h(τ ).
Step 4 of the algorithm corresponds to the case where the corresponding look-ahead time is h = τ. In any case, [τ, h] is the shortest interval of time over which cost information is required in order to decide how to optimally operate the store at time τ.  Figure 1. The thin solid line is the level of stored energy [q](t), plotted over times t ≥ τ, where q is the strategy associated with the constant multiplier λ. The thick solid line represents the level of stored energy associated with the optimal strategy q * , where q * agrees with q over (τ, σ). The look-ahead time associated with time τ is h. The lower and upper dashed lines are respectively [q 1 ](t) and [q 2 ](t). These correspond to λ 1 and λ 2 , and need determining only until times e 1 and e 2 respectively in order to discard these as incorrect choices for the multiplier. strategy q * , then the time horizon H(t) is the first time at which the store either (i) hits full capacity (at some time after t) then discharges to hit the lower capacity constraint, or (ii) hits empty (at some time after t) then charges to hit the upper capacity constraint. The assumption that we may need to know the entire strategy q * before we can calculate H(t) is obviously not ideal, but as long as our optimisation algorithm moves forwards in time (as it does in Section 3.2), we will know that we have reached our time horizon H(t) as soon as we have met one of these criteria. In a similar way to the look-ahead time, any cost information after time H(t) is irrelevant to the action of the store not only at time t, but at all times within the interval [t, σ(t)), where σ(t) denotes the final time prior to H(t) at which the store is full in case (i) or empty in case (ii).
Even if we were to choose an algorithm which works backwards in time, this property could still be useful since we may try solving the optimization problem over some shortened interval of time [0, H ] ⊂ [0, T ]. Applying our criteria will then reveal whether or not H(0) ≤ H . If the inequality holds, then we know that our solution is correct up to time σ(0). Relabelling H(0) as the initial time and setting the level of the store at that time to agree with the level given by our previous optimization, we repeat this procedure until the whole interval [0, T ] has been covered. The efficiency of this precedure of course relies upon making good guesses in advance for the value of each H(t).  (4), where the associated cost rate functions L,L respectively satisfy the conditions of Section 1.2. Assume that there exist optimal strategies q * ,q * ∈ X associated with C andC respectively and let t ∈ [0, T ) be such that [q * ](t) = [q * ](t).
Proof. Suppose that there exist times t 1 , t 2 with the required properties and that L(s, ·) =L(s, ·) for almost all s ∈ [t, t 2 ]. Assume first that [q * ](t 1 ) = E − (t 1 ) and suppose that the lemma is not true. By the continuity of the map s → [q](s) for each admissible q ∈ X, there must exist at least one interval (a, b) ⊂ [t, t 2 ] such that In particular, we may adaptq * if necessary so thatq * (s) = q * (s) for all s ∈ (a, b).
Repeating this argument for all such intervals (a, b), and lettingb be the supremum of all such end-points b, implies that the optimal strategy associated withC can be chosen to coincide with q * over (t,b). The condition that [q * ](t 1 ) = E + (t 1 ), together with the continuity of each map s → L[q](s), further imply thatb ≥ t 1 , hence completing the proof for the case [q * ](t 1 ) = E − (t 1 ). A similar argument holds for the case where [q * ](t 1 ) = E + (t 1 ).

A comparison of update times, exit times, look-ahead times and time horizons.
Consider again the algorithm of Section 3.2 and suppose that we have identified an interval (τ, σ) of constant multiplier λ. Then σ can be chosen to be the first time after τ at which the store is either full or empty (or the level of the store is T if σ = T ). As defined above, the look-ahead time h = h(τ ) associated with τ is such that (τ, h) is the shortest time-interval over which cost information is required in order to operate the store optimally at time τ (and, in fact, over the entire interval (τ, σ)). The look-ahead time has the property that the level of stored energy at time h lies on the opposite capacity constraint to that at the update time σ.
The time horizon H = H(τ ) associated with τ serves a similar purpose to the look-ahead time, in the sense that cost information over (τ, H) is sufficient to determine the optimal strategy over (τ, σ). However, unlike with the look-ahead time, there is no guarantee that (τ, H) is the shortest such interval. In fact, h = min (H, e), where e = e(τ ) is the exit time associated with τ (cf. Definition 3.2). This is the first time after τ at which a capacity constraint would be broken if the multiplier were not updated before this time. As with the look-ahead time h, both H and e share the characteristic that the level of stored energy at times H and e lies on the opposite capacity constraint to that at the update time σ.
Finally, consider extending the interval (τ, σ), if possible, to the maximum interval (τ,σ) over which the multiplier is constant. We callσ =σ(τ ) the "update time" associated with τ, since at this time the multiplier must be updated to a new value. The level of stored energy at timeσ lies on the same capacity constraint as at time σ. In order to evaluate the optimal strategy over (τ,σ), cost information is required up until the associated exit time e. Thus, whilst the look-ahead time indicates the minimum amount of cost information required in order to operate the store at any time t ∈ [τ, σ], the strategy actually remains completely determined by the multiplier λ over the (potentially) longer period of time [τ,σ]. On the one hand, the look-ahead times provide a useful method for the real-time operation of the store when uncertainty is introduced into the cost functions, as discussed below; on the other hand, a deterministic optimisation benefits from the fact that the operation of the store can be characterized by a simple rule, determined by λ, up until the update timeσ.

Real-time stochastic optimization.
Suppose that the cost rate function L(t, ·) of Section 1.2 is not perfectly known in advance for all times t, but rather that it is being frequently updated -at times 0 < s 1 < s 2 < . . . < T, say -as new information becomes available. Then, at time t, having decided on the action of the store over [0, t) and thus having determined the corresponding level of stored energy t at time t, we wish to choose q * r : [t, T ] → U in order to minimise the expected cost over all piecewise continuous q : [t, T ] → U that satisfy the capacity constraints of the store. The look-ahead times h(t) associated with each time t suggest that a rolling time-horizon approach should be an appropriate way of dealing with uncertainty of this kind. Beginning at time t = 0, as long as cost predictions are good up until the look-ahead time h(0), then the algorithm of Section 3.2 should provide a good estimate of the best course of action q r : [0, s 1 ) → U for the store up until the first information update time s 1 . One can then restart the algorithm at time t = s 1 , with the level of stored energy at this new initial time given by [q r ](s 1 ). Again, as long as cost predictions are good from time s 1 up until the look-ahead time h(s 1 ), we can apply the algorithm to extend the strategy q r into [0, s 2 ]. Continuing in this way, we can decide on an action for the store at each time t, with the action depending only on cost information that is available from time t until the look-ahead time h(t). This approach has a number of benefits: (i) Cost information can be frequently updated. This is in contrast to some other approaches in which the cost functions are assumed to follow a fixed probability distribution over all future times. (ii) Computations need to be carried out only over times [t, h(t)] in order to decide on the best action for the store at time t. This method should, therefore, compare favourably with other rolling time approaches, as well as with methods based on dynamic programming, in terms of the number of computations needed to operate the store in real-time. (iii) This method contrasts favourably with other rolling-time approaches which explicitly introduce look-ahead times, perhaps based on an assumption of underlying periodicity in the cost rate function L. With the algorithm of Section 3.2, we can avoid making long-term predictions and save on computations in a similar way, but we need not make an a priori periodicity assumption, since the time horizons h(t) are implicit in the analysis.

Maximizing arbitrage profits.
We consider here a compressed air energy store (CAES) which operates purely within a single wholesale market. Currently, there are only two commercially operating stores of this kind (one in Alabama USA, and the other in Germany), both of which require an input of gas at the discharge stage. Research into heat storage technology is striving to reduce or even remove this reliance on gas, by replacing the energy provided by igniting gas at the discharge stage with heat energy which is restored from charging the store. The case where the heat of compression is reused and no gas is burnt is called "adiabatic". Our model of a compressed air store has charge and discharge power rates q + , q − respectively and a fixed energy capacity M, so that the set of admissible powers is U = [−q − , q + ] and the admissible domain satisfies E − (t) = 0 and E + (t) = M for all t ∈ (0, T ). We further assume that the store is constrained to start and finish empty, so that E − (0) = E + (0) = E − (T ) = E + (T ) = 0. Typically, leakage from stores of this type is considered to be very low, and we therefore set α = 0. The cost functional C is of the "standard storage model" form introduced in Section 1.1, with a cost rate function L (cf. (4)) defined by where p(t), g(t) ∈ R is the price per unit energy at time t and β, γ is the amount of electrical and gas power respectively required to generate and sell 1 unit of electrical power. Setting γ = 0 and β > 1 gives us an adiabatic compressed air store with round-trip efficiency 1/β. This model, with γ = 0 and β > 1, would also be a suitable approximation for other storage types, including pumped hydro (assuming that the effects of rainfall and seepage on stored energy in the upper reservoir is negligible), thermal and liquid air energy storage. An additional multiplicative factor can easily be incorporated into the charge side of the cost function if we wish to model charge and discharge efficiencies separately, rather than a single roundtrip efficiency. We are assuming here that the store is a price-taker so that its actions do not impact on the price of electricity. However, market impact could be incorporated by replacing each p(t) with a suitable price of the formp(t, q(t)) (see [10] and [11] for a further discussion of this). If the price of electricity is always low enough that p(t)(1 − β) < γβg(t) for all t ∈ [0, T ], then the cost functional (27) is convex and an optimal strategy exists. By Propositions 3.5 and 3.6, an optimal strategy can therefore be determined via the algorithm of Section 3.2. For a price-taking arbitrage situation, as we have here, the multiplier µ * can be interpreted as a reference cost: the strategy of the store is to charge at full power if the cost of electricity is sufficiently low that p(t) < µ * (t); to discharge at full power if the cost of electricity (or gas) is sufficiently high (or low) that (1/β) (p(t) − γg(t)) > µ * (t); and to do nothing if prices are not extreme enough to satisfy either of these conditions. The reference cost should not change whilst the store is neither full nor empty; intuitively, there is no incentive to do so. There are analogies here with the literature on investment under uncertainty [12]. When the store is full, however, the reference cost is permitted to increase. At these times, the store operator is waiting until the price of electricity becomes high enough (or the price of gas low enough, but in our illustration we have taken the price of gas to be constant), that it is worth selling power. If the reference cost were instead decreasing at these times, then this would indicate that the store should not have charged up so soon, but rather it should have waited for these more favourable buying conditions. Similar arguments hold when the store is empty. Figure 2 shows the optimal strategies for a compressed air store with parameters modelled on those of the Huntorf store in Germany: M = 580MWh, q − = 290MW, q + = 72.5MW. The left-hand plot takes parameters β = 0.8, γ = 1.6, which coincides with data available for Huntorf station [1]. The right-hand plot represents two adiabatic stores with β = γ = 0 and with different efficiencies. We have chosen here round-trip efficiencies 1/β of 0.75 and 0.85 which, according to [2] are the lower and upper efficiencies to be expected from the recent adiabatic research CAES plant, ADELE. The actions of the store are optimized over the course of one year. Electricity prices are APX half-hourly 2014 mid-price index prices (the data are available from [3]). These are a weighted average of all traded prices at any given time and represent an approximation to the spot price at that time. We have used a constant gas price throughout the year, which is set to the average 2014 gas price of £17.56/MWh, as reported by DECC (the UK Department of Energy and Climate Change).
An immediate observation is that each store follows a similar pattern of optimal storage levels -charging at times of low prices and discharging at times of high prices. As the round-trip efficiency of the adiabatic store increases, it operates over longer periods as the lower energy losses mean that the store is able to take greater advantage of price differences. The gas-fired CAES plant follows a strategy which is more closely related to the adiabatic store with a 0.75 round trip efficiency and this is further reflected in the similarity of the profits of these two stores (see Table 1). The CAES plant is slightly less active than either adiabatic plant here since the cost of discharging power is usually higher for the former than the latter. More precisely, the cost L(t, x) associated with discharging x MW at time t is higher for an adiabatic store with round-trip efficiency ε than our CAES plant if and only if 2g(t) < p(t) (1/0.8 − ε) . With our chosen parameters of g(t) = 17.56 for all t and ε = 0.75, this condition becomes p(t) > 70.24.
We highlight here that, although the price of electricity follows a reasonably periodic pattern (with periods of roughly a day in length), it does not follow that the level of the store is the same at the start and end of each day (see, for example, [17] for a periodic setting in which the optimal solutions are not necessarily periodic). This underlines point (iii) made in Section 5.1, that if we are interested in the real-time operation of the store, then we are likely to obtain a better strategy by implementing the implicit look-ahead times of the algorithm of Section 3.2, rather than introducing explicit look-ahead times with extra constraints imposed at those times at each stage of the optimization. The length of look-ahead times required in order to determine the best strategy at each time is generally longer for the gas-fired CAES plant than for either adiabatic store. In fact, Table 1 indicates that, on average, the operator of a CAES plant needs to predict electricity prices over periods of 87.9 hours, as opposed to 31.9 hours or 21.3 hours for a 0.75 or 0.85 efficient adiabatic store respectively. The lower look-ahead times for the higher efficiency adiabatic store can be explained in a similar way to the difference in optimal strategies -a higher efficiency plant does not need to look as far ahead in time as a lower efficiency plant since it is able to take advantage of smaller price differences, thus resulting in quicker cycling times. Similarly, as observed above, the gas-fired CAES plant is usually less able than a 0.75 efficient adiabatic store to take advantage of smaller price differences, and therefore needs to look further ahead in order to identify price differences which are sufficiently great to encourage the store to cycle.

Storage for smoothing.
Consider the same store that was introduced in Section 5.2 but suppose now that the store operator wishes to "smooth" a supply function g : [0, T ] → R. The supply here might be the power output of a wind farm, or perhaps the difference between supply and demand at some location. Such smoothing could be useful, for example, in controlling the variability of renewable outputs or demand, thus potentially reducing balancing costs and the need for peaking plants. For this example, we will assume no rate constraints, so that the set of admissible powers for the store is U = R, and to simplify exposition we will initially restrict attention to the case of a perfectly efficient store in which there are no energy losses during operation.
In the perfectly efficient case, an optimal smoothing strategy q * ∈ X is now defined as a minimizer of the total variation TV(y) over all functions y : [0, T ] → R of the form y(t) = q(t) − g(t), for all t ∈ [0, T ] and some admissible strategy q ∈ X, where TV(y) := min and where P is the set of all partitions P of the form 0 = t 1 < . . . < t n < t n+1 = T, with n = n(P ). If y is differentiable (or absolutely continuous, thus differentiable in a distributional sense and satisfying the fundamental theorem of calculus), then (28) becomes Alternatively, let G(t) =´t 0 g(s) ds be the accumulated supply output up until time t ∈ [0, T ] and consider the diagram in Figure 3. Let T denote the tunnel bounded by −G(t) and M − G(t). The curve through T represents the trajectory Y * (t) := [q * ](t)−G(t) associated with the optimal smoothing strategy q * and is the same as the path taken by a string that is stretched taut through the tunnel, with fixed start and end positions 0 −G(0) and T −G(T ) respectively (with 0 = T = 0 in the diagram). In light of the taut string analogy, it is not difficult to see that q * is completely characterized by the following conditions: (i) The net output y * is constant over intervals of time when the store is neither full nor empty. (ii) y * can increase only at times when the store is full. (iii) y * can decrease only at times when the store is empty.
Relabelling −G(t) as E − (t) and M − G(t) as E + (t), we see that the conditions (i)-(iii) on y * coincide exactly with the conditions (6)-(11) on µ * . Hence, the algorithm of Section 3.2 can be used to determine y * and consequently the optimal smoothing strategy q * . In particular, even though the objective function (28) is clearly not of the same form as the standard storage model of Section 1.2, the same timelocalization properties continue to hold here. In fact, the update timeσ(t) and exit time h(t) associated with a time t can now be illustrated clearly as in Figure 3. The exit time e(t) corresponds to the latest point on the tunnel that can be seen by projecting a straight line from (t, t ), where t = [q * ](t) is the level of stored energy at time t, so that The update timeσ(t) is the latest time prior to e(t) at which the line coincides tangentially with the boundary of the tunnel which is opposite to the boundary through which the line exits at e(t). If the supply function g is known over the interval [t, e(t)], then this straight line corresponds to x * over [t, σ(t)], and thus uniquely determines the optimal strategy q * over that interval. Consider now a more general case in which the store has a round-trip efficiency ∈ (0, 1], in the sense that for each unit of energy discharged from the store, only units are dispatched to the grid, due to operational losses. (For simplicity of exposition, we assume no losses during the charging process, although our arguments can be adjusted for more general cases.) As a direct analogue of the above, an optimal smoothing strategy is now one which minimizes (28) over all functions y : [0, T ] → R of the form y(t) = q(t) − g(t) if q(t) < 0 and y(t) = q(t) − g(t) if q(t) ≥ 0, for all t ∈ [0, T ] and some admissible strategy q ∈ X. Although it is difficult to extend Figure 3 to this more general case, conditions (i)-(iii) still hold. We omit a formal proof here for brevity of exposition, although some intuition can be gained from the following discussion: If condition (i) does not hold, so that y * is varying at times when the store is neither full nor empty, then the local lack of  and the half-hourly net wind energy output when using a 580MWh, 0.75 efficient store for smoothing the wind energy (right). Bottom row: Optimal stored energy levels when using this store for smoothing (left) and the length of look-ahead time required to decide on the optimal strategy at each time over the year (right). capacity constraints means that there is freedom for the storage operator to slightly adjust the store's power output in order to improve smoothing over these times. If condition (ii) does not hold, so that the store is full over some period of time but y * is decreasing over that time (so that q * is constant but g is increasing), then the storage operator can improve smoothing by discharging a small amount of energy at the start of this time interval and then refilling back to full by the end of the time interval. (Similarly for case (iii).) To illustrate, consider an adiabatic store with a capacity of 580MWh and roundtrip efficiency of =0.75, with no rate constraints. Suppose that the store is designed to smooth the wind output of a wind farm whose capacity is approximately 1/500 of the total installed wind capacity for the UK. This corresponds to an installed capacity of approximately 22MW. Taking 2014 half-hourly wind output data for our supply function g (available from Elexon [3]) and using our algorithm to minimize the total variation (28), we obtain a total variation of 14.0. This compares with a total variation of 1515.9 associated with the wind farm output alone, with no storage attached for smoothing. To achieve this level of smoothing, the store requires a maximum charging rate of 7.6MW and a maximum discharging rate of 6.8MW. The strategy here is very different from the revenue-maximizing strategy of Section 5.2. Firstly, the store no longer operates only at maximum powers but rather continuously adjusts its power output in line with wind generation. Moreover, whilst the prices in Figure 2 loosely follow a daily cycle, here the wind output displays no such periodicity. As such, the optimal smoothing strategy loses the approximate diurnal cycling that was displayed in our arbitrage model of Section 5.2. The average look-ahead timelength required over the course of the year is 1177 hours, with 10th and 90th percentile look-ahead timelengths of 436 hours and 2084 hours respectively. These durations are much higher than those indicated in Table 1. This can be attributed to the large capacity of the store. As seen in Figure 4, the level of stored energy rarely reaches full capacity -a similar level of smoothing could most likely be attained with a smaller capacity store. Thus, for large intervals of time, the store maintains the net wind-farm power output at a constant level, and there is no incentive to change this level whilst the store is neither full nor empty. This in turn means that the storage operator needs to look a long way into the future in order to determine the next constant level of net wind output it needs to aim for. Reducing the capacity of the store would in turn reduce the look-ahead timelengths. Taking instead a 60MWh store, for example, we obtain an average look-ahead timelength of 130 hours (approximately 5 and a half days). This order of timelength is much better suited to the times over which we can currently make reasonable wind forecasts and so, for practical purposes of real-time operation, a smaller store could well be better here. The maximum charge and discharge rates required in this case are 6.1MW and 4.7MW respectively. The disadvantage of using a smaller capacity store, however, is a lower level of wind-smoothing, as seen in Figure 5. 6. Conclusions. We have presented a method to determine how to operate an energy store in order to minimize a given cost functional. This method could apply, for example, to revenue maximization or wind-power smoothing problems. Our setting allows for leakage, inefficiencies, time-varying power constraints and general operating costs which are functions of the power output.
A significant benefit associated with this method is the implicit localization in time of the solution, meaning that the operator of the store typically needs to have only limited foresight of future cost information in order to decide on the optimal real-time operation of the store. We have indicated that this property should render our methods useful for stochastic optimization problems.
Another property of our method comes from the Karush-Kuhn-Tucker multipliers associated with the optimization problem. We have shown that the solution can be characterized entirely by the associated multiplier which, over intervals of time when the store is filling or emptying, is constant. In terms of an arbitrage revenue maximization problem, this multiplier may be interpreted as a reference price which indicates whether the store should be charging or discharging energy at each time.
Software implementing the method will be available via http://estoolbox.org An intended future publication [11] will report on extension to cases where the store is large enough to influence prices or with multiple competing stores.