THE INVERSE VOLATILITY PROBLEM FOR AMERICAN OPTIONS

. The problem of determining equity volatility from a knowledge of American option prices for a range of exercise (strike) prices and expirations is solved by minimization of a convex functional.


Introduction.
It is common to model a financial asset, such as a stock or a commodity, via the stochastic differential equation where, for each time t, S t (ω) is a random variable representing the price of the financial asset for the trial ω, m is the drift, which relates to the "trend" of the asset, σ is the volatility (the "wobble" in the asset graph), and B t (ω) is a Brownian motion stochastic process used to model the randomness. Financial derivatives are contracts that derive their value from such an underlying asset. In particular, a European put (call) option on a stock is a contractual right to sell (buy) one share of the stock at a specified price K (the strike, or exercise, price) at a specified future time T (the expiration, or expiry, date), and an American option is defined similarly with the additional provision of allowing early exercise of the contract. It is known that the daily financial transactions associated with these forwardlooking instruments give rise to a huge amount of raw data, buried inside of which is the market's best guess as to what the future holds. We are concerned here with the possibility of extracting future-oriented information from this type of data with the aid of certain computational inverse algorithms.
In their Nobel Prize winning paper [3] the economists Fischer Black and Myron Scholes determined that, under certain rather severe restrictions, the arbitragefree price v = v(S, t; K, T ) of a European put contract satisfies a homogeneous partial differential equation (PDE) of diffusion type in time t and the value S of the underlying asset: with boundary conditions v(0, t) = Ke −r(T −t) , v(S, t) → 0 as S → ∞, and a terminal condition v(S, T ) = (K − S) + = max(K − S, 0), where σ is the (assumed constant) volatility, µ is the risk-neutral drift, and r is the short-term interest rate, which for practical purposes may be taken to be the interest rate on a 13-week US government treasury bill. The work of many authors over the years [9,19,22,23,25,26,27] makes it clear that several of the assumptions laid down by Black and Scholes are basically incompatible with modern market data. Notable among these are objections to the constancy of σ. As recently as the early nineteen eighties, assertions such as "the Black-Scholes volatility is constant" seemed to hold true, at least while the market believed it so; but then, after the crash of 1987, volatility was anything but constant, and in fact it has now even become fashionable to speak of, and invest in, the volatility of the volatility. So, at the very least one may regard the volatility as a function of S and t, σ = σ(S, t), and we assume such a structure in the sequel. Now, financial data specifying the market price v of options is readily available in quantity at various strike values K around the current price of the underlying asset (the "spot" price), and for values of the expiration T up to six months or so into the future. Given that the computer models currently used by stock analysts do not project very far into the future, one is led quite naturally to the so-called inverse volatility problem: determine a market-inspired estimate of the future volatility function σ(S, t) from a knowledge of current market prices v of options with different strikes and future expirations. In financial circles this is usually known as the calibration problem.
As we intend to concentrate here on the calibration problem using dividend-free American option price data as our main example, and it is known that under these conditions early exercise of American call options is not advantageous, in the sequel we exclusively consider American put option data. It is not difficult to extend this material to American call and put options on dividend paying stocks. Before proceeding we note that, as has been elegantly observed by Kholodnyi [12], an American put option satisfies the following semilinear Black-Scholes terminal value problem: where h is the Heaviside function. A number of approaches have been proposed to compute volatility surfaces from American option prices. A Tikhonov regularization technique requiring an initial estimate of volatility has been implemented in [7], and constrained optimization methods based upon solving the complementarity problem have been used in [11,29]; a related least squares minimization approach was used in [1]. In [20] a generalized Newton minimization is applied via a quadratic approximation to obtain the implied volatility. Related work on perpetual American options with time homogeneous volatility appears in [2] and alternatives involving so-called jump-diffusion models of the underlying assets are discussed in [5,6].
We approach this task as follows. The value v of an option contract also depends on the exercise (strike) price, K, and the expiration date, T , of the contract, so v = v(S, t; K, T ). In 1994 Bruno Dupire [8] noticed that for European options the function v(S, t; K, T ) satisfies the "dual" Black-Scholes equation (also known also as the Dupire equation) in the variables K and T : In what follows, it is convenient to denote the right hand side term in equation (5) by the function g * (S, t; K, T ), and we can thus rewrite equation (5) as the equation which we call the inhomogeneous Black-Scholes equation. We now show that for American options the price v(S, t; K, T ) satisfies a dual version of the inhomogeneous Black-Scholes equation (which we call the inhomogeneous Dupire equation) in the variables K and T where the function g(S, t; K, T ) is defined below.
If v(S, t; K, T ) satisfies the inhomogeneous Black-Scholes equation (9) with boundary condition (6) and terminal condition (7) in the variables S and t, then the function v(S, t; K, T ) also satisfies the dual equation (10) in the variables K and T , where we define g(S, t; K, T ) = L * BS v 0 .
Proof. As g * (S, t; K, T ) is bounded, by [21,Theorem 4.1] there exists a solution v 0 for the Cauchy problem (11). The function v − v 0 satisfies in the variables S and t, as well as boundary condition (6) and terminal condition (7). Since v − v 0 satisfies the Black-Scholes equation and associated conditions in S and t, by standard theory [4, §2] it must also satisfy the Dupire equation (8) in Remark 1. It is worth noting here that if one defines w = v − v 0 then we have that the American option price v may be written as Here, as w = w(S, t; K, T ) satisfies the classical Black-Scholes conditions by (13), it may be considered to be the the price of the European-style put option associated with the same underlying asset, were such an entity to exist. The function v 0 , which must be non-negative (by the non-negativity of the boundary data and the maximum principle for these parabolic equations), then takes on the rôle of the premium, the necessarily non-negative price difference between the less flexible European and the more flexible American option.
In this paper we present a new variational algorithm for computing, via our inhomogeneous Dupire equation (10), the volatility σ(K, T ) from a knowledge of American option prices at various strikes and expirations. The method used is an adaption of the variational approach involving the minimization of convex functionals used in [18] for numerical differentiation (formulated as an inverse problem), and in [16,17] for solving the inverse groundwater modeling problem. This approach has distinct advantages over traditional least squares techniques in that, as the associated functionals are convex, one typically has unique global minima and stationary points, so that one is not concerned about the steepest descent minimization processes "getting stuck" in spurious local minima. This is an enormous advantage numerically in difficult situations. In addition and for similar reasons, the method is quite adept at the simultaneous recovery of multiple coefficient functions in the differential equation, as we shall see presently.
2. Reconstruction of volatility. For simplicity of exposition we assume that there is no dividend for the underlying asset. Thus the risk-neutral drift µ in (9) and (10) is equal to the interest rate r, and the inhomogeneous Dupire equation (10) can be written as where t 0 is the fixed time at which the option data is gathered, and S 0 = S(t 0 ) is the associated "spot" price. Let T 0 < T 1 < ... < T n be expiration times, and for each expiration T i , 0 ≤ i ≤ n, let K i1 , ..., K imi be the associated strike prices. We assume that the volatility σ = σ(K, T ) and g = g(S 0 , t 0 , K, T ) are piecewise constant in T , i.e.
where λ > 0 is a parameter. For each fixed i, 1 ≤ i ≤ n, we now Laplace transform the inhomogeneous Dupire equation (14) over where the primes indicate differentiation with respect to K. On integrating the first term by parts we get and rearranging terms gives, Finally, on multiplying by the integrating factor we now have an equation in Sturm-Liouville form: where If we can recover the functions P (K), Q(K) and F (K) for each i, 1 ≤ i ≤ n, we can find the volatility σ i (K), and the g i (K), from the following formulae: We now focus attention on a variational approach to the recovery of one such trio of coefficient functions P, Q, F defined on an interval a ≤ K ≤ b. It is assumed that we are given the functions w λ (K) for K in [a, b] and all λ > 0. For positive functions p and q and a function f all defined on [a, b], let c = (p, q, f ). Define w = w λ,c (K) to be the solution to the boundary value problem Let  (28) 3. Properties of the functional G λ . The main properties of the functionals G λ are summarized in the following (d) The second Gâteaux derivative of G λ is given by and (· , · ) denotes the usual inner product in Consequently, from (32) using (b) As p and q are chosen to be positive and λ > 0, from (a) we get (b).
(c) The first Gâteaux derivative of the functional G λ is given by If we can show that the second limit above is zero then we have proved 3.1(c). Let the integral in the second term be denoted by I and note that The first term in the integral I may now be expanded as Substituting the above for ε −1 b a p(w 2 λ,c − w 2 λ,c+εh ) in I we get which approaches zero as ε → 0.
With some additional work one can show that the first and second Gâteaux derivatives of G λ are also Fréchet derivatives. As L p,λq is a positive operator on W 1,2 0 [a, b], we have from Theorem 3.1(d) that G λ (c) ≥ 0 for all c in the convex set D. By [31,Corollary 42.8] the functional G λ is therefore convex on D. We know from Theorem 3.1(b) that G λ has a global minimum (zero) at c = (p, q, f ) if and Define a convex functional G on the domain D defined by (27) with We have G(c) ≥ 0, and G(c) = 0 when c = (P, Q, F ). Moreover, when G(c) = 0 from Theorem 3.1(b) we have that w λi = w λi,c for 1 ≤ i ≤ N . From the uniqueness theorem [13,Theorem 3.5] we know that, under certain (computer-verifiable) conditions on the nature of the flows of certain associated vector fields (which amount here to an admissibility restriction on the data v(K, T )), the condition w λ = w λ,c for at least three distinct values of λ implies that c = (p, q, f ) = (P, Q, F ), so G has a unique global minimum at c = (P, Q, F ), which is also a stationary point for G. By [31, Proposition 42.6(1)] we know that if the convex functional G has a stationary point at a point (p, q, f ) then it must have a global minimum there, and from the foregoing (assuming admissible data) that stationary point must uniquely occur at (P, Q, F ). So, the desired function trio (P, Q, F ) now appears as the unique global minimum of a convex functional with a unique stationary point. In practical numerics this is an important consideration, as many (if not most) least-square type minimization methods suffer greatly from the minimization process getting stuck in spurious local minima. That this cannot happen here is one of the significant advantages of our approach.
4. The algorithm. G(c) is a nonnegative convex functional as it is the sum of nonnegative convex functionals, and it also has a unique stationary point at c = (P, Q, F ). The idea here is that by using G rather than just one of the G λ , in addition to gaining favorable uniqueness properties, we are blending additional timebased data into the inverse problem, and this is intended to improve the wellposedness of the problem. We note in passing from [14] that this inverse recovery is conditionally well-posed in the weak-L 2 sense, so from a theoretical standpoint, the recoveries are expected to be quite stable, which indeed is the case. We minimize this functional for N = 20 using the steepest descent method to recover the coefficients P (K), Q(K) and F (K). The gradients for G at the point c = (p 0 , q 0 , f 0 ) with respect to p, q and f are given by respectively. Instead of using these gradients directly, we use the corresponding Neubergergradients (see [24]) as the regular gradient has numerical problems that are extensively discussed in [18]. In particular, the L 1 -gradient with respect to q is zero on the boundary of [a,b], given that w λ and w λ,c are equal there, and consequently, under this gradient the algorithm is unable to properly recover Q. The Neuberger-gradient process smooths an L r -gradient and preserves boundary data during the descent, an important property not shared by other descent techniques. Our Neuberger-gradient g = ∇ H 1 G can be found from an L r -gradient ∇ L r G by solving the boundary value problem Below is the steepest descent algorithm used to get one descent step in p:  Find w λ (K) , w λ (K) ,α(K), and β λ (K).
Find w λ,c and w λ,c .
Find the L 1 gradient in p, ∇ L 1 ,p G(c).
Find the Neuberger gradient in p, ∇ H 1 ,p G(c).
Find α that gives lowest value of G by changing p(K) to p(K) − α∇ H 1 ,p G(c).
Find w λ,c and w λ,c .
Find the L 1 gradient in q, ∇ L 1 ,q G(c).
Find the Neuberger gradient in q, ∇ H 1 ,q G(c).
Find α that gives lowest value of G by changing q(K) to q(K) − α∇ H 1 ,q G(c).
Find w λ,c and w λ,c . The descent in q and f is similar to that for descent in p.  After computing the corresponding gradients in q and f , the q new (K) and f new (K) are given by

Spot
The specific order of descent is somewhat problem dependent, and different combinations of descents in p, q and f were tried to get the best minimization. Typically one needs more p-descent steps relative to q and f steps as the descent progresses.

5.
Results. One of the most popular American options on US exchanges is the option on Google Inc (GOOG). Put option prices on GOOG were taken from the official website of the Chicago Board Options Exchange (CBOE), for several expirations on April 19, 2013; see Figure 2. The data includes only the near-the-money options as they are the most heavily traded. Furthermore, such data is invariably in the "do not exercise" region, likely due to the following in-built buyer/seller asymmetry: for S 0 , t 0 in the "exercisable" region, if the stock goes down the put buyer can exercise immediately and make a profit, and if it goes up or sideways the buyer can wait, or even exercise, taking advantage of the trader's "time value of money" (for a particular strike K) argument that becomes a consideration only when one is the put owner. As there appear to be no compensating advantages for the put seller, such sellers are likely to be in short supply. So for any given S 0 , t 0 , when K, T is near the boundary of the do-not-exercise region the volume drops to zero, and effectively there is no data available for these values of K, T.
One can observe from the data in Figure 2 that, for fixed T , at the bottom end of the strike range the prices tend to zero, and at the top end they tend to the linear function K − S 0 . So, in our data sets we always, for fixed T , linearly extrapolate to zero on the left end of the K interval, and linearly extrapolate to K − S 0 on the right end. This modification is important as it brings the "American" nature of the data into full focus. To recover the coefficient functions P (K), Q(K) and F (K) in (18) a computer code was written in the programming language C.
We have option prices for discrete strikes and discrete expirations T i , 0 ≤ i ≤ n. We generated the function v(K, T ) by using cubic spline interpolation to form the functions v(K, T i ) for each i, and then for each K, and T in a fixed expiration The work in [15] indicates that linear interpolation is quite acceptable here, as solution surfaces generated by such parabolic equations are similar to those generated by analytic functions in that the general shape is very much determined from a knowledge of relatively few points on the surface. Due to this "rigidity" other methods such as bicubic splines also work well. The derivative v K (K, T ) was found using central differences. For 20 fixed values of λ the functions v(K, T ) and v K (K, T ) were Laplace transformed using (16) to w λ (K) and w λ (K) respectively. The functions p(K) and q(K) were initialized using (17) and (19), and the initial σ i was chosen to be the implied volatility. The function f (K) was initialized to be the zero function. Observe that g(S 0 , t 0 ; K, T ) = 0 for all K, T in the original data range (760 ≤ K ≤ 850 and 7 ≤ T ≤ 153) in (10), which is consistent with an American option and a spot price S 0 in the "do not exercise" region above the free boundary [30, p. 108]. In later  iterations we enforced, after each iteration step, the condition f (S 0 , t 0 ; K, T ) = 0 for all K, T in the data range 760 ≤ K ≤ 850 and 7 ≤ T ≤ 153, as we know beforehand that the option should not be exercised with some time value left in the option. We performed a series of descents in p, q, f , using the aforementioned steepest descent algorithm with Neuberger gradients, until the functional could not be minimized any further. Specifically, descents were performed in the order qf f qqf ppppppf at which point the functional could not be lowered any further by descent in any of p, q and f ; the descent is illustrated graphically in Figure 4. The line minimization of G(c) in α was done using the one-variable Brent minimization technique, by adapting the brent() function code in Numerical Recipes in C [28]. To avoid possible catastrophic cancellation in the Simpson rule formula used in the calculation of the integrals in the formula (28) for the functional G λ , we used the formula (29) instead.
As can be seen in the recovery of the function g(S 0 , t 0 , K, T ) for the maturity interval 14 ≤ T ≤ 21 illustrated in Figures 4 and 5, the recovery settled down after about the fourth iteration.
After running the code we recovered the functions P (K), Q(K) and F (K). From (23) we calculated the volatility σ i for each value of i, and created volatility surface σ(K, T ) from (15) using the curve fitting toolbox in MATLAB. From the volatility surface in Figure 3 one can observe the volatility smile and term structure. To check the accuracy of the method, we priced this option by substituting our recovered volatility σ(K, T ) into the homogeneous (so as to be consistent with pricing data from an American option in the "do not exercise" region as mentioned above) Dupire equation (8) and solving with MATLAB's PDE Toolbox, using the known prices for boundary data. We compared these prices with the actual option prices at six different expirations as shown in Figure 6. As can be seen, the fit is excellent. Similar results were also observed using American put option price data from Facebook, Yahoo, and Baidu at various times.
6. Conclusion. We have shown that volatility can be recovered from published American option prices using steepest descent minimization of a convex functional. This provides a "market view" of future volatility which in principle can be used to price and trade options more effectively.