Data driven recovery of local volatility surfaces

This paper examines issues of data completion and location uncertainty, popular in many practical PDE-based inverse problems, in the context of option calibration via recovery of local volatility surfaces. While real data is usually more accessible for this application than for many others, the data is often given only at a restricted set of locations. We show that attempts to"complete missing data"by approximation or interpolation, proposed in the literature, may produce results that are inferior to treating the data as scarce. Furthermore, model uncertainties may arise which translate to uncertainty in data locations, and we show how a model-based adjustment of the asset price may prove advantageous in such situations. We further compare a carefully calibrated Tikhonov-type regularization approach against a similarly adapted EnKF method, in an attempt to fine-tune the data assimilation process. The EnKF method offers reassurance as a different method for assessing the solution in a problem where information about the true solution is difficult to come by. However, additional advantage in the latter approach turns out to be limited in our context.

1. In cases where the data is scarce, in the sense that it is observed only at a small set of locations compared to the size of a reasonable discretization mesh of a physical domain, there would be many models (solutions to the inverse problem) that explain the data (e.g., [16]). It is then tempting to "complete" the data by some interpolation or other approximation, whereupon the role of an ensuing regularization as a prior is less crucial. 2. There may be a hidden uncertainty in the locations of data, not only in data values (e.g., [19,15]). For instance, engineers often prefer to see data given at regular mesh nodes, so a quiet constant interpolation, moving data items to the nearest cell vertex, is common practice. 3. Data completion may be necessary to obtain a more efficient algorithm [28,25]. 4. A quiet data completion is often assumed by mathematicians in order to enable building theory for inverse problems. This includes assumptions of available data on continuous boundary segments [13], or of observed (measured) relationships between unknown functions that are presumed to hold everywhere in a physical domain. 5. There are situations where some form of data completion and other manipulation is necessary because no one knows how to solve the problem otherwise [25].
These observations raise the following questions: (i) when (and in what sense) is it practically acceptable to perform such data manipulations? (ii) in which circumstances can one gain advantage by treating the observed data more carefully? and (iii) how should one assess correctness of a solution that has been obtained with such manipulated data? Our general observation is that researchers occasionally, but not always, seem to get away with such "crimes", in the sense of producing agreeable results. For instance, in [28] the authors obtained agreeable reconstructions so long as the percentage of completed data did not exceed about 50%, but not more. Such empirical evidence is relatively rare in the literature, however, and it depends on the problem at hand. More insight is therefore required, and such may be gained by considering applied case studies.
In this article we focus on a model calibration problem in a setting that features both scarce data and uncertainty regarding data location. Further, it allows us to work with real rather than synthetic data, often available through the internet. This problem, which has had tremendous impact in mathematical finance, concerns the determination of the so-called local volatility surface, making use of derivative prices. A good model for the volatility is crucial for many applications ranging from risk management to hedging and pricing of financial instruments.
The classical Black-Scholes-Merton model had subsumed a constant volatility model σ [6] in a simplifying stochastic dynamics for the underlying process [24]. However, the constant volatility assumption was quickly contradicted by the actual derivative prices observed in the market. The disagreement between the Black-Scholes model-implied prices for different expiration dates and negotiated strike prices became known as the smile effect. A number of practical parametric as well as nonparametric models have been proposed in this context; see [14] and references therein. The parametric ones try to fulfill different phenomenological features of the observed prices. Yet, in a ground breaking paper Dupire [11] proposed the use of a function σ that depends on time and the price at that time. For the case of the European call contracts, he replaced the Black-Scholes equation by a PDE of the form with initial and boundary conditions (for calls) given by where τ is time to maturity, K is the strike price, and C = C(τ, K) is the value of the European call option with expiration date T = τ . The parameter S 0 is the asset's price at a given date. One ensuing complication in using (1), however, is in the calibration of this model, by which we mean finding a plausible volatility surface σ(τ, K) that matches, or explains, given market data on call option values. The task here is significantly more challenging than in the case where σ is a constant. This paper deals with the computational challenges that this inverse, or inference problem gives rise to, focussing in particular on the treatment of scarce data and uncertainty in data locations. We show that, contrary to a popular approach in the literature, avoiding data completion is the way to go here. Further, taking data location uncertainty into account, rather than simply ignoring it, improves reconstruction quality.
The forward problem involves finding the values of C satisfying the differential problem (1)-(2) for given σ(τ, K) and S 0 , evaluated at the points (τ, K) where data values are available. A major difficulty here is that the data are scarce. To explain what we mean by this, suppose we have discretized the PDE using, say, the Crank-Nicolson method on a rectangular mesh that is reasonable in the sense that the essence of the differential solution is retained in the discrete solution. Then the data is scarce in that the number M of degrees of freedom in the discrete C typically far exceeds the number of given data values l: l M . Moreover, the available data in some typical situations are given at locations that are far from the boundaries of the (truncated) domain on which the approximated PDE problem is solved. See Figures 1 and 5 below for examples of such (real) data sets. Now, if the local volatility surface is discretized, or injected, on the same mesh as that of the forward problem, then there are roughly M degrees of freedom in σ, which is again potentially far larger than the number of data constraints. We can of course discretize σ on a coarser sub-mesh (which in the extreme case would have only one point, thus leading back to a constant volatility), or parameterize the surface in a more involved manner; see [18,17,1,12,7,2,10,4] and references therein for further detail. Here, however, we stick to a straightforward nodal representation of this surface on the full C-mesh in the hope of retaining flexibility and detail, while avoiding artifacts that may arise from restrictive simplifying assumptions. This approach has worked well in geophysical exploration problems [16], among others.
Thus, the problem of finding a volatility surface that explains the data is often significantly under-constrained in practice. This does not make it easy to solve, however, as the ultimate goal is to obtain plausible volatility surfaces that can be worked with, and not just to match data. Our task is therefore to assimilate the data information with the information contained in the PDE model (1), using any plausible a priori information as a prior in the assimilation process. Such a priori information can vary significantly, addressing concerns of adherence to the financial model, relative smoothness of the volatility surface, and numerical stability issues, among others.
One approach that has been relatively popular in financial circles is to apply to the data special interpolation/extrapolation methods that take into account the a priori information of the financial model (e.g., [23]). This is used to obtain data values at all points of the rectangular mesh on which C is defined, and subsequently the "new data" is assimilated with the information that the solutions of the Dupire equation for different σ's yield to calibrate (1). An advantage with this data completion approach is that the data is no longer scarce when we get to the redefined inverse problem. This allows for developing some existence and uniqueness theory as well; see [9,1], and references therein. However, a disadvantage is that such data completion constitutes a "statistical crime", as the errors in the new data may no longer be considered as independent random variables, see [28]. In fact, we get two solutions C that are in a sense competing rather than completing one another, since the one satisfying (1), even for the "best" σ, does not necessarily satisfy the data interpolation conditions and vice versa.
In Sections 2 and 5 we therefore examine the performance of this data completion approach against that of a scarce data approach that is based on a carefully tuned Tikhonov-type regularization. We have verified the robustness of our regularization operator by applying also variants of EnKF-like algorithms [22,20,8], further described below, obtaining similar results. Using both synthetic and real data sets, we show that the scarce data approach can give better and more reliable results; in our reported experiments this has happened especially for the real data applications.
We then continue with the scarce data approach. The maximum a posteriori (MAP) functional considered in Section 2 is based on the statistical assumption that the data error covariance matrix is a scalar multiple of the identity. In Section 3 we subsequently consider an algorithm, based on an approach recently proposed in [20] and [22], where we attempt to learn more about the error covariance matrix as the iterative process progresses, using ensemble Kalman filter (EnKF) techniques. Although our problem is time-dependent, the time variable here does not really differ from the other independent variable in the usual sense. In particular, the unknown surface σ depends on both K and τ , unlike for instance the material functions in [20,8,16], which are independent of time. Thus, the EnKF-like methods considered use an artificial time [5]. In Section 3 we find that the EnKF algorithm can be improved in our context by adding smoothing prior penalties, just like in Section 2. The probabilistic setup, although general, is not fully effective as a substitute for prior knowledge that is available in no uncertain terms.
The problem setting used in Sections 2 and 3 regards the asset price S 0 as a known parameter. However, in practice there is uncertainty in this parameter. In fact, we have an observed value which is in the best case an average over a day of trading, so S 0 should be treated as an unknown with an observed mean value and a variance that is relatively easy to estimate. This in turn affects the calibration problem and its solution process. Section 4 deals with this additional complication, which translates into uncertainty in the data locations of a transformed formulation for (1).
In Section 5 we collect our numerical tests, addressing and assessing the various aspects of the methods desribed earlier. We use synthetic data in Section 5.1 to show the advantage in applying the method of Section 4 for problems with uncertainty in the price S 0 . In Section 5.2 we use market equity data to fine-tune our regularization functional, as well as to compare Tikhonov-type regularization vs the modified artificial time EnKF. In Section 5.3 we use oil and gas commodity market data to further investigate data completion approaches, showing that the scarce data approach is superior. Conclusions are offered in Section 6.
2. Two approaches for handling scarce data. Below we assume that the parameter S 0 is given. 1 This assumption will be modified in Section 4. We then apply a standard transformation changing the independent variable K to the so-called log moneyness variable y = log(K/S 0 ). This is followed by changing the dependent variables of the forward and inverse problems to u(τ, y) = C(τ, S 0 exp(y)) and a(τ, y) = 1 2 σ(τ, K(y)) 2 , respectively. We obtain the dimension-less parabolic PDE with no unbounded coefficients subject to the side conditions We can write (3)-(4) as L(a)u = q, with the linear differential operator L depending on a and operating on u and with the right hand side q = q(S 0 ) given. Thus, the forward problem involves finding u satisfying this parabolic linear differential problem for a given local variance surface a and price S 0 .
To find a numerical solution for L(a)u = q we first approximate the domain in y by a finite interval, restricting l y ≤ y ≤ r y for two real values satisfying l y < 0 < r y . The boundary conditions (4b) and (4c) are then required to hold at r y and l y , respectively. Next, we discretize the PDE problem on a mesh with a fixed step ∆τ in τ and a fixed step ∆y in y. Denote by u i,j the approximation of u(i∆τ, l y + j∆y) and by a i,j the injection of the surface a at (i∆τ, l y + j∆y), i = 1, . . . , M τ , j = 0, 1, . . . , M y +1, where (M y +1)∆ y = r y −l y , M τ ∆τ = T . Then, using the Crank-Nicolson method [1], we have the difference relations An obvious treatment of the initial and (Dirichlet) boundary conditions closes this system of M = M τ M y equations that are linear for the variables u i,j . The mesh function u h for the approximation u can be conveniently reshaped (say, ordered by column) into a vector u h ∈ M , retaining the same notation without confusion.
Similarly, we obtain the mesh function a h as an injection of a(τ, y), reshaped into a vector if need be. Then we can write (5) as where L h is a sparse, nonsingular M × M matrix and q h is the mesh injection of q.
The inverse problem is to find a volatility surface σ, approximated through a h , that explains given observed data d ∈ l . These data values approximate u at l locations in the rectangular domain on which the problem (3)-(4) is defined. Typically, these locations are far from the boundaries and l M ; see Figures 1 and 5. Thus, the data set is sparse, or scarce, and the l × M matrix P which maps grid locations for u h to those of d, using bilinear interpolation as necessary, has many more columns than rows. The forward operator, which predicts the data for a given a h , is the matrix-vector product (or projection) P u h . The inverse problem is to find a plausible a h for which the predicted and observed data are sufficiently close, as described below. Note that the approximate solution of the inverse problem must be positive at all mesh points. This positivity constraint turns out to hold automatically in all our reported calculations (i.e., it is not an active constraint in the encountered optimization problems).

2.1.
Regularizing the inverse problem. If the data d has Gaussian noise ∼ N (0, Γ), where Γ is a symmetric positive definite (SPD) error covariance matrix, then the maximum likelihood (ML) data misfit function is where for an SPD matrix C we define the vector energy norm x C = √ x T Cx. For instance, if Γ = α −1 0 I, α 0 > 0, then the discrepancy principle (see, e.g., [29,13]) yields the stopping criterion (i.e., we should find a h to reduce φ until) where l is the number of data points.
However, there are in general many surface meshes a h that would satisfy the conditions (8) (i.e., explain the data). We therefore introduce a regularization operator R(a h ) which is a prior, in the sense that it represents prior knowledge or belief about our sought surface. We then take steps to minimize the MAP merit function Here we introduce the Tikhonov-type penalty function where α 1 , α 2 , α 3 ≥ 0 are parameters to be determined.
The terms involving α 2 and α 3 correspond to smoothness of a(τ, y) in τ and y, respectively. Thus, the values a i,j that are elements of the 2D mesh function a h should not fluctuate randomly, as they ought to form a reasonably smooth surface. So, we insist on selecting α 2 , α 3 > 0, and will modify a given EnKF algorithm accordingly as well. This penalizes lack of smoothness (or, roughness) in the reconstructed local variance surface. Actual values for these parameters, determined by experimentation, are reported in Section 5. 2 Next, the term governed by the weight α 1 penalizes the distance from our sought surface to an a priori function a 0 that we take to be a constant (not knowing better). A reasonable value for this constant can often be estimated based on the risk category of the asset under consideration. The importance of having this term is that without it the prior may tend to favour surface flatness, whether or not this is realistic. But for the case of scarce data there could be too much freedom in seeking merely a relatively flat surface which still explains the data. Setting α 1 = 0 might therefore cause the solution process to behave unstably, as demonstrated further in Section 5.
Having determined values α i , i = 1, 2, 3 for the ensuing experiments the next question is choosing the relative weight of the data misfit function and the prior, which amounts to determining a value for α 0 in the expression Γ = α −1 0 I. For the sake of completion we mention that for this purpose we have used the well-known L-curve method [29,13]. Specifically, we applied this method in order to find the optimal α 0 with the values of α 1 , α 2 and α 3 fixed.

2.2.
The data completion approach. Another approach to deal with the scarce data is to first interpolate/extrapolate the observed data to all M locations of the u-mesh. This can be done using the Kahale algorithm [23], which represents a prior that reflects financial considerations (namely, maintaining the smile, and more). The inverse problem is subsequently defined on the thus-enriched data set, the matrix P in (7) becoming the identity, and the Tikhonov-type regularization previously described is applied for its solution.
This data completion subsequently allows for the development of a more solid theory for the solution of the corresponding regularized inverse problem.
A potential difficulty with this approach, however, is that the ensuing matching of the interpolated "data values" to a field u h that approximately satisfies the PDE problem (3)-(4) is applied to data that can no longer be considered to have independent, random noise. The interpolation/extrapolation operator and the PDE discrete Green's function operator could be in conflict. In Sections 5.2 and 5. 3 we describe examples using real data sets from different markets which compare data completion using two different techniques to the scarce data approach. The obtained plots in Figures 8,9,11 and 12 clearly demonstrate that the data completion approach can give inferior results. In particular, the reconstructions using data completion often agree neither with other curves nor with intuition.
3. Artificial time EnKF-type methods. The homotopy, or continuation, approach of embedding a given problem in a larger one with artificial time, subsequently defining an iterative method by "advancing" in the artificial time, is old; see references in [5]. In our context there have been recent efforts that use such an iterative process to also adaptively learn and improve knowledge of the error covariance [20,8]. A Kalman filter setting is obtained as in [20] by defining for the ML data misfit function φ(a h ) of (7) the augmented state vector together with the artificial dynamics (or prediction)â  , with H = 0 I . Further, the ensemble Kalman filter (EnKF) approach applies Monte Carlo approximations in order to obtain cheap estimates for the error covariance matrices that appear in the Kalman filter method.
It is well-known that Kalman smoothing is equivalent to the solution of the corresponding weighted least squares problem, while the Kalman filter end result agrees with that of the Kalman smoother. However, what is not taken directly into account in [20,8] are the additional regularization terms in the MAP merit function (9). In fact, we generalize (10) by considering where L τ and L y are the scaled discrete sums that multiply α 2 and α 3 there, respectively. The matrices D, D τ and D y are corresponding error covariance matrices, with D as yet unknown and to be determined in the EnKF process.
Next, we modify the data misfit part in (12) using the state augmentation approach (11). For this we redefine the matrices L τ and L y by The matrix D is also modified properly, and we set .
We can then rewrite (12) for the augmented variableâ h of (11) as Since (13) is just an enlarged weighted least squares problem, it also corresponds to a Kalman smoother/filter process, and we subsequently build an approximation for it as in [22] by defining and solving a "three-stage" EnKF.
The details are somewhat tedious but straightforward, so we have gathered them in Appendix A. We are led to the following algorithm, where we letâ (n,j) h denote the augmented state vector of the j-th sample in the n-th iteration.
(ii) Define sample mean and covariance matrix as where d and check for convergence. Assuming that this algorithm stops after N iterations, it requires (N + 1)J solutions of the forward problem. 4. Uncertainty in the asset price. In a typical image processing application of deblurring, denoising or inpainting, data values are prescribed at pixels, so the measurement locations are known. However, when denoising a point cloud or a surface mesh, for instance, there is no such distinction between a datum value and location: the unknowns are nodal mesh points in 3D, and as such they live in a higher dimensional space (namely, 3 rather than 1). This difference affects portability of image processing algorithms to similar problems on surfaces [19].
A more subtle instance of location uncertainty arises in our volatility surface context. The observed value for S 0 is actually an average of a day's trading (say), and as such contains uncertainty whose variance can fortunately be directly estimated. In this section we thus make S 0 an unknown that can be adjusted but should not stray too much from a measured (and thus observed) average value S 0 .
This affects the regularization prior, which now depends also on S 0 , so R = R(a h , S 0 ), and has a term added to (10): The additional parameter α 5 is determined by the daily price variance. Furthermore, in the dimension-less form (3) of the Dupire PDE problem, which to recall allows bounded PDE coefficients and uniform discretization step sizes in (τ, y), the independent log moneyness variable y = log(K/S 0 ) now contains uncertainty as well! We therefore update also the misfit function (7) to read (1 − exp(y j (S 0 ))) + − (1 − exp(y j ( S 0 ))) + 2 .
The parameter α 4 , like α 5 , is determined from the variance in S 0 . In fact, in our experiments we have found that it is safe to set α 5 = 0 and control the uncertainty penalty through α 4 alone. Note that in the data projection matrix, P = P (S 0 ). Our optimization problem replacing (9) is now To solve the extended optimization problem (16) we apply a splitting method, alternately minimizing (16) for a h and for S 0 . When S 0 is held fixed, the problem returns to that considered in Sections 2 and 3. When a h is frozen in turn, the remaining minimization problem for S 0 is scalar and causes no difficulty. Furthermore, fortunately the coupling between these variables is weak, so fast convergence of this splitting method is observed in all our experiments.

5.
Testing our methods on real and synthetic data. In this section we present a number of tests that were performed in order to illustrate the points made in the previous sections, as well as to compare the different methodologies. Specifically, we compare results related to the introduction of uncertainty in the value of the reference price S 0 (see Section 4), the effect of data completion as summarized in Section 2.2, the effect of choosing between the EnKF and the Tikhonov-type regularization approach, and the introduction of a penalty associated to the mean value of the local variance surface a h .
Financial markets provide a large quantity of historical data from equity and commodity prices together with their corresponding derivative prices. This yields a tremendous laboratory for investigating the local volatility surface reconstruction assuming the model of (1)- (2). In everyday practice one is given, for a certain date and current underlying value S 0 , various quoted option prices with future strikes and maturity times. Our task is then to infer the local volatility surface from such data. Yet, the quality of the data, as far as our models are concerned, depends highly on the number of contracts being traded. In this context, we shall often employ the terminology of liquid contracts to those that correspond to a substantial volume of negotiations, whereby our models are expected to provide good results.
The computations for all the examples described in this paper were performed using garden-variety personal computers, with typical runtimes clocking within a few minutes. Thus, being concerned in this work with addressing several fundamental questions, no special effort was made to optimize the runtime performance of our codes.
The basic setting of our numerical experiments consists of solving the inverse problem described in Section 1, first using synthetic data (tainted by multiplicative Gaussian noise) and then using real data. In the synthetic data examples, we first assume a ground truth local variance surface a true on a fine grid, from which we solve the discretized PDE for u h as described in Section 2. From that we sample data on a coarser mesh (so as to avoid the so-called inverse crime) and then subject the data to noise. In the real data examples we selected publicly available option data from the NY stock exchange. The reconstructions are performed by using the techniques put forth in Sections 2 and 3.

5.1.
Results for uncertain S 0 using synthetic data. In order to test our techniques in situations where we have full control of the unknown volatility surface, we postulate a local volatility surface with known form. We concentrate on the issue of underlying price uncertainty, and as we shall see, this can be well addressed by the method discussed in Section 4. The uncertainty in S 0 appears in practice due to possible mismatches among option and underlying price recordings as well as to bid and ask spreads on such observations. 5.1.1. First experiment with underlying price uncertainty. We start by experimenting with the uncertainty about the value of S 0 for a ground truth volatility given by (17) σ(τ, y) = From (17) we produced call option prices by solving the PDE system (3)-(4) discretized over a mesh with step sizes ∆τ = 0.005 and ∆y = 0.025. The maximum time to maturity is τ max = 0.5 years, and the minimum and maximum logmoneyness strikes are y min = −5 and y max = 5, respectively. We also chose, for simplicity, the true underlying asset price as S 0 = 1.0 and the risk-free interest rate as r = 0. The computed call prices u i,j were then polluted by a relative noise, with η i,j drawn from the standard normal distribution N (0, 1). The data is then composed on a coarse mesh by the noisy prices u noisy  As mentioned in Section 4, the minimization of the Tikhonov-type functional (16) is achieved by alternating minimizations, namely, the minimization w.r.t. a h , which is performed by a gradient descent method as in [3], and w.r.t. S 0 , which is performed with the Matlab function lsqnonlin. In both stages, the mesh width used are ∆τ = 0.01 and ∆y = 0.05, the minimum and maximum log-moneyness values are taken as −5 and 5, respectively, and the maximum time to maturity is 0.5.
We started the algorithm with a 0 h ≡ 0.45 2 /2 and S 0 0 =Ŝ 0 = 0.95. During the minimization w.r.t. a h , we took the parameters in the penalty function as α 0 = 10 5 , α 1 = 10 2 , α 2 = 10 4 , α 3 = 1, and α 4 = α 5 = 0, whereas in the minimization w.r.t. S 0 , we used α 1 = α 2 = α 3 = α 5 = 0, and α 4 = 10 5 . The a priori surface was taken as a 0 ≡ 0.45 2 /2. After 8 iterations, the resulting underlying asset price was 0.999, and the normalized 2 -distance between the true and the reconstructed local volatility surfaces was 0.13. At the beginning, with S 0 = 0.95, the normalized distance was 5.55. Figure 2 compares between the reconstructed local volatility surface and the ground truth at each maturity. With the same scarce data set, as the underlying asset price was adjusted, we found a local volatility surface much closer to the original one. Table 1 presents the evolution of the normalized 2 -distance between the true and the reconstructed local volatility surfaces, illustrating the latter observation.  (17) is used for the volatility surface, except for a little bit of smoothing: see the bottom right subfigure in Figure 3. However, we make sure to employ parameters that are close to the ones generally encountered with real data, and we use data that follows a practical grid. The parameters for this example are given in Table 2. Table 2. Parameters for the example of Figure 3.
S 0 initial spot price 2500 S true optimal spot price 2200 r interest rate 0.25% the maximum maturity 1.8 Minimum y -3.5 Maximum y 3.5 ∆τ 0.1 ∆y 0.1 a priori surface a 0 0.4 2 /2 Following the same algorithm as before, we obtain that S 0 approximates S true well, and furthermore, the local variance a h approximates a true on a coarse grid. This is illustrated in Figures 3 and 4. Discussion of the synthetic data results. The experiment with synthetic data indicates that including the underlying stock price as one of the unknowns (which corresponds to handling data location uncertainty in y) leads to better results when there is uncertainty about such value. Indeed, Table 1 and Figures 2 and 3 show that the normalized distance between the reconstructions and the true surface decreases considerably when we combine local volatility calibration with the adjustment of the underlying asset price, upon using the algorithm presented in Section 4. In the first experiment, the distance between the initial guess price S 0 and the (true) price S true was relatively small, whereas in the second one, the two prices were  significantly apart. In both cases, we can see that after only a few iterations the ground-truth price was well-approximated.

5.2.
Results for real data from equity markets. In this subsection and the next one we consider real data from financial markets. In our experiments we chose options on the Standard and Poors (SPX) index. 3 Figure 5 depicts the locations at which our data set is given. Such options are fairly liquid ones and thus amenable to the models introduced in Section 1. The data were collected on 19-Jun-2015 and contain prices for 9 different maturities ranging all the way from one day to two years. The parameters for all the models are given in Table 3. Note that the optimal spot price in the table refers to the optimized spot price using the method described in Section 4. The parameters α 0 , α 1 , α 2 and α 3 in the penalty functional (10) or (12) used in this experiment can be found in Table 4.  Figure 6 displays the reconstructed SPX local volatility surfaces at different maturities obtained with three method variants using the original scarce data. Note that the results generated by the Tikhonov-type method (blue) with a 0 penalty and EnKF (black) are closer, whereas the results without the a 0 penalty (green) differ inexplicably. Figure 6. Reconstructed SPX local volatility surfaces at different maturities obtained with three method variants using scarce data.
In Figure 7 we consider SPX local volatility surfaces at different maturities obtained with the methods of Sections 2 and 3. They were computed using completed data as in [23]. Note that the three method variants produce similar results around y = 0. (This is called at-the-money.) For |y| > 0.5 (the so-called in-the-money and out-of-the-money regions), if we do not add the a 0 penalty, the two wings blow up.    Figure 9 is a zoom-in of Figure 8 to the region around y = 0. In this region one expects a larger number of liquid contracts. Observe that the plotted curves are generally divided into two groups: one is obtained from the original (scarce) data and the other from the completed data. This phenomenon is clearer in the figures for the earliest and latest dates T = 25 and T = 332. Figure 9. Reconstructed SPX local volatility surfaces obtained with six method variants for different maturities in the at-themoney (y = 0) neighbourhood.
Taken together, Figures 8 and 9 again demonstrate the superiority of the scarce data approach, as well as the regularization methods that employ α 1 > 0.
One concept that is prevalent in practical applications in the so-called implied volatility. It is defined as the volatility that would be observed for a standard call (or put) contract to give the observed price if the classical Black-Scholes formula were used. In other words, it is the constant volatility that when plugged into (1) -all else being equal -would yield the corresponding quoted price. In Figure 10 we see that the reconstructions of implied volatility from local volatility do better for intermediate term and long term maturities. However, when the maturities are short, the results split into two groups according to whether we use the original scarce data or the completed data.
It is important to emphasize that the red curves in Figure 10 represent implied volatility, not the prices (i.e., the red curves are not simple "observations" to be fitted). In financial markets, a misfit such as is displayed here for the short maturities may or may not be acceptable depending on issues such as bid and ask spread as well as risk premia and absence of arbitrage opportunities. Moreover, the simple fact that a problem is under-determined does not mean that we should be able to fit it well with a given model such as (1). In fact, even if such fitting were possible in a different (e.g., interpolation) sense, the prices obtained could be subject to arbitrage opportunities, and here indeed the practical use would be excluded. (There is incidentally a well-known and somewhat similar effect when over-fitting learning examples in machine learning.) In fact, if only short term contracts are of interest, then a better agreement among the curves in the first two subplots of Figure 10 can Figure 10. Implied (Black-Scholes) volatility corresponding to the local volatility surfaces, obtained with the six method variants (Tikhonov, EnKF and "no a 0 " applied to real and completed data) and compared to the market one.
be obtained by simply dropping the data columns related to long term contracts! This is what some practitioners do in such a case, and we have verified that it works to significantly improve agreement among the curves, although not necessarily our state of knowledge about the actual financial implications.
In order to assess the misfit between our reconstructed results and the practical implied volatility in our methods, we make use of three different figures of merit. Note that our inverse problem solution process is not directly trying to minimize such quantities, and thus they should be only taken as an additional quality control quantity. Let I L i,j (I ba i,j ) denote the implied volatility corresponding to the reconstructed local volatility (from the average of bid and ask option prices) with strike K i and maturity τ j . In the SPX example, we restrict the strikes to be between 1890 and 2500. Further, V i,j is the volume of the option price with strike K i and maturity τ j , and N vol is the number of contracts that have a nonzero volume. We The resulting values are presented in Table 6. Discussion of the equity data results. From the above experiment we conclude that using real or completed data sets we get quite different results. Within the completed data set, if we discard the a 0 penalty, the two wings of the local volatility surface are not stable, for both methods of Sections 2 and 3. This is apparent in all the figures as well as Table 5. From the results involving reconstruction of the implied volatility as shown in Table 6, the Tikhonov-type method using the original (scarce) data has the best residuals in all three measures, with the EnKF method a close second.

5.3.
Results for real data from commodities and data completion. Commodities have been traded extensively in different markets throughout the world for centuries. In many of those markets, a number of liquid options on such assets are also traded. Here again, data from such markets are freely available, and modelling such data is very relevant for financial analysts and risk management applications.
In In what follows, by residual we mean the merit function (7) with Γ = α −1 0 I, α 0 > 0. The penalty function (10) was used with α 1 = α 2 = α 3 . A gradient descent method was applied in the minimization of the resulting Tikhonov-type regularization functional. The parameter α 0 was chosen by a discrepancy-based method inspired by the classical Morozov principle. See [3] for further details. Figure  11. Reconstructed local volatility for different maturity dates for Henry Hub call option prices, comparing between completed data (green line with pentagram) and scarce data (blue line) results. Figure  12. Reconstructed local volatility for different maturity dates for WTI call option prices, comparing between completed data (green line with pentagram) and scarce data (blue line) results.
To complete the data (when this alternative was used in this subsection) we applied linear interpolation, taking into account the boundary and initial conditions in (4). The boundary conditions were applied at y = ±5.
When using completed data, we evaluated the local volatility only in the maturity times given in the original data, and interpolated it linearly in time, at each step in the minimization. Also, whenever |y| > 0.5 was encountered, we set a(τ, y) = a(τ, 0.5 * sign(y)). When using the given scarce data, at each iteration, for each maturity time τ in the data set, we interpolated a(τ, ·) linearly in the intervals [−5, y min ), and (y max , 5], assuming that a(τ, −5) = max{0.08, a(τ, y min )} and a(τ, 5) = max{0.08, a(τ, y max )}, where y min and y max are the minimum and maximum log-moneyness strikes in the data set corresponding to τ . Figures 11 and 12 display reconstructed local volatility surfaces for the different maturities, comparing between using the given data and the completed data. Figures 13 and 14 present a comparison between the implied volatilities of both methods and the market ones, in order to assess how accurate the reconstructions are. One of the main advantages of the local volatility model is the capability of fitting the market implied smile, which has an important relationship with market risk. The implied volatilities were evaluated using the Matlab function blsimpv, and we used the interest rate as the dividend yield.  In all of these experiments, we have used the mesh widths ∆τ = 1/365 and ∆y = 0.05, the annualized risk-free interest rate was taken as 0.25%, and b = 0 in (3), since futures have no drift. Table 7 displays the parameters obtained in the tests of local volatility calibration with Henry Hub and WTI call prices, with scarce and completed data. In this table, by residual, we mean the 2 -distance between the evaluated quantity and the data, normalized by the 2 -norm of the data. Discussion of the real commodity data results. Observing the market implied volatilities and the implied volatilities obtained with both methods in Figures 13  and 14, the results with scarce data present a much better smile adherence than when using completed data. So, completing the data can be seen as an unnecessary introduction of noise or inconsistency. It can be better noticed when observing the implied volatilities at deep in-the-money (y < −0.1) and deep out-of-the-money (y > 0.1) strikes. In these regions, for almost all cases, the results with scarce data practically matched the implied volatility, whereas with data completion, the resulting implied volatilities presented higher values. For financial market practitioners, higher implied volatilities can be translated to higher risk. So, using data completion could lead investors to be more conservative than necessary.
6. Conclusions. The questions of how to treat observed data in order to produce agreeable solutions, and relatedly, how much to trust the quality of a given data set collected by another agent, are prevalent in many nontrivial applied inverse problems. Standard theory appears to have little to contribute to their satisfactory resolution, and more carefully assembled experience is required. In this article we have highlighted some of the issues involved through an important application in mathematical finance, and we have proposed methods that improve on techniques available in the open literature. The problem of constructing a local volatility surface has similarities with some applications in areas such as geophysics and medical imaging, in that it boils down to the calibration of a diffusive PDE problem, reconstructing a distributed parameter function, i.e., a surface, rather than a few unrelated parameter values. The difficulty of dealing with scarce data, which highlights the need for a careful practical selection of a prior, is common, as is the uncertainty in data location, although the latter appears here as dependence only on a scalar additional variable.
The finance problem considered here is distinguished from its more physical comrades in two important aspects. One is the availability of real data: we have experimented here with three different real applications, while most papers appearing in the applied mathematics literature never get to deal with real data at all. The other aspect is the difficulty of assessing the resulting recovered volatilities: there is no physical solution here, data sets are changing daily, and experience rules. In the present setting we had to determine the coefficients of the regularization operators, for instance α i in (8), (10) and (15), by trial and error.
We remark that we are not proposing to get rid of synthetic data altogether: see, in particular, Section 5.1. But we are encouraging care. In fact, in between the two options of real sparse data and synthetic data everywhere there is the option of using synthetic data at the locations where real data are available. This option is employed routinely in exploration geophysics research, for instance. In general practice, though, synthetic results still tend to "look better" compared to those for real data.
Of particular interest is the regularization term involving a 0 and α 1 . This term is a penalty on the sought local volatility surface for straying away from a given constant, a 0 , which in turn is estimated based on the type of asset under consideration. The question whether or not this should be done is a deep one and touches upon the very foundations of the model under consideration. It also has a practical unfolding, since the possibility that the volatility, as a function of the asset price, grows at a sufficiently fast rate may be connected (at least in some similar models) to the presence of market bubbles [21].
The EnKF method considered in Section 3 is an adaptation of one of several methods proposed in the literature [20,27], improved by adding smoothing penalty terms. We have also experimented with a similar adaptation of the method in [8]. In both cases the results are not consistently better than those obtained by the Tikhonov-type method of Section 2, which in hindsight is not surprising. However, since such methods are currently very much in vogue (especially in our target application area) we feel that our results in this sense are important: one must have data of sufficient quality for such methods to dominate.
Appendix A. EnKF details. The regularized weighted least-squares problem (13) has the solutioñ However, D is unknown. To address this we apply EnKF and replace D by the covariance matrix computed from generated samples. We also replace L τâ0 and L yâ0 in (13) by r τ and r y , sampled from N (L τâ0 , D τ ) and N (L yâ0 , D y ), respectively. In this way, r τ and r y can be viewed as two observations like d. Next we explain the calculations in one iteration of the prediction and analysis steps that appear in the EnKF algorithm stated at the end of Section 3.
Having generated {a (0,j) h } J j=1 as the initial ensemble, we calculate P u h (a This is the first prediction step (i.e., we set n = 0 in the stated algorithm). The sample mean and covariance matrix are then given by We now calculate the Kalman gain three times to obtain where W 1 = U L T y (L y U L T y + D y ) −1 . We continue with the same procedure to calculate U . Let V −1 = D −1 1 + H T Γ −1 H and W 2 = V L T τ (L τ V L T τ + D τ ) −1 . Then we have Hence, defining also W 3 =

Now (19) reads
Reversing this procedure and adjusting notation, we obtain the analysis step 2(b) of the algorithm in Section 3 for n = 0.
In each iteration of the EnKF algorithm, the prediction step is used to map the samples into the data space. The analysis step then calculates the "distance" between the mapped ensemble and the noisy data. Following the analysis step, the approximated local variance and estimated option prices are calculated, and this is used to compare with the original data. We compute the residual and compare to the one from the last iteration: Step 2 of the algorithm is repeated until the residual is less than a certain threshold.