Backtestability and the ridge backtest

    *Corresponding author: Carlo Acerbi 

This paper is the private opinion of the author and does not necessarily reflect the policy and views of Morgan Stanley.

  • We propose a formal definition of backtestability for a statistical functional of a distribution: a functional is backtestable if there exists a backtest function depending only on the forecast of the functional and the related random variable, which is strictly monotonic in the former and has zero expected value for an exact forecast. We discuss the relationship with elicitability and identifiability which turn out being necessary conditions for backtestability. The variance and the expected shortfall are not backtestable for this reason. We compare (absolute) model validation in the context of hypothesis tests via backtest functions, versus (relative) model selection between competing forecasting models, via scoring functions. We define a backtest to be sharp when it is strictly monotonic with respect to the real value of the functional and not only to its forecast. This decides whether the expected value of the backtest determines also the prediction discrepancy and not only its significance. We show that the quantile backtest is not sharp and in fact it provides no information whatsoever on its true value. The expectile is also not sharp; we provide bounds for its true value, which are looser for outer confidence levels. We then introduce the notion of ridge backtests, applicable to particular non–backtestable functionals, such as the variance and the expected shortfall, which coincide with the attained minimum of the scoring function of another elicitable auxiliary functional (the mean and the value at risk, respectively). This permits approximated sharp backtests up to a small and one–sided sensitivity to the prediction of the auxiliary variable. The ridge mechanism explains why the variance has always been de–facto backtestable and allows for similar efficient ways to backtest the expected shortfall. We discuss the relevance of this result in the current debate of financial regulation (banking and insurance), where value at risk and expected shortfall are adopted as regulatory risk measures.

    Mathematics Subject Classification: 91G70, 91G60, 91B05, 62G10, 60G25.


    \begin{equation} \\ \end{equation}
  • Figure 1.  Testing risk predictions is difficult, because the true risk is not observable a posteriori. What is revealed is just one random draw

    Figure 2.  Dependence on $ v $ of tests $ Z_{ \mathrm{\bf{ES}}} $ and $ Z_2 $ in the case of correct predictions for $ \mathrm{\bf{ES}} $. Dashed lines represent critical values at 5% for the two tests. Notice the linear sensitivity of the latter and the muted, quadratic sensitivity of the former. We can see that $ Z_2 $ can easily generate a type Ⅰ error. Source: [3]

    Figure 3.  Similar example in the case of an underestimation $ e = 0.8 \mathrm{\bf{ES}} $. $ Z_2 $ can generate a type Ⅱ error, while $ Z_{ \mathrm{\bf{ES}}} $ can not. Source: [3]

    Table 1.  Common examples of canonical scoring functions

    $ \mathrm{\bf{y}} $ $ S_ \mathrm{\bf{y}}(y,x) $ $ \mathcal{F}_S $
    $ {\mathit{\boldsymbol{\mu}}} $ $ (y-x)^2 $ maximal
    $ \mathrm{\bf{q}}_{1/2} $ $ |y-x| $ maximal
    $ \mathrm{\bf{q}}_\alpha $ $ \alpha (x-y)_+ + (1-\alpha)(x-y)_- $ maximal
    $ \mathrm{\bf{e}}_\alpha $ $ \alpha (x-y)_+^2 + (1-\alpha)(x-y)_-^2 $ maximal
    Table 2.  Common examples of canonical identification functions. $ c\in [-\alpha, 1-\alpha] $, see remark 2.4

    $ \mathrm{\bf{y}} $ $ I_ \mathrm{\bf{y}}(y,x) $ $ \mathcal{F}_I $
    $ {\mathit{\boldsymbol{\mu}}} $ $ y-x $ maximal
    $ \mathrm{\bf{q}}_{1/2} $ $ {\bf{1}}_{\{y>x\}} - {\bf{1}}_{\{y<x\}} + 2c{\bf{1}}_{\{y=x\}} $ $ F(x) $ cont. in $ \mathrm{\bf{q}}_{1/2} $
    $ \mathrm{\bf{q}}_\alpha $ $ (1-\alpha) {\bf{1}}_{\{y>x\}}- \alpha {\bf{1}}_{\{y<x\}} + c{\bf{1}}_{\{y=x\}} $ $ F(x) $ cont. in $ \mathrm{\bf{q}}_{\alpha} $
    $ \mathrm{\bf{e}}_\alpha $ $ (1-\alpha) (x-y)_- - \alpha (x-y)_+ $ maximal
    Table 3.  Common examples of backtest functions. These are unique up to a positive constant (see corollary 3.9)

    $ \mathrm{\bf{y}} $ $ Z_ \mathrm{\bf{y}}(y,x) $ $ \mathcal{F}_Z $
    $ {\mathit{\boldsymbol{\mu}}} $ $ y-x $ maximal
    $ \mathrm{\bf{q}}_{1/2} $ $ {\bf{1}}_{\{y>x\}} - {\bf{1}}_{\{y<x\}} + 2c{\bf{1}}_{\{y=x\}} $ $ F(x) $ str. incr., cont. in $ \mathrm{\bf{q}}_{1/2} $
    $ \mathrm{\bf{q}}_\alpha $ $ (1-\alpha) {\bf{1}}_{\{y>x\}}- \alpha {\bf{1}}_{\{y<x\}} + c{\bf{1}}_{\{y=x\}} $ $ F(x) $ str. incr., cont. in $ \mathrm{\bf{q}}_{\alpha} $
    $ \mathrm{\bf{e}}_\alpha $ $ (1-\alpha) (x-y)_- - \alpha (x-y)_+ $ maximal
