General risk measures for robust machine learning

  * Corresponding author: Henri Gérard

The work of second author was supported by ENPC and Labex Bézout. The work of third author was supported by Institut Universitaire de France

  • A wide array of machine learning problems are formulated as the minimization of the expectation of a convex loss function on some parameter space. Since the probability distribution of the data of interest is usually unknown, it is is often estimated from training sets, which may lead to poor out-of-sample performance. In this work, we bring new insights in this problem by using the framework which has been developed in quantitative finance for risk measures. We show that the original min-max problem can be recast as a convex minimization problem under suitable assumptions. We discuss several important examples of robust formulations, in particular by defining ambiguity sets based on $ \varphi $-divergences and the Wasserstein metric. We also propose an efficient algorithm for solving the corresponding convex optimization problems involving complex convex constraints. Through simulation examples, we demonstrate that this algorithm scales well on real data sets.

    Mathematics Subject Classification: 46N10, 65K10, 62F35, 68Q32.


    \begin{equation} \\ \end{equation}
  • Figure 1.  $\mathtt{ionosphere} $ dataset: Log of the difference between current loss and final loss, with respect to the iteration number for various values of $ \epsilon $

    Figure 2.  $\mathtt{ionosphere} $ dataset: Log of the difference between current loss and final loss, with respect to the CPU time for vaious values of $ \epsilon $ over the first 100 iterations

    Figure 3.  $\mathtt{ionosphere} $ dataset: AUC metric as a function of $ \epsilon $

    Figure 4.  $\mathtt{ionosphere} $ dataset (altered): ROC curve for different values of $ \epsilon $

    Figure 5.  $\mathtt{ionosphere} $ dataset: AUC histogram for 1000 random realizations using 10% of data for the training set. Robust model is used with $ \epsilon = 0.001 $

    Figure 6.  $\mathtt{ionosphere} $ dataset: AUC histogram for 1000 random realizations using 60% of data for the training set. Robust model is used with $ \epsilon = 0.001 $

    Table 1.  Common perspective functions and their conjugate used to define $\varphi$ -divergences

    Divergence $\varphi\left( t \right)$ $\varphi\left( t \right), t \geq 0$ ${D_\varphi }\left( {p,q} \right)$ $\varphi^{*}\ \left( s \right)$ $\tilde \varphi \left( t \right)$
    Kullback-Leibler $\varphi_{kl}\left( t \right)$ $t\log\left( t \right) -t +1$ $\sum_{i = 1}^{N}p_{i}\log\left( {\frac{{{p_i}}}{{{q_i}}}} \right)$ $e^{s}-1$ $\varphi_{b}\left( t \right)$
    Burg entropy $\varphi_{b}\left( t \right)$ $-\log\left( t \right)+t-1$ $\sum_{i = 1}^{N}q_{i}\log\left( {\frac{{{q_i}}}{{{p_i}}}} \right)$ $-\log[\left( {1 - s} \right), s < 1$ $\varphi_{kl}\left( t \right)$
    J-divergence $\varphi_{j}\left( t \right)$ $\left( {t - {\rm{1}}} \right)\log\left( t \right)$ $\sum_{i=1}^{N}\left( {{p_i} - {q_i}} \right)\log\left( {\frac{{{p_i}}}{{{q_i}}}} \right)$ no closed form $\varphi_{j}\left( t \right)$
    $\chi^{2}$-distance $\varphi_{c}\left( t \right)$ $\frac{1}{t}\left( {t - {\rm{1}}} \right)^{2}$ $\sum_{i=1}^{N}\frac{p_{i}-q_{i}}{p_{i}}$ $2-2\sqrt{1-s}, s <1$ $\varphi_{mc}\left( t \right)$
    Modified $\chi^{2}$-distance $\varphi_{mc}\left( t \right)$ $\left( {t - {\rm{1}}} \right)^{2}$ $\sum_{i=1}^{N}\frac{q_{i}-p_{i}}{q_{i}}$ $ \left \{ \begin{array}{ll} -1, &s <-2 \\ s+s^{2}/4, &s\geq-2 \end{array} \right . $ $\varphi_{c}\left( t \right)$
    Hellinger distance $\varphi_{h}\left( t \right)$ $\left( {\sqrt t - 1} \right)^{2}$ $\sum_{i=1}^{N}\left( {\sqrt {{p_i}} - \sqrt {{q_i}} } \right)$ $\frac{s}{1-s},s <1$ $\varphi_{h}\left( t \right)$
    $\chi$-divergence of order $\theta$>1 $\varphi_{ca}^{\theta}\left( t \right)$ $|{t-1}|^{\theta}$ $\sum_{i=1}^{N}q_{i}{\rm{|}}1 - \frac{{{p_i}}}{{{q_i}}}|^{\theta}$ $s+\left( {\theta - 1} \right){\left( {\frac{{|s|}}{\theta }} \right)^{\frac{\theta }{{\theta - 1}}}}$ $t^{1-\theta}\varphi_{ca}^{\theta}\left( t \right)$
    Variation distance $\varphi_{v}\left( t \right)$ $|{t-1}|$ $\sum_{i=1}^{N}|{p_i} - {q_i}|$ $ \left \{ \begin{array}{ll} -1, &s\leq-1 \\ s, &-1 \leq s \leq 1 \end{array} \right . $ $\varphi_{v}\left( t \right)$
    Cressie and Read $\varphi_{cr}^{\theta}\left( t \right)$ $\frac{1-\theta+\theta t-t^{\theta}}{\theta\left( {1 - \theta } \right)}, \theta \notin {\rm{\{ 0,1\} }}$ $\frac{1}{\theta\left( {1 - \theta } \right)}\left( {1 - \sum _{i = 1}^N {p_i^\theta } q_i^{1 - \theta }} \right)$ $ \left \{ \begin{array}{l} \frac{1}{\theta}\left( {1 - s\left( {1 - \theta } \right)} \right)^{\frac{\theta}{\theta-1}}-\frac{1}{\theta} \\ s < \frac{1}{\theta-1} \end{array} \right . $ $\varphi_{cr}^{1-\theta}\left( t \right)$
    Average Value at Risk of level $\beta$ $\varphi_{\textrm{avar}}^{\beta}\left( t \right)$ $\iota_{\left[ {0,\frac{1}{{1 - \beta }}} \right]}, \beta \in [0,1]$ $\sum_{i=1}^{N}\iota_{\left[ {0,\frac{1}{{1 - \beta }}} \right]}(\frac{p_{i}}{q_{i}})$ $\sigma_{\left[ {0,\frac{1}{{1 - \beta }}} \right]} = \left \{ \begin{array}{l} \frac{1}{1-\beta} , s\geq 0 \\ 0 , s < 0 \end{array} \right . $ $\iota_{[1-\beta,+\infty[}$
    Table 2.  Parameters of the datasets

    Name of dataset $\mathtt{ionosphere} $ $\mathtt{colon-cancer}$
    Number of observations ($ N $) 351 64
    Number of features ($ d $) 34 2000
    Table 3.  $\mathtt{colon-cancer}$ dataset: Values of the AUC for different values of $ \epsilon $

    Value of $ \epsilon $ AUC with KL AUC with Wasserstein
    $ \epsilon = 0 $ (LR) 0.832 0.832
    $ \epsilon = 0.001 $ 0.757 0.787
    $ \epsilon = 0.002 $ 0.750 0.770
    $ \epsilon = 0.003 $ 0.779 0.706
    $ \epsilon = 0.004 $ 0.698 0.691
    $ \epsilon = 0.005 $ 0.868 0.831
    $ \epsilon = 0.006 $ 0.890 0.860
    $ \epsilon = 0.007 $ 0.728 0.838
    $ \epsilon = 0.008 $ 0.809 0.768
    $ \epsilon = 0.009 $ 0.875 0.890
    $ \epsilon = 0.01 $ 0.801 0.853
    $ \epsilon = 0.05 $ 0.786 0.794
    $ \epsilon = 0.1 $ 0.801 0.816
    Table 4.  $\mathtt{ionosphere} $ dataset (altered): Values of the area under ROC curve for different values of $ \epsilon $

    Value of $ \epsilon $ AUC with KL AUC with Wasserstein
    $ \epsilon = 0 $ (LR) 0.514 0.514
    $ \epsilon = 0.001 $ 0.816 0.840
    $ \epsilon = 0.002 $ 0.804 0.835
    $ \epsilon = 0.003 $ 0.840 0.814
    $ \epsilon = 0.004 $ 0.824 0.830
    $ \epsilon = 0.005 $ 0.815 0.829
    $ \epsilon = 0.006 $ 0.834 0.829
    $ \epsilon = 0.007 $ 0.821 0.815
    $ \epsilon = 0.008 $ 0.835 0.815
    $ \epsilon = 0.009 $ 0.823 0.822
    $ \epsilon = 0.01 $ 0.828 0.835
    $ \epsilon = 0.05 $ 0.815 0.826
    $ \epsilon = 0.1 $ 0.824 0.823
