Advanced Search
Article Contents
Article Contents

Fused LASSO penalized least absolute deviation estimator for high dimensional linear regression

  • * Corresponding author: Lingchen Kong

    * Corresponding author: Lingchen Kong

This paper was prepared at the occasion of The 10th International Conference on Optimization: Techniques and Applications (ICOTA 2016), Ulaanbaatar, Mongolia, July 23-26,2016, with its Associate Editors of Numerical Algebra, Control and Optimization (NACO) being Prof. Dr. Zhiyou Wu, School of Mathematical Sciences, Chongqing Normal University, Chongqing, China, Prof. Dr. Changjun Yu, Department of Mathematics and Statistics, Curtin University, Perth, Australia, and Shanghai University, China, and Prof. Gerhard-Wilhelm Weber, Middle East Technical University, Ankara, Turkey

The work was supported in part by National Natural Science Foundation of China (11671029), and the Fundamental Research Funds for the Central Universities (2016JBM081).
Abstract Full Text(HTML) Figure(4) / Table(5) Related Papers Cited by
  • The least absolute shrinkage and selection operator (LASSO) has been playing an important role in variable selection and dimensionality reduction for high dimensional linear regression under the zero-mean or Gaussian assumptions of the noises. However, these assumptions may not hold in practice. In this case, the least absolute deviation is a popular and useful method. In this paper, we focus on the least absolute deviation via Fused LASSO, called Robust Fused LASSO, under the assumption that the unknown vector is sparsity for both the coefficients and its successive differences. Robust Fused LASSO estimator does not need any knowledge of standard deviation of the noises or any moment assumptions of the noises. We show that the Robust Fused LASSO estimator possesses near oracle performance, i.e. with large probability, the $\ell_2$ norm of the estimation error is of order $O(\sqrt{k(\log p)/n})$ . The result is true for a wide range of noise distributions, even for the Cauchy distribution. In addition, we apply the linearized alternating direction method of multipliers to find the Robust Fused LASSO estimator, which possesses the global convergence. Numerical results are reported to demonstrate the efficiency of our proposed method.

    Mathematics Subject Classification: Primary: 65K05, 90C90; Secondary: 97K80.


    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Results of RFLASSO when $i = 2, \sigma = 0.001$

    Figure 2.  Results of FLASSO when $i = 2, \sigma = 0.001$

    Figure 3.  Results for RFLASSO when $i = 2, \sigma = 0.001$

    Figure 4.  Results for FLASSO when $i = 2, \sigma = 0.001$

    Table 1.  Iterative scheme of LADMM algorithm for RFLASSO

    LADMM Algorithm for RFLASSO
    Input: $X$ , $Y$ , $\tau$ , $\lambda_1>0, \lambda_2>0, \mu>0, \nu>0$ and $\eta>\rho(\mu X^TX+\nu A^TA)$ , where $\rho(\cdot)$ denotes the spectral radius.
    Select $(\beta^0, \gamma^0, \tau^0, \alpha^0, \theta^0)=(1, 1, 1, 1, 1)$ ;
    For $k=1, 2, \cdots, n $
               Compute $\beta^{k+1}$ by (23),
               Compute $\gamma^{k+1}$ by (24),
               Compute $\tau^{k+1}$ by (25),
               Update $\alpha^{k+1}=\alpha^k-\nu(A\beta^{k+1}-\gamma^{k+1})$ ,
               Update $\theta^{k+1}=\theta^k-\mu(y-X\beta^{k+1}-\tau^{k+1})$ .
    Output: $(\beta^n, \gamma^n, \tau^n, \alpha^n, \theta^n)$ as an approximate solution of (3).
     | Show Table
    DownLoad: CSV

    Table 2.  $\rho(0.5X^TX+0.5A^TA)$ for design matrix with unit column norms

    i1 2 3 4
    $\sigma=0.001$ 5.246 5.289 5.324 5.344
    $\sigma=0.005$ 5.265 5.296 5.333 5.350
    $\sigma=0.01$ 5.273 5.301 5.342 5.356
     | Show Table
    DownLoad: CSV

    Table 3.  The results for RFLASSO with size $(n, p, g) = (360i, 1280i, 160i)$

    i $\sigma$ Iter CPU Obj Error
    $i=1$ $\sigma=0.001$ 156 0.424 31.052 20.3164
    $\sigma=0.005$ 165 0.430 32.276 23.3330
    $\sigma=0.01$ 175 0.463 33.364 23.5509
    $i=2$ $\sigma=0.001$ 150 1.783 48.669 24.9877
    $\sigma=0.005$ 168 1.953 59.415 25.5110
    $\sigma=0.01$ 189 2.113 67.510 31.8536
    $i=3$ $\sigma=0.001$ 169 4.268 87.465 33.2984
    $\sigma=0.005$ 154 4.307 96.761 37.3335
    $\sigma=0.01$ 167 4.109 91.263 34.8521
    $i=4$ $\sigma=0.001$ 166 7.525 123.087 40.0416
    $\sigma=0.005$ 167 7.382 132.316 41.0560
    $\sigma=0.01$ 157 6.833 137.114 44.1079
     | Show Table
    DownLoad: CSV

    Table 4.  Selected results for RFLASSO and FLASSO

    $\sharp{\{~|\widehat{\beta}_i|<0.1, ~i\in G^c}\}$ 2240 2240
    $\text{max}_{i\in G^c}|\widehat{\beta}_i|$ 0.0147 0.0153
    $\sharp{\{~|\widehat{\beta}_i-\beta^*_i|<0.1, ~i\in G}\}$ 320 320
    $\text{max}_{i\in G}|\widehat{\beta}_i-\beta^*_i|$ 0.1447 0.1365
     | Show Table
    DownLoad: CSV

    Table 5.  Selected results for RFLASSO and FLASSO

    $\sharp{\{~|\widehat{\beta}_i|<0.1, ~i\in G^c}\}$ 2240 2372
    $\text{max}_{i\in G^c}|\widehat{\beta}_i|$ 0.0442 0.0997
    $\sharp{\{~|\widehat{\beta}_i-\beta^*_i|<0.1, ~i\in G}\}$ 320 188
    $\text{max}_{i\in G}|\widehat{\beta}_i-\beta^*_i|$ 0.1329 1.7218
     | Show Table
    DownLoad: CSV
  • [1] S. BoydN. ParikhE. ChuB. Peleato and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends® in Machine Learning, 3 (2010), 1-122. 
    [2] E. Candès, Compressive sampling, In Proceedings of the International Congress of Mathematicians, 3 (2006), 1433-1452.
    [3] E. Candès and T. Tao, Decoding by linear programming, IEEE Transactions on Information Theory, 51 (2005), 4203-4215. 
    [4] T. CaiL. Wang and G. Xu, Shifting inequality and recovery of sparse signals, IEEE Transactions on Signal Processing, 58 (2010), 1300-1308. 
    [5] H.B. Chen, Y.J. Wang, G. Wang, Strong convergence of extragradient method for generalized variational inequalities in Hilbert space, Journal of Inequalities and Applications, 2014 (2014), 223, 11 pages.
    [6] W. Deng and W. Yin, On the global and linear convergence of the generalized alternating direction method of multipliers, Journal of Scientific Computing, 66 (2016), 889-916. 
    [7] J. Friedman, Multivariate adaptive regression splines, The Annals of Statistics, (1991), 1-67. 
    [8] J. FriedmanT. HastieH. Höfling and R. Tibshirani, Pathwise coordinate optimization, Annals of Applied Statistics, 1 (2007), 302-332. 
    [9] M. Grant, S. Boyd and Y. Ye, CVX: Matlab Software for Displined Convex Programming, 2009.
    [10] D. HanX. Yuan and W. Zhang, An augmented Lagrangian based parallel splitting method for separable convex minimization with applications to image processing, Mathematics of Computation, 83 (2014), 2263-2291. 
    [11] H. Höfling, A path algorithm for the fused lasso signal approximator, Journal of Computational and Graphical Statistics, 19 (2009), 984-1006. 
    [12] E. Koc and H. Bozdogan, Model selection in multivariate adaptive regression splines (MARS) using information complexity as the fitness function, Machine Learning, 101 (2015), 35-58. 
    [13] X. LiL. MoX. Yuan and J. Zhang, Linearized alternating direction direction method of multipliers for sparse group and fused-LASSO models, Computayional Statistics & Data Analysis, 79 (2013), 203-221. 
    [14] J. Liu, S. Ji and J. Ye, SLEP: sparse learning with efficient projections, http://www.public.asu.edu/~jye02/Software/SLEP, 2009.
    [15] H. Ma and Y. Jia, Stability Analysis For Stochastic Differential Equations With Infinite Markovian Switchings, Journal of Mathematical Analysis and Applications, 435 (2016), 593-605.
    [16] S. MaL. Xue and H. Zou, Alternating direction methods for latent variable Gaussian graphical model selection, Neural Computation, 25 (2013), 2172-2198. 
    [17] D. MartinezD. ShihV. Chen and S. Kim, A convex version of multivariate adaptive regression splines, Computational Statistics & Data Analysis, 81 (2015), 89-106. 
    [18] B. Pete and V. Sara, Statistics for High-dimensional Data: Methods, Theory and Applications, Springer, 2011.
    [19] A. Rinaldo, Properties and refinements of the fused Lasso, Annnal of Statistics, 37 (2009), 2922-2952. 
    [20] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), (1996), 267-288. 
    [21] R. TibshiraniM. SaundersS. RossetJ. Zhu and K. Knight, Sparsity and smoothness via the fused Lasso, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67 (2005), 91-108. 
    [22] R. Tibshirani and P. Wang, Spatial smoothing and hot pot detection for CGH Data using the fused Lasso, Biostatistics, 9 (2008), 18-29. 
    [23] T. Wu and K. Lange, Coordinate decent algorithms for Lasso penalized regression, Annals of Applied Statistics, (2008), 224-244. 
    [24] H. WangG. Li and G. Jiang, Robust regression shrinkage and consistent variable selection through the LAD-lasso, Journal of Business and Economic Statististics, 25 (2007), 347-355. 
    [25] L. Wang, L1 penalized LAD estimator for high dimensional linear regression, Journal of Multivariate Analysis, 120 (2013), 135-151. 
    [26] M. WangL. Song and G. Tian, SCAD-penalized least absolute deviation regression in high dimensional models, Communications in Statistics-Theory and Methods, 44 (2015), 2452-2472. 
    [27] X. Wang and X. Yuan, The linearized alternating direction method of multipliers for Dantzig Selector, Siam Journal on Scientific Computing, 34 (2012), A2793-A2811. 
    [28] G. Weberİ. BatmazG. KöksalP. Taylan and F. Yerlikaya-özkurt, CMARS: a new contribution to nonparametric regression with multivariate adaptive regression splines supported by continuous optimization, Inverse Problems in Science and Engineering, 20 (2012), 371-400. 
    [29] J. Yang and X. Yuan, Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization, Mathematics of Computation, 82 (2013), 301-329. 
    [30] X. ZhangM. Burger and S. Osher, A unified primal-dual algorithm framwork based on Bregman iteration, Journal of Scientific Computing, 46 (2011), 20-46. 
    [31] H. ZhangG. WahbaY. LinM. VoelkerM. FerrisR. Klein and B. Klein, Variable selection and model building via likelihood basis pursuit, Journal of the American Statistical Association, 96 (2014), 659-672. 
    [32] H. ZouT. Hastie and R. Tibshirani, Sparse principle component analysis, Journal of Computational and Graphical Statistics, 15 (2006), 265-286. 
  • 加载中




Article Metrics

HTML views(1362) PDF downloads(437) Cited by(0)

Access History



    DownLoad:  Full-Size Img  PowerPoint