\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Optimal estimation of $\ell_1$-regularization prior from a regularized empirical Bayesian risk standpoint

Abstract Related Papers Cited by
  • We address the problem of prior matrix estimation for the solution of $\ell_1$-regularized ill-posed inverse problems. From a Bayesian viewpoint, we show that such a matrix can be regarded as an influence matrix in a multivariate $\ell_1$-Laplace density function. Assuming a training set is given, the prior matrix design problem is cast as a maximum likelihood term with an additional sparsity-inducing term. This formulation results in an unconstrained yet nonconvex optimization problem. Memory requirements as well as computation of the nonlinear, nonsmooth sub-gradient equations are prohibitive for large-scale problems. Thus, we introduce an iterative algorithm to design efficient priors for such large problems. We further demonstrate that the solutions of ill-posed inverse problems by incorporation of $\ell_1$-regularization using the learned prior matrix perform generally better than commonly used regularization techniques where the prior matrix is chosen a-priori.
    Mathematics Subject Classification: Primary: 49N45; Secondary: 62F15.

    Citation:

    \begin{equation} \\ \end{equation}
  • [1]

    H. Akaike, A new look at the statistical model identification, IEEE Trans. Auto. Cont., 19 (1974), 716-723.doi: 10.1109/TAC.1974.1100705.

    [2]

    A. Aravkin, An $l_1$-Laplace robust kalman smoother, IEEE Trans. Auto. Cont., 56 (2011), 2898-2911.doi: 10.1109/TAC.2011.2141430.

    [3]

    U. Ascher, E. Haber and H. Huang, On effective methods for implicit piecewise smooth surface recovery, SIAM J. Scient. Comput., 28 (2006), 339-358.doi: 10.1137/040617261.

    [4]

    U. Ascher, K. van den Doel, H. Huang and B. Svaiter, Gradient descent and fast artificial time integration, M2AN, Math. Model. Numer. Anal., 43 (2009), 689-708.

    [5]

    J. A. Bilmes, "Factored Sparse Inverse Covariance Matrices," Proc. IEEE Intl. Conf. on Acous., Spee., and Sig. Proc., 2000.

    [6]

    A. M. Bruckstein, D. L. Donoho and M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images, SIAM Review, 51 (2009), 34-81.doi: 10.1137/060657704.

    [7]

    E. J. Candès and D. L. Donoho, "Curvelets: A Surprisingly Effective Nonadaptive Representation for Objects with Edges," 1999.

    [8]

    E. J. Candès and M. B. Wakin, An introduction to compressive sampling, IEEE In Sig. Proc. Magaz., 25 (2008), 21-30.doi: 10.1109/MSP.2007.914731.

    [9]

    T. Chan and J. Shen, "Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic Methods," SIAM, 2005.

    [10]

    X. Chen and W. Zhou., Smoothing nonlinear conjugate gradient method for image restoration using nonsmooth nonconvex minimization, SIAM J. Imaging Sci., 3 (2010), 765-790.doi: 10.1137/080740167.

    [11]

    M. Cheney, D.Isaacson and J. C. Newell, Electrical impedance tomography, SIAM Review, 41 (1999), 85-101.doi: 10.1137/S0036144598333613.

    [12]

    J. Dahl, L. Vandenberghe and V. Roychowdhury, Covariance selection for non-chordal graphs via chordal embedding, Optim. Meth. and Soft., 23 (2008), 501-520.doi: 10.1080/10556780802102693.

    [13]

    Y. H. Dai and R. Fletcher, Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming, Numer. Math., 100 (2005), 21-47.doi: 10.1007/s00211-004-0569-y.

    [14]

    A. d'Aspremont, O. Banerjee and L. E. Ghaoui, First-order methods for sparse covariance selection, SIAM J. Matrix Anal. Appl., 30 (2008), 56-66.doi: 10.1137/060670985.

    [15]

    M. N. Do and M. Vetterli, The contourlet transform: an efficient directional multiresolution image representation, IEEE Trans. on Image Proc., 14 (2005), 2091-2106.doi: 10.1109/TIP.2005.859376.

    [16]

    W. Dong, L. Zhang, G. Shi and X. Wu, Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization, IEEE Trans. Image Proc., 20 (2011), 1838-1857.doi: 10.1109/TIP.2011.2108306.

    [17]

    M. Elad, P. Milanfar and R. Rubinstein, Analysis versus synthesis in signal priors, Inver. Problems, 23 (2007), 947-968.doi: 10.1088/0266-5611/23/3/007.

    [18]

    K. Engan, K. Skretting and J. H. Husφy, Family of iterative ls-based dictionary learning algorithms, ils-dla, for sparse signal representation, Digit. Sig. Proc., 17 (2007), 32-49.doi: 10.1016/j.dsp.2006.02.002.

    [19]

    M. A. T. Figueiredo, R. D. Nowak and S. J. Wright, Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems, IEEE J. on Sig. Proc., 1 (2007), 586-597.

    [20]

    M. A. T. Figueiredo, R. D. Nowak and S. J. Wright, Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems, IEEE J. in Sig. Proc., 1 (2007), 586-597.

    [21]

    J. Friedman, T. Hastie and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9 (2008), 432-441.doi: 10.1093/biostatistics/kxm045.

    [22]

    A. Genkin, D. Lewis and D. Madigan, Large-scale bayesian logistic regression for text categorization, Technometrics, 49 (2007), 291-304.doi: 10.1198/004017007000000245.

    [23]

    M. Girolami, A variational method for learning sparse and overcomplete representations, Neu. Comput., 13 (2001), 2517-2532.doi: 10.1162/089976601753196003.

    [24]

    G. H. Golub, M. T. Heath and G. Wahba, Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics, 21 (1979), 215-223.doi: 10.1080/00401706.1979.10489751.

    [25]

    E. Haber, U. Ascher and D. Oldenburg, Inversion of 3D electromagnetic data in frequency and time domain using an inexact all-at-once approach, Geophysics, 69 (2004), 1216-1228.

    [26]

    E. Haber, D. Oldenburg and A. Celler, Recovery of dynamic parameters in SPECT, IEEE trans. on Nuc. Sci., 44 (1997), 2425-2430.

    [27]

    P. C. Hansen, "Rank Deficient and Ill-Posed Problems," SIAM, Philadelphia, 1998.doi: 10.1137/1.9780898719697.

    [28]

    L. Horesh and E. Haber, Sensitivity computation of the $l_1$ minimization problem and its application to dictionary design of ill-posed problems, Inver. Problems, 25 (2009), 20pp.doi: 10.1088/0266-5611/25/9/095009.

    [29]

    N. Johnson and S. Kotz, "Continuous Multivariate Distributions," Wiley, New York, 1972.

    [30]

    S. M. Kakade, S. Shalev-Shwartz and A. Tewari, "Regularization Techniques for Learning with Matrices," J. Mach. Learning Research, accepted, 2012.

    [31]

    S. Kotz, T. J. Kozubowski and K. Podgórski, "Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance," Birkhäuser, Boston, 2001.

    [32]

    K. Kreutz-Delgado, J. F. Murray, B. D. Rao, K. Engan, T. W. Lee and T. J. Sejnowski, Dictionary learning algorithms for sparse representation, Neu. Comput., 15 (2003), 349-396.doi: 10.1162/089976603762552951.

    [33]

    G. Kutyniok, D. Labate, W. Q. Lim and G. Weiss, Sparse multidimensional representation using shearlets, in "Proc. of SPIE," 5914, (2005), 254-262.doi: 10.1117/12.613494.

    [34]

    M. S. Lewicki and T. J. Sejnowski, Learning overcomplete representations, Neu. Comput., 12 (2000), 337-365.doi: 10.1162/089976600300015826.

    [35]

    A. S. Lewis and M. L. Overton, "Nonsmooth Optimization Via BFGS," SIAM J. Optim., in press, 2011.

    [36]

    Z. Lu, Smooth optimization approach for sparse covariance selection, SIAM J. Optim., 19 (2009), 807-1827.doi: 10.1137/070695915.

    [37]

    J. Mairal, F. Bach, J. Ponce, G. Sapiro and A. Zisserman, Supervised dictionary learning, 21 (2009), 1033-1040.

    [38]

    S. Mallat, "A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way," Academic Press, 2008.

    [39]

    D. Mihovilovic and R. N. Bracewell, Adaptive chirplet representation of signals on time-frequency plane, Electronics Letters, 27 (1991), 1159-1161.doi: 10.1049/el:19910723.

    [40]

    J. F. Murray and K. Kreutz-Delgado, Learning sparse overcomplete codes for images, J. VLSI Sig. Proc., 45 (2006), 97-110.doi: 10.1007/s11265-006-9774-5.

    [41]

    C. Papageorgiou and T. Poggio, "A Trainable Object Detection System: Car Detection in Static Images," Technical Report 1673, mitai, (CBCL Memo 180), October 1999.

    [42]

    A. Pázman, "Foundations of Optimum Experimental Design," Springer-Verlag, New York, 1986.

    [43]

    E. Le Pennec and S. Mallat, Sparse geometric image representations with bandelets, IEEE Trans. on Image Proc., 14 (2005), 423-438.doi: 10.1109/TIP.2005.843753.

    [44]

    G. Peyré and J. Fadili, Learning analysis sparsity priors, in "Proc. of Sampta'11," 2011.

    [45]

    R. Rubinstein, M. Zibulevsky and M. Elad, Double sparsity: learning sparse dictionaries for sparse signal approximation, IEEE Trans. on Sig. Proc., 58 (2010), 1553-1564.

    [46]

    G. Sapiro, "Geometric Partial Differential Equations and Image Analysis," Cambridge, 2001.

    [47]

    O. Scherzer, Scale-space methods and regularization for denoising and inverse problems, Advances in Image and Elect. Phys., 128 (2003), 445-530.

    [48]

    G. Schwarz, Estimating the dimension of a model, The Annals of Stat., 6 (1978), 461-464.doi: 10.1214/aos/1176344136.

    [49]

    D. F. Shanno and K. Phua, Matrix conditioning and nonlinear optimization, Math. Program., 14 (1978), 149-160.doi: 10.1007/BF01588962.

    [50]

    V. Sindhwani and A. Ghoting, "Large-Scale Distributed Non-Negative Sparse Coding and Sparse Dictionary Learning," SIGKDD, 2012.

    [51]

    M. Sugiyama and H. Ogawa, Optimal design of regularization term and regularization parameter by subspace information criterion, Neu. Netw., 15 (2002), 349-361.doi: 10.1016/S0893-6080(02)00022-9.

    [52]

    R. Tibshirani, Regression shrinkage and selection via the lasso, J. Royal. Statist. Soc B., 58 (1996), 267-288.

    [53]

    A. N. Tychonoff, Solution of incorrectly formulated problems and the regularization method, Soviet Math. Dokl., 4 (1963), 1035-1038.

    [54]

    E. van den Berg and M. P. Friedlander, Probing the pareto frontier for basis pursuit solutions, SIAM J. Scient. Comput., 31 (2008), 890-912.doi: 10.1137/080714488.

    [55]

    V. Vapnik, "Statistical Learning Theory," Wiley, 1998.

    [56]

    C. R. Vogel, "Computational Methods for Inverse Problem," SIAM, Philadelphia, 2002.doi: 10.1137/1.9780898717570.

    [57]

    H. Wang, B. Li and C. Leng, Shrinkage tuning parameter selection with a diverging number of parameters, J. R. Statist. Soc. B, 71 (2009), 671-683.doi: 10.1111/j.1467-9868.2008.00693.x.

    [58]

    D. P. Wipf and B. D. Rao, Sparse bayesian learning for basis selection, IEEE Trans. on Sig. Proc., 52 (2004), 2153-2164.

    [59]

    Z. J. Xiang, H. Xu and P. J. Ramadge, Learning sparse representations of high dimensional data on large scale dictionaries, 24 (2011), 900-908.

    [60]

    X. Chen and C. Zhang, Smoothing projected gradient method and its application to stochastic linear complementarity problems, SIAM J. Optim., 20, (2009), 627-649.doi: 10.1137/070702187.

  • 加载中
SHARE

Article Metrics

HTML views() PDF downloads(140) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return