October  2012, 8(4): 1057-1069. doi: 10.3934/jimo.2012.8.1057

A proximal alternating direction method for $\ell_{2,1}$-norm least squares problem in multi-task feature learning

1. 

Institute of Applied Mathematics, Henan University, Kaifeng 475004

2. 

National Center for Theoretical Sciences (South), National Cheng Kung University, Tainan 700, Taiwan

3. 

Department of Mathematics, Nanjing University, Nanjing 210093, China

Received  March 2011 Revised  January 2012 Published  September 2012

The joint feature selection problem arises in many fields including computer vision, text classification and biomedical informatics. Generally, recent results show that it can be realized by solving a $\ell_{2,1}$-norm involved minimization problem. However, solving the optimization problem is a challenging task due to the non-smoothness of the regularization term. In this paper, we reformulate the problem to an equivalent constrained minimization problem by introducing an auxiliary variable. We split the corresponding augmented Lagrange function and minimize the subproblem alternatively with one variable by fixing the other one. Moreover, we linearize the subproblem and add a proximal-point term when the closed-form solutions are not easily to derived. The convergence analysis and the relatedness with other algorithms are also given. Although the $\ell_{2,1}$-norm is mainly considered, we show that the $\ell_{\infty,1}$-norm penalized learning problem can also be readily solved in our framework. The reported experiments on simulated and real data sets show that the proposed method is effective and promising. The performance comparisons illustrate that the proposed algorithm is competitive with even performs little better than the state-of-the-art solver SLEP.
Citation: Yunhai Xiao, Soon-Yi Wu, Bing-Sheng He. A proximal alternating direction method for $\ell_{2,1}$-norm least squares problem in multi-task feature learning. Journal of Industrial & Management Optimization, 2012, 8 (4) : 1057-1069. doi: 10.3934/jimo.2012.8.1057
References:
[1]

R. K. Ando and T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeleddata,, Journal of Machine Learning Research, 6 (2005), 1817.

[2]

A. Argyriou, T. Evgeniou and M. Pontil, Convex multi-convex feature learning,, Machine Learning, 73 (2008), 243.

[3]

B. Bakker and T. Heskes, Task clustering and gating for Bayesian multi-task learning,, Journal of Machine Learning Research, 4 (2003), 83.

[4]

S. Chen, D. Donoho and M. Saunders, Atomic decomposition by basis pursuit,, SIAM Journal on Scientific Computing, 20 (1999), 33. doi: 10.1137/S1064827596304010.

[5]

J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting,, Journal of Machine Learning Research, 10 (2009), 2899.

[6]

T. Evgeniou, C. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods,, Journal of Machine Learning Research, 6 (2005), 615.

[7]

D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations,, Computers & Mathematics with Applications, 2 (1976), 17.

[8]

R. Glowinski, "Numerical Methods for Nonlinear Variational Problems,", Springer, (1984).

[9]

R. Glowinski and A. Marrocco, Sur l'approximation, par élémentsfinis d'ordre un, et la résolution, parpénalisation-dualité d'une classe de problèmes deDirichlet nonlinéaires,, Revue Francaise d'automatique, 2 (1975), 41.

[10]

B. He, L. Z. Liao, D. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities,, Mathematical Programming, 92 (2002), 103. doi: 10.1007/s101070100280.

[11]

B. He, S. L. Wang and H. Yang, A modified variable-penalty alternating directions method for monotone variational inequalities,, Journal of Computational Mathematics, 21 (2003), 495.

[12]

J. Liu, J. Chen and J. Ye, "Large-Scale Sparse Logistic Regression,", in, (2009).

[13]

J. Liu, S. Ji and J. Ye, "Multi-Task Feather Learning Via Efficient $l_{2,1}$-norm Minimization,", in, (2009).

[14]

M. Kowalski, Sparse regression using mixednorms,, Applied and Computational Harmonic Analysis, 27 (2009), 303. doi: 10.1016/j.acha.2009.05.006.

[15]

M. Kowalski, M. Szafranski and L. Ralaivola, "Multiple Indefinite Kernel Learning with Mixed Normregularization,", Proceedings of the 26th Annual International Conference on Machine Learning, (2009).

[16]

A. Nemirovski, "Efficient Methods in Convex Programming,", Lecture Notes, (1994).

[17]

Y. Nesterov, "Introductory Lectures on Convex Optimization: A Basic Course,", Kluwer Academic Publishers, (2003).

[18]

Y. Nesterov, "Gradient Methods for Minimizing Composite Objective Function,", CORE report, (2007).

[19]

F. Nie, H. Huang, X. Cai and C. Ding, "Efficient and Robust Feature Selection via Joint $l_{2,1}$-Normsminimization,", Neural Information Processing Systems Foundation, (2010).

[20]

G. Obozinski, B. Taskar and M. I. Jordan, "Multi-Task Feature Selection,", Technical Report, (2006).

[21]

Y. Saeys, I. Inza and P. Larranaga, A review of feature selection techniques in bioinformatics,, Bioinformatics, 23 (2007), 2507. doi: 10.1093/bioinformatics/btm344.

[22]

Y. Xiao, S.-Y. Wu and D.-H. Li, Splitting and linearizing augmented Lagrangian algorithm for subspace recovery from corrupted observations,, Adv. Comput. Math., (): 10444.

[23]

T. Xiong, J. Bi, B. Rao and V. Cherkassky, "Probabilistic Joint Feature Selection for Multi-Task Learning,", in, (2006).

[24]

M. H. Xu, Proximal alternating directions method for structured variational inequalities,, Journal of Optimization Theory and Applications, 134 (2007), 107. doi: 10.1007/s10957-007-9192-2.

[25]

J. Yang, Dynamic power price problem: An inverse variational inequality approach,, Journal of Industrial and Management Optimization, 4 (2008), 673.

[26]

J. Yang and X. Yuan, Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization,, Math. Comput., (). doi: 10.1090/S0025-5718-2012-02598-1.

[27]

J. Yang and Y. Zhang, Alternating direction algorithms for $l_1$-problemsin compressive sensing,, SIAM Journal on Scientific Computing, 33 (2011), 250. doi: 10.1137/090777761.

[28]

J. Zhang, Z. Ghahramani and Y. Yang, Flexible latent variable models for multi-task learning,, Machine Learning, 73 (2008), 221.

show all references

References:
[1]

R. K. Ando and T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeleddata,, Journal of Machine Learning Research, 6 (2005), 1817.

[2]

A. Argyriou, T. Evgeniou and M. Pontil, Convex multi-convex feature learning,, Machine Learning, 73 (2008), 243.

[3]

B. Bakker and T. Heskes, Task clustering and gating for Bayesian multi-task learning,, Journal of Machine Learning Research, 4 (2003), 83.

[4]

S. Chen, D. Donoho and M. Saunders, Atomic decomposition by basis pursuit,, SIAM Journal on Scientific Computing, 20 (1999), 33. doi: 10.1137/S1064827596304010.

[5]

J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting,, Journal of Machine Learning Research, 10 (2009), 2899.

[6]

T. Evgeniou, C. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods,, Journal of Machine Learning Research, 6 (2005), 615.

[7]

D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations,, Computers & Mathematics with Applications, 2 (1976), 17.

[8]

R. Glowinski, "Numerical Methods for Nonlinear Variational Problems,", Springer, (1984).

[9]

R. Glowinski and A. Marrocco, Sur l'approximation, par élémentsfinis d'ordre un, et la résolution, parpénalisation-dualité d'une classe de problèmes deDirichlet nonlinéaires,, Revue Francaise d'automatique, 2 (1975), 41.

[10]

B. He, L. Z. Liao, D. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities,, Mathematical Programming, 92 (2002), 103. doi: 10.1007/s101070100280.

[11]

B. He, S. L. Wang and H. Yang, A modified variable-penalty alternating directions method for monotone variational inequalities,, Journal of Computational Mathematics, 21 (2003), 495.

[12]

J. Liu, J. Chen and J. Ye, "Large-Scale Sparse Logistic Regression,", in, (2009).

[13]

J. Liu, S. Ji and J. Ye, "Multi-Task Feather Learning Via Efficient $l_{2,1}$-norm Minimization,", in, (2009).

[14]

M. Kowalski, Sparse regression using mixednorms,, Applied and Computational Harmonic Analysis, 27 (2009), 303. doi: 10.1016/j.acha.2009.05.006.

[15]

M. Kowalski, M. Szafranski and L. Ralaivola, "Multiple Indefinite Kernel Learning with Mixed Normregularization,", Proceedings of the 26th Annual International Conference on Machine Learning, (2009).

[16]

A. Nemirovski, "Efficient Methods in Convex Programming,", Lecture Notes, (1994).

[17]

Y. Nesterov, "Introductory Lectures on Convex Optimization: A Basic Course,", Kluwer Academic Publishers, (2003).

[18]

Y. Nesterov, "Gradient Methods for Minimizing Composite Objective Function,", CORE report, (2007).

[19]

F. Nie, H. Huang, X. Cai and C. Ding, "Efficient and Robust Feature Selection via Joint $l_{2,1}$-Normsminimization,", Neural Information Processing Systems Foundation, (2010).

[20]

G. Obozinski, B. Taskar and M. I. Jordan, "Multi-Task Feature Selection,", Technical Report, (2006).

[21]

Y. Saeys, I. Inza and P. Larranaga, A review of feature selection techniques in bioinformatics,, Bioinformatics, 23 (2007), 2507. doi: 10.1093/bioinformatics/btm344.

[22]

Y. Xiao, S.-Y. Wu and D.-H. Li, Splitting and linearizing augmented Lagrangian algorithm for subspace recovery from corrupted observations,, Adv. Comput. Math., (): 10444.

[23]

T. Xiong, J. Bi, B. Rao and V. Cherkassky, "Probabilistic Joint Feature Selection for Multi-Task Learning,", in, (2006).

[24]

M. H. Xu, Proximal alternating directions method for structured variational inequalities,, Journal of Optimization Theory and Applications, 134 (2007), 107. doi: 10.1007/s10957-007-9192-2.

[25]

J. Yang, Dynamic power price problem: An inverse variational inequality approach,, Journal of Industrial and Management Optimization, 4 (2008), 673.

[26]

J. Yang and X. Yuan, Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization,, Math. Comput., (). doi: 10.1090/S0025-5718-2012-02598-1.

[27]

J. Yang and Y. Zhang, Alternating direction algorithms for $l_1$-problemsin compressive sensing,, SIAM Journal on Scientific Computing, 33 (2011), 250. doi: 10.1137/090777761.

[28]

J. Zhang, Z. Ghahramani and Y. Yang, Flexible latent variable models for multi-task learning,, Machine Learning, 73 (2008), 221.

[1]

Foxiang Liu, Lingling Xu, Yuehong Sun, Deren Han. A proximal alternating direction method for multi-block coupled convex optimization. Journal of Industrial & Management Optimization, 2019, 15 (2) : 723-737. doi: 10.3934/jimo.2018067

[2]

Xueyong Wang, Yiju Wang, Gang Wang. An accelerated augmented Lagrangian method for multi-criteria optimization problem. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-9. doi: 10.3934/jimo.2018136

[3]

Russell E. Warren, Stanley J. Osher. Hyperspectral unmixing by the alternating direction method of multipliers. Inverse Problems & Imaging, 2015, 9 (3) : 917-933. doi: 10.3934/ipi.2015.9.917

[4]

Fan Zhang, Guifa Teng, Mengmeng Gao, Shuai Zhang, Jingjing Zhang. Multi-machine and multi-task emergency allocation algorithm based on precedence rules. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1501-1513. doi: 10.3934/dcdss.2019103

[5]

Qian Liu, Xinmin Yang, Heung Wing Joseph Lee. On saddle points of a class of augmented lagrangian functions. Journal of Industrial & Management Optimization, 2007, 3 (4) : 693-700. doi: 10.3934/jimo.2007.3.693

[6]

Sohana Jahan. Supervised distance preserving projection using alternating direction method of multipliers. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-17. doi: 10.3934/jimo.2019029

[7]

Yuan Shen, Wenxing Zhang, Bingsheng He. Relaxed augmented Lagrangian-based proximal point algorithms for convex optimization with linear constraints. Journal of Industrial & Management Optimization, 2014, 10 (3) : 743-759. doi: 10.3934/jimo.2014.10.743

[8]

Chunlin Wu, Juyong Zhang, Xue-Cheng Tai. Augmented Lagrangian method for total variation restoration with non-quadratic fidelity. Inverse Problems & Imaging, 2011, 5 (1) : 237-261. doi: 10.3934/ipi.2011.5.237

[9]

Wei Zhu, Xue-Cheng Tai, Tony Chan. Augmented Lagrangian method for a mean curvature based image denoising model. Inverse Problems & Imaging, 2013, 7 (4) : 1409-1432. doi: 10.3934/ipi.2013.7.1409

[10]

Li Jin, Hongying Huang. Differential equation method based on approximate augmented Lagrangian for nonlinear programming. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-15. doi: 10.3934/jimo.2019053

[11]

Bingsheng He, Xiaoming Yuan. Linearized alternating direction method of multipliers with Gaussian back substitution for separable convex programming. Numerical Algebra, Control & Optimization, 2013, 3 (2) : 247-260. doi: 10.3934/naco.2013.3.247

[12]

Yue Lu, Ying-En Ge, Li-Wei Zhang. An alternating direction method for solving a class of inverse semi-definite quadratic programming problems. Journal of Industrial & Management Optimization, 2016, 12 (1) : 317-336. doi: 10.3934/jimo.2016.12.317

[13]

Zhongming Wu, Xingju Cai, Deren Han. Linearized block-wise alternating direction method of multipliers for multiple-block convex programming. Journal of Industrial & Management Optimization, 2018, 14 (3) : 833-855. doi: 10.3934/jimo.2017078

[14]

Yuan Shen, Lei Ji. Partial convolution for total variation deblurring and denoising by new linearized alternating direction method of multipliers with extension step. Journal of Industrial & Management Optimization, 2019, 15 (1) : 159-175. doi: 10.3934/jimo.2018037

[15]

Dan Li, Li-Ping Pang, Fang-Fang Guo, Zun-Quan Xia. An alternating linearization method with inexact data for bilevel nonsmooth convex optimization. Journal of Industrial & Management Optimization, 2014, 10 (3) : 859-869. doi: 10.3934/jimo.2014.10.859

[16]

Xi-Hong Yan. A new convergence proof of augmented Lagrangian-based method with full Jacobian decomposition for structured variational inequalities. Numerical Algebra, Control & Optimization, 2016, 6 (1) : 45-54. doi: 10.3934/naco.2016.6.45

[17]

Wei Zhu. A numerical study of a mean curvature denoising model using a novel augmented Lagrangian method. Inverse Problems & Imaging, 2017, 11 (6) : 975-996. doi: 10.3934/ipi.2017045

[18]

Xiantao Xiao, Liwei Zhang, Jianzhong Zhang. On convergence of augmented Lagrangian method for inverse semi-definite quadratic programming problems. Journal of Industrial & Management Optimization, 2009, 5 (2) : 319-339. doi: 10.3934/jimo.2009.5.319

[19]

Egil Bae, Xue-Cheng Tai, Wei Zhu. Augmented Lagrangian method for an Euler's elastica based segmentation model that promotes convex contours. Inverse Problems & Imaging, 2017, 11 (1) : 1-23. doi: 10.3934/ipi.2017001

[20]

Xihong Yan. An augmented Lagrangian-based parallel splitting method for a one-leader-two-follower game. Journal of Industrial & Management Optimization, 2016, 12 (3) : 879-890. doi: 10.3934/jimo.2016.12.879

2018 Impact Factor: 1.025

Metrics

  • PDF downloads (10)
  • HTML views (0)
  • Cited by (5)

Other articles
by authors

[Back to Top]