# American Institute of Mathematical Sciences

October  2012, 8(4): 1057-1069. doi: 10.3934/jimo.2012.8.1057

## A proximal alternating direction method for $\ell_{2,1}$-norm least squares problem in multi-task feature learning

 1 Institute of Applied Mathematics, Henan University, Kaifeng 475004 2 National Center for Theoretical Sciences (South), National Cheng Kung University, Tainan 700, Taiwan 3 Department of Mathematics, Nanjing University, Nanjing 210093, China

Received  March 2011 Revised  January 2012 Published  September 2012

The joint feature selection problem arises in many fields including computer vision, text classification and biomedical informatics. Generally, recent results show that it can be realized by solving a $\ell_{2,1}$-norm involved minimization problem. However, solving the optimization problem is a challenging task due to the non-smoothness of the regularization term. In this paper, we reformulate the problem to an equivalent constrained minimization problem by introducing an auxiliary variable. We split the corresponding augmented Lagrange function and minimize the subproblem alternatively with one variable by fixing the other one. Moreover, we linearize the subproblem and add a proximal-point term when the closed-form solutions are not easily to derived. The convergence analysis and the relatedness with other algorithms are also given. Although the $\ell_{2,1}$-norm is mainly considered, we show that the $\ell_{\infty,1}$-norm penalized learning problem can also be readily solved in our framework. The reported experiments on simulated and real data sets show that the proposed method is effective and promising. The performance comparisons illustrate that the proposed algorithm is competitive with even performs little better than the state-of-the-art solver SLEP.
Citation: Yunhai Xiao, Soon-Yi Wu, Bing-Sheng He. A proximal alternating direction method for $\ell_{2,1}$-norm least squares problem in multi-task feature learning. Journal of Industrial & Management Optimization, 2012, 8 (4) : 1057-1069. doi: 10.3934/jimo.2012.8.1057
##### References:
 [1] R. K. Ando and T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeleddata,, Journal of Machine Learning Research, 6 (2005), 1817.   Google Scholar [2] A. Argyriou, T. Evgeniou and M. Pontil, Convex multi-convex feature learning,, Machine Learning, 73 (2008), 243.   Google Scholar [3] B. Bakker and T. Heskes, Task clustering and gating for Bayesian multi-task learning,, Journal of Machine Learning Research, 4 (2003), 83.   Google Scholar [4] S. Chen, D. Donoho and M. Saunders, Atomic decomposition by basis pursuit,, SIAM Journal on Scientific Computing, 20 (1999), 33.  doi: 10.1137/S1064827596304010.  Google Scholar [5] J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting,, Journal of Machine Learning Research, 10 (2009), 2899.   Google Scholar [6] T. Evgeniou, C. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods,, Journal of Machine Learning Research, 6 (2005), 615.   Google Scholar [7] D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations,, Computers & Mathematics with Applications, 2 (1976), 17.   Google Scholar [8] R. Glowinski, "Numerical Methods for Nonlinear Variational Problems,", Springer, (1984).   Google Scholar [9] R. Glowinski and A. Marrocco, Sur l'approximation, par élémentsfinis d'ordre un, et la résolution, parpénalisation-dualité d'une classe de problèmes deDirichlet nonlinéaires,, Revue Francaise d'automatique, 2 (1975), 41.   Google Scholar [10] B. He, L. Z. Liao, D. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities,, Mathematical Programming, 92 (2002), 103.  doi: 10.1007/s101070100280.  Google Scholar [11] B. He, S. L. Wang and H. Yang, A modified variable-penalty alternating directions method for monotone variational inequalities,, Journal of Computational Mathematics, 21 (2003), 495.   Google Scholar [12] J. Liu, J. Chen and J. Ye, "Large-Scale Sparse Logistic Regression,", in, (2009).   Google Scholar [13] J. Liu, S. Ji and J. Ye, "Multi-Task Feather Learning Via Efficient $l_{2,1}$-norm Minimization,", in, (2009).   Google Scholar [14] M. Kowalski, Sparse regression using mixednorms,, Applied and Computational Harmonic Analysis, 27 (2009), 303.  doi: 10.1016/j.acha.2009.05.006.  Google Scholar [15] M. Kowalski, M. Szafranski and L. Ralaivola, "Multiple Indefinite Kernel Learning with Mixed Normregularization,", Proceedings of the 26th Annual International Conference on Machine Learning, (2009).   Google Scholar [16] A. Nemirovski, "Efficient Methods in Convex Programming,", Lecture Notes, (1994).   Google Scholar [17] Y. Nesterov, "Introductory Lectures on Convex Optimization: A Basic Course,", Kluwer Academic Publishers, (2003).   Google Scholar [18] Y. Nesterov, "Gradient Methods for Minimizing Composite Objective Function,", CORE report, (2007).   Google Scholar [19] F. Nie, H. Huang, X. Cai and C. Ding, "Efficient and Robust Feature Selection via Joint $l_{2,1}$-Normsminimization,", Neural Information Processing Systems Foundation, (2010).   Google Scholar [20] G. Obozinski, B. Taskar and M. I. Jordan, "Multi-Task Feature Selection,", Technical Report, (2006).   Google Scholar [21] Y. Saeys, I. Inza and P. Larranaga, A review of feature selection techniques in bioinformatics,, Bioinformatics, 23 (2007), 2507.  doi: 10.1093/bioinformatics/btm344.  Google Scholar [22] Y. Xiao, S.-Y. Wu and D.-H. Li, Splitting and linearizing augmented Lagrangian algorithm for subspace recovery from corrupted observations,, Adv. Comput. Math., (): 10444.   Google Scholar [23] T. Xiong, J. Bi, B. Rao and V. Cherkassky, "Probabilistic Joint Feature Selection for Multi-Task Learning,", in, (2006).   Google Scholar [24] M. H. Xu, Proximal alternating directions method for structured variational inequalities,, Journal of Optimization Theory and Applications, 134 (2007), 107.  doi: 10.1007/s10957-007-9192-2.  Google Scholar [25] J. Yang, Dynamic power price problem: An inverse variational inequality approach,, Journal of Industrial and Management Optimization, 4 (2008), 673.   Google Scholar [26] J. Yang and X. Yuan, Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization,, Math. Comput., ().  doi: 10.1090/S0025-5718-2012-02598-1.  Google Scholar [27] J. Yang and Y. Zhang, Alternating direction algorithms for $l_1$-problemsin compressive sensing,, SIAM Journal on Scientific Computing, 33 (2011), 250.  doi: 10.1137/090777761.  Google Scholar [28] J. Zhang, Z. Ghahramani and Y. Yang, Flexible latent variable models for multi-task learning,, Machine Learning, 73 (2008), 221.   Google Scholar

show all references

##### References:
 [1] R. K. Ando and T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeleddata,, Journal of Machine Learning Research, 6 (2005), 1817.   Google Scholar [2] A. Argyriou, T. Evgeniou and M. Pontil, Convex multi-convex feature learning,, Machine Learning, 73 (2008), 243.   Google Scholar [3] B. Bakker and T. Heskes, Task clustering and gating for Bayesian multi-task learning,, Journal of Machine Learning Research, 4 (2003), 83.   Google Scholar [4] S. Chen, D. Donoho and M. Saunders, Atomic decomposition by basis pursuit,, SIAM Journal on Scientific Computing, 20 (1999), 33.  doi: 10.1137/S1064827596304010.  Google Scholar [5] J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting,, Journal of Machine Learning Research, 10 (2009), 2899.   Google Scholar [6] T. Evgeniou, C. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods,, Journal of Machine Learning Research, 6 (2005), 615.   Google Scholar [7] D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations,, Computers & Mathematics with Applications, 2 (1976), 17.   Google Scholar [8] R. Glowinski, "Numerical Methods for Nonlinear Variational Problems,", Springer, (1984).   Google Scholar [9] R. Glowinski and A. Marrocco, Sur l'approximation, par élémentsfinis d'ordre un, et la résolution, parpénalisation-dualité d'une classe de problèmes deDirichlet nonlinéaires,, Revue Francaise d'automatique, 2 (1975), 41.   Google Scholar [10] B. He, L. Z. Liao, D. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities,, Mathematical Programming, 92 (2002), 103.  doi: 10.1007/s101070100280.  Google Scholar [11] B. He, S. L. Wang and H. Yang, A modified variable-penalty alternating directions method for monotone variational inequalities,, Journal of Computational Mathematics, 21 (2003), 495.   Google Scholar [12] J. Liu, J. Chen and J. Ye, "Large-Scale Sparse Logistic Regression,", in, (2009).   Google Scholar [13] J. Liu, S. Ji and J. Ye, "Multi-Task Feather Learning Via Efficient $l_{2,1}$-norm Minimization,", in, (2009).   Google Scholar [14] M. Kowalski, Sparse regression using mixednorms,, Applied and Computational Harmonic Analysis, 27 (2009), 303.  doi: 10.1016/j.acha.2009.05.006.  Google Scholar [15] M. Kowalski, M. Szafranski and L. Ralaivola, "Multiple Indefinite Kernel Learning with Mixed Normregularization,", Proceedings of the 26th Annual International Conference on Machine Learning, (2009).   Google Scholar [16] A. Nemirovski, "Efficient Methods in Convex Programming,", Lecture Notes, (1994).   Google Scholar [17] Y. Nesterov, "Introductory Lectures on Convex Optimization: A Basic Course,", Kluwer Academic Publishers, (2003).   Google Scholar [18] Y. Nesterov, "Gradient Methods for Minimizing Composite Objective Function,", CORE report, (2007).   Google Scholar [19] F. Nie, H. Huang, X. Cai and C. Ding, "Efficient and Robust Feature Selection via Joint $l_{2,1}$-Normsminimization,", Neural Information Processing Systems Foundation, (2010).   Google Scholar [20] G. Obozinski, B. Taskar and M. I. Jordan, "Multi-Task Feature Selection,", Technical Report, (2006).   Google Scholar [21] Y. Saeys, I. Inza and P. Larranaga, A review of feature selection techniques in bioinformatics,, Bioinformatics, 23 (2007), 2507.  doi: 10.1093/bioinformatics/btm344.  Google Scholar [22] Y. Xiao, S.-Y. Wu and D.-H. Li, Splitting and linearizing augmented Lagrangian algorithm for subspace recovery from corrupted observations,, Adv. Comput. Math., (): 10444.   Google Scholar [23] T. Xiong, J. Bi, B. Rao and V. Cherkassky, "Probabilistic Joint Feature Selection for Multi-Task Learning,", in, (2006).   Google Scholar [24] M. H. Xu, Proximal alternating directions method for structured variational inequalities,, Journal of Optimization Theory and Applications, 134 (2007), 107.  doi: 10.1007/s10957-007-9192-2.  Google Scholar [25] J. Yang, Dynamic power price problem: An inverse variational inequality approach,, Journal of Industrial and Management Optimization, 4 (2008), 673.   Google Scholar [26] J. Yang and X. Yuan, Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization,, Math. Comput., ().  doi: 10.1090/S0025-5718-2012-02598-1.  Google Scholar [27] J. Yang and Y. Zhang, Alternating direction algorithms for $l_1$-problemsin compressive sensing,, SIAM Journal on Scientific Computing, 33 (2011), 250.  doi: 10.1137/090777761.  Google Scholar [28] J. Zhang, Z. Ghahramani and Y. Yang, Flexible latent variable models for multi-task learning,, Machine Learning, 73 (2008), 221.   Google Scholar
 [1] Hui Gao, Jian Lv, Xiaoliang Wang, Liping Pang. An alternating linearization bundle method for a class of nonconvex optimization problem with inexact information. Journal of Industrial & Management Optimization, 2021, 17 (2) : 805-825. doi: 10.3934/jimo.2019135 [2] Mohammed Abdulrazaq Kahya, Suhaib Abduljabbar Altamir, Zakariya Yahya Algamal. Improving whale optimization algorithm for feature selection with a time-varying transfer function. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 87-98. doi: 10.3934/naco.2020017 [3] Nicholas Geneva, Nicholas Zabaras. Multi-fidelity generative deep learning turbulent flows. Foundations of Data Science, 2020, 2 (4) : 391-428. doi: 10.3934/fods.2020019 [4] Bing Yu, Lei Zhang. Global optimization-based dimer method for finding saddle points. Discrete & Continuous Dynamical Systems - B, 2021, 26 (1) : 741-753. doi: 10.3934/dcdsb.2020139 [5] Noah Stevenson, Ian Tice. A truncated real interpolation method and characterizations of screened Sobolev spaces. Communications on Pure & Applied Analysis, 2020, 19 (12) : 5509-5566. doi: 10.3934/cpaa.2020250 [6] Peter Frolkovič, Viera Kleinová. A new numerical method for level set motion in normal direction used in optical flow estimation. Discrete & Continuous Dynamical Systems - S, 2021, 14 (3) : 851-863. doi: 10.3934/dcdss.2020347 [7] Marion Darbas, Jérémy Heleine, Stephanie Lohrengel. Numerical resolution by the quasi-reversibility method of a data completion problem for Maxwell's equations. Inverse Problems & Imaging, 2020, 14 (6) : 1107-1133. doi: 10.3934/ipi.2020056 [8] Fabian Ziltener. Note on coisotropic Floer homology and leafwise fixed points. Electronic Research Archive, , () : -. doi: 10.3934/era.2021001 [9] José Madrid, João P. G. Ramos. On optimal autocorrelation inequalities on the real line. Communications on Pure & Applied Analysis, 2021, 20 (1) : 369-388. doi: 10.3934/cpaa.2020271 [10] Knut Hüper, Irina Markina, Fátima Silva Leite. A Lagrangian approach to extremal curves on Stiefel manifolds. Journal of Geometric Mechanics, 2020  doi: 10.3934/jgm.2020031 [11] Javier Fernández, Cora Tori, Marcela Zuccalli. Lagrangian reduction of nonholonomic discrete mechanical systems by stages. Journal of Geometric Mechanics, 2020, 12 (4) : 607-639. doi: 10.3934/jgm.2020029 [12] Manuel de León, Víctor M. Jiménez, Manuel Lainz. Contact Hamiltonian and Lagrangian systems with nonholonomic constraints. Journal of Geometric Mechanics, 2020  doi: 10.3934/jgm.2021001 [13] Jesús A. Álvarez López, Ramón Barral Lijó, John Hunton, Hiraku Nozawa, John R. Parker. Chaotic Delone sets. Discrete & Continuous Dynamical Systems - A, 2021  doi: 10.3934/dcds.2021016 [14] Riccarda Rossi, Ulisse Stefanelli, Marita Thomas. Rate-independent evolution of sets. Discrete & Continuous Dynamical Systems - S, 2021, 14 (1) : 89-119. doi: 10.3934/dcdss.2020304 [15] Djamel Aaid, Amel Noui, Özen Özer. Piecewise quadratic bounding functions for finding real roots of polynomials. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 63-73. doi: 10.3934/naco.2020015 [16] Joan Carles Tatjer, Arturo Vieiro. Dynamics of the QR-flow for upper Hessenberg real matrices. Discrete & Continuous Dynamical Systems - B, 2021, 26 (3) : 1359-1403. doi: 10.3934/dcdsb.2020166 [17] Yifan Chen, Thomas Y. Hou. Function approximation via the subsampled Poincaré inequality. Discrete & Continuous Dynamical Systems - A, 2021, 41 (1) : 169-199. doi: 10.3934/dcds.2020296 [18] Wolfgang Riedl, Robert Baier, Matthias Gerdts. Optimization-based subdivision algorithm for reachable sets. Journal of Computational Dynamics, 2021, 8 (1) : 99-130. doi: 10.3934/jcd.2021005 [19] Vito Napolitano, Ferdinando Zullo. Codes with few weights arising from linear sets. Advances in Mathematics of Communications, 2020  doi: 10.3934/amc.2020129 [20] Lisa Hernandez Lucas. Properties of sets of Subspaces with Constant Intersection Dimension. Advances in Mathematics of Communications, 2021, 15 (1) : 191-206. doi: 10.3934/amc.2020052

2019 Impact Factor: 1.366