Article Contents
Article Contents

# A proximal alternating direction method for $\ell_{2,1}$-norm least squares problem in multi-task feature learning

• The joint feature selection problem arises in many fields including computer vision, text classification and biomedical informatics. Generally, recent results show that it can be realized by solving a $\ell_{2,1}$-norm involved minimization problem. However, solving the optimization problem is a challenging task due to the non-smoothness of the regularization term. In this paper, we reformulate the problem to an equivalent constrained minimization problem by introducing an auxiliary variable. We split the corresponding augmented Lagrange function and minimize the subproblem alternatively with one variable by fixing the other one. Moreover, we linearize the subproblem and add a proximal-point term when the closed-form solutions are not easily to derived. The convergence analysis and the relatedness with other algorithms are also given. Although the $\ell_{2,1}$-norm is mainly considered, we show that the $\ell_{\infty,1}$-norm penalized learning problem can also be readily solved in our framework. The reported experiments on simulated and real data sets show that the proposed method is effective and promising. The performance comparisons illustrate that the proposed algorithm is competitive with even performs little better than the state-of-the-art solver SLEP.
Mathematics Subject Classification: Primary: 90C30, 90C25; Secondary: 65K05.

 Citation:

•  [1] R. K. Ando and T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeleddata, Journal of Machine Learning Research, 6 (2005), 1817-1853. [2] A. Argyriou, T. Evgeniou and M. Pontil, Convex multi-convex feature learning, Machine Learning, 73 (2008), 243-272. [3] B. Bakker and T. Heskes, Task clustering and gating for Bayesian multi-task learning, Journal of Machine Learning Research, 4 (2003), 83-99. [4] S. Chen, D. Donoho and M. Saunders, Atomic decomposition by basis pursuit, SIAM Journal on Scientific Computing, 20 (1999), 33-61.doi: 10.1137/S1064827596304010. [5] J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research, 10 (2009), 2899-2934. [6] T. Evgeniou, C. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods, Journal of Machine Learning Research, 6 (2005), 615-637. [7] D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Computers & Mathematics with Applications, 2 (1976), 17-40. [8] R. Glowinski, "Numerical Methods for Nonlinear Variational Problems," Springer, New York, 1984. [9] R. Glowinski and A. Marrocco, Sur l'approximation, par élémentsfinis d'ordre un, et la résolution, parpénalisation-dualité d'une classe de problèmes deDirichlet nonlinéaires, Revue Francaise d'automatique, informatique, recherche opéretionnelle. Analyse numérique, 2 (1975), 41-76. [10] B. He, L. Z. Liao, D. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities, Mathematical Programming, 92 (2002), 103-118.doi: 10.1007/s101070100280. [11] B. He, S. L. Wang and H. Yang, A modified variable-penalty alternating directions method for monotone variational inequalities, Journal of Computational Mathematics, 21 (2003), 495-504. [12] J. Liu, J. Chen and J. Ye, "Large-Scale Sparse Logistic Regression," in "ACM SIGKDD International Conference On KnowledgeDiscovery and Data Mining", 2009. [13] J. Liu, S. Ji and J. Ye, "Multi-Task Feather Learning Via Efficient $l_{2,1}$-norm Minimization," in "Comference on Uncertainty in Artificial Intelligence", 2009. [14] M. Kowalski, Sparse regression using mixednorms, Applied and Computational Harmonic Analysis, 27 (2009), 303-324.doi: 10.1016/j.acha.2009.05.006. [15] M. Kowalski, M. Szafranski and L. Ralaivola, "Multiple Indefinite Kernel Learning with Mixed Normregularization," Proceedings of the 26th Annual International Conference on Machine Learning, 2009. [16] A. Nemirovski, "Efficient Methods in Convex Programming," Lecture Notes, 1994. [17] Y. Nesterov, "Introductory Lectures on Convex Optimization: A Basic Course," Kluwer Academic Publishers, 2003. [18] Y. Nesterov, "Gradient Methods for Minimizing Composite Objective Function," CORE report, 2007; available at http://www.ecore.be/DPs/dp_1191313936.pdf. [19] F. Nie, H. Huang, X. Cai and C. Ding, "Efficient and Robust Feature Selection via Joint $l_{2,1}$-Normsminimization," Neural Information Processing Systems Foundation, 2010. [20] G. Obozinski, B. Taskar and M. I. Jordan, "Multi-Task Feature Selection," Technical Report, UC Berkeley, 2006. [21] Y. Saeys, I. Inza and P. Larranaga, A review of feature selection techniques in bioinformatics, Bioinformatics, 23 (2007), 2507-2517.doi: 10.1093/bioinformatics/btm344. [22] Y. Xiao, S.-Y. Wu and D.-H. Li, Splitting and linearizing augmented Lagrangian algorithm for subspace recovery from corrupted observations, Adv. Comput. Math., DOI 10.1007/s10444-011-9261-9. [23] T. Xiong, J. Bi, B. Rao and V. Cherkassky, "Probabilistic Joint Feature Selection for Multi-Task Learning," in "SIAM International Conference on Data Mining", 2006. [24] M. H. Xu, Proximal alternating directions method for structured variational inequalities, Journal of Optimization Theory and Applications, 134 (2007), 107-117.doi: 10.1007/s10957-007-9192-2. [25] J. Yang, Dynamic power price problem: An inverse variational inequality approach, Journal of Industrial and Management Optimization, 4 (2008), 673-684. [26] J. Yang and X. Yuan, Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization, Math. Comput. doi: 10.1090/S0025-5718-2012-02598-1. [27] J. Yang and Y. Zhang, Alternating direction algorithms for $l_1$-problemsin compressive sensing, SIAM Journal on Scientific Computing, 33 (2011), 250-278.doi: 10.1137/090777761. [28] J. Zhang, Z. Ghahramani and Y. Yang, Flexible latent variable models for multi-task learning, Machine Learning, 73 (2008), 221-242.