April 2018, 14(2): 719-729. doi: 10.3934/jimo.2017071

Least absolute deviations learning of multiple tasks

1. 

School of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, China

2. 

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China

3. 

Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

4. 

School of Mathematics and Computer Sciences, Gannan Normal University, Ganzhou 341000, China

Received  December 2015 Published  September 2017

Fund Project: The initial version of this work was done while the first author was a Ph.D. candidate at Nanjing University of Science and Technology

In this paper, we propose a new multitask feature selection model based on least absolute deviations. However, due to the inherent nonsmoothness of $l_1 $ norm, optimizing this model is challenging. To tackle this problem efficiently, we introduce an alternating iterative optimization algorithm. Moreover, under some mild conditions, its global convergence result could be established. Experimental results and comparison with the state-of-the-art algorithm SLEP show the efficiency and effectiveness of the proposed approach in solving multitask learning problems.

Citation: Wei Xue, Wensheng Zhang, Gaohang Yu. Least absolute deviations learning of multiple tasks. Journal of Industrial & Management Optimization, 2018, 14 (2) : 719-729. doi: 10.3934/jimo.2017071
References:
[1]

A. ArgyriouT. Evgeniou and M. Pontil, Convex multi-task feature learning, Mach. Learn., 73 (2008), 243-272.

[2]

Y. Bai and M. Tang, Object tracking via robust multitask sparse representation, IEEE Signal Process. Lett., 21 (2014), 909-913.

[3]

B. Bakker and T. Heskes, Task clustering and gating for Bayesian multitask learning, J. Mach. Learn. Res., 4 (2003), 83-99.

[4]

S. Ben-David and R. Schuller, Exploiting task relatedness for multiple task learning, in Proc. Int. Conf. Learn. Theory, 2777 (2003), 567-580. doi: 10.1007/978-3-540-45167-9_41.

[5]

K. P. BennettJ. HuX. JiG. Kunapuli and J.-S. Pang, Model selection via bilevel optimization in Proc, IEEE Int. Joint Conf. Neural Netw., (2006), 1922-1929.

[6]

X. ChenW. PanJ. T. Kwok and J. G. Carbonell, Accelerated gradient method for multi-task sparse learning problem in Proc, IEEE Int. Conf. Data Min., (2009), 746-751. doi: 10.1109/ICDM.2009.128.

[7]

T. EvgeniouC. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods, J. Mach. Learn. Res., 6 (2005), 615-637.

[8]

D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Comp. Math. Appl., 2 (1976), 17-40. doi: 10.1016/0898-1221(76)90003-1.

[9]

P. GongJ. Ye and C. Zhang, Multi-stage multi-task feature learning in Adv, Neural Inf. Process. Syst., (2012), 1988-1996.

[10]

P. GongJ. Ye and C. Zhang, Robust multi-task feature learning, J. Mach. Learn. Res., 14 (2013), 2979-3010.

[11]

B. HeL. LiaoD. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities, Math. Program., 92 (2002), 103-118. doi: 10.1007/s101070100280.

[12]

Y. HuZ. Wei and G. Yuan, Inexact accelerated proximal gradient algorithms for matrix $l_{2, 1}$-norm minimization problem in multi-task feature learning, Stat., Optim. Inf. Comput., 2 (2014), 352-367. doi: 10.19139/106.

[13]

M. KowalskiM. Szafranski and L. Ralaivola, Multiple indefinite kernel learning with mixed norm regularization, in Proc. Int. Conf. Mach. Learn., (2009), 545-552. doi: 10.1145/1553374.1553445.

[14]

J. LiuS. Ji and J. Ye, Multi-task feature learning via efficient $l_{2,1} $-norm minimization in Proc, Uncertainity Artif. Intell., (2009), 339-348.

[15]

J. Liu, S. Ji and J. Ye, SLEP: Sparse learning with efficient projections, Available from: http://yelab.net/software/SLEP.

[16]

Y. LuY.-E. Ge and L-W. Zhang, An alternating direction method for solving a class of inverse semidefinite quadratic programming problems, J. Ind. Manag. Optim., 12 (2016), 317-336. doi: 10.3934/jimo.2016.12.317.

[17]

F. NieH. HuangX. Cai and C. Ding, Efficient and robust feature selection via joint $l_{2,1} $-norms minimization in Adv, Neural Inf. Process. Syst., (2010), 1813-1821.

[18]

G. ObozinskiB. Taskar and M. I. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems, Statist. Comput., 20 (2010), 231-252. doi: 10.1007/s11222-008-9111-x.

[19]

A. QuattoniX. CarrerasM. Collins and T. Darrell, An efficient projection for $l_{1,∞} $ regularization in Proc, Int. Conf. Mach. Learn., (2009), 857-864.

[20]

Z. ShenZ. Geng and J. Yang, Image reconstruction from incomplete convolution data via total variation regularization, Stat., Optim. Inf. Comput., 3 (2015), 1-14. doi: 10.19139/121.

[21]

S. XiangF. NieG. MengC. Pan and C. Zhang, Discriminative least squares regression for multiclass classification and feature selection, IEEE Trans. Neural Netw. Learn. Syst., 23 (2012), 1738-1754.

[22]

Y. XiaoS.-Y. Wu and B. He, A proximal alternating direction method for $l_{2,1} $-norm least squares problem in multi-task feature learning, J. Ind. Manag. Optim., 8 (2012), 1057-1069. doi: 10.3934/jimo.2012.8.1057.

[23]

T. XiongJ. BiB. Rao and V. Cherkassky, Probabilistic joint feature selection for multi-task learning in Proc, SIAM Int. Conf. Data Min., (2007), 332-342. doi: 10.1137/1.9781611972771.30.

[24]

W. Xue and W. Zhang, Learning a coupled linearized method in online setting, IEEE Trans. Neural Netw. Learn. Syst., 28 (2017), 438-450. doi: 10.1109/TNNLS.2016.2514413.

[25]

W. Xue and W. Zhang, Online weighted multi-task feature selection in Proc, Int. Conf. Neural Inf. Process., (2016), 195-203. doi: 10.1007/978-3-319-46672-9_23.

[26]

H. YangM. R. Lyu and I. King, Efficient online learning for multitask feature selection, ACM Trans. Knowl. Discov. Data, 7 (2013), 1-27.

[27]

L. YangT. K. Pong and X. Chen, Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction, SIAM J. Imaging Sci., 10 (2017), 74-110, arXiv:1506. doi: 10.1137/15M1027528.

[28]

G. YuW. Xue and Y. Zhou, A nonmonotone adaptive projected gradient method for primal-dual total variation image restoration, Signal Process., 103 (2014), 242-249. doi: 10.1016/j.sigpro.2014.02.025.

show all references

References:
[1]

A. ArgyriouT. Evgeniou and M. Pontil, Convex multi-task feature learning, Mach. Learn., 73 (2008), 243-272.

[2]

Y. Bai and M. Tang, Object tracking via robust multitask sparse representation, IEEE Signal Process. Lett., 21 (2014), 909-913.

[3]

B. Bakker and T. Heskes, Task clustering and gating for Bayesian multitask learning, J. Mach. Learn. Res., 4 (2003), 83-99.

[4]

S. Ben-David and R. Schuller, Exploiting task relatedness for multiple task learning, in Proc. Int. Conf. Learn. Theory, 2777 (2003), 567-580. doi: 10.1007/978-3-540-45167-9_41.

[5]

K. P. BennettJ. HuX. JiG. Kunapuli and J.-S. Pang, Model selection via bilevel optimization in Proc, IEEE Int. Joint Conf. Neural Netw., (2006), 1922-1929.

[6]

X. ChenW. PanJ. T. Kwok and J. G. Carbonell, Accelerated gradient method for multi-task sparse learning problem in Proc, IEEE Int. Conf. Data Min., (2009), 746-751. doi: 10.1109/ICDM.2009.128.

[7]

T. EvgeniouC. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods, J. Mach. Learn. Res., 6 (2005), 615-637.

[8]

D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Comp. Math. Appl., 2 (1976), 17-40. doi: 10.1016/0898-1221(76)90003-1.

[9]

P. GongJ. Ye and C. Zhang, Multi-stage multi-task feature learning in Adv, Neural Inf. Process. Syst., (2012), 1988-1996.

[10]

P. GongJ. Ye and C. Zhang, Robust multi-task feature learning, J. Mach. Learn. Res., 14 (2013), 2979-3010.

[11]

B. HeL. LiaoD. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities, Math. Program., 92 (2002), 103-118. doi: 10.1007/s101070100280.

[12]

Y. HuZ. Wei and G. Yuan, Inexact accelerated proximal gradient algorithms for matrix $l_{2, 1}$-norm minimization problem in multi-task feature learning, Stat., Optim. Inf. Comput., 2 (2014), 352-367. doi: 10.19139/106.

[13]

M. KowalskiM. Szafranski and L. Ralaivola, Multiple indefinite kernel learning with mixed norm regularization, in Proc. Int. Conf. Mach. Learn., (2009), 545-552. doi: 10.1145/1553374.1553445.

[14]

J. LiuS. Ji and J. Ye, Multi-task feature learning via efficient $l_{2,1} $-norm minimization in Proc, Uncertainity Artif. Intell., (2009), 339-348.

[15]

J. Liu, S. Ji and J. Ye, SLEP: Sparse learning with efficient projections, Available from: http://yelab.net/software/SLEP.

[16]

Y. LuY.-E. Ge and L-W. Zhang, An alternating direction method for solving a class of inverse semidefinite quadratic programming problems, J. Ind. Manag. Optim., 12 (2016), 317-336. doi: 10.3934/jimo.2016.12.317.

[17]

F. NieH. HuangX. Cai and C. Ding, Efficient and robust feature selection via joint $l_{2,1} $-norms minimization in Adv, Neural Inf. Process. Syst., (2010), 1813-1821.

[18]

G. ObozinskiB. Taskar and M. I. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems, Statist. Comput., 20 (2010), 231-252. doi: 10.1007/s11222-008-9111-x.

[19]

A. QuattoniX. CarrerasM. Collins and T. Darrell, An efficient projection for $l_{1,∞} $ regularization in Proc, Int. Conf. Mach. Learn., (2009), 857-864.

[20]

Z. ShenZ. Geng and J. Yang, Image reconstruction from incomplete convolution data via total variation regularization, Stat., Optim. Inf. Comput., 3 (2015), 1-14. doi: 10.19139/121.

[21]

S. XiangF. NieG. MengC. Pan and C. Zhang, Discriminative least squares regression for multiclass classification and feature selection, IEEE Trans. Neural Netw. Learn. Syst., 23 (2012), 1738-1754.

[22]

Y. XiaoS.-Y. Wu and B. He, A proximal alternating direction method for $l_{2,1} $-norm least squares problem in multi-task feature learning, J. Ind. Manag. Optim., 8 (2012), 1057-1069. doi: 10.3934/jimo.2012.8.1057.

[23]

T. XiongJ. BiB. Rao and V. Cherkassky, Probabilistic joint feature selection for multi-task learning in Proc, SIAM Int. Conf. Data Min., (2007), 332-342. doi: 10.1137/1.9781611972771.30.

[24]

W. Xue and W. Zhang, Learning a coupled linearized method in online setting, IEEE Trans. Neural Netw. Learn. Syst., 28 (2017), 438-450. doi: 10.1109/TNNLS.2016.2514413.

[25]

W. Xue and W. Zhang, Online weighted multi-task feature selection in Proc, Int. Conf. Neural Inf. Process., (2016), 195-203. doi: 10.1007/978-3-319-46672-9_23.

[26]

H. YangM. R. Lyu and I. King, Efficient online learning for multitask feature selection, ACM Trans. Knowl. Discov. Data, 7 (2013), 1-27.

[27]

L. YangT. K. Pong and X. Chen, Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction, SIAM J. Imaging Sci., 10 (2017), 74-110, arXiv:1506. doi: 10.1137/15M1027528.

[28]

G. YuW. Xue and Y. Zhou, A nonmonotone adaptive projected gradient method for primal-dual total variation image restoration, Signal Process., 103 (2014), 242-249. doi: 10.1016/j.sigpro.2014.02.025.

Figure 1.  The first row shows the convergence results of SLEP and LADL21 when $Cov=diag\{1, 0.25, 0.1, 0.05, 0.01\}$. The second row shows the convergence results of SLEP and LADL21 when $Cov=diag\{0.81, 0.64, 0.49, 0.36, 0.25, 0.16, 0.09, 0.04\}$
Figure 2.  Comparison results of SLEP and LADL21 on the School data
Input: $\mathbf{r}_0$, $\boldsymbol{\lambda}_0$, $\mu$, $\beta$, and $\tau$.
1while "not converged", do
2Compute $W_{k+1}$ according to (15) for given $\mathbf{r}_k$ and $\boldsymbol{\lambda}_k$.
3Compute $\mathbf{r}_{k+1}$ according to (17) for given $W_{k+1}$ and $\boldsymbol{\lambda}_k$.
4Update the multipliers $\boldsymbol{\lambda}_{k+1}$ according to (18).
5end while
 Algorithm 1:LADL21-An efficient iterative algorithm to solve the optimization problem in (5).
Input: $\mathbf{r}_0$, $\boldsymbol{\lambda}_0$, $\mu$, $\beta$, and $\tau$.
1while "not converged", do
2Compute $W_{k+1}$ according to (15) for given $\mathbf{r}_k$ and $\boldsymbol{\lambda}_k$.
3Compute $\mathbf{r}_{k+1}$ according to (17) for given $W_{k+1}$ and $\boldsymbol{\lambda}_k$.
4Update the multipliers $\boldsymbol{\lambda}_{k+1}$ according to (18).
5end while
 Algorithm 1:LADL21-An efficient iterative algorithm to solve the optimization problem in (5).
Table 1.  Comparison of the RelErr
(m, n, k)SLEPLADL21
(10000, 20,100)0.01390.0025
(20000, 20,200)0.01360.0017
(30000, 20,300)0.01400.0015
(40000, 20,400)0.01380.0016
(10000, 30,100)0.01410.0016
(20000, 30,200)0.01990.0016
(30000, 30,300)0.01430.0017
(40000, 30,400)0.02020.0018
(10000, 40,100)0.01870.0018
(20000, 40,200)0.01460.0018
(30000, 40,300)0.01540.0020
(40000, 40,400)0.01880.0021
(10000, 50,100)0.01700.0020
(20000, 50,200)0.01790.0022
(30000, 50,300)0.01770.0022
(40000, 50,400)0.01760.0024
(m, n, k)SLEPLADL21
(10000, 20,100)0.01390.0025
(20000, 20,200)0.01360.0017
(30000, 20,300)0.01400.0015
(40000, 20,400)0.01380.0016
(10000, 30,100)0.01410.0016
(20000, 30,200)0.01990.0016
(30000, 30,300)0.01430.0017
(40000, 30,400)0.02020.0018
(10000, 40,100)0.01870.0018
(20000, 40,200)0.01460.0018
(30000, 40,300)0.01540.0020
(40000, 40,400)0.01880.0021
(10000, 50,100)0.01700.0020
(20000, 50,200)0.01790.0022
(30000, 50,300)0.01770.0022
(40000, 50,400)0.01760.0024
[1]

Yunhai Xiao, Soon-Yi Wu, Bing-Sheng He. A proximal alternating direction method for $\ell_{2,1}$-norm least squares problem in multi-task feature learning. Journal of Industrial & Management Optimization, 2012, 8 (4) : 1057-1069. doi: 10.3934/jimo.2012.8.1057

[2]

Russell E. Warren, Stanley J. Osher. Hyperspectral unmixing by the alternating direction method of multipliers. Inverse Problems & Imaging, 2015, 9 (3) : 917-933. doi: 10.3934/ipi.2015.9.917

[3]

Feishe Chen, Lixin Shen, Yuesheng Xu, Xueying Zeng. The Moreau envelope approach for the L1/TV image denoising model. Inverse Problems & Imaging, 2014, 8 (1) : 53-77. doi: 10.3934/ipi.2014.8.53

[4]

Foxiang Liu, Lingling Xu, Yuehong Sun, Deren Han. A proximal alternating direction method for multi-block coupled convex optimization. Journal of Industrial & Management Optimization, 2018, 13 (5) : 1-15. doi: 10.3934/jimo.2018067

[5]

Linghua Chen, Espen R. Jakobsen. L1 semigroup generation for Fokker-Planck operators associated to general Lévy driven SDEs. Discrete & Continuous Dynamical Systems - A, 2018, 38 (11) : 5735-5763. doi: 10.3934/dcds.2018250

[6]

Yingying Li, Stanley Osher. Coordinate descent optimization for l1 minimization with application to compressed sensing; a greedy algorithm. Inverse Problems & Imaging, 2009, 3 (3) : 487-503. doi: 10.3934/ipi.2009.3.487

[7]

Bingsheng He, Xiaoming Yuan. Linearized alternating direction method of multipliers with Gaussian back substitution for separable convex programming. Numerical Algebra, Control & Optimization, 2013, 3 (2) : 247-260. doi: 10.3934/naco.2013.3.247

[8]

Yuan Shen, Lei Ji. Partial convolution for total variation deblurring and denoising by new linearized alternating direction method of multipliers with extension step. Journal of Industrial & Management Optimization, 2018, 13 (5) : 1-17. doi: 10.3934/jimo.2018037

[9]

Yue Lu, Ying-En Ge, Li-Wei Zhang. An alternating direction method for solving a class of inverse semi-definite quadratic programming problems. Journal of Industrial & Management Optimization, 2016, 12 (1) : 317-336. doi: 10.3934/jimo.2016.12.317

[10]

Zhongming Wu, Xingju Cai, Deren Han. Linearized block-wise alternating direction method of multipliers for multiple-block convex programming. Journal of Industrial & Management Optimization, 2018, 14 (3) : 833-855. doi: 10.3934/jimo.2017078

[11]

Jianguo Dai, Wenxue Huang, Yuanyi Pan. A category-based probabilistic approach to feature selection. Big Data & Information Analytics, 2017, 2 (5) : 1-8. doi: 10.3934/bdia.2017020

[12]

Peng Zhang. Chance-constrained multiperiod mean absolute deviation uncertain portfolio selection. Journal of Industrial & Management Optimization, 2018, 13 (5) : 1-28. doi: 10.3934/jimo.2018056

[13]

Lei Wu, Zhe Sun. A new spectral method for $l_1$-regularized minimization. Inverse Problems & Imaging, 2015, 9 (1) : 257-272. doi: 10.3934/ipi.2015.9.257

[14]

Yanqing Liu, Jiyuan Tao, Huan Zhang, Xianchao Xiu, Lingchen Kong. Fused LASSO penalized least absolute deviation estimator for high dimensional linear regression. Numerical Algebra, Control & Optimization, 2018, 8 (1) : 97-117. doi: 10.3934/naco.2018006

[15]

Mohamed A. Tawhid, Kevin B. Dsouza. Hybrid binary dragonfly enhanced particle swarm optimization algorithm for solving feature selection problems. Mathematical Foundations of Computing, 2018, 1 (2) : 181-200. doi: 10.3934/mfc.2018009

[16]

Peng Zhang. Multiperiod mean semi-absolute deviation interval portfolio selection with entropy constraints. Journal of Industrial & Management Optimization, 2017, 13 (3) : 1169-1187. doi: 10.3934/jimo.2016067

[17]

Boshi Tian, Xiaoqi Yang, Kaiwen Meng. An interior-point $l_{\frac{1}{2}}$-penalty method for inequality constrained nonlinear optimization. Journal of Industrial & Management Optimization, 2016, 12 (3) : 949-973. doi: 10.3934/jimo.2016.12.949

[18]

Huiqing Zhu, Runchang Lin. $L^\infty$ estimation of the LDG method for 1-d singularly perturbed convection-diffusion problems. Discrete & Continuous Dynamical Systems - B, 2013, 18 (5) : 1493-1505. doi: 10.3934/dcdsb.2013.18.1493

[19]

Dan Li, Li-Ping Pang, Fang-Fang Guo, Zun-Quan Xia. An alternating linearization method with inexact data for bilevel nonsmooth convex optimization. Journal of Industrial & Management Optimization, 2014, 10 (3) : 859-869. doi: 10.3934/jimo.2014.10.859

[20]

Ying Hao, Fanwen Meng. A new method on gene selection for tissue classification. Journal of Industrial & Management Optimization, 2007, 3 (4) : 739-748. doi: 10.3934/jimo.2007.3.739

2017 Impact Factor: 0.994

Metrics

  • PDF downloads (26)
  • HTML views (414)
  • Cited by (0)

Other articles
by authors

[Back to Top]