\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

A variance-reduced stochastic gradient tracking algorithm for decentralized optimization with orthogonality constraints

  • *Corresponding author: Xin Liu

    *Corresponding author: Xin Liu

The first author is supported by the National Key R & D Program of China (No. 2020YFA0711900, 2020YFA0711904). The second author is supported in part by the National Natural Science Foundation of China (No. 12125108, 11971466, 12288201, 12021001, 11991021) and Key Research Program of Frontier Sciences, Chinese Academy of Sciences (No. ZDBS-LY-7022).

Abstract / Introduction Full Text(HTML) Figure(3) / Table(2) Related Papers Cited by
  • Decentralized optimization with orthogonality constraints is found widely in scientific computing and data science. Since the orthogonality constraints are nonconvex, it is quite challenging to design efficient algorithms. Existing approaches leverage the geometric tools from Riemannian optimization to solve this problem at the cost of high sampling and communication complexities. To relieve this difficulty, based on two novel techniques that can waive the orthogonality constraints, we propose a variance-reduced stochastic gradient tracking (VRSGT) algorithm with the convergence rate of $ O(1 / k) $ to a stationary point. To the best of our knowledge, VRSGT is the first algorithm for decentralized optimization with orthogonality constraints that reduces both sampling and communication complexities simultaneously. In the numerical experiments, VRSGT has a promising performance in a real-world autonomous driving application.

    Mathematics Subject Classification: Primary: 90C06, 90C30; Secondary: 90C15.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Comparison of VRSGT for different values of $ \beta $

    Figure 2.  Comparison between VRSGT and DRSGD in solving the decentralized PCA problem

    Figure 3.  Recovery results of four frames from the KITTI dataset with inliers in blue and outliers in red. Both inliers and outliers are detected by using a threshold on the distance to the hyperplane recovered by each tested algorithm

    Table 1.  A comparison of sampling and communication complexities to find a first-order $ \epsilon $-stationary point (to de defined in Definition 2.3) on decentralized algorithms for the problem (3)

    Algorithm Sampling complexity Communication complexity
    DRGTA [5] $ O(dl / \epsilon) $ $ O(t / \epsilon) $1
    DRSGD [5] $ O(d / \epsilon^2) $ $ O(t / \epsilon^2) $1
    VRSGT (this work) $ O(d\sqrt{l} / \epsilon) $ $ O(1 / \epsilon) $
    1$ t \geq \lceil O(- \log_{\lambda} (2 \sqrt{d})) \rceil $. For sparse networks, $ t $ can be $ O(d^2 \log (d) $ [5].
     | Show Table
    DownLoad: CSV

    Table 2.  Recovery errors of VRSGT and DRSGD on selected frames from the KITTI dataset

    Dataset KITTI-CITY-5 KITTI-CITY-48 KITTI-CITY-71
    Frame 1 45 0 328
    VRSGT 1.074e-02 7.997e-03 3.399e-02 4.088e-02
    DRSGD 1.063e-01 9.618e-02 1.646e-01 2.144e-01
     | Show Table
    DownLoad: CSV
  • [1] P.-A. AbsilR. Mahony and  R. SepulchreOptimization Algorithms on Matrix Manifolds, Princeton University Press, 2008.  doi: 10.1515/9781400830244.
    [2] M. Arjovsky, A. Shah and Y. Bengio, Unitary evolution recurrent neural networks, in International Conference on Machine Learning, PMLR, 2016, 1120-1128.
    [3] T.-H. ChangM. HongH.-T. WaiX. Zhang and S. Lu, Distributed learning in the nonconvex world: From batch data to streaming and beyond, IEEE Signal Processing Magazine, 37 (2020), 26-38.  doi: 10.1109/MSP.2020.2970170.
    [4] T.-H. ChangM. Hong and X. Wang, Multi-agent distributed optimization via inexact consensus ADMM, IEEE Transactions on Signal Processing, 63 (2015), 482-497.  doi: 10.1109/TSP.2014.2367458.
    [5] S. Chen, A. Garcia, M. Hong and S. Shahrampour, Decentralized Riemannian gradient descent on the Stiefel manifold, in Proceedings of the 38th International Conference on Machine Learning, vol. 139, PMLR, 2021, 1594-1605.
    [6] T. Ding, Z. Zhu, T. Ding, Y. Yang, D. P. Robinson, M. C. Tsakiris and R. Vidal, Noisy dual principal component pursuit, in Proceedings of the 36th International Conference on Machine Learning, 2019
    [7] B. Gao, X. Liu and Y.-X. Yuan, Parallelizable algorithms for optimization problems with orthogonality constraints, SIAM Journal on Scientific Computing, 41 (2019), A1949-A1983. doi: 10.1137/18M1221679.
    [8] A. GeigerP. LenzC. Stiller and R. Urtasun, Vision meets robotics: The KITTI dataset, The International Journal of Robotics Research, 32 (2013), 1231-1237.  doi: 10.1177/0278364913491297.
    [9] D. Hajinezhad and M. Hong, Perturbed proximal primal–dual algorithm for nonconvex nonsmooth optimization, Mathematical Programming, 176 (2019), 207-245.  doi: 10.1007/s10107-019-01365-4.
    [10] R. Hartley and  A. ZissermanMultiple View Geometry in Computer Vision, Cambridge University Press, 2003. 
    [11] M. Hong, D. Hajinezhad and M.-M. Zhao, Prox-PDA: The proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks, in International Conference on Machine Learning, PMLR, 2017, 1529-1538.
    [12] J. HuX. LiuZ.-W. Wen and Y.-X. Yuan, A brief introduction to manifold optimization, Journal of the Operations Research Society of China, 8 (2020), 199-248.  doi: 10.1007/s40305-020-00295-9.
    [13] X. Hu and X. Liu, An efficient orthonormalization-free approach for sparse dictionary learning and dual principal component pursuit, Sensors, 20 (2020), 3041.  doi: 10.3390/s20113041.
    [14] L. Huang, X. Liu, B. Lang, A. W. Yu, Y. Wang and B. Li, Orthogonal weight normalization: Solution to optimization over multiple dependent Stiefel manifolds in deep neural networks, in Thirty-Second AAAI Conference on Artificial Intelligence, vol. 32, 2018. doi: 10.1609/aaai.v32i1.11768.
    [15] L.-K. Huang and S. Pan, Communication-efficient distributed PCA by Riemannian optimization, in International Conference on Machine Learning, PMLR, 2020, 4465-4474.
    [16] B. Jiang, S. Ma, A. M.-C. So and S. Zhang, Vector transport-free SVRG with general retraction for Riemannian optimization: Complexity analysis and practical implementation, arXiv: 1705.09059.
    [17] D. Kempe and F. McSherry, A decentralized algorithm for spectral analysis, Journal of Computer and System Sciences, 74 (2008), 70-83.  doi: 10.1016/j.jcss.2007.04.014.
    [18] Y. LeCunL. BottouY. Bengio and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86 (1998), 2278-2324.  doi: 10.1109/5.726791.
    [19] X. Lian, C. Zhang, H. Zhang, C.-J. Hsieh, W. Zhang and J. Liu, Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent, Advances in Neural Information Processing Systems, 30.
    [20] Q. LingW. ShiG. Wu and A. Ribeiro, DLM: Decentralized linearized alternating direction method of multipliers, IEEE Transactions on Signal Processing, 63 (2015), 4051-4064.  doi: 10.1109/TSP.2015.2436358.
    [21] H. LiuA. M.-C. So and W. Wu, Quadratic optimization with orthogonality constraint: Explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods, Mathematical Programming, 178 (2019), 215-262.  doi: 10.1007/s10107-018-1285-1.
    [22] A. NedićA. Olshevsky and M. G. Rabbat, Network topology and communication-computation tradeoffs in decentralized optimization, Proceedings of the IEEE, 106 (2018), 953-976.  doi: 10.1109/JPROC.2018.2817461.
    [23] A. Nedić and A. Ozdaglar, Distributed subgradient methods for multi-agent optimization, IEEE Transactions on Automatic Control, 54 (2009), 48-61.  doi: 10.1109/TAC.2008.2009515.
    [24] F. Penna and S. Stańczak, Decentralized eigenvalue algorithms for distributed signal detection in wireless networks, IEEE Transactions on Signal Processing, 63 (2014), 427-440.  doi: 10.1109/TSP.2014.2373334.
    [25] S. U. PillaiT. Suel and S. Cha, The Perron-Frobenius theorem: Some of its applications, IEEE Signal Processing Magazine, 22 (2005), 62-75.  doi: 10.1109/MSP.2005.1406483.
    [26] C. Qi, K. A. Gallivan and P.-A. Absil, Riemannian BFGS algorithm with applications, in Recent Advances in Optimization and its Applications in Engineering, Springer, 2010,183-192. doi: 10.1007/978-3-642-12598-0_16.
    [27] G. Qu and N. Li, Harnessing smoothness to accelerate distributed optimization, IEEE Transactions on Control of Network Systems, 5 (2017), 1245-1260.  doi: 10.1109/TCNS.2017.2698261.
    [28] H. Raja and W. U. Bajwa, Cloud K-SVD: A collaborative dictionary learning algorithm for big, distributed data, IEEE Transactions on Signal Processing, 64 (2015), 173-188.  doi: 10.1109/TSP.2015.2472372.
    [29] H. SatoH. Kasai and B. Mishra, Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport, SIAM Journal on Optimization, 29 (2019), 1444-1472.  doi: 10.1137/17M1116787.
    [30] W. ShiQ. LingG. Wu and W. Yin, EXTRA: An exact first-order algorithm for decentralized consensus optimization, SIAM Journal on Optimization, 25 (2015), 944-966.  doi: 10.1137/14096668X.
    [31] E. Stiefel, Richtungsfelder und fernparallelismus in n-dimensionalen mannigfaltigkeiten, Commentarii Mathematici Helvetici, 8 (1935), 305-353.  doi: 10.1007/BF01199559.
    [32] H. Sun, S. Lu and M. Hong, Improving the sample and communication complexity for decentralized non-convex optimization: Joint gradient estimation and tracking, in Proceedings of the 37th International Conference on Machine Learning, PMLR, 2020, 9217-9228.
    [33] E. Vorontsov, C. Trabelsi, S. Kadoury and C. Pal, On orthogonality and learning recurrent networks with long term dependencies, in International Conference on Machine Learning, PMLR, 2017, 3570-3578.
    [34] L. WangB. Gao and X. Liu, Multipliers correction methods for optimization problems over the Stiefel manifold, CSIAM Transactions on Applied Mathematics, 2 (2021), 508-531.  doi: 10.4208/csiam-am.SO-2020-0008.
    [35] L. Xiao and S. Boyd, Fast linear iterations for distributed averaging, Systems & Control Letters, 53 (2004), 65-78.  doi: 10.1016/j.sysconle.2004.02.022.
    [36] N. XiaoX. Liu and Y.-X. Yuan, A class of smooth exact penalty function methods for optimization problems with orthogonality constraints, Optimization Methods and Software, 37 (2022), 1205-1241.  doi: 10.1080/10556788.2020.1852236.
    [37] R. XinU. A. Khan and S. Kar, Fast decentralized nonconvex finite-sum optimization with recursive variance reduction, SIAM Journal on Optimization, 32 (2022), 1-28.  doi: 10.1137/20M1361158.
    [38] R. XinS. PuA. Nedić and U. A. Khan, A general framework for decentralized optimization with first-order methods, Proceedings of the IEEE, 108 (2020), 1869-1889.  doi: 10.1109/JPROC.2020.3024266.
    [39] J. Xu, S. Zhu, Y. C. Soh and L. Xie, Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes, in 54th IEEE Conference on Decision and Control, IEEE, 2015, 2055-2060. doi: 10.1109/CDC.2015.7402509.
    [40] K. YuanQ. Ling and W. Yin, On the convergence of decentralized gradient descent, SIAM Journal on Optimization, 26 (2016), 1835-1854.  doi: 10.1137/130943170.
    [41] J. Zeng and W. Yin, On nonconvex decentralized gradient descent, IEEE Transactions on Signal Processing, 66 (2018), 2834-2848.  doi: 10.1109/TSP.2018.2818081.
    [42] H. Zhang, S. J Reddi and S. Sra, Riemannian SVRG: Fast stochastic optimization on Riemannian manifolds, Advances in Neural Information Processing Systems, 29.
    [43] Z. Zhu, Y. Wang, D. Robinson, D. Naiman, R. Vidal and M. Tsakiris, Dual principal component pursuit: Improved analysis and efficient algorithms, Advances in Neural Information Processing Systems, 31.
  • 加载中

Figures(3)

Tables(2)

SHARE

Article Metrics

HTML views(2298) PDF downloads(140) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return