• Previous Article
    Homotopy perturbation method and Chebyshev polynomials for solving a class of singular and hypersingular integral equations
  • NACO Home
  • This Issue
  • Next Article
    On a two-phase approximate greatest descent method for nonlinear optimization with equality constraints
September  2018, 8(3): 327-336. doi: 10.3934/naco.2018021

Approximate greatest descent in neural network optimization

1. 

Faculty of Engineering and Science, Curtin University Malaysia, Malaysia

2. 

Department of Aerospace and Software Engineering, Gyeongsang National University, South Korea

* Corresponding author: King Hann Lim

Thank you Professor Goh Bean San for in depth explanation on AGD

Received  April 2017 Revised  December 2017 Published  June 2018

Numerical optimization is required in artificial neural network to update weights iteratively for learning capability. In this paper, we propose the use of Approximate Greatest Descent (AGD) algorithm to optimize neural network weights using long-term backpropagation manner. The modification and development of AGD into stochastic diagonal AGD (SDAGD) algorithm could improve the learning ability and structural simplicity for deep learning neural networks. It is derived from the operation of a multi-stage decision control system which consists of two phases: (1) when local search region does not contain the minimum point, iteration shall be defined at the boundary of the local search region, (2) when local region contains the minimum point, Newton method is approximated for faster convergence. The integration of SDAGD into Multilayered perceptron (MLP) network is investigated with the goal of improving the learning ability and structural simplicity. Simulation results showed that two-layer MLP with SDAGD achieved a misclassification rate of 9.4% on a smaller mixed national institute of national and technology (MNIST) dataset. MNIST is a database equipped with handwritten digits images suitable for algorithm prototyping in artificial neural networks.

Citation: King Hann Lim, Hong Hui Tan, Hendra G. Harno. Approximate greatest descent in neural network optimization. Numerical Algebra, Control & Optimization, 2018, 8 (3) : 327-336. doi: 10.3934/naco.2018021
References:
[1]

S. AmariH. Park and K. Fukumizu, Adaptive method of realizing natural gradient learning for multilayer perceptron, Neural Compt., 12 (2000), 436-444. doi: 10.1162/089976600300015420. Google Scholar

[2]

S. Becker and Y. LeCun, Improving the convergence of backpropagation learning with second order methods, Proc. of the Con. Models Summer School, (1988), 29-37. Google Scholar

[3]

Y. Bengio, Learning deep architectures for AI, Foundations and trends in Machine Learning, 2 (2009), 1-127. Google Scholar

[4]

L. Bottou, Large-scale machine learning with stochastic gradient descent, Proc. of COMPSTAT, (2010), 177-186. Google Scholar

[5]

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Aistats, 9 (2010), 249-256. Google Scholar

[6]

B. S. Goh, Greatest descent algorithms in unconstrained optimization, J. Optim. Theory Appl., 142 (2009), 275-289. doi: 10.1007/s10957-009-9533-4. Google Scholar

[7]

B. S. Goh, Numerical method in optimization as a multi-stage decision control system, Latest Advances in Systems Science and Computational Intelligence, (2012), 25-30. Google Scholar

[8]

Y. LeCun, L. Bottou, G. B. Orr and K. R. Müller, Efficient backprop, Neural Networks: Tricks of the Trade, Springer, (2012), 9–48.Google Scholar

[9]

Y. LeCunY. Bengio and G. Hinton, Deep learning, Nature, 521 (2015), 436-444. doi: 10.1038/nature14539. Google Scholar

[10]

Y. LeCunL. BottouY. Bengio and P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998), 2278-2323. doi: 10.1109/5.726791. Google Scholar

[11]

K. H. LimK. P. SengL. M. Ang and S. W. Chin, Lyapunov theory-based multilayered neural network, IEEE Transactions on Circuits and Systems II: Express Briefs, 4 (2009), 305-309. Google Scholar

[12]

J. Nocedal and S. Wright, Numerical Optimization, 2nd ed., Springer, 2006. Google Scholar

[13]

J. R. Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain, Tech. Rep. C. Carnegie-Mellon Univ., 1994.Google Scholar

[14]

J. Sohl-DicksteinB. Poole and S. Ganguli, Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods, Procs. 31st Int. Conf. Mach. Learn, (2014), 604-612. Google Scholar

[15]

D. Stutz, Introduction to Neural Networks, Selected Topics in Human Language Technology and Pattern Recognition WS 12/14, 2014.Google Scholar

[16]

H. H. TanK. H. Lim and H. G. Harno, Stochastic diagonal approximate greatest descent in neural networks, 2017 International Joint Conference on Neural Networks (IJCNN), (2007), 1895-1898. doi: 10.1109/IJCNN.2017.7966081. Google Scholar

show all references

References:
[1]

S. AmariH. Park and K. Fukumizu, Adaptive method of realizing natural gradient learning for multilayer perceptron, Neural Compt., 12 (2000), 436-444. doi: 10.1162/089976600300015420. Google Scholar

[2]

S. Becker and Y. LeCun, Improving the convergence of backpropagation learning with second order methods, Proc. of the Con. Models Summer School, (1988), 29-37. Google Scholar

[3]

Y. Bengio, Learning deep architectures for AI, Foundations and trends in Machine Learning, 2 (2009), 1-127. Google Scholar

[4]

L. Bottou, Large-scale machine learning with stochastic gradient descent, Proc. of COMPSTAT, (2010), 177-186. Google Scholar

[5]

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Aistats, 9 (2010), 249-256. Google Scholar

[6]

B. S. Goh, Greatest descent algorithms in unconstrained optimization, J. Optim. Theory Appl., 142 (2009), 275-289. doi: 10.1007/s10957-009-9533-4. Google Scholar

[7]

B. S. Goh, Numerical method in optimization as a multi-stage decision control system, Latest Advances in Systems Science and Computational Intelligence, (2012), 25-30. Google Scholar

[8]

Y. LeCun, L. Bottou, G. B. Orr and K. R. Müller, Efficient backprop, Neural Networks: Tricks of the Trade, Springer, (2012), 9–48.Google Scholar

[9]

Y. LeCunY. Bengio and G. Hinton, Deep learning, Nature, 521 (2015), 436-444. doi: 10.1038/nature14539. Google Scholar

[10]

Y. LeCunL. BottouY. Bengio and P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998), 2278-2323. doi: 10.1109/5.726791. Google Scholar

[11]

K. H. LimK. P. SengL. M. Ang and S. W. Chin, Lyapunov theory-based multilayered neural network, IEEE Transactions on Circuits and Systems II: Express Briefs, 4 (2009), 305-309. Google Scholar

[12]

J. Nocedal and S. Wright, Numerical Optimization, 2nd ed., Springer, 2006. Google Scholar

[13]

J. R. Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain, Tech. Rep. C. Carnegie-Mellon Univ., 1994.Google Scholar

[14]

J. Sohl-DicksteinB. Poole and S. Ganguli, Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods, Procs. 31st Int. Conf. Mach. Learn, (2014), 604-612. Google Scholar

[15]

D. Stutz, Introduction to Neural Networks, Selected Topics in Human Language Technology and Pattern Recognition WS 12/14, 2014.Google Scholar

[16]

H. H. TanK. H. Lim and H. G. Harno, Stochastic diagonal approximate greatest descent in neural networks, 2017 International Joint Conference on Neural Networks (IJCNN), (2007), 1895-1898. doi: 10.1109/IJCNN.2017.7966081. Google Scholar

Figure 1.  Structure of two layer multilayer perceptron.
Figure 2.  AGD iteration from initial point to minimum point.
Figure 3.  AGD iteration from initial point to minimum point.
Table 1.  Comparison of MCR and MSE between three optimization techniques in neural network.
Training Algorithm Training MCR (%) Testing MCR (%) MSE
SGD 8.22 12.14 0.40
SDLM 8.86 10.19 0.32
SDAGD 6.46 9.40 0.21
Training Algorithm Training MCR (%) Testing MCR (%) MSE
SGD 8.22 12.14 0.40
SDLM 8.86 10.19 0.32
SDAGD 6.46 9.40 0.21
[1]

M. S. Lee, B. S. Goh, H. G. Harno, K. H. Lim. On a two-phase approximate greatest descent method for nonlinear optimization with equality constraints. Numerical Algebra, Control & Optimization, 2018, 8 (3) : 315-326. doi: 10.3934/naco.2018020

[2]

Theodore Tachim Medjo. A two-phase flow model with delays. Discrete & Continuous Dynamical Systems - B, 2017, 22 (9) : 3273-3294. doi: 10.3934/dcdsb.2017137

[3]

Jan Prüss, Jürgen Saal, Gieri Simonett. Singular limits for the two-phase Stefan problem. Discrete & Continuous Dynamical Systems - A, 2013, 33 (11&12) : 5379-5405. doi: 10.3934/dcds.2013.33.5379

[4]

Marianne Korten, Charles N. Moore. Regularity for solutions of the two-phase Stefan problem. Communications on Pure & Applied Analysis, 2008, 7 (3) : 591-600. doi: 10.3934/cpaa.2008.7.591

[5]

T. Tachim Medjo. Averaging of an homogeneous two-phase flow model with oscillating external forces. Discrete & Continuous Dynamical Systems - A, 2012, 32 (10) : 3665-3690. doi: 10.3934/dcds.2012.32.3665

[6]

Eberhard Bänsch, Steffen Basting, Rolf Krahl. Numerical simulation of two-phase flows with heat and mass transfer. Discrete & Continuous Dynamical Systems - A, 2015, 35 (6) : 2325-2347. doi: 10.3934/dcds.2015.35.2325

[7]

Ciprian G. Gal, Maurizio Grasselli. Longtime behavior for a model of homogeneous incompressible two-phase flows. Discrete & Continuous Dynamical Systems - A, 2010, 28 (1) : 1-39. doi: 10.3934/dcds.2010.28.1

[8]

Jie Jiang, Yinghua Li, Chun Liu. Two-phase incompressible flows with variable density: An energetic variational approach. Discrete & Continuous Dynamical Systems - A, 2017, 37 (6) : 3243-3284. doi: 10.3934/dcds.2017138

[9]

V. S. Manoranjan, Hong-Ming Yin, R. Showalter. On two-phase Stefan problem arising from a microwave heating process. Discrete & Continuous Dynamical Systems - A, 2006, 15 (4) : 1155-1168. doi: 10.3934/dcds.2006.15.1155

[10]

Feng Ma, Mingfang Ni. A two-phase method for multidimensional number partitioning problem. Numerical Algebra, Control & Optimization, 2013, 3 (2) : 203-206. doi: 10.3934/naco.2013.3.203

[11]

Theodore Tachim-Medjo. Optimal control of a two-phase flow model with state constraints. Mathematical Control & Related Fields, 2016, 6 (2) : 335-362. doi: 10.3934/mcrf.2016006

[12]

Esther S. Daus, Josipa-Pina Milišić, Nicola Zamponi. Global existence for a two-phase flow model with cross-diffusion. Discrete & Continuous Dynamical Systems - B, 2017, 22 (11) : 0-0. doi: 10.3934/dcdsb.2019198

[13]

Daniela De Silva, Fausto Ferrari, Sandro Salsa. Recent progresses on elliptic two-phase free boundary problems. Discrete & Continuous Dynamical Systems - A, 2019, 39 (12) : 6961-6978. doi: 10.3934/dcds.2019239

[14]

Cai-Tong Yue, Jing Liang, Bo-Fei Lang, Bo-Yang Qu. Two-hidden-layer extreme learning machine based wrist vein recognition system. Big Data & Information Analytics, 2017, 2 (1) : 59-68. doi: 10.3934/bdia.2017008

[15]

Yasuhito Miyamoto. Global bifurcation and stable two-phase separation for a phase field model in a disk. Discrete & Continuous Dynamical Systems - A, 2011, 30 (3) : 791-806. doi: 10.3934/dcds.2011.30.791

[16]

Jan Prüss, Yoshihiro Shibata, Senjo Shimizu, Gieri Simonett. On well-posedness of incompressible two-phase flows with phase transitions: The case of equal densities. Evolution Equations & Control Theory, 2012, 1 (1) : 171-194. doi: 10.3934/eect.2012.1.171

[17]

Fengqiu Liu, Xiaoping Xue. Subgradient-based neural network for nonconvex optimization problems in support vector machines with indefinite kernels. Journal of Industrial & Management Optimization, 2016, 12 (1) : 285-301. doi: 10.3934/jimo.2016.12.285

[18]

Barbara Lee Keyfitz, Richard Sanders, Michael Sever. Lack of hyperbolicity in the two-fluid model for two-phase incompressible flow. Discrete & Continuous Dynamical Systems - B, 2003, 3 (4) : 541-563. doi: 10.3934/dcdsb.2003.3.541

[19]

Émilie Chouzenoux, Henri Gérard, Jean-Christophe Pesquet. General risk measures for robust machine learning. Foundations of Data Science, 2019, 1 (3) : 249-269. doi: 10.3934/fods.2019011

[20]

K. Domelevo. Well-posedness of a kinetic model of dispersed two-phase flow with point-particles and stability of travelling waves. Discrete & Continuous Dynamical Systems - B, 2002, 2 (4) : 591-607. doi: 10.3934/dcdsb.2002.2.591

 Impact Factor: 

Metrics

  • PDF downloads (58)
  • HTML views (112)
  • Cited by (0)

Other articles
by authors

[Back to Top]