Training Algorithm | Training MCR (%) | Testing MCR (%) | MSE |
SGD | 8.22 | 12.14 | 0.40 |
SDLM | 8.86 | 10.19 | 0.32 |
SDAGD | 6.46 | 9.40 | 0.21 |
Numerical optimization is required in artificial neural network to update weights iteratively for learning capability. In this paper, we propose the use of Approximate Greatest Descent (AGD) algorithm to optimize neural network weights using long-term backpropagation manner. The modification and development of AGD into stochastic diagonal AGD (SDAGD) algorithm could improve the learning ability and structural simplicity for deep learning neural networks. It is derived from the operation of a multi-stage decision control system which consists of two phases: (1) when local search region does not contain the minimum point, iteration shall be defined at the boundary of the local search region, (2) when local region contains the minimum point, Newton method is approximated for faster convergence. The integration of SDAGD into Multilayered perceptron (MLP) network is investigated with the goal of improving the learning ability and structural simplicity. Simulation results showed that two-layer MLP with SDAGD achieved a misclassification rate of 9.4% on a smaller mixed national institute of national and technology (MNIST) dataset. MNIST is a database equipped with handwritten digits images suitable for algorithm prototyping in artificial neural networks.
Citation: |
Table 1. Comparison of MCR and MSE between three optimization techniques in neural network.
Training Algorithm | Training MCR (%) | Testing MCR (%) | MSE |
SGD | 8.22 | 12.14 | 0.40 |
SDLM | 8.86 | 10.19 | 0.32 |
SDAGD | 6.46 | 9.40 | 0.21 |
S. Amari
, H. Park
and K. Fukumizu
, Adaptive method of realizing natural gradient learning for multilayer perceptron, Neural Compt., 12 (2000)
, 436-444.
doi: 10.1162/089976600300015420.![]() ![]() |
|
S. Becker
and Y. LeCun
, Improving the convergence of backpropagation learning with second order methods, Proc. of the Con. Models Summer School, (1988)
, 29-37.
![]() |
|
Y. Bengio
, Learning deep architectures for AI, Foundations and trends in Machine Learning, 2 (2009)
, 1-127.
![]() ![]() |
|
L. Bottou
, Large-scale machine learning with stochastic gradient descent, Proc. of COMPSTAT, (2010)
, 177-186.
![]() ![]() |
|
X. Glorot
and Y. Bengio
, Understanding the difficulty of training deep feedforward neural networks, Aistats, 9 (2010)
, 249-256.
![]() |
|
B. S. Goh
, Greatest descent algorithms in unconstrained optimization, J. Optim. Theory Appl., 142 (2009)
, 275-289.
doi: 10.1007/s10957-009-9533-4.![]() ![]() ![]() |
|
B. S. Goh
, Numerical method in optimization as a multi-stage decision control system, Latest Advances in Systems Science and Computational Intelligence, (2012)
, 25-30.
![]() |
|
Y. LeCun, L. Bottou, G. B. Orr and K. R. Müller, Efficient backprop, Neural Networks: Tricks of the Trade, Springer, (2012), 9–48.
![]() |
|
Y. LeCun
, Y. Bengio
and G. Hinton
, Deep learning, Nature, 521 (2015)
, 436-444.
doi: 10.1038/nature14539.![]() ![]() |
|
Y. LeCun
, L. Bottou
, Y. Bengio
and P. Haffner
, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998)
, 2278-2323.
doi: 10.1109/5.726791.![]() ![]() |
|
K. H. Lim
, K. P. Seng
, L. M. Ang
and S. W. Chin
, Lyapunov theory-based multilayered neural network, IEEE Transactions on Circuits and Systems II: Express Briefs, 4 (2009)
, 305-309.
![]() |
|
J. Nocedal and S. Wright,
Numerical Optimization, 2nd ed., Springer, 2006.
![]() ![]() |
|
J. R. Shewchuk,
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain, Tech. Rep. C. Carnegie-Mellon Univ., 1994.
![]() |
|
J. Sohl-Dickstein
, B. Poole
and S. Ganguli
, Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods, Procs. 31st Int. Conf. Mach. Learn, (2014)
, 604-612.
![]() |
|
D. Stutz,
Introduction to Neural Networks, Selected Topics in Human Language Technology and Pattern Recognition WS 12/14, 2014.
![]() |
|
H. H. Tan
, K. H. Lim
and H. G. Harno
, Stochastic diagonal approximate greatest descent in neural networks, 2017 International Joint Conference on Neural Networks (IJCNN), (2007)
, 1895-1898.
doi: 10.1109/IJCNN.2017.7966081.![]() ![]() |
Structure of two layer multilayer perceptron.
AGD iteration from initial point to minimum point.
AGD iteration from initial point to minimum point.