May  2021, 4(2): 131-143. doi: 10.3934/mfc.2021008

Evaluation of parallel and sequential deep learning models for music subgenre classification

1. 

School of Engineering, Stanford University, Stanford, CA 94305-6106, USA

2. 

Departments of Computer Science and Mathematics, Trent University, Peterborough, ON K9L 0G2, Canada

Received  April 2021 Revised  May 2021 Published  May 2021

Fund Project: The second author is supported by a grant from the Natural Sciences and Engineering Research Council of Canada (NSERC)

In this paper, we evaluate two deep learning models which integrate convolutional and recurrent neural networks. We implement both sequential and parallel architectures for fine-grain musical subgenre classification. Due to the exceptionally low signal to noise ratio (SNR) of our low level mel-spectrogram dataset, more sensitive yet robust learning models are required to generate meaningful results. We investigate the effects of three commonly applied optimizers, dropout, batch regularization, and sensitivity to varying initialization distributions. The results demonstrate that the sequential model specifically requires the RMSprop optimizer, while the parallel model implemented with the Adam optimizer yielded encouraging and stable results achieving an average F1 score of $ 0.63 $. When all factors are considered, the optimized hybrid parallel model outperformed the sequential in classification accuracy and system stability.

Citation: Miria Feng, Wenying Feng. Evaluation of parallel and sequential deep learning models for music subgenre classification. Mathematical Foundations of Computing, 2021, 4 (2) : 131-143. doi: 10.3934/mfc.2021008
References:
[1]

M. Browne and S. S. Ghidary, Convolutional neural networks for image processing: An application in robot vision, in AI 2003: Advances in Artificial Intelligence, Lecture Notes in Comput. Sci., 2903, Lecture Notes in Artificial Intelligence, Springer, Berlin, 2003,641–652. doi: 10.1007/978-3-540-24581-0_55.  Google Scholar

[2]

K. Choi, G. Fazekas, M. Sandler and K. Cho, Convolutional recurrent neural networks for music classification, IEEE International Conference on Acoustics, Speech, and Signal Processing, New Orleans, LA, 2017. doi: 10.1109/ICASSP.2017.7952585.  Google Scholar

[3]

J. Chung, C. Gulcehre, K. Cho and Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, preprint, arXiv: 1412.3555v1. Google Scholar

[4]

Y. M. G. CostaL. S. Oliveira and C. N. Silla Jr., An evaluation of Convolutional Neural Networks for music classification using spectrograms, Applied Soft Computing, 52 (2017), 28-38.  doi: 10.1016/j.asoc.2016.12.024.  Google Scholar

[5]

G. Gessle and S. Åkesson, A Comparative Analysis of CNN and LSTM for Music Genre Classification, Degree Project in Technology, Stockholm, Sweden, 2019. Available from: https://www.diva-portal.org/smash/get/diva2:1354738/FULLTEXT01.pdf. Google Scholar

[6]

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 9 (2010), 249–256. Available from: http://proceedings.mlr.press/v9/glorot10a.html. Google Scholar

[7]

M. Helén and T. Virtanen, Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine, $13^{th}$ European Signal Processing Conference, Antalya, Turkey, (2005), 1–4. Available from: https://ieeexplore.ieee.org/document/7078147. Google Scholar

[8]

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, 9 (1997), 1735-1780.  doi: 10.1162/neco.1997.9.8.1735.  Google Scholar

[9]

D. P. Kingma and J. Ba, ADAM: A method for stochastic optimization, preprint, arXiv: 1412.6980. Google Scholar

[10]

A. Krizhevsky, I. Sutskever and G. E. Hinton, ImageNet classification with deep convolutional neural networks, $26^{th}$ Conference on Neural Information Processing Systems, NeurIPS, 2012. Google Scholar

[11]

Y. LeCunB. BoserJ. S. DenkerD. HendersonR. E. HowardW. Hubbard and L. D. Jackel, Backpropagation applied to handwritten zip code recognition, Neural Computation, 1 (1989), 541-551.   Google Scholar

[12]

H. Lee, Y. Largman, P. Pham and A. Y. Ng, Unsupervised feature learning for audio classification using convolutional deep belief networks, $23^{rd}$ Conference on Neural Information Processing Systems, NeurIPS, 2009. Available from: https://ai.stanford.edu/ ang/papers/nips09-AudioConvolutionalDBN.pdf. Google Scholar

[13]

B. Logan, Mel frequency cepstral coefficients for music modeling, International Symposium on Music Information Retrieval, 2000. Google Scholar

[14]

Z. Nasrullah and Y. Zhao, Music artist classification with convolutional recurrent neural networks, preprint, arXiv: 1901.04555v2. Google Scholar

[15]

Y. Panagakis, C. Kotropoulos and G. R. Arce, Music genre classification via sparse representations of auditory temporal modulations, $17^{th}$ European Signal Processing Conference, Glasgow, UK, 2009. Google Scholar

[16]

Python, Package Index: Spotify and Spot-dl., Available from: https://pypi.org/project/spotify/ and https://pypi.org/project/spotdl/. Google Scholar

[17]

A. J. R. Simpson, G. Roma and M. D. Plumbley, Deep karaoke: Extracting vocals from musical mixtures using a convolutional deep neural network, preprint, arXiv: 1504.04658. Google Scholar

[18]

G. Tzanetakis and P. Cook, Musical genre classification of audio signals, IEEE Transactions on Speech and Audio Processing, 10 (2002), 293-302.  doi: 10.1109/TSA.2002.800560.  Google Scholar

[19]

J. Wülfing and M. A. Riedmiller, Unsupervised learning of local features for music classification, International Society for Music Information Retrieval Conference (ISMIR), 2012. Available from: http://ml.informatik.uni-freiburg.de/former/_media/publications/wuelf2012.pdf. Google Scholar

[20]

C. Xu, N. C. Maddage, X. Shao, F. Cao and Q. Tian, Musical genre classification using support vector machines, IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP '03), Hong Kong, 2003. doi: 10.1109/ICASSP.2003.1199998.  Google Scholar

[21]

R. YangL. FengH. WangJ. Yao and S. Luo, Parallel recurrent convolutional neural networks based music genre classification method for mobile devices, IEEE Access, 8 (2020), 19629-19637.  doi: 10.1109/ACCESS.2020.2968170.  Google Scholar

[22]

M. D. Zeiler, ADADELTA: An adaptive learning rate method, preprint, arXiv: 1212.5701. Google Scholar

show all references

References:
[1]

M. Browne and S. S. Ghidary, Convolutional neural networks for image processing: An application in robot vision, in AI 2003: Advances in Artificial Intelligence, Lecture Notes in Comput. Sci., 2903, Lecture Notes in Artificial Intelligence, Springer, Berlin, 2003,641–652. doi: 10.1007/978-3-540-24581-0_55.  Google Scholar

[2]

K. Choi, G. Fazekas, M. Sandler and K. Cho, Convolutional recurrent neural networks for music classification, IEEE International Conference on Acoustics, Speech, and Signal Processing, New Orleans, LA, 2017. doi: 10.1109/ICASSP.2017.7952585.  Google Scholar

[3]

J. Chung, C. Gulcehre, K. Cho and Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, preprint, arXiv: 1412.3555v1. Google Scholar

[4]

Y. M. G. CostaL. S. Oliveira and C. N. Silla Jr., An evaluation of Convolutional Neural Networks for music classification using spectrograms, Applied Soft Computing, 52 (2017), 28-38.  doi: 10.1016/j.asoc.2016.12.024.  Google Scholar

[5]

G. Gessle and S. Åkesson, A Comparative Analysis of CNN and LSTM for Music Genre Classification, Degree Project in Technology, Stockholm, Sweden, 2019. Available from: https://www.diva-portal.org/smash/get/diva2:1354738/FULLTEXT01.pdf. Google Scholar

[6]

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 9 (2010), 249–256. Available from: http://proceedings.mlr.press/v9/glorot10a.html. Google Scholar

[7]

M. Helén and T. Virtanen, Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine, $13^{th}$ European Signal Processing Conference, Antalya, Turkey, (2005), 1–4. Available from: https://ieeexplore.ieee.org/document/7078147. Google Scholar

[8]

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, 9 (1997), 1735-1780.  doi: 10.1162/neco.1997.9.8.1735.  Google Scholar

[9]

D. P. Kingma and J. Ba, ADAM: A method for stochastic optimization, preprint, arXiv: 1412.6980. Google Scholar

[10]

A. Krizhevsky, I. Sutskever and G. E. Hinton, ImageNet classification with deep convolutional neural networks, $26^{th}$ Conference on Neural Information Processing Systems, NeurIPS, 2012. Google Scholar

[11]

Y. LeCunB. BoserJ. S. DenkerD. HendersonR. E. HowardW. Hubbard and L. D. Jackel, Backpropagation applied to handwritten zip code recognition, Neural Computation, 1 (1989), 541-551.   Google Scholar

[12]

H. Lee, Y. Largman, P. Pham and A. Y. Ng, Unsupervised feature learning for audio classification using convolutional deep belief networks, $23^{rd}$ Conference on Neural Information Processing Systems, NeurIPS, 2009. Available from: https://ai.stanford.edu/ ang/papers/nips09-AudioConvolutionalDBN.pdf. Google Scholar

[13]

B. Logan, Mel frequency cepstral coefficients for music modeling, International Symposium on Music Information Retrieval, 2000. Google Scholar

[14]

Z. Nasrullah and Y. Zhao, Music artist classification with convolutional recurrent neural networks, preprint, arXiv: 1901.04555v2. Google Scholar

[15]

Y. Panagakis, C. Kotropoulos and G. R. Arce, Music genre classification via sparse representations of auditory temporal modulations, $17^{th}$ European Signal Processing Conference, Glasgow, UK, 2009. Google Scholar

[16]

Python, Package Index: Spotify and Spot-dl., Available from: https://pypi.org/project/spotify/ and https://pypi.org/project/spotdl/. Google Scholar

[17]

A. J. R. Simpson, G. Roma and M. D. Plumbley, Deep karaoke: Extracting vocals from musical mixtures using a convolutional deep neural network, preprint, arXiv: 1504.04658. Google Scholar

[18]

G. Tzanetakis and P. Cook, Musical genre classification of audio signals, IEEE Transactions on Speech and Audio Processing, 10 (2002), 293-302.  doi: 10.1109/TSA.2002.800560.  Google Scholar

[19]

J. Wülfing and M. A. Riedmiller, Unsupervised learning of local features for music classification, International Society for Music Information Retrieval Conference (ISMIR), 2012. Available from: http://ml.informatik.uni-freiburg.de/former/_media/publications/wuelf2012.pdf. Google Scholar

[20]

C. Xu, N. C. Maddage, X. Shao, F. Cao and Q. Tian, Musical genre classification using support vector machines, IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP '03), Hong Kong, 2003. doi: 10.1109/ICASSP.2003.1199998.  Google Scholar

[21]

R. YangL. FengH. WangJ. Yao and S. Luo, Parallel recurrent convolutional neural networks based music genre classification method for mobile devices, IEEE Access, 8 (2020), 19629-19637.  doi: 10.1109/ACCESS.2020.2968170.  Google Scholar

[22]

M. D. Zeiler, ADADELTA: An adaptive learning rate method, preprint, arXiv: 1212.5701. Google Scholar

Figure 1.  Baseline CNN model
Figure 2.  CRNN sequential architecture
Figure 3.  Parallel CNN-RNN architecture
Figure 4.  Visualization of one song from our dataset
Figure 5.  RMSprop learning process on two axes
Figure 6.  Classification accuracy across 50 epochs
Table 1.  F1 scores for optimizer evaluation
Optimizer CNN CRNN CNN-RNN
Adam 0.45 0.32 0.63
Adadelta 0.30 0.31 0.35
RMSprop 0.41 0.54 0.60
Optimizer CNN CRNN CNN-RNN
Adam 0.45 0.32 0.63
Adadelta 0.30 0.31 0.35
RMSprop 0.41 0.54 0.60
Table 2.  Optimal classification accuracy
CNN CRNN CNN-RNN
Optimizer Adam RMSprop Adam
Accuracy 0.31 0.57 0.64
CNN CRNN CNN-RNN
Optimizer Adam RMSprop Adam
Accuracy 0.31 0.57 0.64
Table 3.  Marco F1 scores for effect of regularization
Model Data Dropout Batch Normalization Dropout + Batch Normalization
CRNN Train 0.67 1.00 0.98
Validation 0.65 0.58 0.60
Test 0.62 0.57 0.41
CNN-RNN Train 0.65 1.00 0.90
Validation 0.65 0.58 0.60
Test 0.63 0.61 0.63
Model Data Dropout Batch Normalization Dropout + Batch Normalization
CRNN Train 0.67 1.00 0.98
Validation 0.65 0.58 0.60
Test 0.62 0.57 0.41
CNN-RNN Train 0.65 1.00 0.90
Validation 0.65 0.58 0.60
Test 0.63 0.61 0.63
Table 4.  Average F1 accuracy scores for effects of initialization methods
Initialization CNN CRNN CNN-RNN
Glorot Normal 0.31 0.63 0.63
Glorot Uniform 0.34 0.60 0.59
Random Normal 0.33 0.45 0.53
Random Uniform 0.33 0.37 0.57
Initialization CNN CRNN CNN-RNN
Glorot Normal 0.31 0.63 0.63
Glorot Uniform 0.34 0.60 0.59
Random Normal 0.33 0.45 0.53
Random Uniform 0.33 0.37 0.57
[1]

Yuantian Xia, Juxiang Zhou, Tianwei Xu, Wei Gao. An improved deep convolutional neural network model with kernel loss function in image classification. Mathematical Foundations of Computing, 2020, 3 (1) : 51-64. doi: 10.3934/mfc.2020005

[2]

Zhuwei Qin, Fuxun Yu, Chenchen Liu, Xiang Chen. How convolutional neural networks see the world --- A survey of convolutional neural network visualization methods. Mathematical Foundations of Computing, 2018, 1 (2) : 149-180. doi: 10.3934/mfc.2018008

[3]

Hyeontae Jo, Hwijae Son, Hyung Ju Hwang, Eun Heui Kim. Deep neural network approach to forward-inverse problems. Networks & Heterogeneous Media, 2020, 15 (2) : 247-259. doi: 10.3934/nhm.2020011

[4]

Editorial Office. Retraction: Honggang Yu, An efficient face recognition algorithm using the improved convolutional neural network. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 901-901. doi: 10.3934/dcdss.2019060

[5]

Lars Grüne. Computing Lyapunov functions using deep neural networks. Journal of Computational Dynamics, 2021, 8 (2) : 131-152. doi: 10.3934/jcd.2021006

[6]

Martin Benning, Elena Celledoni, Matthias J. Ehrhardt, Brynjulf Owren, Carola-Bibiane Schönlieb. Deep learning as optimal control problems: Models and numerical methods. Journal of Computational Dynamics, 2019, 6 (2) : 171-198. doi: 10.3934/jcd.2019009

[7]

Nicholas Geneva, Nicholas Zabaras. Multi-fidelity generative deep learning turbulent flows. Foundations of Data Science, 2020, 2 (4) : 391-428. doi: 10.3934/fods.2020019

[8]

Jianfeng Feng, Mariya Shcherbina, Brunello Tirozzi. Stability of the dynamics of an asymmetric neural network. Communications on Pure & Applied Analysis, 2009, 8 (2) : 655-671. doi: 10.3934/cpaa.2009.8.655

[9]

Rajendra K C Khatri, Brendan J Caseria, Yifei Lou, Guanghua Xiao, Yan Cao. Automatic extraction of cell nuclei using dilated convolutional network. Inverse Problems & Imaging, 2021, 15 (1) : 27-40. doi: 10.3934/ipi.2020049

[10]

Yang Mi, Kang Zheng, Song Wang. Homography estimation along short videos by recurrent convolutional regression network. Mathematical Foundations of Computing, 2020, 3 (2) : 125-140. doi: 10.3934/mfc.2020014

[11]

Ndolane Sene. Fractional input stability and its application to neural network. Discrete & Continuous Dynamical Systems - S, 2020, 13 (3) : 853-865. doi: 10.3934/dcdss.2020049

[12]

Ying Sue Huang, Chai Wah Wu. Stability of cellular neural network with small delays. Conference Publications, 2005, 2005 (Special) : 420-426. doi: 10.3934/proc.2005.2005.420

[13]

King Hann Lim, Hong Hui Tan, Hendra G. Harno. Approximate greatest descent in neural network optimization. Numerical Algebra, Control & Optimization, 2018, 8 (3) : 327-336. doi: 10.3934/naco.2018021

[14]

Shyan-Shiou Chen, Chih-Wen Shih. Asymptotic behaviors in a transiently chaotic neural network. Discrete & Continuous Dynamical Systems, 2004, 10 (3) : 805-826. doi: 10.3934/dcds.2004.10.805

[15]

Rui Hu, Yuan Yuan. Stability, bifurcation analysis in a neural network model with delay and diffusion. Conference Publications, 2009, 2009 (Special) : 367-376. doi: 10.3934/proc.2009.2009.367

[16]

Hui-Qiang Ma, Nan-Jing Huang. Neural network smoothing approximation method for stochastic variational inequality problems. Journal of Industrial & Management Optimization, 2015, 11 (2) : 645-660. doi: 10.3934/jimo.2015.11.645

[17]

Yixin Guo, Aijun Zhang. Existence and nonexistence of traveling pulses in a lateral inhibition neural network. Discrete & Continuous Dynamical Systems - B, 2016, 21 (6) : 1729-1755. doi: 10.3934/dcdsb.2016020

[18]

Jianhong Wu, Ruyuan Zhang. A simple delayed neural network with large capacity for associative memory. Discrete & Continuous Dynamical Systems - B, 2004, 4 (3) : 851-863. doi: 10.3934/dcdsb.2004.4.851

[19]

Weishi Yin, Jiawei Ge, Pinchao Meng, Fuheng Qu. A neural network method for the inverse scattering problem of impenetrable cavities. Electronic Research Archive, 2020, 28 (2) : 1123-1142. doi: 10.3934/era.2020062

[20]

Sanjay K. Mazumdar, Cheng-Chew Lim. A neural network based anti-skid brake system. Discrete & Continuous Dynamical Systems, 1999, 5 (2) : 321-338. doi: 10.3934/dcds.1999.5.321

 Impact Factor: 

Article outline

Figures and Tables

[Back to Top]