In collaboration with Payame Noor University and the Iranian Society of Instrumentation and Control Engineers

Document Type : Research Article

Authors

‎Department of Computer Engineering and Information Technology‎, ‎Payame Noor University (PNU)‎, ‎Tehran‎, ‎Iran

Abstract

The paper discusses the limitations of emotion recognition in Persian speech due to inefficient feature extraction and classification tools‎. ‎To address this‎, ‎we propose a new method for detecting hidden emotions in Persian speech with higher recognition accuracy‎. ‎The method involves four steps‎: ‎preprocessing‎, ‎feature description‎, ‎feature extraction‎, ‎and classification‎. ‎The input signal is normalized in the preprocessing step using single-channel vector conversion and signal resampling‎. ‎Feature descriptions are performed using Mel-Frequency Cepstral Coefficients and Spectro-Temporal Modulation techniques‎, ‎which produce separate feature matrices‎. ‎These matrices are then merged and used for feature extraction through a Convolutional Neural Network‎. ‎Finally‎, ‎a Support Vector Machine with a linear kernel function is used for emotion classification‎. ‎The proposed method is evaluated using the Sharif Emotional Speech dataset and achieves an average accuracy of 80.9% in classifying emotions in Persian speech‎.

Keywords

[1] Alabsi, A., Gong, W., Hawbani, A. (2022). “Emotion recognition based on wireless, physiological and audiovisual signals: A comprehensive survey”, In International Conference on Smart Computing and Cyber Security: Strategic Foresight, Security Challenges and Innovation, 121-138.
[2] Alghifari, M.F., Gunawan, T.S., Kartiwi, M. (2018). “Speech emotion recognition using deep feedforward neural network”, Indonesian Journal of Electrical Engineering and Computer Science, 10 (2), 554-561.
[3] Badie, A., Moragheb, M.A., Noshad, A. (2021). “An efficient approach to mental sentiment classification with EEG-based signals using LSTM neural network”, Control and Optimization in Applied Mathematics, 6 (1).
[4] Edraki, A., Chan, W.Y. G., Jensen, J., Fogerty, D. (2019). “Improvement and assessment of spectro-temporal modulation analysis for speech intelligibility estimation”, In Interspeech 2019, 1378-1382.
[5] Edraki, A., Chan, W.Y., Jensen, J., Fogerty, D. (2022). “Spectro-temporal modulation glimpsing for speech intelligibility prediction”, Hearing Research, 108620.
[6] Fahad, M., Deepak, A., Pradhan, G., Yadav, J. (2021). “DNN-HMM-based speaker-adaptive emotion recognition using MFCC and epoch-based features”, Circuits, Systems, and Signal Processing, 40 (1), 466-489.
[7] Horkous, H., Guerti, M. (2021). “Recognition of anger and neutral emotions in speech with different languages”, International Journal of Computing and Digital Systems, 10, 563-574.
[8] Hossin, M., Sulaiman, M.N. (2015). “A review on evaluation metrics for data classification evaluations”, International Journal of Data Mining & Knowledge Management Process (IJDKP), 5, 3-9.
[9] Ke, X., Zhu, Y., Wen, L., Zhang, W. (2018). “Speech emotion recognition based on SVM and ANN”, International Journal of Machine Learning and Computing, 8 (3), 198-202.
[10] Kumbhar, H.S., Bhandari, S.U. (2019). “Speech emotion recognition using MFCC features and LSTM network”, In 2019, 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), 1-3.
[11] Liu, Z.T., Rehman, A., Wu, M., Cao, W.H., Hao, M. (2021). “Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence”, Information Sciences, 563, 309-325.
[12] Nezami, M.O., Jamshid Lou, P., Karami, M. (2019). “ShEMO: A Large-Scale Validated Database for Persian Speech Emotion Detection”, Language Resources & Evaluation.
[13] Panagakis, Y., Kotropoulos, C., Arce, G.R. (2009). “Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification”, IEEE Transactions on Audio, Speech, and Language Processing, 18 (3), 576-588.
[14] Pisner, D.A., Schnyer, D.M. (2020). “Support vector machine”, In Machine Learning, 101-121, Academic Press.
[15] Ravanbakhsh, M., Setayeshi, S., Pedram, M.M., Mirzaei, A. (2020). “Evaluation of implicit emotion in the message through emotional speech processing based on Mel-frequency Cepstral coefficient and short-time Fourier transform features”, Advances in Cognitive Science, 22 (2), 71-81.
[16] Siadat, S.R., Voronkov, I.M., Kharlamov, A.A. (2022). “Emotion recognition from Persian speech with 1D Convolution neural network”, In 2022 Fourth International Conference Neurotechnologies and Neurointerfaces (CNN), 152-157.
[17] Tiwari, P., Darji, A.D. (2022). “A novel S-LDA features for automatic emotion recognition from speech using 1-D CNN”, International Journal of Mathematical, Engineering and Management Sciences, 7 (1), 49.
[18] Yadav, S.P., Zaidi, S., Mishra, A., Yadav, V. (2022). “Survey on machine learning in speech emotion recognition and vision systems using a recurrent neural network (RNN)”, Archives of Computational Methods in Engineering, 29 (3), 1753-1770.