Saeed Hashemi; Saeed Ayat
Abstract
The paper discusses the limitations of emotion recognition in Persian speech due to inefficient feature extraction and classification tools. To address this, we propose a new method for detecting hidden emotions in Persian speech with higher recognition accuracy. The method ...
Read More
The paper discusses the limitations of emotion recognition in Persian speech due to inefficient feature extraction and classification tools. To address this, we propose a new method for detecting hidden emotions in Persian speech with higher recognition accuracy. The method involves four steps: preprocessing, feature description, feature extraction, and classification. The input signal is normalized in the preprocessing step using single-channel vector conversion and signal resampling. Feature descriptions are performed using Mel-Frequency Cepstral Coefficients and Spectro-Temporal Modulation techniques, which produce separate feature matrices. These matrices are then merged and used for feature extraction through a Convolutional Neural Network. Finally, a Support Vector Machine with a linear kernel function is used for emotion classification. The proposed method is evaluated using the Sharif Emotional Speech dataset and achieves an average accuracy of 80.9% in classifying emotions in Persian speech.