<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ArticleSet PUBLIC "-//NLM//DTD PubMed 2.7//EN" "https://dtd.nlm.nih.gov/ncbi/pubmed/in/PubMed.dtd">
<ArticleSet>
<Article>
<Journal>
				<PublisherName>Payame Noor University (PNU)</PublisherName>
				<JournalTitle>Control and Optimization in Applied Mathematics</JournalTitle>
				<Issn>2383-3130</Issn>
				<Volume>8</Volume>
				<Issue>2</Issue>
				<PubDate PubStatus="epublish">
					<Year>2023</Year>
					<Month>12</Month>
					<Day>01</Day>
				</PubDate>
			</Journal>
<ArticleTitle>Emotion Recognition for Persian Speech Using Convolutional Neural Network and Support Vector Machine</ArticleTitle>
<VernacularTitle></VernacularTitle>
			<FirstPage>85</FirstPage>
			<LastPage>105</LastPage>
			<ELocationID EIdType="pii">9985</ELocationID>
			
<ELocationID EIdType="doi">10.30473/coam.2023.66718.1226</ELocationID>
			
			<Language>EN</Language>
<AuthorList>
<Author>
					<FirstName>Saeed</FirstName>
					<LastName>Hashemi</LastName>
<Affiliation>‎Department of Computer Engineering and Information Technology‎, ‎Payame Noor University (PNU)‎, ‎Tehran‎, ‎Iran</Affiliation>

</Author>
<Author>
					<FirstName>Saeed</FirstName>
					<LastName>Ayat</LastName>
<Affiliation>‎Department of Computer Engineering and Information Technology‎, ‎Payame Noor University (PNU)‎, ‎Tehran‎, ‎Iran</Affiliation>

</Author>
</AuthorList>
				<PublicationType>Journal Article</PublicationType>
			<History>
				<PubDate PubStatus="received">
					<Year>2023</Year>
					<Month>01</Month>
					<Day>15</Day>
				</PubDate>
			</History>
		<Abstract>The paper discusses the limitations of emotion recognition in Persian speech due to inefficient feature extraction and classification tools‎. ‎To address this‎, ‎we propose a new method for detecting hidden emotions in Persian speech with higher recognition accuracy‎. ‎The method involves four steps‎: ‎preprocessing‎, ‎feature description‎, ‎feature extraction‎, ‎and classification‎. ‎The input signal is normalized in the preprocessing step using single-channel vector conversion and signal resampling‎. ‎Feature descriptions are performed using Mel-Frequency Cepstral Coefficients and Spectro-Temporal Modulation techniques‎, ‎which produce separate feature matrices‎. ‎These matrices are then merged and used for feature extraction through a Convolutional Neural Network‎. ‎Finally‎, ‎a Support Vector Machine with a linear kernel function is used for emotion classification‎. ‎The proposed method is evaluated using the Sharif Emotional Speech dataset and achieves an average accuracy of 80.9% in classifying emotions in Persian speech‎.</Abstract>
		<ObjectList>
			<Object Type="keyword">
			<Param Name="value">Emotion recognition in speech‎</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">‎Mel-Frequency cepstral coefficients‎</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">‎Convolutional neural network‎</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">‎Support vector machine</Param>
			</Object>
		</ObjectList>
<ArchiveCopySource DocType="pdf">https://mathco.journals.pnu.ac.ir/article_9985_dcd62836f34cab478d8d51ae3292541d.pdf</ArchiveCopySource>
</Article>
</ArticleSet>
