期刊论文

【摘要】

The ongoing development of audio datasets for numerous languages has spurred research activities towards designing smart speech recognition systems. A typical speech recognition system can be applied in many emerging applications, such as smartphone dialing, airline reservations, and automatic wheelchairs, among others. Urdu is a national language of Pakistan and is also widely spoken in many other South Asian countries (e.g., India, Afghanistan). Therefore, we present a comprehensive dataset of spoken Urdu digits ranging from 0 to 9. Our dataset has 25,518 sound samples that are collected from 740 participants. To test the proposed dataset, we apply different existing classification algorithms on the datasets including Support Vector Machine (SVM), Multilayer Perceptron (MLP), and flavors of the EfficientNet. These algorithms serve as a baseline. Furthermore, we propose a convolutional neural network (CNN) for audio digit classification. We conduct the experiment using these networks, and the results show that the proposed CNN is efficient and outperforms the baseline algorithms in terms of classification accuracy.

【授权许可】

Unknown

Applied Sciences
AUDD: Audio Urdu Digits Dataset for Automatic Audio Urdu Digit Recognition

Malika Bendechache¹ Aisha Chandio² Yao Shen² Irum Inayat³ Teerath Kumar³
[1] ADAPT & Lero Research Centres, School of Computing, Dublin City University, Dublin 9, Ireland;Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;Department of Software Engineering, School of Computing, National University of Computer and Emerging Sciences, Islamabad 44000, Pakistan;
关键词: audio classification; baseline classification accuracy; digit recognition; speech processing; Urdu dataset classification; Urdu digit dataset;
DOI : 10.3390/app11198842
来源: DOAJ


	文献评价指标
	下载次数：0次	浏览次数：0次

【 摘 要 】

【 授权许可】

【摘要】

【授权许可】