2014 International Conference on Science & Engineering in Mathematics, Chemistry and Physics | |
Implementation of the Intelligent Voice System for Kazakh | |
数学;化学;物理学 | |
Yessenbayev, Zh.^1 ; Saparkhojayev, N.^2 ; Tibeyev, T.^3 | |
Nazarbayev University Research and Innovation System, 53 Kabanbay Batyr Ave., Astana 010000, Kazakhstan^1 | |
ISMB Research Institution, POLITO, Torino 10129, Italy^2 | |
Suleyman Demirel University, 1/1 Abylaikhan St., Kaskelen, Almaty, 040900, Kazakhstan^3 | |
关键词: Application Servers; Developed countries; Dialog management; Graphical environments; Isolated word recognition; Research and application; Speech technology; Synthesis experiment; | |
Others : https://iopscience.iop.org/article/10.1088/1742-6596/495/1/012043/pdf DOI : 10.1088/1742-6596/495/1/012043 |
|
来源: IOP | |
【 摘 要 】
Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1% WER on isolated word recognition and 6.9% WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Implementation of the Intelligent Voice System for Kazakh | 517KB | download |