期刊论文

【摘要】

Advances in speech recognition technology have enabled voice-controlled user interfaces. Smart speakers, such as Amazon Echo and Google Home, and smart-phones equipped with voice agent services are hands-free ways for users to communicate with their smart devices. Hereinafter, we refer to such voice-controlled devices as voice agents. Because voice agents operate in real environments, observed signals contain noises such as background speech or speech directed at other people. Thus, it is an indispensable ability for voice agents to distinguish users’ voice queries directed at the system (directed speech) from non-directed speech, and only respond to the directed speech. Keyword spotting is a common way to deal with this problem where users ‘wake up’ the system by phrasing predefined keywords or key phrases (like ‘Okay, computer’) before providing queries. The computer accepts a query spoken directly after the keyword as the device-directed query. Although key-word spotting technology can distinguish keywords with fairly high accuracy, detecting devicedirected queries only based on keywords sometimes results in incorrect responses.

【授权许可】

Unknown

【预览】

附件列表
Files	Size	Format	View
RO202302200000551ZK.pdf	188KB	PDF	download

Acoustical science and technology
Multi-modal modeling for device-directed speech detection using acoustic and linguistic cues
article
Hiroshi Sato¹ Yusuke Shinohara² Atsunori Ogawa¹
[1] NTT Corporation;Yahoo Japan Corporation
关键词: Device-directed speech detection; Multi-modal; Utterance classification; Attention;
DOI : 10.1250/ast.44.40
学科分类：声学和超声波
来源: Acoustical Society of Japan
PDF


	文献评价指标
	下载次数：1次	浏览次数：1次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】