期刊论文详细信息
EURASIP Journal on Audio, Speech, and Music Processing
Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting
Empirical Research
Xingwei Liang1  Ruifeng Xu2  Zehua Zhang3 
[1] Konka Group Co., Ltd, Shenzhen, China;School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China;School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China;School of Electronics and Information Engineering, Harbin Institute of Technology, Shenzhen, China;
关键词: Speaker verification;    Keyword spotting;    Personalized voice trigger;    Flow attention;   
DOI  :  10.1186/s13636-023-00293-8
 received in 2023-05-03, accepted in 2023-06-19,  发布年份 2023
来源: Springer
PDF
【 摘 要 】

Personalized voice triggering is a key technology in voice assistants and serves as the first step for users to activate the voice assistant. Personalized voice triggering involves keyword spotting (KWS) and speaker verification (SV). Conventional approaches to this task include developing KWS and SV systems separately. This paper proposes a single system called the multi-task deep cross-attention network (MTCANet) that simultaneously performs KWS and SV, while effectively utilizing information relevant to both tasks. The proposed framework integrates a KWS sub-network and an SV sub-network to enhance performance in challenging conditions such as noisy environments, short-duration speech, and model generalization. At the core of MTCANet are three modules: a novel deep cross-attention (DCA) module to integrate KWS and SV tasks, a multi-layer stacked shared encoder (SE) to reduce the impact of noise on the recognition rate, and soft attention (SA) modules to allow the model to focus on pertinent information in the middle layer while preventing gradient vanishing. Our proposed model demonstrates outstanding performance in the well-off test set, improving by 0.2%, 0.023, and 2.28% over the well-known SV model emphasized channel attention, propagation, and aggregation in time delay neural network (ECAPA-TDNN) and the advanced KWS model Convmixer in terms of equal error rate (EER), minimum detection cost function (minDCF), and accuracy (Acc), respectively.

【 授权许可】

CC BY   
© The Author(s) 2023

【 预 览 】
附件列表
Files Size Format View
RO202309140260476ZK.pdf 1793KB PDF download
41116_2023_38_Article_IEq306.gif 1KB Image download
41116_2023_38_Article_IEq325.gif 1KB Image download
41116_2023_38_Article_IEq327.gif 1KB Image download
41116_2023_38_Article_IEq330.gif 1KB Image download
41116_2023_38_Article_IEq332.gif 1KB Image download
41116_2023_38_Article_IEq334.gif 1KB Image download
41116_2023_38_Article_IEq337.gif 1KB Image download
41116_2023_38_Article_IEq340.gif 1KB Image download
41116_2023_38_Article_IEq341.gif 1KB Image download
41116_2023_38_Article_IEq345.gif 1KB Image download
41116_2023_38_Article_IEq347.gif 1KB Image download
41116_2023_38_Article_IEq349.gif 1KB Image download
41116_2023_38_Article_IEq161.gif 1KB Image download
Fig. 1 1926KB Image download
41116_2023_38_Article_IEq163.gif 1KB Image download
41116_2023_38_Article_IEq190.gif 1KB Image download
Fig. 1 287KB Image download
Fig. 1 120KB Image download
Fig. 4 244KB Image download
MediaObjects/42004_2023_927_MOESM1_ESM.pdf 3435KB PDF download
MediaObjects/12862_2023_2130_MOESM1_ESM.docx 3995KB Other download
Fig. 5 147KB Image download
MediaObjects/12864_2023_9504_MOESM2_ESM.xlsx 116KB Other download
Fig. 1 98KB Image download
MediaObjects/40360_2019_335_MOESM1_ESM.docx 59KB Other download
Fig. 2 673KB Image download
Fig. 6 1340KB Image download
Fig. 2 110KB Image download
679KB Image download
MediaObjects/12862_2023_2130_MOESM3_ESM.docx 25KB Other download
MediaObjects/12862_2023_2130_MOESM4_ESM.xlsx 20KB Other download
Fig. 4 1372KB Image download
40507_2023_185_Article_IEq48.gif 1KB Image download
MediaObjects/40249_2023_1106_MOESM3_ESM.docx 16KB Other download
MediaObjects/12903_2023_3201_MOESM1_ESM.docx 50KB Other download
Fig. 17 770KB Image download
MediaObjects/13046_2023_2728_MOESM1_ESM.docx 18KB Other download
Fig. 2 249KB Image download
MediaObjects/13287_2023_3404_MOESM1_ESM.docx 87665KB Other download
Fig. 5 630KB Image download
Fig. 1 567KB Image download
Fig. 1 499KB Image download
Fig. 11 1773KB Image download
Fig. 2 286KB Image download
MediaObjects/12944_2023_1842_MOESM3_ESM.docx 17KB Other download
Fig. 6 121KB Image download
【 图 表 】

Fig. 6

Fig. 2

Fig. 11

Fig. 1

Fig. 1

Fig. 5

Fig. 2

Fig. 17

40507_2023_185_Article_IEq48.gif

Fig. 4

Fig. 2

Fig. 6

Fig. 2

Fig. 1

Fig. 5

Fig. 4

Fig. 1

Fig. 1

41116_2023_38_Article_IEq190.gif

41116_2023_38_Article_IEq163.gif

Fig. 1

41116_2023_38_Article_IEq161.gif

41116_2023_38_Article_IEq349.gif

41116_2023_38_Article_IEq347.gif

41116_2023_38_Article_IEq345.gif

41116_2023_38_Article_IEq341.gif

41116_2023_38_Article_IEq340.gif

41116_2023_38_Article_IEq337.gif

41116_2023_38_Article_IEq334.gif

41116_2023_38_Article_IEq332.gif

41116_2023_38_Article_IEq330.gif

41116_2023_38_Article_IEq327.gif

41116_2023_38_Article_IEq325.gif

41116_2023_38_Article_IEq306.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  • [50]
  • [51]
  • [52]
  • [53]
  • [54]
  • [55]
  • [56]
  文献评价指标  
  下载次数:0次 浏览次数:0次