期刊论文详细信息
Frontiers in Communication
Siri, you've changed! Acoustic properties and racialized judgments of voice assistants
Communication
Nicole Holliday1 
[1] null;
关键词: voice assistants;    voice quality;    ethnic identification;    language and race;    speech perception;   
DOI  :  10.3389/fcomm.2023.1116955
 received in 2022-12-06, accepted in 2023-03-20,  发布年份 2023
来源: Frontiers
PDF
【 摘 要 】

As speech technology is increasingly integrated into modern American society, voice assistants are a more significant part of our everyday lives. According to Apple, Siri fulfills 25 billion requests each month. As part of a software update in April 2021, users in the U.S. were presented with a choice of 4 Siris. While in beta testing, users on Twitter began to comment that they felt that some of the voices had racial identities, noting in particular that Voice 2 and Voice 3 “sounded black.” This study tests whether listeners indeed hear the different Siri voices as sounding like speakers from different groups, as well as examines voice quality features that may trigger these judgments. In order to test evaluations of the four voices, 485 American English listeners heard each Siri voice reading the Rainbow Passage, via online survey conducted on Qualtrics. Following each clip, listeners responded to questions about the speaker's demographic characteristics and personal traits. An LMER model of normalized ratings assessed the interaction of voice and race judgment revealed that indeed, Voice 2 and Voice 3 were significantly more likely to be rated as belonging to a Black speaker than Voices 1 and 4 (p < 0.001). Per-trait logistic regression models and chi-square tests examining ratings revealed Voice 3, the male voice rated as Black, was judged less competent (X2 = 108.99, x < 0.001), less professional (X2 = 90.97, p < 0.001), and funniest (X2 = 123.39, x < 0.001). Following analysis of listener judgments of voices, I conducted post-hoc analysis comparing voice quality (VQ) features to examine which may trigger the listener judgments of race. Using PraatSauce, I employed scripts to extract VQ measures previously hypothesized to pattern differently in African American English vs. Mainstream American English. VQ measures that significantly affected listener ratings of the voices are mean F0 and H1–A3c, which correlate with perceptions of pitch and breathiness. These results reveal listeners attribute human-like demographic and personal characteristics to synthesized voices. A more comprehensive understanding of social judgments of digitized voices may help us to understand how listeners evaluate human voices, with implications for speech perception and discrimination as well as recognition and synthesis.

【 授权许可】

Unknown   
Copyright © 2023 Holliday.

【 预 览 】
附件列表
Files Size Format View
RO202310105183729ZK.pdf 1287KB PDF download
  文献评价指标  
  下载次数:3次 浏览次数:1次