Audio source separation is a well-known problem in the speech community. Many methods have been proposed to isolate speech signals from a multichannel mixture. In this thesis, we will explore a number of techniques involving interchannel phase difference (IPD) features within a tensor factorization framework. IPD features can be extracted on a time-frequency (TF) grid and are a function of the phase characteristics of the mixing process. Thus, the ultimate goal is to form a clustering of these features and produce TF masks that can be used to perform the separation. We discuss various non-tensor-based methods that are capable of modeling linear and nonlinear IPD trends. Then, we discuss generalizations to both nonnegative and complex tensor factorizations (NTF, CTF). We show that each method performs best in certain circumstances and we conclude by saying that more work is needed to devise a generally superior approach.
【 预 览 】
附件列表
Files
Size
Format
View
Phase difference and tensor factorization models for audio source separation