学位论文详细信息
Time-varying networks estimation and Chinese words segmentation
dynamic networks;proximal gradient method;varying-coefficient model;Language Processing;words segmentation
Shu, Xinxin
关键词: dynamic networks;    proximal gradient method;    varying-coefficient model;    Language Processing;    words segmentation;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/50539/Xinxin_Shu.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

This thesis contains two research areas including time-varying networks estimation and Chinese words segmentation. Chapter 1 introduces the background of the time-varying networks and the structure of Chinese language, followed by the motivations and goals for the research work.In many biomedical and social science studies, it is important to identify and predict the dynamic changes of associations among network data over time. However, inadequate literature addresses the estimation of time-varying networks mainly because of extremely large volume of time-varying network data, leading to the computational difficulty.In Chapter 2, we propose a varying-coefficient model toincorporate time-varying network data, and impose a piecewise-penaltyfunction to capture local features of the network associations. Theadvantages of the proposed approach are that it is nonparametric andtherefore flexible in modeling dynamic changes of association for networkdata problems, and capable of identifying the time regions when dynamicchanges of associations occur. To achieve local sparsity of networkestimation, we implement a group penalization strategy involving overlappingparameters among different groups. We also develop a fast algorithm, based on thesmoothing proximal gradient method, which is computationally efficient andaccurate. We illustrate the proposed method through simulation studies andchildren's attention deficit hyperactivity disorder fMRI data, and show thatthe proposed method and algorithm efficiently recover dynamic networkchanges over time.The digital information has become an essential part of modern life, from scientific research, entertainment business, product marketing to national security protection. So developing fast automatic process of information extraction becomes extremely demanding. Chinese language is the second popular language among all internet users but is still severely under-studied, mainly due to the challenge of its ambiguity nature.In Chapter 3, we propose a new method forword segmentation in Chinese language processing.The Chinese language is the second most popular language among all internet users,but itis still not well-studied. Segmentationbecomes crucial for Chinese language processing,since it is the firststep to develop a fast automatic process of information extraction. One major challenge is that the Chinese language is highly context-dependent, and is very different from English.We propose a machine-learning model with computationally feasible loss functionswhich utilize linguistically-embedded features. The proposed method is investigated through the Peking university corpus Chinese documents. Our numerical study shows that the proposed methodperformsbetterthanexisting top competitive performers.

【 预 览 】
附件列表
Files Size Format View
Time-varying networks estimation and Chinese words segmentation 1497KB PDF download
  文献评价指标  
  下载次数:21次 浏览次数:24次