期刊论文详细信息
Applied Network Science
Network-theoretic information extraction quality assessment in the human trafficking domain
  1    1 
[1] 0000 0001 2156 6853, grid.42505.36, Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Ste. 1001, Marina del Rey, California, USA;
关键词: Information extraction;    Structural analysis;    Human trafficking;    Relational analysis;    Network theory;    Attributed networks;    Artificial intelligence;   
DOI  :  10.1007/s41109-019-0154-z
来源: publisher
PDF
【 摘 要 】

Information extraction (IE) is an important problem in Natural Language Processing (NLP) and Web Mining communities. Recently, IE has been applied to online sex advertisements with the goal of powering search and analytics systems that can help law enforcement investigate human trafficking (HT). Extracting key attributes such as names, phone numbers and addresses from online sex ads is extremely challenging, since such webpages contain boilerplate, obfuscation, and extraneous text in unusual language models. Assessing the quality of an IE system is an important problem that is particularly problematic in this domain due to lack of gold standard datasets. Furthermore, building a robust ground truth from scratch is an expensive and time-consuming task for social scientists and law enforcement to undertake. In this article, we undertake the empirical challenge of analyzing the quality of IE outputs in the HT domain without the provision of laboriously annotated ground truths. Specifically, we use concepts from network science to construct and study an extraction graph from IE outputs collected over a corpus of online sex ads. Our studies show that network metrics, which require no labeled ground truths, share interesting and consistent correlations with IE accuracy metrics (e.g., precision and recall) that do require ground-truths. Our methods can potentially be applied for comparing the quality of different IE systems in the HT domain without access to ground-truths.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO201910101469948ZK.pdf 1853KB PDF download
  文献评价指标  
  下载次数:5次 浏览次数:11次