科技报告

【摘要】

Analysts are overwhelmed with information. They have large archives of historical data, both structured and unstructured, and continuous streams of relevant messages and documents that they need to match to current tasks, digest, and incorporate into their analysis. The purpose of the READ project is to develop technologies to make it easier to catalog, classify, and locate relevant information. We approached this task from multiple angles. First, we tackle the issue of processing large quantities of information in reasonable time. Second, we provide mechanisms that allow users to customize their queries based on latent topics exposed from corpus statistics. Third, we assist users in organizing query results, adding localized expert structure over results. Forth, we use word sense disambiguation techniques to increase the precision of matching user generated keyword lists with terms and concepts in the corpus. Fifth, we enhance co-occurrence statistics with latent topic attribution, to aid entity relationship discovery. Finally we quantitatively analyze the quality of three popular latent modeling techniques to examine under which circumstances each is useful.

【预览】

附件列表
Files	Size	Format	View
RO201704210000350LZ	1315KB	PDF	download


Rapid Exploitation and Analysis of Documents

Buttler, D J ; Andrzejewski, D ; Stevens, K D ; Anastasiu, D ; Gao, B
关键词: ACCURACY; ORGANIZING; PROCESSING; SIMULATION; STATISTICS;
DOI : 10.2172/1033748 RP-ID : LLNL-TR-517731 PID : OSTI ID: 1033748 Others : TRN: US201203%%227
学科分类：数学（综合）
美国\|英语
来源: SciTech Connect
PDF


	文献评价指标
	下载次数：3次	浏览次数：4次

【 摘 要 】

【 预 览 】

【摘要】

【预览】