科技报告详细信息
ParaText : scalable solutions for processing and searching very large document collections : final LDRD report.
Crossno, Patricia Joyce ; Dunlavy, Daniel M. ; Stanton, Eric T. ; Shead, Timothy M.
Sandia National Laboratories
关键词: Classification;    Processing;    99 General And Miscellaneous//Mathematics, Computing, And Information Science;    Information Retrieval;    Algorithms;   
DOI  :  10.2172/1007321
RP-ID  :  SAND2010-6269
RP-ID  :  AC04-94AL85000
RP-ID  :  1007321
美国|英语
来源: UNT Digital Library
PDF
【 摘 要 】

This report is a summary of the accomplishments of the 'Scalable Solutions for Processing and Searching Very Large Document Collections' LDRD, which ran from FY08 through FY10. Our goal was to investigate scalable text analysis; specifically, methods for information retrieval and visualization that could scale to extremely large document collections. Towards that end, we designed, implemented, and demonstrated a scalable framework for text analysis - ParaText - as a major project deliverable. Further, we demonstrated the benefits of using visual analysis in text analysis algorithm development, improved performance of heterogeneous ensemble models in data classification problems, and the advantages of information theoretic methods in user analysis and interpretation in cross language information retrieval. The project involved 5 members of the technical staff and 3 summer interns (including one who worked two summers). It resulted in a total of 14 publications, 3 new software libraries (2 open source and 1 internal to Sandia), several new end-user software applications, and over 20 presentations. Several follow-on projects have already begun or will start in FY11, with additional projects currently in proposal.

【 预 览 】
附件列表
Files Size Format View
1007321.pdf 6128KB PDF download
  文献评价指标  
  下载次数:6次 浏览次数:23次