学位论文详细信息
Information Extraction on Para-Relational Data.
information extraction;data mining;text mining;Computer Science;Engineering;Computer Science and Engineering
Chen, ZheMozafari, Barzan ;
University of Michigan
关键词: information extraction;    data mining;    text mining;    Computer Science;    Engineering;    Computer Science and Engineering;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/120853/chenzhe_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】
Para-relational data (such as spreadsheets and diagrams) refers to a type of nearlyrelational data that shares the important qualities of relational data but does notpresent itself in a relational format. Para-relational data often conveys highly valuableinformation and is widely used in many different areas. If we can convert para-relationaldata into the relational format, many existing tools can be leveraged for avariety of interesting applications, such as data analysis with relational query systemsand data integration applications.This dissertation aims to convert para-relational data into a high-quality relationalform with little user assistance. We have developed four standalone systems, eachaddressing a specific type of para-relational data. Senbazuru is a prototype spreadsheetdatabase management system that extracts relational information from a largenumber of spreadsheets. Anthias is an extension of the Senbazuru system to converta broader range of spreadsheets into a relational format. Lyretail is an extractionsystem to detect long-tail dictionary entities on webpages. Finally, DiagramFlyer isa web-based search system that obtains a large number of diagrams automaticallyextracted from web-crawled PDFs. Together, these four systems demonstrate thatconverting para-relational data into the relational format is possible today, and alsosuggest directions for future systems.
【 预 览 】
附件列表
Files Size Format View
Information Extraction on Para-Relational Data. 3710KB PDF download
  文献评价指标  
  下载次数:17次 浏览次数:50次