学位论文详细信息
Discovery driven analysis on semi-structured text data
computer science;Information Science;Data Mining;Text Mining;Online analytical processing (OLAP);discovery driven analysis;Probabilistic latent semantic analysis (PLSA)
Hauguel, Samson A. ; Zhai ; ChengXiang
关键词: computer science;    Information Science;    Data Mining;    Text Mining;    Online analytical processing (OLAP);    discovery driven analysis;    Probabilistic latent semantic analysis (PLSA);   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/16180/2_Hauguel_Samson.pdf?sequence=3&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Discovery Driven Analysis(DDA) isacommonfeatureof OLAP technologytoanalyze structured data.Inessence, DDA helps analyststo discover anomalous data by highlighting 'unexpected'valuesintheOLAPcube.Bygivingindicationstotheanalystonwhat dimensionstoexplore,DDAspeedsuptheprocessofdiscoveringanomaliesandtheir causes. However, Discovery Driven Analysis(and OLAPingeneral)isonlyapplicableon structured data, such as records in databases. We propose a system to extend DDA technology tosemi-structuredtextdocuments,thatis,textdocuments withafewstructureddata. Our systempipelineconsistsoftwostages:first,thetextpartofeachdocumentisstructured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDAto these fullystructureddocuments, thus enablingDDAontextdocuments.Wepresentsome applicationsofthissystemin OLAPanalysisandshowhowscalabilityissuesaresolved. Resultsshowthatoursystemcanhandlereasonabledatasetsofdocuments,inrealtime, without any need for pre-computation.

【 预 览 】
附件列表
Files Size Format View
Discovery driven analysis on semi-structured text data 926KB PDF download
  文献评价指标  
  下载次数:13次 浏览次数:27次