Discovery Driven Analysis(DDA) isacommonfeatureof OLAP technologytoanalyze structured data.Inessence, DDA helps analyststo discover anomalous data by highlighting 'unexpected'valuesintheOLAPcube.Bygivingindicationstotheanalystonwhat dimensionstoexplore,DDAspeedsuptheprocessofdiscoveringanomaliesandtheir causes. However, Discovery Driven Analysis(and OLAPingeneral)isonlyapplicableon structured data, such as records in databases. We propose a system to extend DDA technology tosemi-structuredtextdocuments,thatis,textdocuments withafewstructureddata. Our systempipelineconsistsoftwostages:first,thetextpartofeachdocumentisstructured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDAto these fullystructureddocuments, thus enablingDDAontextdocuments.Wepresentsome applicationsofthissystemin OLAPanalysisandshowhowscalabilityissuesaresolved. Resultsshowthatoursystemcanhandlereasonabledatasetsofdocuments,inrealtime, without any need for pre-computation.
【 预 览 】
附件列表
Files
Size
Format
View
Discovery driven analysis on semi-structured text data