2nd Intl. Workshop DMDW'2000 at CAiSE*00 | |
Automated Dimensionality Reduction of Data Warehouses | |
计算机科学;社会科学(总论) | |
Mark Last ; Oded Maimon | |
Others : http://CEUR-WS.org/Vol-28/paper7.pdf PID : 21254 |
|
来源: CEUR | |
【 摘 要 】
A data warehouse is designed to consolidate and maintain all attributes that are relevant for the analysis processes. Due to the rapid increase in the size of the modern operational systems, it becomes neither practical, nor necessary to load and maintain in the data warehouse every operational attribute. This paper presents a novel methodology for automated selection of the most relevantindependent attributes in a data warehouse. The method is based on the information-theoretic approach to knowledge discovery in databases. Attributes are selected by a stepwise forwardprocedure aimed at minimizing the uncertainty inthe values of key performance indicators (KPI's). Each selected attribute is assigned a score, expressing its degree of relevance. Using the method does not require any prior expertise in the domain of the data and it can be equally applied to nominal and ordinal attributes. An attribute will be included in a data warehouse schema, if it is found as relevant to at least one KPI. We demonstrate the applicability of the method by reducing the dimensionality of a direct marketing database.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Automated Dimensionality Reduction of Data Warehouses | 85KB | download |