Massive Data bring new opportunities and challenges to data scientists and statisticians.On one hand,Massive Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the size and dimensionality of Massive Data introduce unique statistical challenges and consequences for model misspecification. Some important factors are as follows.Complexity: Since Massive Data are often aggregated from multiple sources, they often exhibit heavy-tailedness behavior with nontrivial tail dependence. Noise: Massive Data usually contain various types of measurement error, outliers, and missing values. Dependence: In many data types, such as financial time series, functional magnetic resonance image(fMRI), and time course microarray data, the samples are dependent with relatively weak signals. These challenges are difficult to address and require new computational and statistical tools. More specifically, to handle these challenges, it is necessary to develop statistical methods that are robust to data complexity, noise, and dependence. Our work aims to make headway in resolving these issues. Notably, we give a unified framework for analyzing high dimensional, complex, noisy datasets having temporal/spatial dependence. The proposed methods enjoy good theoretical properties. Their empirical usefulness is also verified in large-scale neuroimage and financial data analysis.
【 预 览 】
附件列表
Files
Size
Format
View
Large-Scale Nonparametric and Semiparametric Inference for Large, Complex, and Noisy Datasets