Multiple instance regression (MIR) operates on a collection of bags, where each bag contains multiple instances sharing an identical real-valued label. Only few instances, called primary instances, contribute to the bag labels. The remaining instances are noise and outliers observations. The goal in MIR is to identify the primary instances within each bag and learn a regression model that can predict the label of a previously unseen bag. In this thesis, we introduce an algorithm that uses robust fuzzy clustering with an appropriate distance to learn multiple linear models from a noisy feature space simultaneously. We show that fuzzy memberships are useful in allowing instances to belong to multiple models, while possibilistic memberships allow identification of the primary instances of each bag with respect to each model. We also use possibilistic memberships to identify and ignore noisy instances and determine the optimal number of regression models. We evaluate our approach on a series of synthetic data sets, remote sensing data to predict the yearly average yield of a crop and application to drug activity prediction. We show that our approach achieves higher accuracy than existing methods.
【 预 览 】
附件列表
Files
Size
Format
View
Robust fuzzy clustering for multiple instance regression.