In order for a machine learning effort to succeed, an appropriate model must be chosen.This is a difficult task in which one must balance flexibility, so that the model can capture the complexities of the domain, and simplicity, so that the model does not overfit to irrelevant characteristics of the training data.The optimal model is not only a function of the task to which it is applied, but also the amount of training data available.Copious training data can justify a complex model that includes many of the ``true'' domain interaction.But when training data is limited, additional simplifications are necessary.Traditional model selection techniques, that require fitting each of a number of hypothesized models to the training data before selecting one, apply in theory, but are not feasible when the number of possible models is large.In this thesis, we describe steps in a new direction for automatically adapting model flexibility.Our approach leverages prior knowledge of two forms:1) Qualitative knowledge statements, which describe positive and negative relationships between domain variables, and 2) Structural metadata, which provide categorical assignments for each training instance.In our approach, this prior knowledge is used to implicitly construct a large space of alternative well-formed models.A model adaptation procedure then utilizes the training data to conduct a directed search through the space of possible models.The search requires that relatively few models be fit to the data.Thus, the search is efficient and the risk of overfitting in the model selection process is minimized.We demonstrate our approaches on a variety of machine learning tasks, including military airspace safety prediction, planning operator construction, sports prediction, and document sentiment analysis.
【 预 览 】
附件列表
Files
Size
Format
View
Toward automatic model adaptation for structured domains