How does uncertainty affect a robot when attempting to generate a control policy to achieve some objective? How sensitive is the obtained control policy to perturbations? These are the central questions addressed in this dissertation. For most real-world robotic systems, the state of the system is observed only indirectly through limited sensor modalities. Since the actual state of the robot is not fully observable, partially observable information is all that is available to infer the state of the system. Further complicating matters, the system may be subject to disturbances that not only perturb the evolution of the system but also perturb the sensor data. Determining policies to effectively and efficiently govern the behavior of the system relative to a stated objective becomes computationally burdensome and, for many systems, impractical for the exact case. Thus, much research has been devoted to determining approximately optimal solutions for these partially observed Markov decision processes (POMDPs). The techniques presented herein exploit the inherent insensitivity in POMDPs based on the notion that small changes in a policy have little impact on the quality of the solution except at a small set of critical points. First, a hierarchical method for determining nearly optimal policies is presented that achieves temporal and spatial abstraction though local approximations. Through a mixed simulation and analytic representation, a directed graph is generated to determine the underlying POMDP structure. The result is a multi-query method for generating the structural representation offline. The graph is generated by randomly sampling vertices. Local policies are then used to connect to the newly added vertices. A new edge is added if the local policy was successful. By continuing to extend the graph at each iteration of the algorithm, a sparse representation is obtained. Theoretical and simulation-based results are provided to demonstrate the effectiveness of this approach. The second technique extends the methodology of the first technique to an anytime algorithm. Adaptive sampling is used to quickly and effective determine nearly optimal policies. Between exploitation and exploration sampling, the structural representation is expanded based on inductive bias on the past performance of the sampling algorithm in the neighborhood of a perspective sample. In this way, we are able to preferentially sample policies that are both more likely to result in better exploration and also more likely to increase the connectivity in a region of the space that has a lower cost. Finally, a perturbation analysis framework is developed. This serves two purposes. First, the derived analysis is used to support the hypothesis that POMDPs are often insensitive and to identify when they are not. Secondly, the perturbation analysis framework enables the chaining of forecasted evolutions together into a compact representation. This compact representation provides even greater temporal and spatial abstraction in an analytic representation.
【 预 览 】
附件列表
Files
Size
Format
View
Exploiting insensitivity in stochastic systems to learn approximately optimal policies