学位论文

【摘要】

We propose a new method for learning policies for large, partially observable Markov decision processes (POMDPs) that require long time horizons for planning. Computing optimal policies for POMDPs is an intractable problem and, in practice, dimensionality renders exact solutions essentially unreachable for even small real-world systems of interest. For this reason, we restrict the policies we learn to the class of switched belief-feedback policies in a manner that allows us to introduce domain expert knowledge into the planning process. This approach has worked well for the systems on which we have tested our approach, and we conjecture that it will be useful for many real-world systems of interest.Our approach is based on a method like value iteration to learn a switching law. Because the POMDP problem is intractable, we use a Monte Carlo approximation to evaluate system behavior and optimize a switching law based on sampling. We explicitly analyze the sensitivity of expected cost (performance) with respect to perturbations introduced by our approximations, and use that analysis to avoid approximation errors that are potentially disastrous when using the computed policy. We demonstrate results on discrete POMDP problems from the literature and a resource allocation problem modeled after a team of robots attempting to extinguish a forest fire.We then utilize our approach to build two algorithms that solve the minimum uncertainty robot navigation problem. We demonstrate that our approach can improve on techniques in the literature in terms of solution quality by demonstrating our results in simulation. Our second approach utilizes information-theoretic heuristics to drive the sampling-based learning procedure. We conjecture that this is a useful direction towards an efficient, general stochastic motion planning algorithm.

【预览】

附件列表
Files	Size	Format	View
Optimization for stochastic, partially observed systems using a sampling-based approach to learn switched policies	1168KB	PDF	download


Optimization for stochastic, partially observed systems using a sampling-based approach to learn switched policies
Partially Observable Markov Decision Processes (POMDP);robotics;machine learning;motion planning;stochastic control
Candido, Salvatore J.
关键词: Partially Observable Markov Decision Processes (POMDP); robotics; machine learning; motion planning; stochastic control;
Others : https://www.ideals.illinois.edu/bitstream/handle/2142/26081/Candido_Salvatore.pdf?sequence=1&isAllowed=y
美国\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：17次	浏览次数：13次

【 摘 要 】

【 预 览 】

【摘要】

【预览】