学位论文

【摘要】

The first part of this research program concernsthe development of customized and easily implementable Probably Approximately Correct (PAC)-learning algorithms for episodic tasks over acyclic state spaces. The defining characteristic of our algorithms is that they take explicitly into consideration the acyclic structure of the underlying state space and the episodic nature of the considered learning task. The first of the above two attributes enables a very straightforward and efficient resolution of the ``exploration vs exploitation' dilemma, while the second provides a natural regenerating mechanism that is instrumental in the dynamics of our algorithms. Some additional characteristics that distinguish our algorithms from those developed in the past literature are (i) their direct nature, that eliminates the need of a complete specification of the underlying MDP model and reduces their execution to a very simple computation, and (ii) the unique emphasis that they place in the efficient implementation of the sampling process that is defined by their PAC property.More specifically, the aforementioned PAC-learning algorithms complete their learning task by implementing a systematic episodic sampling schedule on the underlying acyclic state space. This sampling schedule combined with the stochastic nature ofthe transitions taking place, definethe need for efficient routing policies that will help thealgorithms complete their exploration program while minimizing, in expectation,thenumber of executed episodes.The design of an optimal policy thatwill satisfy a specified pattern of arc visitation requirements inan acyclic stochastic graph, while minimizing the expected numberof required episodes, is a challenging problem, even under theassumption that all the branching probabilities involved are knowna priori. Hence, the sampling process that takes place inthe proposed PAC-learning algorithms gives rise to a novel, veryinteresting stochastic control/scheduling problem, that ischaracterized as the problem of the Optimal Node Visitation(ONV) in acyclic stochastic digraphs. The second part of the work presented herein seeks the systematic modelling and analysis of the ONV problem.The last part of this research program explores the computational meritsobtained by heuristical implementations that result from the integration ofthe ONV problem developments into the PAC-algorithms developed in the first part of this work. We study, throughnumerical experimentation, the relative performance of these resulting heuristical implementations in comparison to (i) the initial version of the PAC-learning algorithms,presented in the first part of the research program, and (ii) standard Q-learning algorithm variations provided in the RL literature. The work presented in this last part reinforces and confirms the driving assumption of this research, i.e., that one can design customized RL algorithms of enhanced performance if the underlying problem structure is taken into account.

【预览】

附件列表
Files	Size	Format	View
Efficient pac-learning for episodic tasks with acyclic state spaces and the optimal node visitation problem in acyclic stochastic digaphs.	1203KB	PDF	download


Efficient pac-learning for episodic tasks with acyclic state spaces and the optimal node visitation problem in acyclic stochastic digaphs.
Computational complexity;Stochastic control;Approximate dynamic programming;Dynamic programming;PAC learning;Scheduling;Fluid relaxation;Reinforcement learning
Bountourelis, Theologos ; Industrial and Systems Engineering
University:Georgia Institute of Technology
Department:Industrial and Systems Engineering
关键词: Computational complexity; Stochastic control; Approximate dynamic programming; Dynamic programming; PAC learning; Scheduling; Fluid relaxation; Reinforcement learning;
Others : https://smartech.gatech.edu/bitstream/1853/28144/1/theologos_bountourelis_200905_phd.pdf
美国\|英语
来源: SMARTech Repository
PDF


	文献评价指标
	下载次数：16次	浏览次数：21次

【 摘 要 】

【 预 览 】

【摘要】

【预览】