The class of Markov decision processes (MDPs) provides a popular framework which covers a wide variety of sequential decision-making problems. We consider infinite-horizon discounted MDPs with countably infinite state space and finite action space. Our goal is to establish theoretical properties and develop new solution methods for such MDPs by studying their linear programming (LP) formulations. The LP formulations have countably infinite numbers of variables and constraints and therefore are called countably infinite linear programs (CILPs). General CILPs are challenging to analyze or solve, mainly because useful theoretical properties and techniques of finite LPs fail to extend to general CILPs. Another goal of this thesis is to deepen the limited current understanding of CILPs, resulting in new algorithmic approaches to find their solutions.Recently, Ghate and Smith (2013) developed an implementable simplex-type algorithm for solving a CILP formulation of a non-stationary MDP with finite state space. We establish rate of convergence results for their simplex algorithm with a particular pivoting rule and another existing solution method for such MDPs, and compare empirical performance of the algorithms. We also present ways to accelerate their simplex algorithm. The class of non-stationary MDPs with finite state space can be considered to be a subclass of stationary MDPs with countably infinite state space. We present a simplex-type algorithm for solving a CILP formulation of a stationary MDP with countably infinite state space that is implementable (using only finite data and computation in each iteration). We show that the algorithm finds a sequence of policies that improves monotonically and converges to optimality in value, and present a numerical illustration. An important extension of MDPs considered so far are constrained MDPs, which optimize an objective function while satisfying constraints, typically on budget, quality, and so on. For constrained non-stationary MDPs with finite state space, we provide a necessary and sufficient condition for a feasible solution of its CILP formulation to be an extreme point. Since simplex-type algorithms are expected to navigate between extreme points, this result sets a foundation for developing a simplex-type algorithm for constrained non-stationary MDPs.
【 预 览 】
附件列表
Files
Size
Format
View
Analysis and Simplex-type Algorithms for Countably Infinite Linear Programming Models of Markov Decision Processes.