Dynamic programming is a principal method for analyzing stochastic optimal control problems. However, the exact computation of dynamic programming can be intractable in large-scale problems due to the "curse of dimensionality". Various approximate dynamic programmingmethods have been proposed to address this issue and they can often generate suboptimal policies, though it is generally difficult to tell how far these suboptimal policies are from optimal. To this end, this thesis concerns withstudying the stochastic control problems from a duality perspective and generating upper boundson maximal rewards, which complements lower bounds on maximal rewardsthat can be derived by simulation under heuristic policies. If the gap between the lower and upper bounds is small, it implies that the heuristic policy must be close to optimal. The approachof "information relaxation'' considered in this thesis,proposed by Brown et al. 2013, relaxes the non-anticipativity constraint that requires the decisions to depend only on the information available to the decision maker and impose a penalty that punishes such a violation. This thesis further explores theories of information relaxation andcomputational methodsin several stochastic optimal control problems. We first study the interaction of Lagrangian relaxation and information relaxation in weakly coupled dynamic program. A commonly studied approach builds on the property that this high-dimensional problem can be decoupled by dualizing theresource constraints via Lagrangian relaxation. We generalize the information relaxation approach toimprove upon the Lagrangian boundanddevelop a computational method to tackle large-scale problems. Second, weformulate the information relaxation-based duality in an important class of continuous-time decision-making models -- controlled Markov diffusion. We find that this continuous-time model admits an optimal penalty in compact expression -- an Ito stochastic integral, which enables us to construct approximate penalties in simple forms and achieve tight dual bounds, and to facilitate the computation of dual bounds significantly compared with that of the discrete-time model. Finally, we consider the problem of optimal stopping of discrete-time continuous-state partially observable Markov processes. We develop a filtering-based dual approach, which relies on the martingale duality formulation of the optimal stopping problem and the particle filtering technique.
【 预 览 】
附件列表
Files
Size
Format
View
Information relaxation in stochastic optimal control