期刊论文详细信息
Journal of Electrical and Computer Engineering Innovations 卷:7
Faster Convergence of Modified Policy Iteration for Quantitative Verification of Markov Decision Processes
M.S. Mohagheghi1 
[1] Department of Computer science, Faculty of Mathematical Sciences, Vali-e-Asr University ofRafsanjan, Rafsanjan, Iran;
关键词: expected rewards;    markov decision processes;    modified policy iteration;    probabilistic model checking;    reachability probabilities;   
DOI  :  10.22061/jecei.2020.5395.216
来源: DOAJ
【 摘 要 】

Background and Objectives: Numerical iterative methods are widely used to compute reachability probabilities and expected rewards in probabilistic model checking of Markov decision processes. Several approaches have been proposed to improve the performance of these iterative methods. Reducing the total number of iterations is an important goal that is followed by several techniques in this paper.Methods: In this paper, we consider MDPs with different levels of non-determinism. We show that modified policy iteration performs better than the other standard iterative methods when the degree of non-determinism increases. We propose some novel methods to improve the performance of the modified policy iteration method. Our approach is to define several criteria to avoid useless iterations and updates of the modified policy iteration method. The first criterion considers the graphical structure of the model to define the number of iterations after each policy modification. The second criterion is a dynamic one to determine when a policy should be updated. The third proposed approach in this work is to use several priority heaps to select states for value updates.Results: Our proposed methods are implemented in the PRISM model checker and applied on several standard case studies. The results of experiments show that the running times of our approaches are less than the running time of the standard and previous methods for most case studies.  The main reason for these results is that the total numbers of iterations and updates are reduced by our approaches, which results in an improvement in the performance of the method. While the running times are reduced by our approaches, the precision of computations are kept in most cases.Conclusion: The proposed techniques are able to reduce the number of iterations and accelerate the convergence to the optimal policies. The also prioritize the commutations to reduce the total number of updates.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次