学位论文

【摘要】

Optimal control for the canonical model of systems with linear dynamics and quadratic operating costs (known as LQ systems) is a well-studied problem in the stochastic control literature. When the true system dynamics are unknown, an adaptive policy is required for learning the model parameters and planning a control policy simultaneously. Addressing this trade-off between accurate estimation and good control represents the main challenge in area of adaptive control. Another important issue is to prevent the system becoming destabilized (in the sense that its state grows in an uncontrolled fashion) due to lack of knowledge of the system dynamics. Asymptotically optimal approaches have been thoroughly investigated in the literature, but non-asymptotic results are few and rather incomplete. To derive such results, new concepts and technical tools need to be developed for the estimation during the stabilization period of the system.In adaptive control, the system performance is measured by the regret, which is the difference between the cost of the adaptive policy and that of the optimal control designed according to the known dynamics. In this work, we establish non-asymptotic high probability regret bounds, which are modulo a logarithmic factor, optimal, for different LQ systems with and without identifiability assumptions. We also provide high probability guarantees for a stabilization algorithm based on random linear feedbacks. The results obtained are fairly general, since the assumptions needed are those of: (i) stabilizability of the matrices encoding the system;;s dynamical, and (ii) on the heaviness of the distribution for the noise vectors. The study provides also novel results regarding the estimation of the parameters for presumably unstable Vector Autoregressive (VAR) models. In the classical literature, there are hardly any results for the unstable case, especially regarding finite sample bounds, that is the subject of this work. Our results relate the sample size required as a function of the problem dimension and key characteristics of the true underlying transition matrix and the innovation distribution. To obtain them, appropriate concentration inequalities for random matrices and for sequences of martingale differences are leveraged.

【预览】

附件列表
Files	Size	Format	View
Non-Asymptotic Adaptive Control of Linear-Quadratic Systems	852KB	PDF	download


Non-Asymptotic Adaptive Control of Linear-Quadratic Systems
Non-Asymptotic Adaptive Control;Linear Systems;Finite Time Stabilization;Reinforcement Learning;Unstable Vector Autoregressive;Finite Sample Estimation;Computer Science;Electrical Engineering;Engineering (General);Industrial and Operations Engineering;Mathematics;Statistics and Numeric Data;Engineering;Science;Statistics
Shirani Faradonbeh, Mohamad KazemKeener, Robert W ;
University of Michigan
关键词: Non-Asymptotic Adaptive Control; Linear Systems; Finite Time Stabilization; Reinforcement Learning; Unstable Vector Autoregressive; Finite Sample Estimation; Computer Science; Electrical Engineering; Engineering (General); Industrial and Operations Engineering; Mathematics; Statistics and Numeric Data; Engineering; Science; Statistics;
Others : https://deepblue.lib.umich.edu/bitstream/handle/2027.42/140882/shirany_1.pdf?sequence=1&isAllowed=y
瑞士\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：17次	浏览次数：20次

【 摘 要 】

【 预 览 】

【摘要】

【预览】