We study the average cost Linear Quadratic (LQ) control problem with unknown model parameters, also known as the adaptive control problem in the control community. We design an algorithm and prove that apart from logarithmic factors its regret up to time T is O(T ). Unlike previous approaches that use a forcedexploration scheme, we construct a highprobability confidence set around the model parameters and design an algorithm that plays optimistically with respect to this confidence set. The construction of the confidence set is based on the recent results from online leastsquares estimation and leads to improved worstcase regret bound for the proposed algorithm. To the best of our knowledge this is
【 预 览 】
附件列表
Files
Size
Format
View
Regret Bounds for the Adaptive Control of Linear Quadratic Systems