Optimistic linear programming gives logarithmic regret for irreducible MDPs
Tewari, Ambuj & Bartlett, Peter L. (2008) Optimistic linear programming gives logarithmic regret for irreducible MDPs. In Platt, John, Koller, Daphne, Singer, Yoram, & Rowies, Sam (Eds.) Advances in Neural Information Processing Systems 20 (NIPS) , 2008, Cambridge, MA.
Abstract
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). OLP uses its experience so far to estimate the MDP. It chooses actions by optimistically maximizing estimated future rewards over a set of nextstate transition probabilities that are close to the estimates, a computation that corresponds to solving linear programs. We show that the total expected reward obtained by OLP up to time T is within C(P) log T of the reward obtained by the optimal policy, where C(P) is an explicit, MDPdependent constant. OLP is closely related to an algorithm proposed by Burnetas and Katehakis with four key differences: OLP is simpler, it does not require knowledge of the supports of transition probabilities, the proof of the regret bound is simpler, but our regret bound is a constant factor larger than the regret of their algorithm. OLP is also similar in flavor to an algorithm recently proposed by Auer and Ortner. But OLP is simpler and its regret bound has a better dependence on the size of the MDP.
Impact and interest:
Citation counts are sourced monthly from Scopus and Web of Science® citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.
ID Code:  45645 

Item Type:  Conference Paper 
Refereed:  Yes 
Additional Information:  A free fulltext of the paper is available from the link above. 
Additional URLs: 

Keywords:  MDPs, Optimistic Linear Programming 
Subjects:  Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > INFORMATION SYSTEMS (080600) 
Divisions:  Past > QUT Faculties & Divisions > Faculty of Science and Technology Past > Schools > Mathematical Sciences 
Deposited On:  01 Sep 2011 00:13 
Last Modified:  01 Sep 2011 00:16 
Export: EndNote  Dublin Core  BibTeX
Repository Staff Only: item control page