QUT ePrints

REGAL : a regularization based algorithm for reinforcement learning in weakly communicating MDPs

Bartlett, Peter L. & Tewari, Ambuj (2009) REGAL : a regularization based algorithm for reinforcement learning in weakly communicating MDPs. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009)), McGill University, Montreal.

View at publisher

Abstract

We provide an algorithm that achieves the optimal regret rate in an unknown weakly communicating Markov Decision Process (MDP). The algorithm proceeds in episodes where, in each episode, it picks a policy using regularization based on the span of the optimal bias vector. For an MDP with S states and A actions whose optimal bias vector has span bounded by H, we show a regret bound of ~ O(HS p AT ). We also relate the span to various diameter-like quantities associated with the MDP, demonstrating how our results improve on previous regret bounds.

Impact and interest:

7 citations in Scopus
Search Google Scholar™

Citation countsare sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 45708
Item Type: Conference Paper
Additional URLs:
Keywords: algorithm, optimal regret rate, Markov Decision Process (MDP)
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100)
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Past > Schools > Mathematical Sciences
Copyright Owner: Copyright 2009 [please consult the authors]
Deposited On: 06 Sep 2011 08:28
Last Modified: 06 Sep 2011 08:28

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page