One practical algorithm for both stochastic and adversarial bandits

Seldin, Yevgeny & Slivkins, Aleksandrs (2014) One practical algorithm for both stochastic and adversarial bandits. In 31st International Conference on Machine Learning, 21 - 26 June 2014, Beijing, China.

View at publisher (open access)

Abstract

We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stochastic and adversarial regimes without prior knowledge about the nature of the environment. Our algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm. The algorithm simultaneously applies the “old” control lever, the learning rate, to control the regret in the adversarial regime and the new control lever to detect and exploit gaps between the arm losses. This secures problem-dependent “logarithmic” regret when gaps are present without compromising on the worst-case performance guarantee in the adversarial regime. We show that the algorithm can exploit both the usual expected gaps between the arm losses in the stochastic regime and deterministic gaps between the arm losses in the adversarial regime. The algorithm retains “logarithmic” regret guarantee in the stochastic regime even when some observations are contaminated by an adversary, as long as on average the contamination does not reduce the gap by more than a half. Our results for the stochastic regime are supported by experimental validation.

Impact and interest:

0 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 88822
Item Type: Conference Paper
Refereed: Yes
Divisions: Current > Schools > School of Mathematical Sciences
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Deposited On: 14 Oct 2015 04:42
Last Modified: 14 Oct 2015 04:42

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page