One practical algorithm for both stochastic and adversarial bandits
![]() |
Published Version
(PDF 812kB)
88822.pdf. |
Open access copy at publisher website
Description
We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stochastic and adversarial regimes without prior knowledge about the nature of the environment. Our algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm. The algorithm simultaneously applies the “old” control lever, the learning rate, to control the regret in the adversarial regime and the new control lever to detect and exploit gaps between the arm losses. This secures problem-dependent “logarithmic” regret when gaps are present without compromising on the worst-case performance guarantee in the adversarial regime. We show that the algorithm can exploit both the usual expected gaps between the arm losses in the stochastic regime and deterministic gaps between the arm losses in the adversarial regime. The algorithm retains “logarithmic” regret guarantee in the stochastic regime even when some observations are contaminated by an adversary, as long as on average the contamination does not reduce the gap by more than a half. Our results for the stochastic regime are supported by experimental validation.
Impact and interest:
Citation counts are sourced monthly from Scopus and Web of Science® citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.
Full-text downloads:
Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
ID Code: | 88822 |
---|---|
Item Type: | Chapter in Book, Report or Conference volume (Conference contribution) |
Measurements or Duration: | 9 pages |
ISBN: | 1938-7228 |
Pure ID: | 32649069 |
Divisions: | Past > Institutes > Institute for Future Environments Past > QUT Faculties & Divisions > Science & Engineering Faculty |
Funding: | |
Copyright Owner: | Consult author(s) regarding copyright matters |
Copyright Statement: | This work is covered by copyright. Unless the document is being made available under a Creative Commons Licence, you must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the document is available under a Creative Commons License (or other specified license) then refer to the Licence for details of permitted re-use. It is a condition of access that users recognise and abide by the legal requirements associated with these rights. If you believe that this work infringes copyright please provide details by email to qut.copyright@qut.edu.au |
Deposited On: | 14 Oct 2015 04:42 |
Last Modified: | 02 Mar 2024 01:28 |
Export: EndNote | Dublin Core | BibTeX
Repository Staff Only: item control page