Bayesian controller fusion: Leveraging control priors in deep reinforcement learning for robotics

, , , , , & (2023) Bayesian controller fusion: Leveraging control priors in deep reinforcement learning for robotics. International Journal of Robotics Research, 42(3), pp. 123-146.

Open access copy at publisher website

Description

We present Bayesian Controller Fusion (BCF): a hybrid control strategy that combines the strengths of traditional hand-crafted controllers and model-free deep reinforcement learning (RL). BCF thrives in the robotics domain, where reliable but suboptimal control priors exist for many tasks, but RL from scratch remains unsafe and data-inefficient. By fusing uncertainty-aware distributional outputs from each system, BCF arbitrates control between them, exploiting their respective strengths. We study BCF on two real-world robotics tasks involving navigation in a vast and long-horizon environment, and a complex reaching task that involves manipulability maximisation. For both these domains, simple handcrafted controllers exist that can solve the task at hand in a risk-averse manner but do not necessarily exhibit the optimal solution given limitations in analytical modelling, controller miscalibration and task variation. As exploration is naturally guided by the prior in the early stages of training, BCF accelerates learning, while substantially improving beyond the performance of the control prior, as the policy gains more experience. More importantly, given the risk-aversity of the control prior, BCF ensures safe exploration and deployment, where the control prior naturally dominates the action distribution in states unknown to the policy. We additionally show BCF’s applicability to the zero-shot sim-to-real setting and its ability to deal with out-of-distribution states in the real world. BCF is a promising approach towards combining the complementary strengths of deep RL and traditional robotic control, surpassing what either can achieve independently. The code and supplementary video material are made publicly available at https://krishanrana.github.io/bcf.

Impact and interest:

11 citations in Scopus
9 citations in Web of Science®
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 239326
Item Type: Contribution to Journal (Journal Article)
Refereed: Yes
ORCID iD:
Rana, Krishanorcid.org/0000-0002-9028-9295
Haviland, Jesseorcid.org/0000-0002-1227-7459
Talbot, Benorcid.org/0000-0002-5670-1928
Milford, Michaelorcid.org/0000-0002-5162-1793
Sünderhauf, Nikoorcid.org/0000-0001-5286-3789
Additional Information: Funding Information: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the Queensland University of Technology (QUT) through the Centre for Robotics and Australian Research Council Centre of Excellence for Robotic Vision (project number CE140100016). This work was partially supported by an Australian Research Council Discovery Project (project number DP220102398).
Measurements or Duration: 24 pages
Keywords: behavioural priors, Deep reinforcement learning, hybrid control, robot control, safe reinforcement learning, sample-efficient learning
DOI: 10.1177/02783649231167210
ISSN: 0278-3649
Pure ID: 130802585
Divisions: Current > Research Centres > Centre for Robotics
Current > QUT Faculties and Divisions > Faculty of Engineering
Current > Schools > School of Electrical Engineering & Robotics
Funding Information: We acknowledge continued support from the Queensland University of Technology (QUT) through the Centre for Robotics. This research was supported by the Australian Research Council Centre of Excellence for Robotic Vision (project number CE140100016). This work was partially supported by an Australian Research Council Discovery Project (project number DP220102398). The authors would like to thank Jake Bruce, Robert Lee, Mingda Xu, Dimity Miller, Thomas Coppin and Jordan Erskine for their valuable and insightful discussions towards this contribution. The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the Queensland University of Technology (QUT) through the Centre for Robotics and Australian Research Council Centre of Excellence for Robotic Vision (project number CE140100016). This work was partially supported by an Australian Research Council Discovery Project (project number DP220102398).
Funding:
Copyright Owner: The Author(s) 2023
Copyright Statement: This work is covered by copyright. Unless the document is being made available under a Creative Commons Licence, you must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the document is available under a Creative Commons License (or other specified license) then refer to the Licence for details of permitted re-use. It is a condition of access that users recognise and abide by the legal requirements associated with these rights. If you believe that this work infringes copyright please provide details by email to qut.copyright@qut.edu.au
Deposited On: 26 Apr 2023 03:51
Last Modified: 09 Feb 2025 10:33