Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
Anderson, Peter, Wu, Qi, Teney, Damien, Bruce, Jacob, Johnson, Mark, Suenderhauf, Niko, Reid, Ian, Gould, Stephen, & Van Den Hengel, Anton (2018) Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments. In Oliva, A, Laptev, I, Forsyth, D, & Ramanan, D (Eds.) Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation, http://openaccess.thecvf.com/CVPR2018.py, pp. 3674-3683.
|
Accepted Version
(PDF 9MB)
124633.pdf. |
Description
A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator -- a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -- the Room-to-Room (R2R) dataset.
Impact and interest:
Citation counts are sourced monthly from Scopus and Web of Science® citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.
Full-text downloads:
Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
| ID Code: | 124633 | ||
|---|---|---|---|
| Item Type: | Chapter in Book, Report or Conference volume (Conference contribution) | ||
| ORCID iD: |
|
||
| Measurements or Duration: | 10 pages | ||
| Event Title: | IEEE Conference on Computer Vision and Pattern Recognition | ||
| Event Dates: | 2018-06-18 - 2018-06-22 | ||
| Event Location: | Salt Lake City, United States | ||
| DOI: | 10.1109/CVPR.2018.00387 | ||
| Pure ID: | 33310987 | ||
| Divisions: | Past > Institutes > Institute for Future Environments Past > QUT Faculties & Divisions > Science & Engineering Faculty |
||
| Funding: | |||
| Copyright Owner: | Consult author(s) regarding copyright matters | ||
| Copyright Statement: | This work is covered by copyright. Unless the document is being made available under a Creative Commons Licence, you must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the document is available under a Creative Commons License (or other specified license) then refer to the Licence for details of permitted re-use. It is a condition of access that users recognise and abide by the legal requirements associated with these rights. If you believe that this work infringes copyright please provide details by email to qut.copyright@qut.edu.au | ||
| Deposited On: | 15 Jan 2019 14:26 | ||
| Last Modified: | 07 Jun 2026 08:54 |
Export: EndNote | Dublin Core | BibTeX
Repository Staff Only: item control page