Automatic State Construction using Decision Tree for Reinforcement Learning Agents
Au, Manix & Maire, Frederic D. (2004) Automatic State Construction using Decision Tree for Reinforcement Learning Agents. In Mohammadian, Masoud (Ed.) International Conference on Computational Intelligence for Modelling Control and Automation - CIMCA'2004, 12 - 14 July 2004, Sheraton Mirage Hotel, Gold Coast, Australia.
Reinforcement Learning (RL) is a learning framework for modelling an agent and its interaction with its environment through actions, perceptions, and rewards. Intelligent agents should choose actions after every perception, such that their long-term reward is maximized. A well defined framework for this interaction is the partially observable Markov decision process model (POMDP). Unfortunately solving POMDPs is an intractable problem. To overcome the problem of partial observability, McCallum introduced the U-tree, a RL algorithm that uses selective attention and short-term memory to simultaneously address the intertwined problems of large perceptual state spaces and hidden states. A U-tree embodies the policy and the state representation of the environment of the agent. A U-tree combines the advantages of instance-based learning with robust statistical tests for separating noise from task structure. In this paper, we consider an alternative approach for the feature selection of past events for the construction of the state representation. We apply information theory and decision tree techniques to derive a variation of the U-tree. The relevance of the candidate features is assessed by ranking the information gain ratio with respect to the cumulative expected reward. Experiments carried on three different RL tasks demonstrate that our variant of the U-tree produces a more robust state representation and faster learning. This better performance can be explained by the fact that the information gain ratio exhibits a lower variance in return prediction than the Kolmogorov-Smirnov statistical test used in the original U-tree algorithm.
Impact and interest:
Citation countsare sourced monthly fromand citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloadsdisplays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Item Type:||Conference Paper|
|Keywords:||Automatic State Construction, Reinforcement Learning, Decision Tree|
|Subjects:||Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000)|
|Divisions:||Past > QUT Faculties & Divisions > Faculty of Science and Technology|
|Copyright Owner:||Copyright 2004 (please consult author)|
|Deposited On:||04 Jan 2006|
|Last Modified:||29 Feb 2012 23:08|
Repository Staff Only: item control page