Automatic State Construction using Decision Trees for Reinforcement Learning Agents
Au, Manix (2005) Automatic State Construction using Decision Trees for Reinforcement Learning Agents. .
Reinforcement Learning (RL) is a learning framework in which an agent learns a policy from continual interaction with the environment. A policy is a mapping from states to actions. The agent receives rewards as feedback on the actions performed. The objective of RL is to design autonomous agents to search for the policy that maximizes the expectation of the cumulative reward.
When the environment is partially observable, the agent cannot determine the states with certainty. These states are called hidden in the literature. An agent that relies exclusively on the current observations will not always find the optimal policy. For example, a mobile robot needs to remember the number of doors went by in order to reach a specific door, down a corridor of identical doors.
To overcome the problem of partial observability, an agent uses both current and past (memory) observations to construct an internal state representation, which is treated
as an abstraction of the environment.
This research focuses on how features of past events are extracted with variable granularity regarding the internal state construction. The project introduces a new method that applies Information Theory and decision tree technique to derive a tree structure, which represents the state and the policy. The relevance, of a candidate feature, is assessed by the Information Gain Ratio ranking with respect to the cumulative expected reward.
Experiments carried out on three different RL tasks have shown that our variant of the U-Tree (McCallum, 1995) produces a more robust state representation and faster learning. This better performance can be explained by the fact that the Information Gain Ratio exhibits a lower variance in return prediction than the Kolmogorov-Smirnov statistical test used in the original U-Tree algorithm.
Impact and interest:
Citation countsare sourced monthly fromand citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloadsdisplays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Item Type:||QUT Thesis (Masters by Research)|
|Supervisor:||Maire, Frederic& Sitte, Joaquin|
|Keywords:||Reinforcement learning, State, Action, Reward, Policy, Value based method, Policy search method, Automatic state construction, Decision tree, Partial observability, U-Tree, Kolmogorov-Smirnov two sample test, Information gain ratio test|
|Divisions:||Past > QUT Faculties & Divisions > Faculty of Science and Technology|
|Department:||Faculty of Information Technology|
|Institution:||Queensland University of Technology|
|Copyright Owner:||Copyright Manix Au|
|Deposited On:||03 Dec 2008 13:54|
|Last Modified:||29 Oct 2011 05:41|
Repository Staff Only: item control page