QUT ePrints

Automatic State Construction using Decision Trees for Reinforcement Learning Agents

Au, Manix (2005) Automatic State Construction using Decision Trees for Reinforcement Learning Agents. .

Abstract

Reinforcement Learning (RL) is a learning framework in which an agent learns a policy from continual interaction with the environment. A policy is a mapping from states to actions. The agent receives rewards as feedback on the actions performed. The objective of RL is to design autonomous agents to search for the policy that maximizes the expectation of the cumulative reward.

When the environment is partially observable, the agent cannot determine the states with certainty. These states are called hidden in the literature. An agent that relies exclusively on the current observations will not always find the optimal policy. For example, a mobile robot needs to remember the number of doors went by in order to reach a specific door, down a corridor of identical doors.

To overcome the problem of partial observability, an agent uses both current and past (memory) observations to construct an internal state representation, which is treated

as an abstraction of the environment.

This research focuses on how features of past events are extracted with variable granularity regarding the internal state construction. The project introduces a new method that applies Information Theory and decision tree technique to derive a tree structure, which represents the state and the policy. The relevance, of a candidate feature, is assessed by the Information Gain Ratio ranking with respect to the cumulative expected reward.

Experiments carried out on three different RL tasks have shown that our variant of the U-Tree (McCallum, 1995) produces a more robust state representation and faster learning. This better performance can be explained by the fact that the Information Gain Ratio exhibits a lower variance in return prediction than the Kolmogorov-Smirnov statistical test used in the original U-Tree algorithm.

Impact and interest:

Citation countsare sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

189 since deposited on 03 Dec 2008
14 in the past twelve months

Full-text downloadsdisplays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 15965
Item Type: QUT Thesis (Masters by Research)
Supervisor: Maire, Frederic& Sitte, Joaquin
Keywords: Reinforcement learning, State, Action, Reward, Policy, Value based method, Policy search method, Automatic state construction, Decision tree, Partial observability, U-Tree, Kolmogorov-Smirnov two sample test, Information gain ratio test
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Department: Faculty of Information Technology
Institution: Queensland University of Technology
Copyright Owner: Copyright Manix Au
Deposited On: 03 Dec 2008 13:54
Last Modified: 29 Oct 2011 05:41

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page