Preliminary Exam Talks

Non-parametric Approximate Linear Programming for MDPs

Speaker:Jason Pazis
jpazis at cs.duke.edu
Date: Tuesday, August 28, 2012
Time: 9:00am - 11:00am
Location: D344 LSRC, Duke

Abstract

One of the most difficult tasks in value function based methods for Reinforcement Learning is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. This thesis presents a novel Non-Parametric approach to Approximate Linear Programming (NP-ALP), which requires nothing more than a smoothness assumption on the value function. NP-ALP can make use of real-world noisy sampled transitions rather than requiring samples from the full Bellman equation, while providing the first known max-norm performance guarantees for ALP, and finite sample convergence under mild assumptions. Additionally NP-ALP is amenable to problems with large (multidimensional) or even infinite (continuous) action spaces, and does not require a model to select actions using the resulting approximate solution.
Advisor(s): Ronald Parr
Vincent Conitzer, Mauro Maggioni, Silvia Ferrari