LSPI: Least-Squares Policy Iteration

Introduction

Least-Squares Policy Iteration (LSPI) is a reinforcement learning algorithm designed to solve control problems. It uses value function approximation to cope with large state spaces and batch processing for efficient use of training data. LSPI has been used successfully to solve several large scale problems using relatively few training data. This page contains information about LSPI, examples, research papers, and a code distribution that can be used for academic and/or research purposes.


Authors

Michail G. Lagoudakis
Ph.D. Candidate, Department of Computer Science, Duke University
mgl @ cs . duke . edu

Ronald Parr
Assistant Professor, Department of Computer Science, Duke University
parr @ cs . duke . edu

Papers

This is the paper that introduced LSPI:
Model-Free Least-Squares Policy Iteration
Michail G. Lagoudakis and Ronald Parr
Proceedings of NIPS*2001: Neural Information Processing Systems: Natural and Synthetic
Vancouver, BC, December 2001, pp. 1547-1554.

A longer journal version is also available:

Least-Squares Policy Iteration
Michail G. Lagoudakis and Ronald Parr
Journal of Machine Learning Research, 4, 2003, pp. 1107-1149.

Several other papers have been published since then. They are available from Michail's publications page.


LSPI Code Distribution

This is a MatLab implementation of LSPI with certain parts written in C. It should run on any Unix or Linux architecture with MatLab installed without any problems. It has not been tested on a Windows machine.

At the moment, the distribution includes the core LSPI code, the chain and the pendulum domain. Additional domains will be added soon. Check back for updates!

Distribution and use of this code is subject to the following agreement:
This Program is provided by Duke University and the authors as a service to the research community. It is provided without cost or restrictions, except for the User's acknowledgement that the Program is provided on an "As Is" basis and User understands that Duke University and the authors make no express or implied warranty of any kind.  Duke University and the authors specifically disclaim any implied warranty or merchantability or fitness for a particular purpose, and make no representations or warranties that the Program will not infringe the intellectual property rights of others. The User agrees to indemnify and hold harmless Duke University and the authors from and against any and all liability arising out of User's use of the Program.

LSPI


CHAIN


PENDULUM


Email Michail at  mgl @ cs . duke . edu  if you encounter any problems.