Options are important instruments in mod ern finance. In this paper, we investi gate reinforcement learning (RL) methods— in particular, leastsquares policy iteration (LSPI)—for the problem of learning exercise policies for American options. We develop finitetime bounds on the performance of the policy obtained with LSPI and compare LSPI and the fitted Qiteration algorithm (FQI) with the LongstaffSchwartz method (LSM), the standard leastsquares Monte Carlo algo rithm from the finance community. Our em pirical results show that the exercise policies discovered by LSPI and FQI gain larger pay offs than those discovered by LSM, on both real and synthetic data. Furthermore, we find that for all methods the policies learned from real data generally gain similar pay offs to the policies learned from simulated data. Our work shows that solution methods developed in machine learning can advance the stateoftheart in an important and chal lenging application area, while demonstrat ing that computational finance remains a promising area for future applications of ma