Contact The MIT Press Information on how to order from The MIT Press Access your saved shopping cart, e-mail list subscriptions, order history, address book, and other info in the Your Profile area MIT Press Home Page


March 1998
7 x 9, 342 pp., 108 illus.
$63.00/£38.95 (CLOTH)
Short

ISBN-10:
0-262-19398-1
ISBN-13:
978-0-262-19398-6

Series
Adaptive Computation and Machine Learning
Bradford Books
Related Links
Open this site in a new browser window.
Open this site in a new browser window.
Authors' Web page with supplementary materialOpen this site in a new browser window.
Online Solutions Manual Download RequestOpen this site in a new browser window.
Find this book in a library
Request Exam/Desk Copy
Table of Contents
< BACK
Reinforcement Learning
An Introduction
Richard S. Sutton and Andrew G. Barto

CONTENTS
SERIES FOREWORD
PREFACE
ITHE PROBLEM
1Introduction
    1.1Reinforcement Learning
    1.2Examples
    1.3Elements of Reinforcement Learning
    1.4An Extended Example: Tic-Tac-Toe
    1.5Summary
    1.6History of Reinforcement Learning
    1.7Bibliographical Remarks
2Evaluative Feedback
    2.1An n-Armed Bandit Problem
    2.2Action-Value Methods
    2.3Softmax Action Selection
    2.4Evaluation Versus Instruction
    2.5Incremental Implementation
    2.6Tracking a Nonstationary Problem
    2.7Optimistic Initial Values
    2.8Reinforcement Comparison
    2.9Pursuit Methods
    2.10Associative Search
    2.11Conclusions
    2.12Bibliographical and Historical Remarks
3The Reinforcement Learning Problem
    3.1The Agent-Environment Interface
    3.2Goals and Rewards
    3.3Returns
    3.4Unified Notation for Episodic and Continuing Tasks
    3.5The Markov Property
    3.6Markov Decision Processes
    3.7Value Functions
    3.8Optimal Value Functions
    3.9Optimality and Approximation
    3.10Summary
    3.11Bibliographical and Historical Remarks
IIELEMENTARY SOLUTION METHODS
4Dynamic Programming
    4.1Policy Evaluation
    4.2Policy Improvement
    4.3Policy Iteration
    4.4Value Iteration
    4.5Asynchronous Dynamic Programming
    4.6Generalized Policy Iteration
    4.7Efficiency of Dynamic Programming
    4.8Summary
    4.9Bibliographical and Historical Remarks
5Monte Carlo Methods
    5.1Monte Carlo Policy Evaluation
    5.2Monte Carlo Estimation of Action Values
    5.3Monte Carlo Control
    5.4On-Policy Monte Carlo Control
    5.5Evaluating One Policy While Following Another
    5.6Off-Policy Monte Carlo Control
    5.7Incremental Implementation
    5.8Summary
    5.9Bibliographical and Historical Remarks
6Temporal-Difference Learning
    6.1TD Prediction
    6.2Advantages of TD Prediction Methods
    6.3Optimality of TD(O)
    6.4Sarsa: On-Policy TD Control
    6.5Q-Learning: Off-Policy TD Control
    6.6Actor-Critic Methods
    6.7R-Learning for Undiscounted Continuing Tasks
    6.8Games, Afterstates, and Other Special Cases
    6.9Summary
    6.10Bibliographical and Historical Remarks
IIIA UNIFIED VIEW
7Eligibility Traces
    7.1n-Step TD Prediction
    7.2The Forward View of TD(l)
    7.3The Backward View of TD(l)
    7.4Equivalence of Forward and Backward Views
    7.5Sarsa(l)
    7.6Q(l)
    7.7Eligibility Traces for Actor-Critic Methods
    7.8Replacing Traces
    7.9Implementation Issues
    7.10Variable l
    7.11Conclusions
    7.12Bibliographical and Historical Remarks
8Generalization and Function Approximation
    8.1Value Prediction with Function Approximation
    8.2Gradient-Descent Methods
    8.3Linear Methods
    8.4Control with Function Approximation
    8.5Off-Policy Bootstrapping
    8.6Should We Bootstrap?
    8.7Summary
    8.8Bibliographical and Historical Remarks
9Planning and Learning
    9.1Models and Planning
    9.2Integrating Planning, Acting, and Learning
    9.3When the Model Is Wrong
    9.4Prioritized Sweeping
    9.5Full vs. Sample Backups
    9.6Trajectory Sampling
    9.7Heuristic Search
    9.8Summary
    9.9Bibliographical and Historical Remarks
10Dimensions of Reinforcement Learning
    10.1The Unified View
    10.2Other Frontier Dimensions
11Case Studies
    11.1TD-Gammon
    11.2Samuel's Checkers Player
    11.3The Acrobot
    11.4Elevator Dispatching
    11.5Dynamic Channel Allocation
    11.6Job-Shop Scheduling
REFERENCES
SUMMARY OF NOTATION
INDEX
 
Join an E-mail Alert List


 
 
TECHNOLOGY PARTNER: Azility, Inc. TERMS OF USE | PRIVACY POLICY | COPYRIGHT © 2009