Contents

Copyright

Series Foreword

Preface

1     Introduction and Overview

1.1     Classification Problems and Machine Learning

1.2     Boosting

1.3     Resistance to Overfitting and the Margins Theory

1.4     Foundations and Algorithms

Summary

Bibliographic Notes

Exercises

I     CORE ANALYSIS

2     Foundations of Machine Learning

2.1     A Direct Approach to Machine Learning

2.2     General Methods of Analysis

2.3     A Foundation for the Study of Boosting Algorithms

Summary

Bibliographic Notes

Exercises

3     Using AdaBoost to Minimize Training Error

3.1     A Bound on AdaBoost’s Training Error

3.2     A Sufficient Condition for Weak Learnability

3.3     Relation to Chernoff Bounds

3.4     Using and Designing Base Learning Algorithms

Summary

Bibliographic Notes

Exercises

4     Direct Bounds on the Generalization Error

4.1     Using VC Theory to Bound the Generalization Error

4.2     Compression-Based Bounds

4.3     The Equivalence of Strong and Weak Learnability

Summary

Bibliographic Notes

Exercises

5     The Margins Explanation for Boosting’s Effectiveness

5.1     Margin as a Measure of Confidence

5.2     A Margins-Based Analysis of the Generalization Error

5.3     Analysis Based on Rademacher Complexity

5.4     The Effect of Boosting on Margin Distributions

5.5     Bias, Variance, and Stability

5.6     Relation to Support-Vector Machines

5.7     Practical Applications of Margins

Summary

Bibliographic Notes

Exercises

II    FUNDAMENTAL PERSPECTIVES

6     Game Theory, Online Learning, and Boosting

6.1     Game Theory

6.2     Learning in Repeated Game Playing

6.3     Online Prediction

6.4     Boosting

6.5     Application to a “Mind-Reading” Game

Summary

Bibliographic Notes

Exercises

7     Loss Minimization and Generalizations of Boosting

7.1     AdaBoost’s Loss Function

7.2     Coordinate Descent

7.3     Loss Minimization Cannot Explain Generalization

7.4     Functional Gradient Descent

7.5     Logistic Regression and Conditional Probabilities

7.6     Regularization

7.7     Applications to Data-Limited Learning

Summary

Bibliographic Notes

Exercises

8     Boosting, Convex Optimization, and Information Geometry

8.1     Iterative Projection Algorithms

8.2     Proving the Convergence of AdaBoost

8.3     Unification with Logistic Regression

8.4     Application to Species Distribution Modeling

Summary

Bibliographic Notes

Exercises

III  ALGORITHMIC EXTENSIONS

9     Using Confidence-Rated Weak Predictions

9.1     The Framework

9.2     General Methods for Algorithm Design

9.3     Learning Rule-Sets

9.4     Alternating Decision Trees

Summary

Bibliographic Notes

Exercises

10   Multiclass Classification Problems

10.1   A Direct Extension to the Multiclass Case

10.2   The One-against-All Reduction and Multi-label Classification

10.3   Application to Semantic Classification

10.4   General Reductions Using Output Codes

Summary

Bibliographic Notes

Exercises

11   Learning to Rank

11.1   A Formal Framework for Ranking Problems

11.2   A Boosting Algorithm for the Ranking Task

11.3   Methods for Improving Efficiency

11.4   Multiclass, Multi-label Classification

11.5   Applications

Summary

Bibliographic Notes

Exercises

IV  ADVANCED THEORY

12   Attaining the Best Possible Accuracy

12.1   Optimality in Classification and Risk Minimization

12.2   Approaching the Optimal Risk

12.3   How Minimizing Risk Can Lead to Poor Accuracy

Summary

Bibliographic Notes

Exercises

13   Optimally Efficient Boosting

13.1   The Boost-by-Majority Algorithm

13.2   Optimal Generalization Error

13.3   Relation to AdaBoost

Summary

Bibliographic Notes

Exercises

14   Boosting in Continuous Time

14.1   Adaptiveness in the Limit of Continuous Time

14.2   BrownBoost

14.3   AdaBoost as a Special Case of BrownBoost

14.4   Experiments with Noisy Data

Summary

Bibliographic Notes

Exercises

Appendix: Some Notation, Definitions, and Mathematical Background

A.1    General Notation

A.2    Norms

A.3    Maxima, Minima, Suprema, and Infima

A.4    Limits

A.5    Continuity, Closed Sets, and Compactness

A.6    Derivatives, Gradients, and Taylor’s Theorem

A.7    Convexity

A.8    The Method of Lagrange Multipliers

A.9    Some Distributions and the Central Limit Theorem

 

Bibliography

Index of Algorithms, Figures, and Tables

Subject and Author Index