| Series Foreword | xi |
|
| Preface | xiii |
|
| 1. | Introduction to Semi-Supervised Learning
Sample Chapter - Download PDF (211 KB) | 1 |
|
| 1.1 | Supervised, Unsupervised, and Semi-Supervised Learning | 1 |
|
| 1.2 | When Can Semi-Supervised Learning Work? | 4 |
|
| 1.3 | Classes of Algorithms and Organization of This Book | 8 |
|
| I. | Generative Models | 13 |
|
| 2. | A Taxonomy for Semi-Supervised Learning Methods
Matthias W. Seeger | 15 |
|
| 2.1 | The Semi-Supervised Learning Problem | 15 |
|
| 2.2 | Paradigms for Semi-Supervised Learning | 17 |
|
| 2.3 | Examples | 22 |
|
| 2.4 | Conclusions | 31 |
|
| 3. | Semi-Supervised Text Classification Using EM
N. C. Nigam, Andrew McCallum and Tom Mitchell | 33 |
|
| 3.1 | Introduction | 33 |
|
| 3.2 | A Generative Model for Text | 35 |
|
| 3.3 | Experminental Results with Basic EM | 41 |
|
| 3.4 | Using a More Expressive Generative Model | 43 |
|
| 3.5 | Overcoming the Challenges of Local Maxima | 49 |
|
| 3.6 | Conclusions and Summary | 54 |
|
| 4. | Risks of Semi-Supervised Learning
Fabio Cozman and Ira Cohen | 57 |
|
| 4.1 | Do Unlabled Data Improve or Degrade Classification Performance? | 57 |
|
| 4.2 | Understanding Unlabeled Data: Asymptotic Bias | 59 |
|
| 4.3 | The Asymptotic Analysis of Generative Smei-Supervised Learning | 63 |
|
| 4.4 | The Value of Labeled and Unlabeled Data | 67 |
|
| 4.5 | Finite Sample Effects | 69 |
|
| 4.6 | Model Search and Robustness | 70 |
|
| 4.7 | Conclusion | 71 |
|
| 5. | Probabilistic Semi-Supervised Cluster with Constraints
Sugato Basu, Mikhail Bilenko, Arindam Banerjee and Raymond J. Mooney | 73 |
|
| 5.1 | Introduction | 74 |
|
| 5.2 | HMRF Model for Semi-Supervised Clustering | 75 |
|
| 5.3 | HMRF-KMeans Algorithm | 81 |
|
| 5.4 | Active Learning for Constraint Acquistion | 93 |
|
| 5.5 | Experimental Results | 96 |
|
| 5.6 | Related Work | 100 |
|
| 5.7 | Conclusions | 101 |
|
| II. | Low-Density Separation | 103 |
|
| 6. | Transductive Support Vector Machines
Thorsten Joachims | 105 |
|
| 6.1 | Introduction | 105 |
|
| 6.2 | Transductive Support Vector Machines | 108 |
|
| 6.3 | Why Use Margin on the Test Set? | 111 |
|
| 6.4 | Experiments and Applications of the TSVMs | 112 |
|
| 6.5 | Solving the TSVM Optimization Problem | 114 |
|
| 6.6 | Connection to Related Approaches | 116 |
|
| 6.7 | Summary and Conclusions | 116 |
|
| 7. | Semi-Supervised Learning Using Semi-Definite Programming
Tijl De Bie and Nello Cristianini | 119 |
|
| 7.1 | Relaxing SVM transduction | 119 |
|
| 7.2 | An Approximation for Speedup | 126 |
|
| 7.3 | General Semi-Supervised Learning Settings | 128 |
|
| 7.4 | Empirical Results | 129 |
|
| 7.5 | Summary and Outlook | 133 |
|
| Appendix:
The Extended Schur Complement Lemma | 134 |
|
| 8. | Gaussian Processes and the Null-Category Noise Model
Neil D. Lawrence and Michael I. Jordan | 137 |
|
| 8.1 | Introduction | 137 |
|
| 8.2 | The Noise Model | 141 |
|
| 8.3 | Process Model and the Effect of the Null-Category | 143 |
|
| 8.4 | Posterior Inference and Prediction | 145 |
|
| 8.5 | Results | 147 |
|
| 8.6 | Discussion | 149 |
|
| 9. | Entropy Regularization
Yves Grandvalet and Yoshua Bengio | 151 |
|
| 9.1 | Introduction | 151 |
|
| 9.2 | Derivation of the Criterion | 152 |
|
| 9.3 | Optimization Algorithms | 155 |
|
| 9.4 | Related Methods | 158 |
|
| 9.5 | Experiments | 160 |
|
| 9.6 | Conclusion | 166 |
|
| Appendix
Proof of Theorem 9.1 | 166 |
|
| 10. | Data-Dependent Regularization
Adrian Corduneanu and Tommi S. Jaakkola | 169 |
|
| 10.1 | Introduction | 169 |
|
| 10.2 | Information Regularization on Metric Spaces | 174 |
|
| 10.3 | Information Regularization and Relational Data | 182 |
|
| 10.4 | Discussion | 189 |
|
| III. | Graph-Based Models | 191 |
|
| 11. | Label Propogation and Quadratic Criterion
Yoshua Bengio, Olivier Delalleau and Nicolas Le Roux | 193 |
|
| 11.1 | Introduction | 193 |
|
| 11.2 | Label Propogation on a Similarity Graph | 194 |
|
| 11.3 | Quadratic Cost Criterion | 198 |
|
| 11.4 | From Transduction to Induction | 205 |
|
| 11.5 | Incorporating Class Prior Knowledge | 205 |
|
| 11.6 | Curse of Dimensionality for Semi-Supervised Learning | 206 |
|
| 11.7 | Discussion | 215 |
|
| 12. | The Geometric Basis of Semi-Supervised Learning
Vikas Sindhwani, Misha Belkin and Partha Niyogi | 217 |
|
| 12.1 | Introduction | 217 |
|
| 12.2 | Incorporating Geometry in Regularization | 220 |
|
| 12.3 | Algorithms | 224 |
|
| 12.4 | Data-Dependent Kernels for Semi-Supervised Learning | 229 |
|
| 12.5 | Linear Methods for Large-Scale Semi-Supervised Learning | 231 |
|
| 12.6 | Connections to Other Algorithms and Related Work | 232 |
|
| 12.7 | Future Directions | 234 |
|
| 13. | Discrete Regularization
Dengyong Zhou and Bernhard Schölkopf | 237 |
|
| 13.1 | Introduction | 237 |
|
| 13.2 | Discrete Analysis | 239 |
|
| 13.3 | Discrete Regularization | 245 |
|
| 13.4 | Conclusion | 249 |
|
| 14. | Semi-Supervised Learning with Conditional Harmonic Mixing
Christopher J. C. Burges and John C. Platt | 251 |
|
| 14.1 | Introduction | 251 |
|
| 14.2 | Conditional Harmonic Mixing | 255 |
|
| 14.3 | Learning in CHM Models | 256 |
|
| 14.4 | Incorporating Prior Knowledge | 261 |
|
| 14.5 | Learning the Conditionals | 261 |
|
| 14.6 | Model Averaging | 262 |
|
| 14.7 | Experiments | 263 |
|
| 14.8 | Conclusions | 273 |
|
| IV. | Change of Representation | 275 |
|
| 15. | Graph Kernels by Spectral Transforms
Xiaojin Zhu, Jaz Kandola, John Lafferty and Zoubin Ghahramani | 277 |
|
| 15.1 | The Graph Laplacian | 278 |
|
| 15.2 | Kernels by Spectral Transforms | 280 |
|
| 15.3 | Kernel Alignment | 281 |
|
| 15.4 | Optimizing Alignment Using QCQP for Semi-Supervised Learning | 282 |
|
| 15.5 | Semi-Supervised Kernels with Order Restraints | 283 |
|
| 15.6 | Experimental Results | 285 |
|
| 15.7 | Conclusion | 289 |
|
| 16. | Spectral Methods for Dimensionality Reduction
Lawrence K. Saul, Kilian Weinberger, Fei Sha and Jihun Ham | 293 |
|
| 16.1 | Introduction | 293 |
|
| 16.2 | Linear Methods | 295 |
|
| 16.3 | Graph-Based Methods | 297 |
|
| 16.4 | Kernel Methods | 303 |
|
| 16.5 | Discussion | 306 |
|
| 17. | Modifying Distances
Alon Orlitsky and Sajama | 309 |
|
| 17.1 | Introduction | 309 |
|
| 17.2 | Estimating DBD Metrics | 312 |
|
| 17.3 | Computing DBD Metrics | 321 |
|
| 17.4 | Semi-Supervised Learning Using Density-Based Metrics | 327 |
|
| 17.5 | Conclusions and Future Work | 329 |
|
| V. | Semi-Supervised Learning in Practice | 331 |
|
| 18. | Large-Scale Algorithms
Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux | 333 |
|
| 18.1 | Introduction | 333 |
|
| 18.2 | Cost Approximations | 334 |
|
| 18.3 | Subset Selection | 337 |
|
| 18.4 | Discussion | 340 |
|
| 19. | Semi-Supervised Protein Classification Using Cluster Kernels
Jason Weston, Christina Leslie, Eugene Ie and William Stafford Noble | 343 |
|
| 19.1 | Introduction | 343 |
|
| 19.2 | Representation and Kernels for Protein Sequences | 345 |
|
| 19.3 | Semi-Supervised Kernels for Protein Sequences | 348 |
|
| 19.4 | Experiments | 352 |
|
| 19.5 | Discussion | 358 |
|
| 20. | Prediction of Protein Function from Networks
Hyunjung Shin and Koji Tsuda | 361 |
|
| 20.1 | Introduction | 361 |
|
| 20.2 | Graph-Based Semi-Supervised Learning | 364 |
|
| 20.3 | Combining Multiple Graphs | 366 |
|
| 20.4 | Experiments on Function Prediction of Proteins | 369 |
|
| 20.5 | Conclusion and Outlook | 374 |
|
| 21. | Analysis of Benchmarks | 377 |
|
| 21.1 | The Benchmark | 377 |
|
| 21.2 | Application of SSL Methods | 383 |
|
| 21.3 | Results and Discussion | 390 |
|
| VI. | Perspectives | 395 |
|
| 22. | An Augmented PAC Model for Semi-Supervised Learning
Maria-Florina Balcan and Avrim Blum | 397 |
|
| 22.1 | Introduction | 398 |
|
| 22.2 | A Formal Framework | 400 |
|
| 22.3 | Sample Complexity Results | 403 |
|
| 22.4 | Algorithmic Results | 412 |
|
| 22.5 | Related Models and Discussion | 416 |
|
| 23. | Metric-Based Approaches for Semi-Supervised Regression and Classification
Dale Schuurmans, Finnegan Southey, Dana Wilkinson and Yuhong Guo | 421 |
|
| 23.1 | Introduction | 421 |
|
| 23.2 | Metric Structure of Supervised Learning | 423 |
|
| 23.3 | Model Selection | 426 |
|
| 23.4 | Regularization | 436 |
|
| 23.5 | Classification | 445 |
|
| 23.6 | Conclusion | 449 |
|
| 24. | Transductive Inference and Semi-Supervised Learning
Vladimir Vapnik | 453 |
|
| 24.1 | Problem Settings | 453 |
|
| 24.2 | Problem of Generalization in Inductive and Transductive Inference | 455 |
|
| 24.3 | Structure of the VC Bounds and Transductive Inference | 457 |
|
| 24.4 | The Symmetrization Lemma and Transductive Inference | 458 |
|
| 24.5 | Bounds for Transductive Inference | 459 |
|
| 24.6 | The Structural Risk Minimization Principle for Induction and Transduction | 460 |
|
| 24.7 | Combinatorics in Transductive Inference | 462 |
|
| 24.8 | Measures of Size of Equivalence Classes | 463 |
|
| 24.9 | Algorithms for Inductive and Transductive SVMs | 465 |
|
| 24.10 | Semi-Supervised Learning | 470 |
|
| 24.11 | Conclusion:
Transductive Inference and the New Problems of Inference | 470 |
|
| 24.12 | Beyond Transduction: Selective Inference | 471 |
|
| 25. | A Discussion of Semi-Supervised Learning and Transduction | 473 |
|
| References | 479 |
|
| Notation and Symbols | 499 |
|
| Contributors | 503 |
|
| Index
Online Index | 509 |
|