# Machine Learning and Adaptive Computation

The goal of structured prediction is to build machine learning models that predict relational information that itself has structure, such as being composed of multiple interrelated parts. These models, which reflect prior knowledge, task-specific relations, and constraints, are used in fields including computer vision, speech recognition, natural language processing, and computational biology. They can carry out such tasks as predicting a natural language sentence, or segmenting an image into meaningful components.

These models are expressive and powerful, but exact computation is often intractable. A broad research effort in recent years has aimed at designing structured prediction models and approximate inference and learning procedures that are computationally efficient. This volume offers an overview of this recent research in order to make the work accessible to a broader research community. The chapters, by leading researchers in the field, cover a range of topics, including research trends, the linear programming relaxation approach, innovations in probabilistic modeling, recent theoretical progress, and resource-aware learning.

Sebastian Nowozin is a Researcher in the Machine Learning and Perception group (MLP) at Microsoft Research, Cambridge, England. Peter V. Gehler is a Senior Researcher in the Perceiving Systems group at the Max Planck Institute for Intelligent Systems, Tübingen, Germany. Jeremy Jancsary is a Senior Research Scientist at Nuance Communications, Vienna. Christoph H. Lampert is Assistant Professor at the Institute of Science and Technology Austria, where he heads a group for Computer Vision and Machine Learning.

**Contributors **

Jonas Behr, Yutian Chen, Fernando De La Torre, Justin Domke, Peter V. Gehler, Andrew E. Gelfand, Sébastien Giguère, Amir Globerson, Fred A. Hamprecht, Minh Hoai, Tommi Jaakkola, Jeremy Jancsary, Joseph Keshet, Marius Kloft, Vladimir Kolmogorov, Christoph H. Lampert, François Laviolette, Xinghua Lou, Mario Marchand, André F. T. Martins, Ofer Meshi, Sebastian Nowozin, George Papandreou, Daniel Průša, Gunnar Rätsch, Amélie Rolland, Bogdan Savchynskyy, Stefan Schmidt, Thomas Schoenemann, Gabriele Schweikert, Ben Taskar, Sinisa Todorovic, Max Welling, David Weiss, Thomáš Werner, Alan Yuille, Stanislav Živný

This book offers a concise and accessible introduction to the emerging field of artificial cognitive systems. Cognition, both natural and artificial, is about anticipating the need for action and developing the capacity to predict the outcome of those actions. Drawing on artificial intelligence, developmental psychology, and cognitive neuroscience, the field of artificial cognitive systems has as its ultimate goal the creation of computer-based systems that can interact with humans and serve society in a variety of ways. This primer brings together recent work in cognitive science and cognitive robotics to offer readers a solid grounding on key issues.

The book first develops a working definition of cognitive systems—broad enough to encompass multiple views of the subject and deep enough to help in the formulation of theories and models. It surveys the cognitivist, emergent, and hybrid paradigms of cognitive science and discusses cognitive architectures derived from them. It then turns to the key issues, with chapters devoted to autonomy, embodiment, learning and development, memory and prospection, knowledge and representation, and social cognition. Ideas are introduced in an intuitive, natural order, with an emphasis on the relationships among ideas and building to an overview of the field. The main text is straightforward and succinct; sidenotes drill deeper on specific topics and provide contextual links to further reading.

Sparse modeling is a rapidly developing area at the intersection of statistical learning and signal processing, motivated by the age-old statistical problem of selecting a small number of predictive variables in high-dimensional datasets. This collection describes key approaches in sparse modeling, focusing on its applications in fields including neuroscience, computational biology, and computer vision.

Sparse modeling methods can improve the interpretability of predictive models and aid efficient recovery of high-dimensional unobserved signals from a limited number of measurements. Yet despite significant advances in the field, a number of open issues remain when sparse modeling meets real-life applications. The book discusses a range of practical applications and state-of-the-art approaches for tackling the challenges presented by these applications. Topics considered include the choice of method in genomics applications; analysis of protein mass-spectrometry data; the stability of sparse models in brain imaging applications; sequential testing approaches; algorithmic aspects of sparse recovery; and learning sparse latent models.**Contributors**A. Vania Apkarian, Marwan Baliki, Melissa K. Carroll, Guillermo A. Cecchi, Volkan Cevher, Xi Chen, Nathan W. Churchill, Rémi Emonet, Rahul Garg, Zoubin Ghahramani, Lars Kai Hansen, Matthias Hein, Katherine Heller, Sina Jafarpour, Seyoung Kim, Mladen Kolar, Anastasios Kyrillidis, Aurelie Lozano, Matthew L. Malloy, Pablo Meyer, Shakir Mohamed, Alexandru Niculescu-Mizil, Robert D. Nowak, Jean-Marc Odobez, Peter M. Rasmussen, Irina Rish, Saharon Rosset, Martin Slawski, Stephen C. Strother, Jagannadan Varadarajan, Eric P. Xing

The goal of machine learning is to program computers to use example data or past experience to solve a given problem. Many successful applications of machine learning exist already, including systems that analyze past sales data to predict customer behavior, optimize robot behavior so that a task can be completed using minimum resources, and extract knowledge from bioinformatics data.* Introduction to Machine Learnin*g is a comprehensive textbook on the subject, covering a broad array of topics not usually included in introductory machine learning texts. Subjects include supervised learning; Bayesian decision theory; parametric, semi-parametric, and nonparametric methods; multivariate analysis; hidden Markov models; reinforcement learning; kernel machines; graphical models; Bayesian estimation; and statistical testing.

Machine learning is rapidly becoming a skill that computer science students must master before graduation. The third edition of *Introduction to Machine Learning* reflects this shift, with added support for beginners, including selected solutions for exercises and additional example data sets (with code available online). Other substantial changes include discussions of outlier detection; ranking algorithms for perceptrons and support vector machines; matrix decomposition and spectral methods; distance estimation; new kernel algorithms; deep learning in multilayered perceptrons; and the nonparametric approach to Bayesian methods. All learning algorithms are explained so that students can easily move from the equations in the book to a computer program. The book can be used by both advanced undergraduates and graduate students. It will also be of interest to professionals who are concerned with the application of machine learning methods.

**Downloadable instructor resources available for this title: solution manual, programs, lecture slides, and file of figures in the book**

Boosting is an approach to machine learning based on the idea of creating a highly accurate predictor by combining many weak and inaccurate “rules of thumb.” A remarkably rich theory has evolved around boosting, with connections to a range of topics, including statistics, game theory, convex optimization, and information geometry. Boosting algorithms have also enjoyed practical success in such fields as biology, vision, and speech processing. At various times in its history, boosting has been perceived as mysterious, controversial, even paradoxical.

This book, written by the inventors of the method, brings together, organizes, simplifies, and substantially extends two decades of research on boosting, presenting both theory and applications in a way that is accessible to readers from diverse backgrounds while also providing an authoritative reference for advanced researchers. With its introductory treatment of all material and its inclusion of exercises in every chapter, the book is appropriate for course use as well.

The book begins with a general introduction to machine learning algorithms and their analysis; then explores the core theory of boosting, especially its ability to generalize; examines some of the myriad other theoretical viewpoints that help to explain and understand boosting; provides practical extensions of boosting for more complex learning problems; and finally presents a number of advanced theoretical topics. Numerous applications and practical illustrations are offered throughout.

Complex adaptive systems (cas), including ecosystems, governments, biological cells, and markets, are characterized by intricate hierarchical arrangements of boundaries and signals. In ecosystems, for example, niches act as semi-permeable boundaries, and smells and visual patterns serve as signals; governments have departmental hierarchies with memoranda acting as signals; and so it is with other cas. Despite a wealth of data and descriptions concerning different cas, there remain many unanswered questions about "steering" these systems. In *Signals and Boundaries*, John Holland argues that understanding the origin of the intricate signal/border hierarchies of these systems is the key to answering such questions. He develops an overarching framework for comparing and steering cas through the mechanisms that generate their signal/boundary hierarchies.

Holland lays out a path for developing the framework that emphasizes agents, niches, theory, and mathematical models. He discusses, among other topics, theory construction; signal-processing agents; networks as representations of signal/boundary interaction; adaptation; recombination and reproduction; the use of tagged urn models (adapted from elementary probability theory) to represent boundary hierarchies; finitely generated systems as a way to tie the models examined into a single framework; the framework itself, illustrated by a simple finitely generated version of the development of a multi-celled organism; and Markov processes.

Today’s Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach.

The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

**Downloadable instructor resources available for this title: instuctor's manual and file of figures in the book**

This graduate-level textbook introduces fundamental concepts and methods in machine learning. It describes several important modern algorithms, provides the theoretical underpinnings of these algorithms, and illustrates key aspects for their application. The authors aim to present novel theoretical tools and concepts while giving concise proofs even for relatively advanced topics.

*Foundations of Machine Learning* fills the need for a general textbook that also offers theoretical details and an emphasis on proofs. Certain topics that are often treated with insufficient attention are discussed in more detail here; for example, entire chapters are devoted to regression, multi-class classification, and ranking. The first three chapters lay the theoretical foundation for what follows, but each remaining chapter is mostly self-contained. The appendix offers a concise probability review, a short introduction to convex optimization, tools for concentration bounds, and several basic properties of matrices and norms used in the book.

The book is intended for graduate students and researchers in machine learning, statistics, and related areas; it can be used either as a textbook or as a reference text for a research seminar.

**Downloadable instructor resources available for this title: solution manual**

As the power of computing has grown over the past few decades, the field of machine learning has advanced rapidly in both theory and practice. Machine learning methods are usually based on the assumption that the data generation mechanism does not change over time. Yet real-world applications of machine learning, including image recognition, natural language processing, speech recognition, robot control, and bioinformatics, often violate this common assumption. Dealing with non-stationarity is one of modern machine learning’s greatest challenges. This book focuses on a specific non-stationary environment known as covariate shift, in which the distributions of inputs (queries) change but the conditional distribution of outputs (answers) is unchanged, and presents machine learning theory, algorithms, and applications to overcome this variety of non-stationarity.

After reviewing the state-of-the-art research in the field, the authors discuss topics that include learning under covariate shift, model selection, importance estimation, and active learning. They describe such real world applications of covariate shift adaption as brain-computer interface, speaker identification, and age prediction from facial images. With this book, they aim to encourage future research in machine learning, statistics, and engineering that strives to create truly autonomous learning machines able to learn under non-stationarity.

The interplay between optimization and machine learning is one of the most important developments in modern computational science. Optimization formulations and methods are proving to be vital in designing algorithms to extract essential knowledge from huge volumes of data. Machine learning, however, is not simply a consumer of optimization technology but a rapidly evolving field that is itself generating new optimization ideas. This book captures the state of the art of the interaction between optimization and machine learning in a way that is accessible to researchers in both fields.

Optimization approaches have enjoyed prominence in machine learning because of their wide applicability and attractive theoretical properties. The increasing complexity, size, and variety of today’s machine learning models call for the reassessment of existing assumptions. This book starts the process of reassessment. It describes the resurgence in novel contexts of established frameworks such as first-order methods, stochastic approximations, convex relaxations, interior-point methods, and proximal methods. It also devotes attention to newer themes such as regularized optimization, robust optimization, gradient and subgradient methods, splitting techniques, and second-order methods. Many of these techniques draw inspiration from other fields, including operations research, theoretical computer science, and subfields of optimization. The book will enrich the ongoing cross-fertilization between the machine learning community and these other fields, and within the broader optimization community.