# Machine Learning and Adaptive Computation

The Internet gives us access to a wealth of information in languages we don’t understand. The investigation of automated or semi-automated approaches to translation has become a thriving research field with enormous commercial potential. This volume investigates how Machine Learning techniques can improve Statistical Machine Translation, currently at the forefront of research in the field. The book looks first at enabling technologies--technologies that solve problems that are not Machine Translation proper but are linked closely to the development of a Machine Translation system. These include the acquisition of bilingual sentence-aligned data from comparable corpora, automatic construction of multilingual name dictionaries, and word alignment. The book then presents new or improved statistical Machine Translation techniques, including a discriminative training framework for leveraging syntactic information, the use of semi-supervised and kernel-based learning methods, and the combination of multiple Machine Translation outputs in order to improve overall translation quality.ContributorsSrinivas Bangalore, Nicola Cancedda, Josep M. Crego, Marc Dymetman, Jakob Elming, George Foster, Jesús Giménez, Cyril Goutte, Nizar Habash, Gholamreza Haffari, Patrick Haffner, Hitoshi Isahara, Stephan Kanthak, Alexandre Klementiev, Gregor Leusch, Pierre Mahé, Lluís Màrquez, Evgeny Matusov, I. Dan Melamed, Ion Muslea, Hermann Ney, Bruno Pouliquen, Dan Roth, Anoop Sarkar, John Shawe-Taylor, Ralf Steinberger, Joseph Turian, Nicola Ueffing, Masao Utiyama, Zhuoran Wang, Benjamin Wellington, Kenji Yamada

Learning to perform complex action strategies is an important problem in the fields of artificial intelligence, robotics, and machine learning. Filled with interesting new experimental results, *Learning in Embedded Systems *explores algorithms that learn efficiently from trial-and error experience with an external world. It is the first detailed exploration of the problem of learning action strategies in the context of designing embedded systems that adapt their behavior to a complex, changing environment; such systems include mobile robots, factory process controllers, and long-term software databases.

Kaelbling investigates a rapidly expanding branch of machine learning known as reinforcement learning, including the important problems of controlled exploration of the environment, learning in highly complex environments, and learning from delayed reward. She reviews past work in this area and presents a number of significant new results. These include the intervalestimation algorithm for exploration, the use of biases to make learning more efficient in complex environments, a generate-and-test algorithm that combines symbolic and statistical processing into a flexible learning method, and some of the first reinforcement-learning experiments with a real robot.

Handling inherent uncertainty and exploiting compositional structure are fundamental to understanding and designing large-scale systems. Statistical relational learning builds on ideas from probability theory and statistics to address uncertainty while incorporating tools from logic, databases, and programming languages to represent structure. In *Introduction to Statistical Relational Learning,* leading researchers in this emerging area of machine learning describe current formalisms, models, and algorithms that enable effective and robust reasoning about richly structured systems and data.

The early chapters provide tutorials for material used in later chapters, offering introductions to representation, inference and learning in graphical models, and logic. The book then describes object-oriented approaches, including probabilistic relational models, relational Markov networks, and probabilistic entity-relationship models as well as logic-based formalisms including Bayesian logic programs, Markov logic, and stochastic logic programs. Later chapters discuss such topics as probabilistic models with unknown objects, relational dependency networks, reinforcement learning in relational domains, and information extraction.

By presenting a variety of approaches, the book highlights commonalities and clarifies important differences among proposed approaches and, along the way, identifies important representational and algorithmic issues. Numerous applications are provided throughout.

Pervasive and networked computers have dramatically reduced the cost of collecting and distributing large datasets. In this context, machine learning algorithms that scale poorly could simply become irrelevant. We need learning algorithms that scale linearly with the volume of the data while maintaining enough statistical efficiency to outperform algorithms that simply process a random subset of the data. This volume offers researchers and engineers practical solutions for learning from large scale datasets, with detailed descriptions of algorithms and experiments carried out on realistically large datasets. At the same time it offers researchers information that can address the relative lack of theoretical grounding for many useful algorithms. After a detailed description of state-of-the-art support vector machine technology, an introduction of the essential concepts discussed in the volume, and a comparison of primal and dual optimization techniques, the book progresses from well-understood techniques to more novel and controversial approaches. Many contributors have made their code and data available online for further experimentation. Topics covered include fast implementations of known algorithms, approximations that are amenable to theoretical guarantees, and algorithms that perform well in practice but are difficult to analyze theoretically.ContributorsLéon Bottou, Yoshua Bengio, Stéphane Canu, Eric Cosatto, Olivier Chapelle, Ronan Collobert, Dennis DeCoste, Ramani Duraiswami, Igor Durdanovic, Hans-Peter Graf, Arthur Gretton, Patrick Haffner, Stefanie Jegelka, Stephan Kanthak, S. Sathiya Keerthi, Yann LeCun, Chih-Jen Lin, Gaëlle Loosli, Joaquin Quiñonero-Candela, Carl Edward Rasmussen, Gunnar Rätsch, Vikas Chandrakant Raykar, Konrad Rieck, Vikas Sindhwani, Fabian Sinz, Sören Sonnenburg, Jason Weston, Christopher K. I. Williams, Elad Yom-TovLéon Bottou is a Research Scientist at NEC Labs America. Olivier Chapelle is with Yahoo! Research. He is editor of Semi-Supervised Learning (MIT Press, 2006). Dennis DeCoste is with Microsoft Research. Jason Weston is a Research Scientist at NEC Labs America.

Machine learning develops intelligent computer systems that are able to generalize from previously seen examples. A new domain of machine learning, in which the prediction must satisfy the additional constraints found in structured data, poses one of machine learning’s greatest challenges: learning functional dependencies between arbitrary input and output domains. This volume presents and analyzes the state of the art in machine learning algorithms and theory in this novel field. The contributors discuss applications as diverse as machine translation, document markup, computational biology, and information extraction, among others, providing a timely overview of an exciting field. Contributors Yasemin Altun, Gökhan Bakir [no dot over i], Olivier Bousquet, Sumit Chopra, Corinna Cortes, Hal Daumé III, Ofer Dekel, Zoubin Ghahramani, Raia Hadsell, Thomas Hofmann, Fu Jie Huang, Yann LeCun, Tobias Mann, Daniel Marcu, David McAllester, Mehryar Mohri, William Stafford Noble, Fernando Pérez-Cruz, Massimiliano Pontil, Marc’Aurelio Ranzato, Juho Rousu, Craig Saunders, Bernhard Schölkopf, Matthias W. Seeger, Shai Shalev-Shwartz, John Shawe-Taylor, Yoram Singer, Alexander J. Smola, Sandor Szedmak, Ben Taskar, Ioannis Tsochantaridis, S.V.N Vishwanathan, Jason Weston Gökhan Bakir [no dot over i] is Research Scientist at the Max Planck Institute for Biological Cybernetics in Tübingen, Germany. Thomas Hofmann is a Director of Engineering at Google’s Engineering Center in Zurich and Adjunct Associate Professor of Computer Science at Brown University. Bernhard Schölkopf is Director of the Max Planck Institute for Biological Cybernetics and Professor at the Technical University Berlin. Alexander J. Smola is Senior Principal Researcher and Machine Learning Program Leader at National ICT Australia/Australian National University, Canberra. Ben Taskar is Assistant Professor in the Computer and Information Science Department at the University of Pennsylvania. S. V. N. Vishwanathan is Senior Researcher in the Statistical Machine Learning Program, National ICT Australia with an adjunct appointment at the Research School for Information Sciences and Engineering, Australian National University.

The annual Neural Information Processing Systems (NIPS) conference is the flagship meeting on neural computation. It draws a diverse group of attendees—physicists, neuroscientists, mathematicians, statisticians, and computer scientists. The presentations are interdisciplinary, with contributions in algorithms, learning theory, cognitive science, neuroscience, brain imaging, vision, speech and signal processing, reinforcement learning and control, emerging technologies, and applications. Only twenty-five percent of the papers submitted are accepted for presentation at NIPS, so the quality is exceptionally high. This volume contains the papers presented at the December 2005 meeting, held in Vancouver.

Regression and classification methods based on similarity of the input to stored examples have not been widely used in applications involving very large sets of high-dimensional data. Recent advances in computational geometry and machine learning, however, may alleviate the problems in using these methods on large data sets. This volume presents theoretical and practical discussions of nearest-neighbor (NN) methods in machine learning and examines computer vision as an application domain in which the benefit of these advanced methods is often dramatic. It brings together contributions from researchers in theory of computation, machine learning, and computer vision with the goals of bridging the gaps between disciplines and presenting state-of-the-art methods for emerging applications.The contributors focus on the importance of designing algorithms for NN search, and for the related classification, regression, and retrieval tasks, that remain efficient even as the number of points or the dimensionality of the data grows very large. The book begins with two theoretical chapters on computational geometry and then explores ways to make the NN approach practicable in machine learning applications where the dimensionality of the data and the size of the data sets make the naïve methods for NN search prohibitively expensive. The final chapters describe successful applications of an NN algorithm, locality-sensitive hashing (LSH), to vision tasks.

Evolutionary computation, the use of evolutionary systems as computational processes for solving complex problems, is a tool used by computer scientists and engineers who want to harness the power of evolution to build useful new artifacts, by biologists interested in developing and testing better models of natural evolutionary systems, and by artificial life scientists for designing and implementing new artificial evolutionary worlds. In this clear and comprehensive introduction to the field, Kenneth De Jong presents an integrated view of the state of the art in evolutionary computation. Although other books have described such particular areas of the field as genetic algorithms, genetic programming, evolution strategies, and evolutionary programming, *Evolutionary Computation* is noteworthy for considering these systems as specific instances of a more general class of evolutionary algorithms. This useful overview of a fragmented field is suitable for classroom use or as a reference for computer scientists and engineers.

Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics.The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

The process of inductive inference—to infer general laws and principles from particular instances—is the basis of statistical modeling, pattern recognition, and machine learning. The Minimum Descriptive Length (MDL) principle, a powerful method of inductive inference, holds that the best explanation, given a limited set of observed data, is the one that permits the greatest compression of the data—that the more we are able to compress the data, the more we learn about the regularities underlying the data. *Advances in Minimum Description Length* is a sourcebook that will introduce the scientific community to the foundations of MDL, recent theoretical advances, and practical applications.

The book begins with an extensive tutorial on MDL, covering its theoretical underpinnings, practical implications as well as its various interpretations, and its underlying philosophy. The tutorial includes a brief history of MDL—from its roots in the notion of Kolmogorov complexity to the beginning of MDL proper. The book then presents recent theoretical advances, introducing modern MDL methods in a way that is accessible to readers from many different scientific fields. The book concludes with examples of how to apply MDL in research settings that range from bioinformatics and machine learning to psychology.