Skip navigation

Computational Molecular Biology

  • Page 1 of 2
Marshall Nirenberg and the Discovery of the Genetic Code

The genetic code is the Rosetta Stone by which we interpret the 3.3 billion letters of human DNA, the alphabet of life, and the discovery of the code has had an immeasurable impact on science and society. In 1968, Marshall Nirenberg, an unassuming government scientist working at the National Institutes of Health, shared the Nobel Prize for cracking the genetic code. He was the least likely man to make such an earth-shaking discovery, and yet he had gotten there before such members of the scientific elite as James Watson and Francis Crick. How did Nirenberg do it, and why is he so little known? In The Least Likely Man, Franklin Portugal tells the fascinating life story of a famous scientist that most of us have never heard of.

Nirenberg did not have a particularly brilliant undergraduate or graduate career. After being hired as a researcher at the NIH, he quietly explored how cells make proteins. Meanwhile, Watson, Crick, and eighteen other leading scientists had formed the “RNA Tie Club” (named after the distinctive ties they wore, each decorated with one of twenty amino acid designs), intending to claim credit for the discovery of the genetic code before they had even worked out the details. They were surprised, and displeased, when Nirenberg announced his preliminary findings of a genetic code at an international meeting in Moscow in 1961.

Drawing on Nirenberg’s “lab diaries,” Portugal offers an engaging and accessible account of Nirenberg’s experimental approach, describes counterclaims by Crick, Watson, and Sidney Brenner, and traces Nirenberg’s later switch to an entirely new, even more challenging field. Having won the Nobel for his work on the genetic code, Nirenberg moved on to the next frontier of biological research: how the brain works.

The goal of structured prediction is to build machine learning models that predict relational information that itself has structure, such as being composed of multiple interrelated parts. These models, which reflect prior knowledge, task-specific relations, and constraints, are used in fields including computer vision, speech recognition, natural language processing, and computational biology. They can carry out such tasks as predicting a natural language sentence, or segmenting an image into meaningful components.

These models are expressive and powerful, but exact computation is often intractable. A broad research effort in recent years has aimed at designing structured prediction models and approximate inference and learning procedures that are computationally efficient. This volume offers an overview of this recent research in order to make the work accessible to a broader research community. The chapters, by leading researchers in the field, cover a range of topics, including research trends, the linear programming relaxation approach, innovations in probabilistic modeling, recent theoretical progress, and resource-aware learning.

Sebastian Nowozin is a Researcher in the Machine Learning and Perception group (MLP) at Microsoft Research, Cambridge, England. Peter V. Gehler is a Senior Researcher in the Perceiving Systems group at the Max Planck Institute for Intelligent Systems, Tübingen, Germany. Jeremy Jancsary is a Senior Research Scientist at Nuance Communications, Vienna. Christoph H. Lampert is Assistant Professor at the Institute of Science and Technology Austria, where he heads a group for Computer Vision and Machine Learning.

Jonas Behr, Yutian Chen, Fernando De La Torre, Justin Domke, Peter V. Gehler, Andrew E. Gelfand, Sébastien Giguère, Amir Globerson, Fred A. Hamprecht, Minh Hoai, Tommi Jaakkola, Jeremy Jancsary, Joseph Keshet, Marius Kloft, Vladimir Kolmogorov, Christoph H. Lampert, François Laviolette, Xinghua Lou, Mario Marchand, André F. T. Martins, Ofer Meshi, Sebastian Nowozin, George Papandreou, Daniel Průša, Gunnar Rätsch, Amélie Rolland, Bogdan Savchynskyy, Stefan Schmidt, Thomas Schoenemann, Gabriele Schweikert, Ben Taskar, Sinisa Todorovic, Max Welling, David Weiss, Thomáš Werner, Alan Yuille, Stanislav Živný

Sparse modeling is a rapidly developing area at the intersection of statistical learning and signal processing, motivated by the age-old statistical problem of selecting a small number of predictive variables in high-dimensional datasets. This collection describes key approaches in sparse modeling, focusing on its applications in fields including neuroscience, computational biology, and computer vision.

Sparse modeling methods can improve the interpretability of predictive models and aid efficient recovery of high-dimensional unobserved signals from a limited number of measurements. Yet despite significant advances in the field, a number of open issues remain when sparse modeling meets real-life applications. The book discusses a range of practical applications and state-of-the-art approaches for tackling the challenges presented by these applications. Topics considered include the choice of method in genomics applications; analysis of protein mass-spectrometry data; the stability of sparse models in brain imaging applications; sequential testing approaches; algorithmic aspects of sparse recovery; and learning sparse latent models.

A. Vania Apkarian, Marwan Baliki, Melissa K. Carroll, Guillermo A. Cecchi, Volkan Cevher, Xi Chen, Nathan W. Churchill, Rémi Emonet, Rahul Garg, Zoubin Ghahramani, Lars Kai Hansen, Matthias Hein, Katherine Heller, Sina Jafarpour, Seyoung Kim, Mladen Kolar, Anastasios Kyrillidis, Aurelie Lozano, Matthew L. Malloy, Pablo Meyer, Shakir Mohamed, Alexandru Niculescu-Mizil, Robert D. Nowak, Jean-Marc Odobez, Peter M. Rasmussen, Irina Rish, Saharon Rosset, Martin Slawski, Stephen C. Strother, Jagannadan Varadarajan, Eric P. Xing

The Algorithmics of Ancestral Recombination Graphs and Explicit Phylogenetic Networks

In this book, Dan Gusfield examines combinatorial algorithms to construct genealogical and exact phylogenetic networks, particularly ancestral recombination graphs (ARGs). The algorithms produce networks (or information about networks) that serve as hypotheses about the true genealogical history of observed biological sequences and can be applied to practical biological problems.

Phylogenetic trees have been the traditional means to represent evolutionary history, but there is a growing realization that networks rather than trees are often needed, most notably for recent human history. This has led to the development of ARGs in population genetics and, more broadly, to phylogenetic networks. ReCombinatorics offers an in-depth, rigorous examination of current research on the combinatorial, graph-theoretic structure of ARGs and explicit phylogenetic networks, and algorithms to reconstruct or deduce information about those networks.

ReCombinatorics, a groundbreaking contribution to the emerging field of phylogenetic networks, connects and unifies topics in population genetics and phylogenetics that have traditionally been discussed separately and considered to be unrelated. It covers the necessary combinatorial and algorithmic background material; the various biological phenomena; the mathematical, population genetic, and phylogenetic models that capture the essential elements of these phenomena; the combinatorial and algorithmic problems that derive from these models; the theoretical results that have been obtained; related software that has been developed; and some empirical testing of the software on simulated and real biological data.

An Introduction

Systems techniques are integral to current research in molecular cell biology, and system-level investigations are often accompanied by mathematical models. These models serve as working hypotheses: they help us to understand and predict the behavior of complex systems. This book offers an introduction to mathematical concepts and techniques needed for the construction and interpretation of models in molecular systems biology. It is accessible to upper-level undergraduate or graduate students in life science or engineering who have some familiarity with calculus, and will be a useful reference for researchers at all levels.

The first four chapters cover the basics of mathematical modeling in molecular systems biology. The last four chapters address specific biological domains, treating modeling of metabolic networks, of signal transduction pathways, of gene regulatory networks, and of electrophysiology and neuronal action potentials. Chapters 3–8 end with optional sections that address more specialized modeling topics. Exercises, solvable with pen-and-paper calculations, appear throughout the text to encourage interaction with the mathematical techniques. More involved end-of-chapter problem sets require computational software. Appendixes provide a review of basic concepts of molecular biology, additional mathematical background material, and tutorials for two computational software packages (XPPAUT and MATLAB) that can be used for model simulation and analysis.

The introduction of high-throughput methods has transformed biology into a data-rich science. Knowledge about biological entities and processes has traditionally been acquired by thousands of scientists through decades of experimentation and analysis. The current abundance of biomedical data is accompanied by the creation and quick dissemination of new information. Much of this information and knowledge, however, is represented only in text form--in the biomedical literature, lab notebooks, Web pages, and other sources. Researchers’ need to find relevant information in the vast amounts of text has created a surge of interest in automated text-analysis.

In this book, Hagit Shatkay and Mark Craven offer a concise and accessible introduction to key ideas in biomedical text mining. The chapters cover such topics as the relevant sources of biomedical text; text-analysis methods in natural language processing; the tasks of information extraction, information retrieval, and text categorization; and methods for empirically assessing text-mining systems. Finally, the authors describe several applications that recognize entities in text and link them to other entities and data resources, support the curation of structured databases, and make use of text to enable further prediction and discovery.

An Introduction to Molecular Biology

Recent research in molecular biology has produced a remarkably detailed understanding of how living things operate. Becoming conversant with the intricacies of molecular biology and its extensive technical vocabulary can be a challenge, though, as introductory materials often seem more like a barrier than an invitation to the study of life. This text offers a concise and accessible introduction to molecular biology, requiring no previous background in science, aimed at students and professionals in fields ranging from engineering to journalism--anyone who wants to get a foothold in this rapidly expanding field. It will be particularly useful for computer scientists exploring computational biology. A reader who has mastered the information in The Processes of Life is ready to move on to more complex material in almost any area of contemporary biology.

Using the tools of information technology to understand the molecular machinery of the cell offers both challenges and opportunities to computational scientists. Over the past decade, novel algorithms have been developed both for analyzing biological data and for synthetic biology problems such as protein engineering. This book explains the algorithmic foundations and computational approaches underlying areas of structural biology including NMR (nuclear magnetic resonance); X-ray crystallography; and the design and analysis of proteins, peptides, and small molecules.

Each chapter offers a concise overview of important concepts, focusing on a key topic in the field. Four chapters offer a short course in algorithmic and computational issues related to NMR structural biology, giving the reader a useful toolkit with which to approach the fascinating yet thorny computational problems in this area. A recurrent theme is understanding the interplay between biophysical experiments and computational algorithms. The text emphasizes the mathematical foundations of structural biology while maintaining a balance between algorithms and a nuanced understanding of experimental data. Three emerging areas, particularly fertile ground for research students, are highlighted: NMR methodology, design of proteins and other molecules, and the modeling of protein flexibility.

The next generation of computational structural biologists will need training in geometric algorithms, provably good approximation algorithms, scientific computation, and an array of techniques for handling noise and uncertainty in combinatorial geometry and computational biophysics. This book is an essential guide for young scientists on their way to research success in this exciting field.

Contemporary Methods and Applications

Biomedical signal analysis has become one of the most important visualization and interpretation methods in biology and medicine. Many new and powerful instruments for detecting, storing, transmitting, analyzing, and displaying images have been developed in recent years, allowing scientists and physicians to obtain quantitative measurements to support scientific hypotheses and medical diagnoses. This book offers an overview of a range of proven and new methods, discussing both theoretical and practical aspects of biomedical signal analysis and interpretation.After an introduction to the topic and a survey of several processing and imaging techniques, the book describes a broad range of methods, including continuous and discrete Fourier transforms, independent component analysis (ICA), dependent component analysis, neural networks, and fuzzy logic methods. The book then discusses applications of these theoretical tools to practical problems in everyday biosignal processing, considering such subjects as exploratory data analysis and low-frequency connectivity analysis in fMRI, MRI signal processing including lesion detection in breast MRI, dynamic cerebral contrast-enhanced perfusion MRI, skin lesion classification, and microscopic slice image processing and automatic labeling. Biomedical Signal Analysis can be used as a text or professional reference. Part I, on methods, forms a self-contained text, with exercises and other learning aids, for upper-level undergraduate or graduate-level students. Researchers or graduate students in systems biology, genomic signal processing, and computer-assisted radiology will find both parts I and II (on applications) a valuable handbook.

In the field of machine learning, semi-supervised learning (SSL) occupies the middle ground, between supervised learning (in which all training examples are labeled) and unsupervised learning (in which no label data are given). Interest in SSL has increased in recent years, particularly because of application domains in which unlabeled data are plentiful, such as images, text, and bioinformatics. This first comprehensive overview of SSL presents state-of-the-art algorithms, a taxonomy of the field, selected applications, benchmark experiments, and perspectives on ongoing and future research.Semi-Supervised Learning first presents the key assumptions and ideas underlying the field: smoothness, cluster or low-density separation, manifold structure, and transduction. The core of the book is the presentation of SSL methods, organized according to algorithmic strategies. After an examination of generative models, the book describes algorithms that implement the low-density separation assumption, graph-based methods, and algorithms that perform two-step learning. The book then discusses SSL applications and offers guidelines for SSL practitioners by analyzing the results of extensive benchmark experiments. Finally, the book looks at interesting directions for SSL research. The book closes with a discussion of the relationship between semi-supervised learning and transduction.Olivier Chapelle and Alexander Zien are Research Scientists and Bernhard Schölkopf is Professor and Director at the Max Planck Institute for Biological Cybernetics in Tübingen. Schölkopf is coauthor of Learning with Kernels (MIT Press, 2002) and is a coeditor of Advances in Kernel Methods: Support Vector Learning (1998), Advances in Large-Margin Classifiers (2000), and Kernel Methods in Computational Biology (2004), all published by The MIT Press.

  • Page 1 of 2