The introduction of high-throughput methods has transformed biology into a data-rich science. Knowledge about biological entities and processes has traditionally been acquired by thousands of scientists through decades of experimentation and analysis. The current abundance of biomedical data is accompanied by the creation and quick dissemination of new information. Much of this information and knowledge, however, is represented only in text form--in the biomedical literature, lab notebooks, Web pages, and other sources.
Using the tools of information technology to understand the molecular machinery of the cell offers both challenges and opportunities to computational scientists. Over the past decade, novel algorithms have been developed both for analyzing biological data and for synthetic biology problems such as protein engineering.
Computational systems biology aims to develop algorithms that uncover the structure and parameterization of the underlying mechanistic model—in other words, to answer specific questions about the underlying mechanisms of a biological system—in a process that can be thought of as learning or inference. This volume offers state-of-the-art perspectives from computational biology, statistics, modeling, and machine learning on new methodologies for learning and inference in biological networks.
From one cell to another, from one individual to another, and from one species to another, the content of DNA molecules is often similar. The organization of these molecules, however, differs dramatically, and the mutations that affect this organization are known as genome rearrangements. Combinatorial methods are used to reconstruct putative rearrangement scenarios in order to explain the evolutionary history of a set of species, often formalizing the evolutionary events that can explain the multiple combinations of observed genomes as combinatorial optimization problems.
There are many excellent computational biology resources now available for learning about methods that have been developed to address specific biological systems, but comparatively little attention has been paid to training aspiring computational biologists to handle new and unanticipated problems. This text is intended to fill that gap by teaching students how to reason about developing formal mathematical models of biological systems that are amenable to computational analysis.
Recent advances in biotechnology, spurred by the Human Genome Project, have resulted in the accumulation of vast amounts of new data. Ontologies—computer-readable, precise formulations of concepts (and the relationship among them) in a given field—are a critical framework for coping with the exponential growth of valuable biological data generated by high-output technologies.
Despite the fact that advanced bioinformatics methodologies have not been used as extensively in immunology as in other subdisciplines within biology, research in immunological bioinformatics has already developed models of components of the immune system that can be combined and that may help develop therapies, vaccines, and diagnostic tools for such diseases as AIDS, malaria, and cancer.
This introductory text offers a clear exposition of the algorithmic principles driving advances in bioinformatics. Accessible to students in both biology and computer science, it strikes a unique balance between rigorous mathematics and practical techniques, emphasizing the ideas underlying algorithms rather than offering a collection of apparently unrelated problems.
Modern machine learning techniques are proving to be extremely valuable for the analysis of data in computational biology problems. One branch of machine learning, kernel methods, lends itself particularly well to the difficult aspects of biological data, which include high dimensionality (as in microarray measurements), representation as discrete and structured data (as in DNA or amino acid sequences), and the need to combine heterogeneous sources of information.
The advent of ever more sophisticated molecular manipulation techniques has made it clear that cellular systems are far more complex and dynamic than previously thought. At the same time, experimental techniques are providing an almost overwhelming amount of new data. It is increasingly apparent that linking molecular and cellular structure to function will require the use of new computational tools.
Functional genomics—the deconstruction of the genome to determine the biological function of genes and gene interactions—is one of the most fruitful new areas of biology. The growing use of DNA microarrays allows researchers to assess the expression of tens of thousands of genes at a time. This quantitative change has led to qualitative progress in our ability to understand regulatory processes at the cellular level.
As exciting as the new field of genomics is, it has not yet produced a basic conceptual change in biology. The fundamental problems remain: the origin of life, cell organization, the pathways of differentiation, aging, and the molecular and cellular capabilities of the brain. What has occurred is an explosion of molecular information obtained by genomic sequences, which will soon be followed by exhaustive catalogs of protein interactions and protein function. This wealth of information can be analyzed and manipulated only with the help of computers.
Computational molecular biology, or bioinformatics, draws on the disciplines of biology, mathematics, statistics, physics, chemistry, computer science, and engineering. It provides the computational support for functional genomics, which links the behavior of cells, organisms, and populations to the information encoded in the genomes, as well as for structural genomics. At the heart of all large-scale and high-throughput biotechnologies, it has a growing impact on health and medicine.
In one of the first major texts in the emerging field of computational molecular biology, Pavel Pevzner covers a broad range of algorithmic and combinatorial topics and shows how they are connected to molecular biology and to biotechnology. The book has a substantial "computational biology without formulas" component that presents the biological and computational ideas in a relatively simple manner. This makes the material accessible to computer scientists without biological training, as well as to biologists with limited background in computer science.