The introduction of high-throughput methods has transformed biology into a data-rich science. Knowledge about biological entities and processes has traditionally been acquired by thousands of scientists through decades of experimentation and analysis. The current abundance of biomedical data is accompanied by the creation and quick dissemination of new information. Much of this information and knowledge, however, is represented only in text form--in the biomedical literature, lab notebooks, Web pages, and other sources. Researchers’ need to find relevant information in the vast amounts of text has created a surge of interest in automated text-analysis.
In this book, Hagit Shatkay and Mark Craven offer a concise and accessible introduction to key ideas in biomedical text mining. The chapters cover such topics as the relevant sources of biomedical text; text-analysis methods in natural language processing; the tasks of information extraction, information retrieval, and text categorization; and methods for empirically assessing text-mining systems. Finally, the authors describe several applications that recognize entities in text and link them to other entities and data resources, support the curation of structured databases, and make use of text to enable further prediction and discovery.
About the Authors
Hagit Shatkay is Associate Professor in the Department of Computer and Information Sciences and Head of the Computational Biomedicine Lab at the University of Delaware.
Mark Craven is Professor in the Department of Biostatistics and Medical Informatics and in the Department of Computer Sciences at the University of Wisconsin.
"The explosive growth of the biomedical literature and the breakdown of disciplinary boundaries in biomedical research make text mining an indispensable part of modern molecular biomedicine. Mining the Biomedical Literature clearly introduces the key ideas and applications to computational biologists looking to get started in this exciting field."
--Lawrence Hunter, University of Colorado School of Medicin"—
“This book provides a lucid and accessible exposition of the most important topics in biomedical text mining and related areas of information retrieval, information extraction, and machine learning. Readers will enjoy well-chosen examples of biomedical applications and textual snippets, as well as a balanced treatment of diverse computational techniques used today. The book even provides the most important competition venues for text-mining systems.”
--Andrey Rzhetsky, Professor of Medicine and Human Genetics, Computation Institute and Institute for Genomics and Systems Biology, University of Chicago"—