Spotting and Discovering Terms through Natural Language Processing
357 pp., 7 x 9 in, 71 illus.
- Published: April 27, 2001
Christian Jacquemin shows how the power of natural language processing (NLP) can be used to advance text indexing and information retrieval (IR).
In this book Christian Jacquemin shows how the power of natural language processing (NLP) can be used to advance text indexing and information retrieval (IR). Jacquemin's novel tool is FASTR, a parser that normalizes terms and recognizes term variants. Since there are more meanings in a language than there are words, FASTR uses a metagrammar composed of shallow linguistic transformations that describe the morphological, syntactic, semantic, and pragmatic variations of words and terms. The acquired parsed terms can then be applied for precise retrieval and assembly of information.
The use of a corpus-based unification grammar to define, recognize, and combine term variants from their base forms allows for intelligent information access to, or "linguistic data tuning" of, heterogeneous texts. FASTR can be used to do automatic controlled indexing, to carry out content-based Web searches through conceptually related alternative query formulations, to abstract scientific and technical extracts, and even to translate and collect terms from multilingual material. Jacquemin provides a comprehensive account of the method and implementation of this innovative retrieval technique for text processing.
This important and timely publication presents an efficient, accurate method for automated term variant extraction from technical and scientific texts. The book provides the first comprehensive account of work in this area.
Tomek Strzalkowski, Department of Computer Science, University at Albany, SUNY
People say the same things in different ways. This variation poses difficult problems for finding information in online text. This book answers many of these problems, providing a complete theoretical background and validated computational techniques.
Gregory Grefenstette, Principal Scientist, Xerox Research Centre Europe
Jacquemin has pursued the problem of lexical term variation for over a decade, and this book presents a detailed, no-nonsense description of his linguistically motivated but empirical approach. In part because of its excellent review of related work, this book is essential reading for those interested in computational approaches to lexical recognition and variation.
Marti Hearst, School of Information Management and Systems (SIMS), University of California Berkeley