Information retrieval is the foundation for modern search engines. This textbook offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation. The emphasis is on implementation and experimentation; each chapter includes exercises and suggestions for student projects. Wumpus—a multiuser open-source information retrieval system developed by one of the authors and available online—provides model implementations and a basis for student work.
Category theory was invented in the 1940s to unify and synthesize different areas in mathematics, and it has proven remarkably successful in enabling powerful communication between disparate fields and subfields within mathematics. This book shows that category theory can be useful outside of mathematics as a rigorous, flexible, and coherent modeling language throughout the sciences.
Complex communicating computer systems—computers connected by data networks and in constant communication with their environments—do not always behave as expected. This book introduces behavioral modeling, a rigorous approach to behavioral specification and verification of concurrent and distributed systems. It is among the very few techniques capable of modeling systems interaction at a level of abstraction sufficient for the interaction to be understood and analyzed.
As the World Wide Web continues to expand, it becomes increasingly difficult for users to obtain information efficiently. Because most search engines read format languages such as HTML or SGML, search results reflect formatting tags more than actual page content, which is expressed in natural language.
Lessons from database research have been applied in academic fields ranging from bioinformatics to next-generation Internet architecture and in industrial uses including Web-based e-commerce and search engines. The core ideas in the field have become increasingly influential. This text provides both students and professionals with a grounding in database research and a technical context for understanding recent innovations in the field.
In Ontological Semantics, Sergei Nirenburg and Victor Raskin introduce a comprehensive approach to the treatment of text meaning by computer. Arguing that being able to use meaning is crucial to the success of natural language processing (NLP) applications, they depart from the ad hoc approach to meaning taken by much of the NLP community and propose theory-based semantic methods.
The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics.
The idea of knowledge bases lies at the heart of symbolic, or "traditional," artificial intelligence. A knowledge-based system decides how to act by running formal reasoning procedures over a body of explicitly represented knowledge—a knowledge base. The system is not programmed for specific tasks; rather, it is told what it needs to know and expected to infer the rest.
Until recently, information systems have been designed around different business functions, such as accounts payable and inventory control. Object-oriented modeling, in contrast, structures systems around the data--the objects--that make up the various business functions. Because information about a particular function is limited to one place--to the object--the system is shielded from the effects of change.
When an application is built, an underlying data model is chosen to make that application effective. Frequently, other applications need the same data, only modeled differently. The naïve solution of copying the underlying data and modeling is costly in terms of storage and makes data maintenance and evolution impossible. View mechanisms are a technique to model data differently for various applications without affecting the underlying format and structure of the data.