Skip navigation

Scientific & Engineering Computation

  • Page 2 of 5
Portable Shared Memory Parallel Programming

"I hope that readers will learn to use the full expressibility and power of OpenMP. This book should provide an excellent introduction to beginners, and the performance section should help those with some experience who want to push OpenMP to its limits."
—from the foreword by David J. Kuck, Intel Fellow, Software and Solutions Group, and Director, Parallel and Distributed Solutions, Intel Corporation

OpenMP, a portable programming interface for shared memory parallel computers, was adopted as an informal standard in 1997 by computer scientists who wanted a unified model on which to base programs for shared memory systems. OpenMP is now used by many software developers; it offers significant advantages over both hand-threading and MPI. Using OpenMP offers a comprehensive introduction to parallel programming concepts and a detailed overview of OpenMP.

Using OpenMP discusses hardware developments, describes where OpenMP is applicable, and compares OpenMP to other programming interfaces for shared and distributed memory parallel architectures. It introduces the individual features of OpenMP, provides many source code examples that demonstrate the use and functionality of the language constructs, and offers tips on writing an efficient OpenMP program. It describes how to use OpenMP in full-scale applications to achieve high performance on large-scale architectures, discussing several case studies in detail, and offers in-depth troubleshooting advice. It explains how OpenMP is translated into explicitly multithreaded code, providing a valuable behind-the-scenes account of OpenMP program performance. Finally, Using OpenMP considers trends likely to influence OpenMP development, offering a glimpse of the possibilities of a future OpenMP 3.0 from the vantage point of the current OpenMP 2.5.

With multicore computer use increasing, the need for a comprehensive introduction and overview of the standard interface is clear. Using OpenMP provides an essential reference not only for students at both undergraduate and graduate levels but also for professionals who intend to parallelize existing codes or develop new parallel programs for shared memory computer architectures.

The minimum description length (MDL) principle is a powerful method of inductive inference, the basis of statistical modeling, pattern recognition, and machine learning. It holds that the best explanation, given a limited set of observed data, is the one that permits the greatest compression of the data. MDL methods are particularly well-suited for dealing with model selection, prediction, and estimation problems in situations where the models under consideration can be arbitrarily complex, and overfitting the data is a serious concern.

This extensive, step-by-step introduction to the MDL Principle provides a comprehensive reference (with an emphasis on conceptual issues) that is accessible to graduate students and researchers in statistics, pattern classification, machine learning, and data mining, to philosophers interested in the foundations of statistics, and to researchers in other applied sciences that involve model selection, including biology, econometrics, and experimental psychology. Part I provides a basic introduction to MDL and an overview of the concepts in statistics and information theory needed to understand MDL. Part II treats universal coding, the information-theoretic notion on which MDL is built, and part III gives a formal treatment of MDL theory as a theory of inductive inference based on universal coding. Part IV provides a comprehensive overview of the statistical theory of exponential families with an emphasis on their information-theoretic properties. The text includes a number of summaries, paragraphs offering the reader a "fast track" through the material, and boxes highlighting the most important concepts.

Use of Beowulf clusters (collections of off-the-shelf commodity computers programmed to act in concert, resulting in supercomputer performance at a fraction of the cost) has spread far and wide in the computational science community. Many application groups are assembling and operating their own "private supercomputers" rather than relying on centralized computing centers. Such clusters are used in climate modeling, computational biology, astrophysics, and materials science, as well as non-traditional areas such as financial modeling and entertainment. Much of this new popularity can be attributed to the growth of the open-source movement.

The second edition of Beowulf Cluster Computing with Linux has been completely updated; all three stand-alone sections have important new material. The introductory material in the first part now includes a new chapter giving an overview of the book and background on cluster-specific issues, including why and how to choose a cluster, as well as new chapters on cluster initialization systems (including ROCKS and OSCAR) and on network setup and tuning. The information on parallel programming in the second part now includes chapters on basic parallel programming and available libraries and programs for clusters. The third and largest part of the book, which describes software infrastructure and tools for managing cluster resources, has new material on cluster management and on the Scyld system.

Achieving System Balance
Edited by Daniel A. Reed

As we enter the "decade of data," the disparity between the vast amount of data storage capacity (measurable in terabytes and petabytes) and the bandwidth available for accessing it has created an input/output bottleneck that is proving to be a major constraint on the effective use of scientific data for research. Scalable Input/Output is a summary of the major research results of the Scalable I/O Initiative, launched by Paul Messina, then Director of the Center for Advanced Computing Research at the California Institute of Technology, to explore software and algorithmic solutions to the I/O imbalance. The contributors explore techniques for I/O optimization, including: I/O characterization to understand application and system I/O patterns; system checkpointing strategies; collective I/O and parallel database support for scientific applications; parallel I/O libraries and strategies for file striping, prefetching, and write behind; compilation strategies for out-of-core data access; scheduling and shared virtual memory alternatives; network support for low-latency data transfer; and parallel I/O application programming interfaces.

Randomization is an important tool in the design of algorithms, and the ability of randomization to provide enhanced power is a major research topic in complexity theory. Noam Nisan continues the investigation into the power of randomization and the relationships between randomized and deterministic complexity classes by pursuing the idea of emulating randomness, or pseudorandom generation.

Pseudorandom generators reduce the number of random bits required by randomized algorithms, enable the construction of certain cryptographic protocols, and shed light on the difficulty of simulating randomized algorithms by deterministic ones. The research described here deals with two methods of constructing pseudorandom generators from hard problems and demonstrates some surprising connections between pseudorandom generators and seemingly unrelated topics such as multiparty communication complexity and random oracles.

Nisan first establishes a precise connection between computational complexity and pseudorandom number generation, revealing that efficient deterministic simulation of randomized algorithms is possible under much weaker assumptions than was previously known, and bringing to light new consequences concerning the power of random oracles. Using a remarkable argument based on multiparty communication complexity, Nisan then constructs a generator that is good against all tests computable in logarithmic space. A consequence of this result is a new construction of universal traversal sequences.

Contents: Introduction. Hardness vs. Randomness. Pseudorandom Generators for Logspace and Multiparty Protocols.


There is increasing interest in genetic programming by both researchers and professional software developers. These twenty-two invited contributions show how a wide variety of problems across disciplines can be solved using this new paradigm.

Advances in Genetic Programming reports significant results in improving the power of genetic programming, presenting techniques that can be employed immediately in the solution of complex problems in many areas, including machine learning and the simulation of autonomous behavior. Popular languages such as C and C++ are used in many of the applications and experiments, illustrating how genetic programming is not restricted to symbolic computing languages such as LISP. Researchers interested in getting started in genetic programming will find information on how to begin, on what public domain code is available, and on how to become part of the active genetic programming community via electronic mail.

A major focus of the book is on improving the power of genetic programming. Experimental results are presented in a variety of areas, including adding memory to genetic programming, using locality and "demes" to maintain evolutionary diversity, avoiding the traps of local optima by using coevolution, using noise to increase generality, and limiting the size of evolved solutions to improve generality.

Significant theoretical results in the understanding of the processes underlying genetic programming are presented, as are several results in the area of automatic function definition. Performance increases are demonstrated by directly evolving machine code, and implementation and design issues for genetic programming in C++ are discussed.


Theory and Algorithms

Linear classifiers in kernel spaces have emerged as a major topic within the field of machine learning. The kernel technique takes the linear classifier--a limited, but well-established and comprehensively studied model--and extends its applicability to a wide range of nonlinear pattern-recognition tasks such as natural language processing, machine vision, and biological sequence analysis. This book provides the first comprehensive overview of both the theory and algorithms of kernel classifiers, including the most recent developments. It begins by describing the major algorithmic advances: kernel perceptron learning, kernel Fisher discriminants, support vector machines, relevance vector machines, Gaussian processes, and Bayes point machines. Then follows a detailed introduction to learning theory, including VC and PAC-Bayesian theory, data-dependent structural risk minimization, and compression bounds. Throughout, the book emphasizes the interaction between theory and algorithms: how learning algorithms work and why. The book includes many examples, complete pseudo code of the algorithms presented, and an extensive source code library.

Support Vector Machines, Regularization, Optimization, and Beyond

In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs—-kernels--for a number of learning tasks. Kernel machines provide a modular framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm. They are replacing neural networks in a variety of fields, including engineering, information retrieval, and bioinformatics.Learning with Kernels provides an introduction to SVMs and related kernel methods. Although the book begins with the basics, it also includes the latest research. It provides all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the powerful algorithms that have been developed over the last few years.

Edited by Thomas Sterling

Beowulf clusters, which exploit mass-market PC hardware and software in conjunction with cost-effective commercial network technology, are becoming the platform for many scientific, engineering, and commercial applications. With growing popularity has come growing complexity. Addressing that complexity, Beowulf Cluster Computing with Linux and Beowulf Cluster Computing with Windows provide system users and administrators with the tools they need to run the most advanced Beowulf clusters. The book is appearing in both Linux and Windows versions in order to reach the entire PC cluster community, which is divided into two distinct camps according to the node operating system.

Each book consists of three stand-alone parts. The first provides an introduction to the underlying hardware technology, assembly, and configuration. The second part offers a detailed presentation of the major parallel programming libraries. The third, and largest, part describes software infrastructures and tools for managing cluster resources. This includes some of the most popular of the software packages available for distributed task scheduling, as well as tools for monitoring and administering system resources and user accounts. Approximately 75% of the material in the two books is shared, with the other 25% pertaining to the specific operating system. Most of the chapters include text specific to the operating system. The Linux volume includes a discussion of parallel file systems.

Beowulf clusters, which exploit mass-market PC hardware and software in conjunction with cost-effective commercial network technology, are becoming the platform for many scientific, engineering, and commercial applications. With growing popularity has come growing complexity. Addressing that complexity, Beowulf Cluster Computing with Linux and Beowulf Cluster Computing with Windows provide system users and administrators with the tools they need to run the most advanced Beowulf clusters. The book is appearing in both Linux and Windows versions in order to reach the entire PC cluster community, which is divided into two distinct camps according to the node operating system.

Each book consists of three stand-alone parts. The first provides an introduction to the underlying hardware technology, assembly, and configuration. The second part offers a detailed presentation of the major parallel programming libraries. The third, and largest, part describes software infrastructures and tools for managing cluster resources. This includes some of the most popular of the software packages available for distributed task scheduling, as well as tools for monitoring and administering system resources and user accounts. Approximately 75% of the material in the two books is shared, with the other 25% pertaining to the specific operating system. Most of the chapters include text specific to the operating system. The Linux volume includes a discussion of parallel file systems.

  • Page 2 of 5