Chapter 4











Institute for Computer Design and Fault Tolerance

User-Transparent Fault-Tolerance

Bibliography of Fault Tolerance related Neural Network literature

Toward a Zero-Defect Culture

The notion of fault tolerance (i.e. detection of error) and error recovery are central to the development of dependable computing. Reviewing the progress in computer reliability, we can consider whether it is possible -- in theory -- to make the hardware of a massive computer like HAL reliable, especially for a prolonged space mission.

Alas, the techniques for insuring reliable hardware are less effective than those techniques for software and, although progress in mechanizing fault-tolerant design and measuring fault behavior has been impressive, we do not yet know how to verify highly critical systems such as HAL.

As fault tolerance technology rapidly diffuses into such continually-expanding areas as database and telecommunications systems, and is driven by applications in banking systems, medical systems, mobile computing and the ever-expanding Internet, it is clear that in the future, most computing will be done, not on isolated mechanisms at finite locations, but as a highly diffused, dynamic activity in an all-pervasive information-processing web. Fault-tolerant systems will need to be highly adaptive to changing environments. In principle, however, there seems to be nothing to prevent us from making a large computer that is fault-tolerant.






further reading