

|
Institute for Computer Design and Fault Tolerance User-Transparent Fault-Tolerance Bibliography of Fault Tolerance related Neural Network literature Toward a Zero-Defect Culture |
The notion of fault tolerance (i.e. detection of error) and error
recovery are central to the development of dependable computing.
Reviewing the progress in computer reliability, we can consider
whether it is possible -- in theory -- to make the hardware of
a massive computer like HAL reliable, especially for a prolonged space
mission. Alas, the techniques for insuring reliable hardware are less effective than those techniques for software and, although progress in mechanizing fault-tolerant design and measuring fault behavior has been impressive, we do not yet know how to verify highly critical systems such as HAL.
As fault tolerance technology rapidly diffuses into such
continually-expanding areas as database and telecommunications
systems, and is driven by applications in banking systems, medical
systems, mobile computing and the ever-expanding Internet, it is clear
that in the future, most computing will be done, not on isolated
mechanisms at finite locations, but as a highly diffused, dynamic
activity in an all-pervasive information-processing
web. Fault-tolerant systems will need to be highly adaptive to
changing environments. In principle, however, there seems to be
nothing to prevent us from making a large computer that is
fault-tolerant.
|