
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 |
The test involves an acoustic matching process, but the hypothesis has
nothing to do with sound at all -- nor even with language -- but
rather relates to knowledge on a multiplicity of levels. As many of
the chapters in this book point out, knowledge goes far beyond mere
facts and data. For information to become knowledge, it must
incorporate the relationships between ideas. And for knowledge to be
useful, the links describing how concepts interact must be easily
accessed, updated, and manipulated. Human intelligence is remarkable
in its ability to perform all these tasks. Ironically, it is also
remarkably weak at reliably storing the information on which knowledge
is based. The natural strengths of today's computers are roughly the
opposite. They have, therefore, become powerful allies of the human
intellect because of their ability to reliably store and rapidly
retrieve vast quantities of information. Conversely, they have been
slow to master true knowledge. Modeling the knowledge needed to
understand the highly ambiguous and variable phenomenon of human
speech has been a primary key to making progress in the field of
automatic speech recognition (ASR).
Lesson 1: Knowledge Is a Many -- layered Thing Thus lesson number one for constructing a computer system that can understand human speech is to build ;in knowledge at many levels: the structure of speech sounds, the way speech is produced by our vocal apparatus, the patterns of speech sounds that comprise dialects and languages, the complex (and not fully understood) rules of word usage, and the -- greatest difficulty -- general knowledge of the subject matter being spoken about. Each level of analysis provides useful constraints that can limit our search for the right answer. For example, the basic building blocks of speech called phonemes cannot appear in just any order. Indeed, many sequences are impossible to articulate (try saying ptkee). More important, only certain phoneme sequences correspond to a word or word fragment in the language. Although the set of phonemes used is similar (although not identical) from one language to another, contextual factors differ dramatically. English, for example, has over ten thousand possible syllables, whereas Japanese has only a hundred and twenty. On a higher level, the syntax and semantics of the language put constraints on possible word orders. Resolving homonym ambiguities can require multiple levels of knowledge. One type of technology frequently used in speech recognition and understanding systems is a sentence parser, which builds sentence diagrams like those we learned in elementary school (see figure 7.1). One of the first such systems, developed in 1963 by Susumu Kuno of Harvard (around the time Kubrick and Clarke began work on 2001), revealed the depth of ambiguity in English. Kuno asked his computerized parser what the sentence "Time flies like an arrow" means. In what has become a famous response, the computer replied that it was not quite sure. It might mean 1. That time passes as quickly as an arrow passes. 2. Or maybe it is a command telling us to time the flies the same way that an arrow times flies; that is, Time flies like an arrow would. 3. Or it could be a command telling us to time only those flies that are similar to arrows; that is, Time flies that are like an arrow. 4. Or perhaps it means that a type of flies known as time flies have a fondness for arrows: Time -- flies like (i.e., appreciate) an arrow."
It became clear from this and other syntactical ambiguities that
understanding language, spoken or written, requires both knowledge of
the relationships between words and of the concepts underlying
words. It is impossible to understand the sentence about time (or even
to understand that the sentence is indeed talking about time and not
flies) without mastery of the knowledge structures that represent what
we know about time, flies, arrows, and how these concepts relate to
one another.
|