Chapter 7



01  02  03  04  05  

06  07  08  09  10  

11  12  13  14  15  

16  17  





Playing HAL

To demonstrate today's state of the art in computer speech recognition, we fed in some of the sound track of 2001 into the Kurzweil VOICE for Windows version 2.0 (KV/Win 2.0). KV/Win 2.0 is capable of understanding the speech of a person it has not heard speak before and can recognize a vocabulary of up to sixty thousand words (forty thousand in its initial vocabulary with the ability to add another twenty thousand). The primary limitation of today's technology is that it can only handle discrete speech -- that is, words or brief phrases (such as thank you) spoken with brief pauses in between. I played the following dialogue to KV/Win 2.0 with a view to learning whether it could understand Dave as HAL does in the movie:

HAL: Good evening, Dave.

Dave: How you doing, HAL?

HAL: Everything is running smoothly; and you?

Dave: Oh, not too bad.

HAL: Have you been doing some more work?

Dave: Just a few sketches.

HAL: May I see them?

Dave: Sure.

HAL: That's a very nice rendering, Dave. I think you've improved a great deal. Can you hold it a bit closer?

Dave: Sure.

HAL: That's Dr. Hunter, isn't it?

Dave: Hm hmm.

HAL: By the way, do you mind if I ask you a personal question?

Dave: No, not at all.

I trained the system on the phrases "Oh, not too bad" and "No, not at all," but did not train it on Dave's voice. When I did the experiment, KV/Win 2.0 had never heard Dave's voice, and it had to pick out each word or phrase from among forty thousand possibilities. I had the system listen to Dave saying the following discrete words and phrases from the above dialogue:

Dave: Oh, not too bad.

Dave: Sure.

Dave: Sure.

Dave: No, not at all.

KV/Win 2.0 was able to successfully recognize the above utterances even though it had not been previously exposed to Dave's voice (see figure 7.4). For good measure, I also had KV/Win 2.0 listen to Dave in the critical scene in which HAL is betraying him. In this scene, Dave says the word HAL five times in a row in an increasingly plaintive voice. KV/Win 2.0 successfully recognized the five utterances, despite their obvious differences in tone and enunciation (see figure 7.5). Looking at the spectrogram, we can see that these five utterances, although they are similar in some respects, are really quite different from one another and demonstrate clearly the variability of human speech (see figure 7.6). So, except for KV/Win's restriction to discrete speech, with regard to speech recognition we've already created HAL!