Chapter 6



01  02  03  04  05  

06  07  08  09  10  

11  










DAISY sound file

In 2001, just before HAL is disconnected, he starts singing. Because the computer's voice is a human voice, HAL's singing doesn't seem extraordinary to us. The song he chooses is rather curious. I doubt that too many people would think of "Daisy, Daisy" as the appropriate song for such a scene. However, as Arthur C. Clarke knew, this song is historically important: It was the first song ever sung by a computer. This work was done by John Kelly at Bell Laboratories and employed his synthesis-by-rule algorithm. Whenever I lecture about computer singing, I always start by playing the original computer song and reminding the audience that this was the song used in 2001.

My own work in computer singing stemmed from research in speech synthesis. To understand the effect of manipulating speech in the parameter domain, I constructed an interactive system to display and alter the synthesis parameters for a digital version of an electroacoustic synthesizer. Because the state of synthesis was not very advanced in 1974, I used analysis parameters from natural-speech segments. My system allowed me to adjust the timing of events by stretching and compressing the parameters and to change the pitch by simply drawing or typing a new pitch contour; for special effects, I could also change the spectral parameters. By adjusting the timing to fit the music and setting the pitch to the frequency of the desired musical notes, I was able to program a computer to sing. A singing formant developed by J. Sundberg added richness to the voice, and a vibrato contributed to its realistic sound.

The opera, written for a human soprano and a computer with a male voice, was the story of a woman scientist who builds a computer and tries to teach it to talk. The computer, later producing only a few unintelligible sounds, miraculously begins to talk by imitating its creator. The scientist is happy with this development but wants the machine to speak with more feeling. When the computer obliges, after some failed attempts, the scientist is satisfied and turns the machine off. The machine, however, turns itself back on and pleads with her to stay. Soon it breaks into song and she joins it in a love duet based on music from Verdi's La Traviata. Of course, operas don't always end happily. The scientist cannot cope with loving a machine and proceeds to disassemble it. The main theme of this opera is our desire for computers not just to speak, but to speak with feeling.


Today and Beyond 2001

As we near the year 2001, do we have a computer that sounds like the voice of HAL portrayed by actor Douglas Rain -- personable, warm, emotional, human-sounding? The answer is no, not yet.

At Bell Laboratories we have developed a text-to-speech synthesizer that is highly intelligible in several languages, including English, German, French, Spanish, Russian, Chinese, and Navajo. The finest module in the synthesizer is the pronunciation module, which enables it to pronounce words and names as well as any educated American would. Yet, although capable of both reading or generating such complex text as e-mail or newspaper stories, the synthesizer does not replicate the human voice. It has a distinct "machine"sound. Which of the stages of the synthesis process we described account for this fault?

Not one but many of the stages require improvements before we succeed in producing humanlike speech. There are problems at both the text-analysis and the speech-synthesis stages. The greatest dilemma facing synthesis researchers, as well those working on automatic speech recognition (see chapter 7), is the machine's inability to comprehend what it is saying or hearing. This, of course, is a part of the greater problem of artificial intelligence, which at present is very limited. Even so, a machine has been "taught" to play high-level chess (see chapter 5) and can defeat most human players. Compared to the problem of language understanding, however, chess is quite simple. Language acquisition is more analogous to the game of Go, as there are an enormous number of possible combinations of moves in the game and of sentences in the language. Go has approximately 10768 sequences of moves, a number that is many orders of magnitude larger than the number of atoms in the universe. Due to this complexity, machines programed for Go play at only an elementary or novice level. The same holds true for machine language understanding. A computer can only perform tasks requiring very limited understanding. It can maintain a dialogue about ordering a pizza but not about a subject matter that has not been previously defined.


top of pageauthor infofurther readingorderforward