
![]() 01 02 03 04 05 06 07 08 09 10 11 PlainTalk -- Text to Speech
|
Consequently, when the computer reads a text, it may err in its analysis of hierarchical segmentation and assignment of sentence stress. Since pitch is largely determined by segmentation and stress, incorrect information about these elements can result in unintelligible speech. To minimize the effects of such errors, we limit the range of pitch movement. Although the synthesizer sounds more realistic than people trying to impersonate computers, it still sounds very mechanical. When we can annotate text to specify phrase structure and focus, or generate text with a computer whose range of pitch can expand to match the range of human speakers, synthetic speech will sound better.
Other causes for the poor quality of synthetic speech arise from our inability to model the duration of the phonemes and the movement of the pitch as accurately as we need to to imitate human speech. More important, we still cannot analyze speech and use the resulting parameters in a way that accurately copies the human sound of the speech. At present, it is difficult to predict when we will solve these problems and build computers that sound like HAL.
Researchers in speech synthesis are now working in an area not
portrayed in 2001. In the film, HAL is portrayed as a large
machine whose connection to the world is a large red eye. At Bell
Labs, we have attached a talking face to our computer, which
simultaneously sends the same information to the synthesizer and the
talking head. Thus the talking head receives information about the
phonemes and their duration and uses the information to compute the
appropriate position of its lips, jaw, and tongue. It also moves its
eyebrows to enhance the stressed portions of the speech. Although the
talking head in the picture is a flat mask, it can be covered by a
textured face mask portraying any person you choose. The talking face
not only makes the speech synthesizer more attractive and personable,
it also enhances the intelligibility of the speech by letting the
listener lipread while listening tothe computer (cf. chapter 11). If
HAL had had a real face, rather than one large eye, would it have been
so easy to kill him -- by turning him off? I wonder.
|