
![]() ![]() 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 |
A more complex example is the phenomenon of nonlinear time
compression. When we speak, we change our speed according to context
and other factors. If we speak one word more quickly, we do not
increase the rate evenly throughout the entire word. The duration of
certain portions of the word, such as plosive consonants (e.g.,
/p/, /b/, /t/), remains fairly constant,
while other portions, such as vowels, undergo most of the change. In
matching a spoken word to a stored example (a template), we
need to align corresponding acoustic events or the match will never
succeed. A mathematical technique called dynamic programming
solves this temporal alignment (see figure 7.3).
Lesson 3: Speech Is Like a Song A third lesson is also apparent from studying speech spectrograms. It is that the perceptual cues needed to identify speech sounds and assemble words are found in the frequency domain, and not in the original time-varying signal. To make sense of it, we need to convert the original waveform into its frequency components. The human vocal tract is similar to a musical instrument (indeed it is a musical instrument). The vocal cords vibrate, creating a characteristic pitched sound; the length and tautness of the cords determine pitch in the same way that the length and tautness of a violin or piano string does. We can control the tautness of our vocal cords -- as we do when singing -- and alter the overtones produced by our vibrating cords by moving our tongue, teeth, and lips -- which change the shape of the vocal tract. The vocal tract is a chamber that acts like a pipe in a pipe organ, the harmonic resonances of which emphasize certain overtones and diminish others. Finally, we control a small piece of tissue called the alveolar flap (or soft palate), which opens and closes the nasal cavity. When the alveolar flap is open, the nasal cavity adds an additional resonant chamber; it's a lot like opening another organ pipe. (Viewers of My Fair Lady will recall that the anatomy of speech recognition is also an important topic for specialists in phonetics.) In addition to the pitched sound produced by the vocal cords, we can produce a noiselike sound by the rush of air through the speech cavity. This sound does not have specific overtones but is a complex spectrum of many frequencies mixed together. Like the musical tones produced by the vocal cords, the spectra of these noise sounds are shaped by the changing resonances of the moving vocal tract. This vocal apparatus allows us to create the varied sounds that comprise human speech. Although many animals communicate with others of their species through sound, we humans are unique in our ability to shape that sound into language. We produce vowel sounds (e.g., /a/, /i/) by shaping the overtones from the vibrating vocal cords into distinct frequency bands, the formants. Sibilant sounds (/s/, /z/) result from the rush of air through particular configurations of tongue and teeth. Plosive consonants (/p/, /b/, /t/) are transitory sounds created by the percussive movement of lips, tongue, and mouth cavity. Nasal sounds (/n/, /m/) are created by invoking the resonances of the nasal cavity. The distribution of sounds vary from one language to another.
Each of the several dozen basic sounds, the phonemes, requires an
intricate movement involving precise coordination of the vocal cords,
alveolar flap, tongue, lips, and teeth. We typically speak about three
words per second. So with an average of six phonemes per word, we make
about eighteen intricate phonetic gestures per second, a task
comparable in complexity to a performance by a concert pianist. We do
this without thinking about it, of course. Our thoughts remain on the
conceptual (that is, the highest) level of the language and knowledge
hierarchy. In our first two years of life, however, we thought a lot
about how to make speech sounds -- and how to string them together
meaningfully. This process is an example of our sequential (i.e.,
logical, rational) conscious mind training our parallel preconscious
pattern-processing mental faculties.
|