
![]() ![]() 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 |
Great strides were made in normalizing the speech signal to filter out
variability (lesson 2). F. Itakura, a Japanese scientist, and H. Sakoe
and S. Chiba introduced dynamic programming to compute optimal
nonlinear time alignments, a technique that quickly became the
standard. Jim Baker and IBM's Fred Jelinek introduced a statistical
method called Markov Modeling; it provided a powerful mathematical
tool for finding the invariant information in the speech signal.
Lesson 3, about breaking the speech signal into its frequency
components, had already been established prior to the ARPA SUR
projects, some of which developed systems that could adapt to aspects
of the speaker's voice (lesson 4). Lesson 5 was anticipated by
allowing ARPA SUR researchers to use as much computer memory as they
could afford to buy and as much computer time as they had the patience
to wait for. An underlying, and accurate expectation was that Moore's
law (see chapter 3 and below) would ultimately provide whatever
computing platform the algorithms required.
By the end of the 1970s, numerous commercial speech-recognition
products were available. They ranged from Heuristics' $259 H-2000
Speech Link, to $100,000 speaker-independent systems from Verbex and
Nippon. Other companies, including Threshold, Scott, Centigram, and
Interstate, offered systems with sixteen-channel filter banks at
prices between $2,000 and $15,000. Such products could recognize small
vocabularies spoken with pauses between words.
Important work on large-vocabulary continuous speech (i.e., speech with no pauses between words) was also conducted at Carnegie Mellon University by Kai-Fu Lee, who subsequently left the university to head Apple's speech-recognition efforts.
By 1991, revenues for the ASR industry were in low eight figures and
were increasing substantially every year. A buyer could choose any one
(but not two) characteristics from the following menu: large
vocabulary, speaker independence, or continuous speech. HAL, of
course, could do all three.
|