Decades of Text-to-speech (TTS) Evolution From Baby Talk to Siri

The text to speech technology made it to the consumers’ awareness during the advent of the digital era, but few people recognize that the speech synthesizer technology has been around since the time of the first commercially available computer UNIVAC. Melanie Pinola of PCWorld has traced the origin of the TTS technology and its conception from the “Audrey” System to today’s most recognizable digital voice named Siri. In 1952, “The first speech recognition systems could understand only digits. (Given the complexity of human language, it makes sense that inventors and engineers first focused on numbers.) Bell Laboratories designed in 1952 the “Audrey” system, which recognized digits spoken by a single voice. Ten years later, IBM demonstrated at the 1962 World’s Fair its “Shoebox” machine, which could understand 16 words spoken in English” (Pinola).

The continuous development of speech recognition hardware conducted in the United States, Japan, England, and the Soviet Union in the 1960s led to the formation of DARPA Speech Understanding Research established from 1971 to 1976. The program was responsible for the “Harpy” speech-understanding system that could recognize 1,011 words consists of primarily an average three-year-old vocabulary.

“Harpy was significant because it introduced a more efficient search approach, called beam search, to “prove the finite-state network of possible sentences,” according to Readings in Speech Recognition by Alex Waibel and Kai-Fu Lee. (The story of speech recognition is very much tied to advances in search methodology and technology, as Google’s entrance into speech recognition on mobile devices proved just a few years ago.)” – Pinola

The decades following the introduction of “Harpy” can be considered as the milestone era of speech recognition where the few hundred words vocabulary jumped to several thousands paving the way for the unlimited word recognition as observed in today’s TTS technology. Instead of using word templates, the Hidden Markov Model (HMM) has considered the possibilities of using known sounds as being words, “Equipped with this expanded vocabulary, speech recognition started to work its way into commercial applications for business and specialized industry (for instance, medical use). It even entered the home, in the form of Worlds of Wonder’s Julie doll (1987), which children could train to respond to their voice. Finally, the doll that understands you” (Pinola). At the dawn of the 21^st Century, a commercially available speech recognition software was made available to the masses such as Dragon Naturally Speaking and VAL introduced by Bell South.

However, the biggest leap towards the development of TTS is the arrival of Google Voice Search App for iPhone, which made a tremendous impact on the evolution of TTS as a core component of today’s mobile devices. As the phase of TTS development expands to a whole new ground as an intuitive and assistive technology for mobile devices, Apple introduced Siri to its consumers, which relies on cloud-based processing.

Siri draws what it knows about you to generate a contextual reply, and it responds to your voice input with personality – Pinola

From this point, speech recognition has evolved from utility to entertainment encompassing a future for TTS to invade not only the mobile devices but every aspect of human existence. Who knows, in the near future, making a sandwich would be as easy as commanding the armada of kitchen gadgets to make it for you.

Enable your site or app to reach more audience. Try ResponsiveVoice TTS now!

Read More: “Speech Recognition Through the Decades: How We Ended Up With Siri“ by Melanie Pinola. PCWorld