Speech recognition

1960

Flanagan took over. Raj Reddy was the first person to take on continuous speech recognition as a graduate student at Stanford University in the late 1960s.

This revived speech recognition research post John Pierce's letter. 1972 - The IEEE Acoustics, Speech, and Signal Processing group held a conference in Newton, Massachusetts. 1976 The first ICASSP was held in Philadelphia, which since then has been a major venue for the publication of research on speech recognition. During the late 1960s Leonard Baum developed the mathematics of Markov chains at the Institute for Defense Analysis.

1976

At the end of the DARPA program in 1976, the best computer available to researchers was the PDP-10 with 4 MB ram.

1979

A Microsoft research executive called this innovation "the most dramatic change in accuracy since 1979".

1980

However, the HMM proved to be a highly useful way for modeling speech and replaced dynamic time warping to become the dominant speech recognition algorithm in the 1980s. 1982 – Dragon Systems, founded by James and Janet M.

Baker, was one of IBM's few competitors. ===Practical speech recognition=== The 1980s also saw the introduction of the n-gram language model. 1987 – The back-off model allowed language models to use multiple length n-grams, and CSELT used HMM to recognize languages (both in software and in hardware specialized processors, e.g.

recurrent nets) of artificial neural networks had been explored for many years during 1980s, 1990s and a few years into the 2000s. But these methods never won over the non-uniform internal-handcrafting Gaussian mixture model/Hidden Markov model (GMM-HMM) technology based on generative models of speech trained discriminatively.

1987

A 1987 ad for a doll had carried the tagline "Finally, the doll that understands you." – despite the fact that it was described as "which children could train to respond to their voice". In 2017, Microsoft researchers reached a historical human parity milestone of transcribing conversational telephony speech on the widely benchmarked Switchboard task.

1990

It could take up to 100 minutes to decode just 30 seconds of speech. Two practical products were: 1987 – a recognizer from Kurzweil Applied Intelligence 1990 – Dragon Dictate, a consumer product released in 1990 AT&T deployed the Voice Recognition Call Processing service in 1992 to route telephone calls without the use of a human operator.

A number of key difficulties had been methodologically analyzed in the 1990s, including gradient diminishing and weak temporal correlation structure in the neural predictive models.

1992

The Sphinx-II system was the first to do speaker-independent, large vocabulary, continuous speech recognition and it had the best performance in DARPA's 1992 evaluation.

Raj Reddy's student Kai-Fu Lee joined Apple where, in 1992, he helped develop a speech interface prototype for the Apple computer known as Casper. Lernout & Hauspie, a Belgium-based speech recognition company, acquired several other companies, including Kurzweil Applied Intelligence in 1997 and Dragon Systems in 2000.

1993

Huang went on to found the speech recognition group at Microsoft in 1993.

1997

DARPA's EARS's program and IARPA's Babel program. In the early 2000s, speech recognition was still dominated by traditional approaches such as Hidden Markov Models combined with feedforward artificial neural networks. Today, however, many aspects of speech recognition have been taken over by a deep learning method called Long short-term memory (LSTM), a recurrent neural network published by Sepp Hochreiter & Jürgen Schmidhuber in 1997.

2000

Apple originally licensed software from Nuance to provide speech recognition capability to its digital assistant Siri. ====2000s==== In the 2000s DARPA sponsored two speech recognition programs: Effective Affordable Reusable Speech-to-Text (EARS) in 2002 and Global Autonomous Language Exploitation (GALE).

2001

L&H was an industry leader until an accounting scandal brought an end to the company in 2001.

2002

2005

The speech technology from L&H was bought by ScanSoft which became Nuance in 2005.

2006

Google Voice Search is now supported in over 30 languages. In the United States, the National Security Agency has made use of a type of speech recognition for keyword spotting since at least 2006.

2007

Google's first effort at speech recognition came in 2007 after hiring some researchers from Nuance.

LSTM RNNs avoid the vanishing gradient problem and can learn "Very Deep Learning" tasks that require memories of events that happened thousands of discrete time steps ago, which is important for speech. Around 2007, LSTM trained by Connectionist Temporal Classification (CTC) started to outperform traditional speech recognition in certain applications.

2009

Most speech recognition researchers who understood such barriers hence subsequently moved away from neural nets to pursue generative modeling approaches until the recent resurgence of deep learning starting around 2009–2010 that had overcome all these difficulties.

2010

reviewed part of this recent history about how their collaboration with each other and then with colleagues across four groups (University of Toronto, Microsoft, Google, and IBM) ignited a renaissance of applications of deep feedforward neural networks to speech recognition. ====2010s==== By early 2010s speech recognition, also called voice recognition was clearly differentiated from speaker recognition, and speaker independence was considered a major breakthrough.

2017

All text is taken from Wikipedia. Text is available under the Creative Commons Attribution-ShareAlike License .

Page generated on 2021-08-05