Artificial Intelligence Does Better Than Humans in Recognizing Speech

Published on October 21, 2020
Artificial Intelligence
Image Credit: [Unsplash/Andy Kelly]

Keeping track of a conversation and precisely transcribing it is a huge challenge for AI (Artificial Intelligence) researchers. Now, for the first time, researchers at KIT (the Karlsruhe Institute of Technology) have successfully developed a machine learning algorithm that can outperform a human. It is able to recognize any kind of spoken language with almost no delay.

When people talk to each other, there are stops, stutterings, hesitations, such as ‘er’ or ‘hmmm,’ laughs and coughs. Often, words are pronounced unclearly. And so far, this has been even more difficult for AI (Artificial Intelligence).

lex Waibel – Professor for Informatics at KIT

Live Translation is Getting Better

Waibel had developed an automatic live translator that can translate university lectures to English (from German) or into the many languages used by foreign students. Waibel’s ‘Lecture Translator’ has seen use at KIT since as far back as 2012.

Recognition of spontaneous speech is the most important component of this system, as errors and delays in recognition make the translation incomprehensible. On conversational speech, the human error rate amounts to about 5.5%. Our system now reaches 5.0%.

lex Waibel – Professor for Informatics at KIT

It’s not just precision that matters. A system’s speed in producing an output is just as important. This makes sense. A fast translation lets students follow the lecture more fluidly. Delays would make learning more difficult. In this case, researchers have been able to reduce the translation delay down to a single second. According to Waibel, this is the smallest reported latency ever reached by a speech recognition system that’s this capable.

The ‘switchboard-benchmark’ is used to measure a speech recognition system’s latency and rate of error. It’s a standardized system defined by NIST (The US National Institute of Standards and Technology). The switchboard-benchmark is widely accepted and well understood. It’s used by international artificial intelligence researchers to build an algorithm that can nearly replicate a humans ability to recognize speech. Now, it outperforms them.

Related post: Does Explaining How AI Arrived at Its Conclusion Increase Trust in Machine Learning? (No)

Enjoyed this video?
"No Thanks. Please Close This Box!"