MATCH Technology Tutorial - Basic Principles of Speech Recognition

Basic Principles of Speech Recognition

To understand Speech Recognition software, we must understand the basic units of language and the methods to interpret it.

Speech

The smallest unit of spoken language is known as a Phoneme. The English language contains approximately 44 phonemes representing all the vowels and consonants that we use for speech. We can take the example of a typical word such as moon which can be broken down into three phonemes: m, ue, n.

Interpreting Speech

To interpret speech we must have a way of identifying the components of spoken words. Phonemes act as identifying markers that within speech since they remain at a constant value and can therefore be broken down further.

An algorithm has to be used to interpret the speech further. The University of Edinburgh is using mathematical models (Hidden Markov Models) to do this.

These models work on the basis of probability. To create a speech recognition engine, a large database of models is created to match each phoneme. When a comparison is performed, the most likely match is determined between the spoken phoneme and the stored one, and further computations are performed. This allows the system to break down the exact word that was uttered, to understand in what context it was used, and to understand the grammar if the word is part of a sentence.