SPEECH PROCESSING

Reading Time: 2 minutes

Speech processing is the study of speech signal and analysis of speech signal using ‘TIME DOMAIN’ and ‘FREQUENCY DOMAIN’ analysis parameter.

TIME DOMAIN ANALYSIS:-

Time domain often requires simple calculation and interpretation. Among the relevant features found readily in temporal analysis are waveform statics, power and fundamental frequency (Fo).

NEED FOR TIME DOMAIN ANALYSIS:-

Time domain analysis is found necessary when a signal has constant frequency over the time span of our analysis.

FREQUENCY DOMAIN ANALYSIS:-

Frequency domain analysis we find out the spectral properties of signal .Spectrum analysis provides us the mechanism of most useful parameter of speech signal like bandwidth, spectral energy, formant frequency (Formants are the distinguishing or meaningful frequency components of human speech and of singing) etc.

NEED FOR FREQUENCY DOMAIN ANALYSIS:-

Frequency domain is necessary when we want to filtering operation on speech signal .It is also crucial when fixed bend allocation to any system is needed.

AREA OF STUDY IN SPEECH PROCESSING:

1.) Voice recognition:-

It has two parts :-a) Speech recognition b)Speaker recognition

a.) Speech recognition:-

“Determine what is being said”. In computer science speech recognition is the translation of spoken word in text.

b.) Speaker recognition:-

“Determine who is speaking”. Speaker recognition is the identification of the person who is speaking by characteristics of their voice biometrics.

 

 

2.) Speech coding:-

Speech coding is an application of data compression of digital audio signal containing speech .Speech coding uses speech specific estimation using audio signal processing technique to model speech signal combined with data compression algorithm to represent resulting modeled parameters in a compact bit stream.

 

3.) Voice analysis:-

Voice analysis is the study of speech sound for purpose other than linguistic content such as in speech recognition, such study includes medical analysis of the voice.

4.) Speech synthesis:-

Speech synthesis is the artificial production of human voice. A computer system used for this is called speech synthesizer.

5.) Speech enhancement:-

Speech enhancement aims to improve speech quality using various algorithms. The objective of enhancement is improvement in speech and overall perceptual quality of degraded signal using audio signal processing.

6.) Speaker diarization:-

Speaker diarisation (or diarization) is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity. It can enhance the readability of an automatic speech transcription by structuring the audio stream into speaker turns and, when used together with speaker recognition systems, by providing the speaker’s true identity.

It is used to answer the question “who spoke when?” Speaker diarisation is a combination of speaker segmentation and speaker clustering. The first aims at finding speaker change points in an audio stream. The second aims at grouping together speech segments on the basis of speaker characteristics.

 

CEV - Handout