Back to overview Show all

Original article (peer-reviewed)

Journal IEEE Access
Page(s) 1
Title of proceedings IEEE Access
DOI 10.1109/access.2016.2604038


Practically no knowledge exists on the effects of speech coding and recognition for narrow-band transmission of speech signals within certain frequency ranges especially in relation to the recognition of paralinguistic cues in speech. We thus investigated the impact of narrow-band standard speech coders on the machine based classification of affective vocalizations and clinical vocal recordings. Additionally we analysed the effect of speech low-pass filtering by a set of different cut-off frequencies, either chosen as static values in the 0.5-5 kHz range or given dynamically by different upper limits from the first five speech formants (F1-F5). Speech coding and recognition was tested, first, according to short-term speaker states by using affective vocalizations as given by the Geneva Multimodal Emotion Portrayals (GEMEP). Second, in relation to long-term speaker traits we tested vocal recording from clinical populations involving speech impairments as found in the Child Pathological Speech Database (CPSD). We employ a large acoustic feature space derived from the Interspeech Computational Paralinguistics Challenge. Besides analysis of the sheer corruption outcome, we analysed the potential of matched and multi-condition training as opposed to miss-matched condition. In the results, first, multicondition and matched-condition training significantly increase performances as opposed to mismatched condition. Second, downgrades in classification accuracy occur, however, only at comparably severe levels of low-pass filtering. The downgrades especially appear for multi-categorical rather than for binary decisions. These can be dealt with reasonably by the alluded strategies.