While global characteristics of the speaker’s source and spectral features have been successfully employed in pathological voice detection, the underlying text has largely been ignored. In this work, we focus on experiments that exploit the text stimulus that is read by the subject. Features derived from text include the mean cepstral distortion of the subject from an average intelligible speaker, and prosodic features include the speaking rate, statistics of phoneme durations, etc. The phonetic labeling information is also exploited to ignore all the unvoiced regions of the speech samples to improve the discriminability between intelligible and pathological voices. We also designed features that capture the speaker’s overall closeness to intelligible instances of the same text stimulus from other speakers. Our experiments show that the proposed text-derived features improve the detection of pathological voices by 20%.
Index Terms: Pathological voices, example based detection, text-driven features, fusion of classification methods.