Sitaram S., Parlikar A., Anumanchipalli G.K., Black A.W.

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

pp 3360



Grapheme-to-phoneme conversion follows the text processing step in speech synthesis. Typically, lexicons or Letter-to-Sound rules are used to map graphemes to phonemes. However, in some languages, such resources may not be readily available. In this paper, we describe a universal front end that supports using grapheme information alone to build usable speech synthesis systems. This work takes advantage of an explicit mapping of Unicode characters from a wide range of scripts to a single phoneset to create support for building speech synthesizers for most languages in the world. We compare the efficacy of this front end to the baseline approach of treating every single grapheme as a separate phoneme for synthesis by building voices for twelve languages across several language
families and to front ends with linguistic knowledge in languages with higher resources. In addition, we improve our models by using Random Forests as opposed to using single Classification and Regression Trees. We find that the common universal front end performs better than the raw graphemes in general. We also find that using Random Forests lead to a significant improvement in synthesis quality, which is better than the quality of the knowledge based front end in many cases.