A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features

Makoto TACHIBANA, Junichi YAMAGISHI, Takashi MASUKO, Takao KOBAYASHI

  • Full Text Views

    0

  • Cite this

Summary :

This paper proposes a technique for synthesizing speech with a desired speaking style and/or emotional expression, based on model adaptation in an HMM-based speech synthesis framework. Speaking styles and emotional expressions are characterized by many segmental and suprasegmental features in both spectral and prosodic features. Therefore, it is essential to take account of these features in the model adaptation. The proposed technique called style adaptation, deals with this issue. Firstly, the maximum likelihood linear regression (MLLR) algorithm, based on a framework of hidden semi-Markov model (HSMM) is presented to provide a mathematically rigorous and robust adaptation of state duration and to adapt both the spectral and prosodic features. Then, a novel tying method for the regression matrices of the MLLR algorithm is also presented to allow the incorporation of both the segmental and suprasegmental speech features into the style adaptation. The proposed tying method uses regression class trees with contextual information. From the results of several subjective tests, we show that these techniques can perform style adaptation while maintaining naturalness of the synthetic speech.

Publication
IEICE TRANSACTIONS on Information Vol.E89-D No.3 pp.1092-1099
Publication Date
2006/03/01
Publicized
Online ISSN
1745-1361
DOI
10.1093/ietisy/e89-d.3.1092
Type of Manuscript
Special Section PAPER (Special Section on Statistical Modeling for Speech Processing)
Category
Speech Synthesis

Authors

Keyword

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.