This paper proposes a technique for synthesizing speech with a desired speaking style and/or emotional expression, based on model adaptation in an HMM-based speech synthesis framework. Speaking styles and emotional expressions are characterized by many segmental and suprasegmental features in both spectral and prosodic features. Therefore, it is essential to take account of these features in the model adaptation. The proposed technique called style adaptation, deals with this issue. Firstly, the maximum likelihood linear regression (MLLR) algorithm, based on a framework of hidden semi-Markov model (HSMM) is presented to provide a mathematically rigorous and robust adaptation of state duration and to adapt both the spectral and prosodic features. Then, a novel tying method for the regression matrices of the MLLR algorithm is also presented to allow the incorporation of both the segmental and suprasegmental speech features into the style adaptation. The proposed tying method uses regression class trees with contextual information. From the results of several subjective tests, we show that these techniques can perform style adaptation while maintaining naturalness of the synthetic speech.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Makoto TACHIBANA, Junichi YAMAGISHI, Takashi MASUKO, Takao KOBAYASHI, "A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features" in IEICE TRANSACTIONS on Information,
vol. E89-D, no. 3, pp. 1092-1099, March 2006, doi: 10.1093/ietisy/e89-d.3.1092.
Abstract: This paper proposes a technique for synthesizing speech with a desired speaking style and/or emotional expression, based on model adaptation in an HMM-based speech synthesis framework. Speaking styles and emotional expressions are characterized by many segmental and suprasegmental features in both spectral and prosodic features. Therefore, it is essential to take account of these features in the model adaptation. The proposed technique called style adaptation, deals with this issue. Firstly, the maximum likelihood linear regression (MLLR) algorithm, based on a framework of hidden semi-Markov model (HSMM) is presented to provide a mathematically rigorous and robust adaptation of state duration and to adapt both the spectral and prosodic features. Then, a novel tying method for the regression matrices of the MLLR algorithm is also presented to allow the incorporation of both the segmental and suprasegmental speech features into the style adaptation. The proposed tying method uses regression class trees with contextual information. From the results of several subjective tests, we show that these techniques can perform style adaptation while maintaining naturalness of the synthetic speech.
URL: https://globals.ieice.org/en_transactions/information/10.1093/ietisy/e89-d.3.1092/_p
Copy
@ARTICLE{e89-d_3_1092,
author={Makoto TACHIBANA, Junichi YAMAGISHI, Takashi MASUKO, Takao KOBAYASHI, },
journal={IEICE TRANSACTIONS on Information},
title={A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features},
year={2006},
volume={E89-D},
number={3},
pages={1092-1099},
abstract={This paper proposes a technique for synthesizing speech with a desired speaking style and/or emotional expression, based on model adaptation in an HMM-based speech synthesis framework. Speaking styles and emotional expressions are characterized by many segmental and suprasegmental features in both spectral and prosodic features. Therefore, it is essential to take account of these features in the model adaptation. The proposed technique called style adaptation, deals with this issue. Firstly, the maximum likelihood linear regression (MLLR) algorithm, based on a framework of hidden semi-Markov model (HSMM) is presented to provide a mathematically rigorous and robust adaptation of state duration and to adapt both the spectral and prosodic features. Then, a novel tying method for the regression matrices of the MLLR algorithm is also presented to allow the incorporation of both the segmental and suprasegmental speech features into the style adaptation. The proposed tying method uses regression class trees with contextual information. From the results of several subjective tests, we show that these techniques can perform style adaptation while maintaining naturalness of the synthetic speech.},
keywords={},
doi={10.1093/ietisy/e89-d.3.1092},
ISSN={1745-1361},
month={March},}
Copy
TY - JOUR
TI - A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features
T2 - IEICE TRANSACTIONS on Information
SP - 1092
EP - 1099
AU - Makoto TACHIBANA
AU - Junichi YAMAGISHI
AU - Takashi MASUKO
AU - Takao KOBAYASHI
PY - 2006
DO - 10.1093/ietisy/e89-d.3.1092
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E89-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2006
AB - This paper proposes a technique for synthesizing speech with a desired speaking style and/or emotional expression, based on model adaptation in an HMM-based speech synthesis framework. Speaking styles and emotional expressions are characterized by many segmental and suprasegmental features in both spectral and prosodic features. Therefore, it is essential to take account of these features in the model adaptation. The proposed technique called style adaptation, deals with this issue. Firstly, the maximum likelihood linear regression (MLLR) algorithm, based on a framework of hidden semi-Markov model (HSMM) is presented to provide a mathematically rigorous and robust adaptation of state duration and to adapt both the spectral and prosodic features. Then, a novel tying method for the regression matrices of the MLLR algorithm is also presented to allow the incorporation of both the segmental and suprasegmental speech features into the style adaptation. The proposed tying method uses regression class trees with contextual information. From the results of several subjective tests, we show that these techniques can perform style adaptation while maintaining naturalness of the synthetic speech.
ER -