A Style Control Technique for HMM-Based Expressive Speech Synthesis

Takashi NOSE; Junichi YAMAGISHI; Takashi MASUKO; Takao KOBAYASHI

doi:10.1093/ietisy/e90-d.9.1406

A Style Control Technique for HMM-Based Expressive Speech Synthesis

Takashi NOSE, Junichi YAMAGISHI, Takashi MASUKO, Takao KOBAYASHI

Full Text Views

0

Share
Cite this

Summary :

This paper describes a technique for controlling the degree of expressivity of a desired emotional expression and/or speaking style of synthesized speech in an HMM-based speech synthesis framework. With this technique, multiple emotional expressions and speaking styles of speech are modeled in a single model by using a multiple-regression hidden semi-Markov model (MRHSMM). A set of control parameters, called the style vector, is defined, and each speech synthesis unit is modeled by using the MRHSMM, in which mean parameters of the state output and duration distributions are expressed by multiple-regression of the style vector. In the synthesis stage, the mean parameters of the synthesis units are modified by transforming an arbitrarily given style vector that corresponds to a point in a low-dimensional space, called style space, each of whose coordinates represents a certain specific speaking style or emotion of speech. The results of subjective evaluation tests show that style and its intensity can be controlled by changing the style vector.

Publication: IEICE TRANSACTIONS on Information Vol.E90-D No.9 pp.1406-1413

Publication Date: 2007/09/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1093/ietisy/e90-d.9.1406

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Takashi NOSE, Junichi YAMAGISHI, Takashi MASUKO, Takao KOBAYASHI, "A Style Control Technique for HMM-Based Expressive Speech Synthesis" in IEICE TRANSACTIONS on Information, vol. E90-D, no. 9, pp. 1406-1413, September 2007, doi: 10.1093/ietisy/e90-d.9.1406.
Abstract: This paper describes a technique for controlling the degree of expressivity of a desired emotional expression and/or speaking style of synthesized speech in an HMM-based speech synthesis framework. With this technique, multiple emotional expressions and speaking styles of speech are modeled in a single model by using a multiple-regression hidden semi-Markov model (MRHSMM). A set of control parameters, called the style vector, is defined, and each speech synthesis unit is modeled by using the MRHSMM, in which mean parameters of the state output and duration distributions are expressed by multiple-regression of the style vector. In the synthesis stage, the mean parameters of the synthesis units are modified by transforming an arbitrarily given style vector that corresponds to a point in a low-dimensional space, called style space, each of whose coordinates represents a certain specific speaking style or emotion of speech. The results of subjective evaluation tests show that style and its intensity can be controlled by changing the style vector.
URL: https://globals.ieice.org/en_transactions/information/10.1093/ietisy/e90-d.9.1406/_p

Copy

@ARTICLE{e90-d_9_1406,
author={Takashi NOSE, Junichi YAMAGISHI, Takashi MASUKO, Takao KOBAYASHI, },
journal={IEICE TRANSACTIONS on Information},
title={A Style Control Technique for HMM-Based Expressive Speech Synthesis},
year={2007},
volume={E90-D},
number={9},
pages={1406-1413},
abstract={This paper describes a technique for controlling the degree of expressivity of a desired emotional expression and/or speaking style of synthesized speech in an HMM-based speech synthesis framework. With this technique, multiple emotional expressions and speaking styles of speech are modeled in a single model by using a multiple-regression hidden semi-Markov model (MRHSMM). A set of control parameters, called the style vector, is defined, and each speech synthesis unit is modeled by using the MRHSMM, in which mean parameters of the state output and duration distributions are expressed by multiple-regression of the style vector. In the synthesis stage, the mean parameters of the synthesis units are modified by transforming an arbitrarily given style vector that corresponds to a point in a low-dimensional space, called style space, each of whose coordinates represents a certain specific speaking style or emotion of speech. The results of subjective evaluation tests show that style and its intensity can be controlled by changing the style vector.},
keywords={},
doi={10.1093/ietisy/e90-d.9.1406},
ISSN={1745-1361},
month={September},}

Copy

TY - JOUR
TI - A Style Control Technique for HMM-Based Expressive Speech Synthesis
T2 - IEICE TRANSACTIONS on Information
SP - 1406
EP - 1413
AU - Takashi NOSE
AU - Junichi YAMAGISHI
AU - Takashi MASUKO
AU - Takao KOBAYASHI
PY - 2007
DO - 10.1093/ietisy/e90-d.9.1406
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E90-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2007
AB - This paper describes a technique for controlling the degree of expressivity of a desired emotional expression and/or speaking style of synthesized speech in an HMM-based speech synthesis framework. With this technique, multiple emotional expressions and speaking styles of speech are modeled in a single model by using a multiple-regression hidden semi-Markov model (MRHSMM). A set of control parameters, called the style vector, is defined, and each speech synthesis unit is modeled by using the MRHSMM, in which mean parameters of the state output and duration distributions are expressed by multiple-regression of the style vector. In the synthesis stage, the mean parameters of the synthesis units are modified by transforming an arbitrarily given style vector that corresponds to a point in a low-dimensional space, called style space, each of whose coordinates represents a certain specific speaking style or emotion of speech. The results of subjective evaluation tests show that style and its intensity can be controlled by changing the style vector.
ER -