A Context Clustering Technique for Average Voice Models

Junichi YAMAGISHI, Masatsune TAMURA, Takashi MASUKO, Keiichi TOKUDA, Takao KOBAYASHI

  • Full Text Views

    0

  • Cite this

Summary :

This paper describes a new context clustering technique for average voice model, which is a set of speaker independent speech synthesis units. In the technique, we first train speaker dependent models using multi-speaker speech database, and then construct a decision tree common to these speaker dependent models for context clustering. When a node of the decision tree is split, only the context related questions which are applicable to all speaker dependent models are adopted. As a result, every node of the decision tree always has training data of all speakers. After construction of the decision tree, all speaker dependent models are clustered using the common decision tree and a speaker independent model, i.e., an average voice model is obtained by combining speaker dependent models. From the results of subjective tests, we show that the average voice models trained using the proposed technique can generate more natural sounding speech than the conventional average voice models.

Publication
IEICE TRANSACTIONS on Information Vol.E86-D No.3 pp.534-542
Publication Date
2003/03/01
Publicized
Online ISSN
DOI
Type of Manuscript
Special Section PAPER (Special Issue on Speech Information Processing)
Category
Speech Synthesis and Prosody

Authors

Keyword

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.