Author Search Result

[Author] Masatsune TAMURA(4hit)

1-4hit
  • Fast Concatenative Speech Synthesis Using Pre-Fused Speech Units Based on the Plural Unit Selection and Fusion Method

    Masatsune TAMURA  Tatsuya MIZUTANI  Takehiko KAGOSHIMA  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:2
      Page(s):
    544-553

    We have previously developed a concatenative speech synthesizer based on the plural speech unit selection and fusion method that can synthesize stable and human-like speech. In this method, plural speech units for each speech segment are selected using a cost function and fused by averaging pitch-cycle waveforms. This method has a large computational cost, but some platforms require a speech synthesis system that can work within limited hardware resources. In this paper, we propose an offline unit fusion method that reduces the computational cost. In the proposed method, speech units are fused in advance to make a pre-fused speech unit database. At synthesis time, a speech unit for each segment is selected from the pre-fused speech unit database and the speech waveform is synthesized by applying prosodic modification and concatenation without the computationally expensive unit fusion process. We compared several algorithms for constructing the pre-fused speech unit database. From the subjective and objective evaluations, the effectiveness of the proposed method is confirmed by the results that the quality of synthetic speech of the offline unit fusion method with 100 MB database is close to that of the online unit fusion method with 93 MB JP database and is slightly lower to that of the 390 MB US database, while the computational time is reduced by 80%. We also show that the frequency-weighted VQ-based method is effective for construction of the pre-fused speech unit database.

  • A Training Method of Average Voice Model for HMM-Based Speech Synthesis

    Junichi YAMAGISHI  Masatsune TAMURA  Takashi MASUKO  Keiichi TOKUDA  Takao KOBAYASHI  

     
    PAPER

      Vol:
    E86-A No:8
      Page(s):
    1956-1963

    This paper describes a new training method of average voice model for speech synthesis in which arbitrary speaker's voice is generated based on speaker adaptation. When the amount of training data is limited, the distributions of average voice model often have bias depending on speaker and/or gender and this will degrade the quality of synthetic speech. In the proposed method, to reduce the influence of speaker dependence, we incorporate a context clustering technique called shared decision tree context clustering and speaker adaptive training into the training procedure of average voice model. From the results of subjective tests, we show that the average voice model trained using the proposed method generates more natural sounding speech than the conventional average voice model. Moreover, it is shown that voice characteristics and prosodic features of synthetic speech generated from the adapted model using the proposed method are closer to the target speaker than the conventional method.

  • A Context Clustering Technique for Average Voice Models

    Junichi YAMAGISHI  Masatsune TAMURA  Takashi MASUKO  Keiichi TOKUDA  Takao KOBAYASHI  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E86-D No:3
      Page(s):
    534-542

    This paper describes a new context clustering technique for average voice model, which is a set of speaker independent speech synthesis units. In the technique, we first train speaker dependent models using multi-speaker speech database, and then construct a decision tree common to these speaker dependent models for context clustering. When a node of the decision tree is split, only the context related questions which are applicable to all speaker dependent models are adopted. As a result, every node of the decision tree always has training data of all speakers. After construction of the decision tree, all speaker dependent models are clustered using the common decision tree and a speaker independent model, i.e., an average voice model is obtained by combining speaker dependent models. From the results of subjective tests, we show that the average voice models trained using the proposed technique can generate more natural sounding speech than the conventional average voice models.

  • Statistical Bandwidth Extension for Speech Synthesis Based on Gaussian Mixture Model with Sub-Band Basis Spectrum Model

    Yamato OHTANI  Masatsune TAMURA  Masahiro MORITA  Masami AKAMINE  

     
    PAPER-Voice conversion

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2481-2489

    This paper describes a novel statistical bandwidth extension (BWE) technique based on a Gaussian mixture model (GMM) and a sub-band basis spectrum model (SBM), in which each dimensional component represents a specific acoustic space in the frequency domain. The proposed method can achieve the BWE from speech data with an arbitrary frequency bandwidth whereas the conventional methods perform the conversion from fixed narrow-band data. In the proposed method, we train a GMM with SBM parameters extracted from full-band spectra in advance. According to the bandwidth of input signal, the trained GMM is reconstructed to the GMM of the joint probability density between low-band SBM and high-band SBM components. Then high-band SBM components are estimated from low-band SBM components of the input signal based on the reconstructed GMM. Finally, BWE is achieved by adding the spectra decoded from estimated high-band SBM components to the ones of the input signal. To construct the full-band signal from the narrow-band one, we apply this method to log-amplitude spectra and aperiodic components. Objective and subjective evaluation results show that the proposed method extends the bandwidth of speech data robustly for the log-amplitude spectra. Experimental results also indicate that the aperiodic component extracted from the upsampled narrow-band signal realizes the same performance as the restored and the full-band aperiodic components in the proposed method.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.