1-2hit |
In this paper, we propose a statistical approach to improve the performance of spectral quantization of speech coders. The proposed techniques compensate for the distortion in a decoded line spectrum pair (LSP) vector based on a statistical mapping function between a decoded LSP vector and its corresponding original LSP vector. We first develop two codebook-based probabilistic matching (CBPM) methods by investigating the distribution of LSP vectors. In addition, we propose an iterative procedure for the two CBPMs. Next, the proposed techniques are applied to the predictive vector quantizer (PVQ) used for the IS-641 speech coder. The experimental results show that the proposed techniques reduce average spectral distortion by around 0.064 dB and the percentage of outliers compared with the PVQ without any compensation, resulting in transparent quality of spectral quantization. Finally, the comparison of speech quality using the perceptual evaluation of speech quality (PESQ) measure is performed and it is shown that the IS-641 speech coder employing the proposed techniques has better decoded speech quality than the standard IS-641 speech coder.
Naoto IWAHASHI Nobuyoshi KAIKI Yoshinori SAGISAKA
This paper proposes a new scheme for concatenative speech synthesis to improve the speech segment selection procedure. The proposed scheme selects a segment sequence for concatenation by minimizing acoustic distortions between the selected segment and the desired spectrum for the target without the use of heuristics. Four types of distortion, a) the spectral prototypicality of a segment, b) the spectral difference between the source and target contexts, c) the degradation resulting from concatenation of phonemes, and d) the acoustic discontinuity between the concatenated segments, are formulated as acoustic quantities, and used as measures for minimization. A search method for selecting segments from a large speech database is also descrided. In this method, a three-step optimization using dynamic programming is used to minimize the four types of distortion. A perceptual test shows that this proposed segment selection method with minimum distortion criteria produces high quality synthesized speech, and that contextual spectral difference and acoustic discontinuity at the segment boundary are important measures for improving the quality.