1-3hit |
M. Shahidur RAHMAN Tetsuya SHIMAMURA
A new system identification based method has been proposed for accurate estimation of vocal tract parameters. An often encountered problem in using the conventional linear prediction analysis is due to the harmonic structure of the excitation source of voiced speech. This harmonic characteristic is coupled with the estimation of autoregressive (AR) coefficients that results in difficulties in estimating the vocal tract filter. This paper models the effective voice source from the residual obtained through the covariance analysis in the first-pass which is then used as input to the second-pass least-square analysis. A better source-filter separation is thus achieved. The formant frequencies and corresponding bandwidths obtained using the proposed method for synthetic vowels are found to be accurate up to a factor of more than three (in percent) compared to the conventional method. Since the source characteristic is taken into account, local variations due to the positioning of analysis window are reduced significantly. The validity of the proposed method is also examined by inspecting the spectra obtained from natural vowel sounds uttered by high-pitched female speaker.
M. Shahidur RAHMAN Tetsuya SHIMAMURA
This paper explores the potential of pitch determination from bone conducted (BC) speech. Pitch determination from normal air conducted (AC) speech signal can not attain the expected level of accuracy for every voice and background conditions. In contrast, since BC speech is caused by the vibrations that have traveled through the vocal tract wall, it is robust against ambient conditions. Though an appropriate model of BC speech is not known, it has regular harmonic structure in the lower spectral region. Due to this lowpass nature, pitch determination from BC speech is not usually affected by the dominant first formant. Experiments conducted on simultaneously recorded AC and BC speech show that BC speech is more reliable for pitch estimation than AC speech. With little human work, pitch contour estimated from BC speech can also be used as pitch reference that can serve as an alternate to the pitch contour extracted from laryngograph output which is sometimes inconsistent with simultaneously recorded AC speech.
M. Shahidur RAHMAN Tetsuya SHIMAMURA
A two-stage least square identification method is proposed for estimating ARMA (autoregressive moving average) coefficients from speech signals. A pulse-train like input sequence is often employed to account for the source effects in estimating vocal tract parameters of voiced speech. Due to glottal and radiation effects, the pulse train, however, does not represent the effective voice source. The authors have already proposed a simple but effective model of voice source for estimating AR (autoregressive) coefficients. This letter extends our approach to ARMA analysis to wider varieties of speech sounds including nasal vowels and consonants. Analysis results on both synthetic and natural nasal speech are presented to demonstrate the analysis ability of the method.