1-7hit |
Takahiro MURAKAMI Hiroyuki YAMAGISHI Yoshihisa ISHIDA
The theoretically minimum length of a signal for fundamental frequency estimation in a noisy environment is discussed. Assuming that the noise is additive white Gaussian, it is known that a Cramér-Rao lower bound (CRLB) is given by the length and other parameters of the signal. In this paper, we define the minimum length as the length whose CRLB is less than or equal to the specific variance for any parameters of the signal. The specific variance is allowable variance of the estimate within an application of fundamental frequency estimation. By reformulating the CRLB with respect to the initial phase of the signal, the algorithms for determining the minimum length are proposed. In addition, we develop the methods of deciding the specific variance for general fundamental frequency estimation and pitch estimation. Simulation results in terms of both the fundamental frequency estimation and the pitch estimation show the validity of our approach.
Dhany ARIFIANTO Tomohiro TANAKA Takashi MASUKO Takao KOBAYASHI
Borrowing the notion of instantaneous frequency that was developed in the context of time-frequency signal analysis, an instantaneous frequency amplitude spectrum (IFAS) is introduced for estimating fundamental frequency of speech signal in both noiseless and adverse environments. We define harmonicity measure as a quantity that indicates degree of periodical regularity in the IFAS and that shows substantial difference between periodic signal and noise-like waveform. The harmonicity measure is applied to estimate the existence of fundamental frequency. We provide experimental examples to demonstrate the general applicability of the harmonicity measure and apply the proposed procedure to Japanese continuous speech signals. The results show that the proposed method outperforms the conventional methods with or without the presence of noise.
There have been numerous studies on the enhancement of the noisy speech signal. In this paper, We propose a new speech enhancement method, that is, a DFF (Dissonant Frequency Filtering) scheme combined with NR (noise reduction) algorithm. The simulation results indicate that the proposed method provides a significant gain in perceptual quality compared with the conventional method. Therefore if the proposed enhancement scheme is used as a pre-filter, the output speech quality would be enhanced perceptually.
Yuichi ISHIMOTO Kentaro ISHIZUKA Kiyoaki AIKAWA Masato AKAGI
This paper proposes a robust method for estimating the fundamental frequency (F0) in real environments. It is assumed that the spectral structure of real environmental noise varies momentarily and its energy does not distribute evenly in the time-frequency domain. Therefore, segmenting a spectrogram of speech mixed with environmental noise into narrow time-frequency regions will produce low-noise regions in which the signal-to-noise ratio is high. The proposed method estimates F0 from the periodic and harmonic features that are clearly observed in the low-noise regions. It first uses two kinds of spectrogram, one with high frequency resolution and another with high temporal resolution, to represent the periodic and harmonic features corresponding to F0. Next, the method segments these two kinds of feature plane into narrow time-frequency regions, and calculates the probability function of F0 for each region. It then utilizes the entropy of the probability function as weight to emphasize the probability function in the low-noise region and to enhance noise robustness. Finally, the probability functions are grouped in each time, and F0 is obtained as the frequency with the highest probability of the function. The experimental results showed that, in comparison with other approaches such as the cepstrum method and the autocorrelation method, the developed method can more robustly estimate F0s from speech in the presence of band-limited noise and car noise.
Bumki JEON Sangki KANG Seong-Joon BAEK Koeng-Mo SUNG
There have been numerous studies on the enhancement of the noisy speech signal. In this paper, we propose a completely new speech enhancement method, that is, a filtering of a dissonant frequency based on improved fundamental frequency estimation which is developed in frequency domain. The subjective test results indicate that the proposed method provides a significant gain in audible improvement especially for speech contaminated by colored noise and a husky voice. Therefore if the filter is employed as a pre-filter for speech enhancement, the output speech quality and intelligibility should be greatly enhanced.
Hee-Suk PANG SeongJoon BAEK Koeng-Mo SUNG
A simple but effective fundamental frequency estimation method is proposed using parametric cubic convolution. The performance of the method is shown to be good not only for the stationary signals but also for the signal whose fundamental frequency is changing with time. In the simulation, comparisons with other high-accuracy methods are also shown. Due to its accuracy and simplicity, the proposed method is practically useful.
Tamotsu SHIRADO Masuzo YANAGIDA
An algorithm for extracting fundamental frequencies from duet sounds is proposed. The algorithm is based on an acoustical feature that the temporal fluctuation patterns in frequency an power are similar for harmonic components composing a sound for a single musical note played on a single instrument with a single active vibrating source. The algorithm is applied to the sounds of 153 combinations of pair-notes played by a flute duet and a violin duet. Experimental results show that the zone-wize correct identification rate by pitch name are 98% for the flute duet and 95% for the violin duet in the best cases.