Fundamental Frequency Estimation for Noisy Speech Using Entropy-Weighted Periodic and Harmonic Features

Yuichi ISHIMOTO; Kentaro ISHIZUKA; Kiyoaki AIKAWA; Masato AKAGI

Fundamental Frequency Estimation for Noisy Speech Using Entropy-Weighted Periodic and Harmonic Features

Yuichi ISHIMOTO, Kentaro ISHIZUKA, Kiyoaki AIKAWA, Masato AKAGI

Full Text Views

0

Share
Cite this

Summary :

This paper proposes a robust method for estimating the fundamental frequency (F0) in real environments. It is assumed that the spectral structure of real environmental noise varies momentarily and its energy does not distribute evenly in the time-frequency domain. Therefore, segmenting a spectrogram of speech mixed with environmental noise into narrow time-frequency regions will produce low-noise regions in which the signal-to-noise ratio is high. The proposed method estimates F0 from the periodic and harmonic features that are clearly observed in the low-noise regions. It first uses two kinds of spectrogram, one with high frequency resolution and another with high temporal resolution, to represent the periodic and harmonic features corresponding to F0. Next, the method segments these two kinds of feature plane into narrow time-frequency regions, and calculates the probability function of F0 for each region. It then utilizes the entropy of the probability function as weight to emphasize the probability function in the low-noise region and to enhance noise robustness. Finally, the probability functions are grouped in each time, and F0 is obtained as the frequency with the highest probability of the function. The experimental results showed that, in comparison with other approaches such as the cepstrum method and the autocorrelation method, the developed method can more robustly estimate F0s from speech in the presence of band-limited noise and car noise.

Publication: IEICE TRANSACTIONS on Information Vol.E87-D No.1 pp.205-214

Publication Date: 2004/01/01

Publicized

Online ISSN

DOI

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Yuichi ISHIMOTO, Kentaro ISHIZUKA, Kiyoaki AIKAWA, Masato AKAGI, "Fundamental Frequency Estimation for Noisy Speech Using Entropy-Weighted Periodic and Harmonic Features" in IEICE TRANSACTIONS on Information, vol. E87-D, no. 1, pp. 205-214, January 2004, doi: .
Abstract: This paper proposes a robust method for estimating the fundamental frequency (F0) in real environments. It is assumed that the spectral structure of real environmental noise varies momentarily and its energy does not distribute evenly in the time-frequency domain. Therefore, segmenting a spectrogram of speech mixed with environmental noise into narrow time-frequency regions will produce low-noise regions in which the signal-to-noise ratio is high. The proposed method estimates F0 from the periodic and harmonic features that are clearly observed in the low-noise regions. It first uses two kinds of spectrogram, one with high frequency resolution and another with high temporal resolution, to represent the periodic and harmonic features corresponding to F0. Next, the method segments these two kinds of feature plane into narrow time-frequency regions, and calculates the probability function of F0 for each region. It then utilizes the entropy of the probability function as weight to emphasize the probability function in the low-noise region and to enhance noise robustness. Finally, the probability functions are grouped in each time, and F0 is obtained as the frequency with the highest probability of the function. The experimental results showed that, in comparison with other approaches such as the cepstrum method and the autocorrelation method, the developed method can more robustly estimate F0s from speech in the presence of band-limited noise and car noise.
URL: https://globals.ieice.org/en_transactions/information/10.1587/e87-d_1_205/_p

Copy

@ARTICLE{e87-d_1_205,
author={Yuichi ISHIMOTO, Kentaro ISHIZUKA, Kiyoaki AIKAWA, Masato AKAGI, },
journal={IEICE TRANSACTIONS on Information},
title={Fundamental Frequency Estimation for Noisy Speech Using Entropy-Weighted Periodic and Harmonic Features},
year={2004},
volume={E87-D},
number={1},
pages={205-214},
abstract={This paper proposes a robust method for estimating the fundamental frequency (F0) in real environments. It is assumed that the spectral structure of real environmental noise varies momentarily and its energy does not distribute evenly in the time-frequency domain. Therefore, segmenting a spectrogram of speech mixed with environmental noise into narrow time-frequency regions will produce low-noise regions in which the signal-to-noise ratio is high. The proposed method estimates F0 from the periodic and harmonic features that are clearly observed in the low-noise regions. It first uses two kinds of spectrogram, one with high frequency resolution and another with high temporal resolution, to represent the periodic and harmonic features corresponding to F0. Next, the method segments these two kinds of feature plane into narrow time-frequency regions, and calculates the probability function of F0 for each region. It then utilizes the entropy of the probability function as weight to emphasize the probability function in the low-noise region and to enhance noise robustness. Finally, the probability functions are grouped in each time, and F0 is obtained as the frequency with the highest probability of the function. The experimental results showed that, in comparison with other approaches such as the cepstrum method and the autocorrelation method, the developed method can more robustly estimate F0s from speech in the presence of band-limited noise and car noise.},
keywords={},
doi={},
ISSN={},
month={January},}

Copy

TY - JOUR
TI - Fundamental Frequency Estimation for Noisy Speech Using Entropy-Weighted Periodic and Harmonic Features
T2 - IEICE TRANSACTIONS on Information
SP - 205
EP - 214
AU - Yuichi ISHIMOTO
AU - Kentaro ISHIZUKA
AU - Kiyoaki AIKAWA
AU - Masato AKAGI
PY - 2004
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E87-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2004
AB - This paper proposes a robust method for estimating the fundamental frequency (F0) in real environments. It is assumed that the spectral structure of real environmental noise varies momentarily and its energy does not distribute evenly in the time-frequency domain. Therefore, segmenting a spectrogram of speech mixed with environmental noise into narrow time-frequency regions will produce low-noise regions in which the signal-to-noise ratio is high. The proposed method estimates F0 from the periodic and harmonic features that are clearly observed in the low-noise regions. It first uses two kinds of spectrogram, one with high frequency resolution and another with high temporal resolution, to represent the periodic and harmonic features corresponding to F0. Next, the method segments these two kinds of feature plane into narrow time-frequency regions, and calculates the probability function of F0 for each region. It then utilizes the entropy of the probability function as weight to emphasize the probability function in the low-noise region and to enhance noise robustness. Finally, the probability functions are grouped in each time, and F0 is obtained as the frequency with the highest probability of the function. The experimental results showed that, in comparison with other approaches such as the cepstrum method and the autocorrelation method, the developed method can more robustly estimate F0s from speech in the presence of band-limited noise and car noise.
ER -