This paper presents a new statistical model-based voice activity detection (VAD) algorithm in the wavelet domain to improve the performance in non-stationary environments. Due to the efficient time-frequency localization and the multi-resolution characteristics of the wavelet representations, the wavelet transforms are quite suitable for processing non-stationary signals such as speech. To utilize the fact that the wavelet packet is very efficient approximation of discrete Fourier transform and has built-in de-noising capability, we first apply wavelet packet decomposition to effectively localize the energy in frequency space, use spectral subtraction, and employ matched filtering to enhance the SNR. Since the conventional wavelet-based spectral subtraction eliminates the low-power speech signal in onset and offset regions and generates musical noise, we derive an improved multi-band spectral subtraction. On the other hand, noticing that fixed threshold cannot follow fluctuations of time varying noise power and the inability to adapt to a time-varying environment severely limits the VAD performance, we propose a statistical model-based VAD algorithm in wavelet domain with an adaptive threshold. We perform extensive computer simulations and compare with the conventional algorithms to demonstrate performance improvement of the proposed algorithm under various noise environments.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Yoon-Chang LEE, Sang-Sik AHN, "Statistical Model-Based VAD Algorithm with Wavelet Transform" in IEICE TRANSACTIONS on Fundamentals,
vol. E89-A, no. 6, pp. 1594-1600, June 2006, doi: 10.1093/ietfec/e89-a.6.1594.
Abstract: This paper presents a new statistical model-based voice activity detection (VAD) algorithm in the wavelet domain to improve the performance in non-stationary environments. Due to the efficient time-frequency localization and the multi-resolution characteristics of the wavelet representations, the wavelet transforms are quite suitable for processing non-stationary signals such as speech. To utilize the fact that the wavelet packet is very efficient approximation of discrete Fourier transform and has built-in de-noising capability, we first apply wavelet packet decomposition to effectively localize the energy in frequency space, use spectral subtraction, and employ matched filtering to enhance the SNR. Since the conventional wavelet-based spectral subtraction eliminates the low-power speech signal in onset and offset regions and generates musical noise, we derive an improved multi-band spectral subtraction. On the other hand, noticing that fixed threshold cannot follow fluctuations of time varying noise power and the inability to adapt to a time-varying environment severely limits the VAD performance, we propose a statistical model-based VAD algorithm in wavelet domain with an adaptive threshold. We perform extensive computer simulations and compare with the conventional algorithms to demonstrate performance improvement of the proposed algorithm under various noise environments.
URL: https://globals.ieice.org/en_transactions/fundamentals/10.1093/ietfec/e89-a.6.1594/_p
Copy
@ARTICLE{e89-a_6_1594,
author={Yoon-Chang LEE, Sang-Sik AHN, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Statistical Model-Based VAD Algorithm with Wavelet Transform},
year={2006},
volume={E89-A},
number={6},
pages={1594-1600},
abstract={This paper presents a new statistical model-based voice activity detection (VAD) algorithm in the wavelet domain to improve the performance in non-stationary environments. Due to the efficient time-frequency localization and the multi-resolution characteristics of the wavelet representations, the wavelet transforms are quite suitable for processing non-stationary signals such as speech. To utilize the fact that the wavelet packet is very efficient approximation of discrete Fourier transform and has built-in de-noising capability, we first apply wavelet packet decomposition to effectively localize the energy in frequency space, use spectral subtraction, and employ matched filtering to enhance the SNR. Since the conventional wavelet-based spectral subtraction eliminates the low-power speech signal in onset and offset regions and generates musical noise, we derive an improved multi-band spectral subtraction. On the other hand, noticing that fixed threshold cannot follow fluctuations of time varying noise power and the inability to adapt to a time-varying environment severely limits the VAD performance, we propose a statistical model-based VAD algorithm in wavelet domain with an adaptive threshold. We perform extensive computer simulations and compare with the conventional algorithms to demonstrate performance improvement of the proposed algorithm under various noise environments.},
keywords={},
doi={10.1093/ietfec/e89-a.6.1594},
ISSN={1745-1337},
month={June},}
Copy
TY - JOUR
TI - Statistical Model-Based VAD Algorithm with Wavelet Transform
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1594
EP - 1600
AU - Yoon-Chang LEE
AU - Sang-Sik AHN
PY - 2006
DO - 10.1093/ietfec/e89-a.6.1594
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E89-A
IS - 6
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - June 2006
AB - This paper presents a new statistical model-based voice activity detection (VAD) algorithm in the wavelet domain to improve the performance in non-stationary environments. Due to the efficient time-frequency localization and the multi-resolution characteristics of the wavelet representations, the wavelet transforms are quite suitable for processing non-stationary signals such as speech. To utilize the fact that the wavelet packet is very efficient approximation of discrete Fourier transform and has built-in de-noising capability, we first apply wavelet packet decomposition to effectively localize the energy in frequency space, use spectral subtraction, and employ matched filtering to enhance the SNR. Since the conventional wavelet-based spectral subtraction eliminates the low-power speech signal in onset and offset regions and generates musical noise, we derive an improved multi-band spectral subtraction. On the other hand, noticing that fixed threshold cannot follow fluctuations of time varying noise power and the inability to adapt to a time-varying environment severely limits the VAD performance, we propose a statistical model-based VAD algorithm in wavelet domain with an adaptive threshold. We perform extensive computer simulations and compare with the conventional algorithms to demonstrate performance improvement of the proposed algorithm under various noise environments.
ER -