Statistical Model-Based VAD Algorithm with Wavelet Transform

Yoon-Chang LEE; Sang-Sik AHN

doi:10.1093/ietfec/e89-a.6.1594

Statistical Model-Based VAD Algorithm with Wavelet Transform

Yoon-Chang LEE, Sang-Sik AHN

Full Text Views

0

Share
Cite this

Summary :

This paper presents a new statistical model-based voice activity detection (VAD) algorithm in the wavelet domain to improve the performance in non-stationary environments. Due to the efficient time-frequency localization and the multi-resolution characteristics of the wavelet representations, the wavelet transforms are quite suitable for processing non-stationary signals such as speech. To utilize the fact that the wavelet packet is very efficient approximation of discrete Fourier transform and has built-in de-noising capability, we first apply wavelet packet decomposition to effectively localize the energy in frequency space, use spectral subtraction, and employ matched filtering to enhance the SNR. Since the conventional wavelet-based spectral subtraction eliminates the low-power speech signal in onset and offset regions and generates musical noise, we derive an improved multi-band spectral subtraction. On the other hand, noticing that fixed threshold cannot follow fluctuations of time varying noise power and the inability to adapt to a time-varying environment severely limits the VAD performance, we propose a statistical model-based VAD algorithm in wavelet domain with an adaptive threshold. We perform extensive computer simulations and compare with the conventional algorithms to demonstrate performance improvement of the proposed algorithm under various noise environments.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E89-A No.6 pp.1594-1600

Publication Date: 2006/06/01

Publicized

Online ISSN: 1745-1337

DOI: 10.1093/ietfec/e89-a.6.1594

Type of Manuscript: Special Section PAPER (Special Section on Papers Selected from 2005 International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC2005))

Category

Cite this

Copy

Yoon-Chang LEE, Sang-Sik AHN, "Statistical Model-Based VAD Algorithm with Wavelet Transform" in IEICE TRANSACTIONS on Fundamentals, vol. E89-A, no. 6, pp. 1594-1600, June 2006, doi: 10.1093/ietfec/e89-a.6.1594.
Abstract: This paper presents a new statistical model-based voice activity detection (VAD) algorithm in the wavelet domain to improve the performance in non-stationary environments. Due to the efficient time-frequency localization and the multi-resolution characteristics of the wavelet representations, the wavelet transforms are quite suitable for processing non-stationary signals such as speech. To utilize the fact that the wavelet packet is very efficient approximation of discrete Fourier transform and has built-in de-noising capability, we first apply wavelet packet decomposition to effectively localize the energy in frequency space, use spectral subtraction, and employ matched filtering to enhance the SNR. Since the conventional wavelet-based spectral subtraction eliminates the low-power speech signal in onset and offset regions and generates musical noise, we derive an improved multi-band spectral subtraction. On the other hand, noticing that fixed threshold cannot follow fluctuations of time varying noise power and the inability to adapt to a time-varying environment severely limits the VAD performance, we propose a statistical model-based VAD algorithm in wavelet domain with an adaptive threshold. We perform extensive computer simulations and compare with the conventional algorithms to demonstrate performance improvement of the proposed algorithm under various noise environments.
URL: https://globals.ieice.org/en_transactions/fundamentals/10.1093/ietfec/e89-a.6.1594/_p

Copy

@ARTICLE{e89-a_6_1594,
author={Yoon-Chang LEE, Sang-Sik AHN, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Statistical Model-Based VAD Algorithm with Wavelet Transform},
year={2006},
volume={E89-A},
number={6},
pages={1594-1600},
abstract={This paper presents a new statistical model-based voice activity detection (VAD) algorithm in the wavelet domain to improve the performance in non-stationary environments. Due to the efficient time-frequency localization and the multi-resolution characteristics of the wavelet representations, the wavelet transforms are quite suitable for processing non-stationary signals such as speech. To utilize the fact that the wavelet packet is very efficient approximation of discrete Fourier transform and has built-in de-noising capability, we first apply wavelet packet decomposition to effectively localize the energy in frequency space, use spectral subtraction, and employ matched filtering to enhance the SNR. Since the conventional wavelet-based spectral subtraction eliminates the low-power speech signal in onset and offset regions and generates musical noise, we derive an improved multi-band spectral subtraction. On the other hand, noticing that fixed threshold cannot follow fluctuations of time varying noise power and the inability to adapt to a time-varying environment severely limits the VAD performance, we propose a statistical model-based VAD algorithm in wavelet domain with an adaptive threshold. We perform extensive computer simulations and compare with the conventional algorithms to demonstrate performance improvement of the proposed algorithm under various noise environments.},
keywords={},
doi={10.1093/ietfec/e89-a.6.1594},
ISSN={1745-1337},
month={June},}

Copy

TY - JOUR
TI - Statistical Model-Based VAD Algorithm with Wavelet Transform
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1594
EP - 1600
AU - Yoon-Chang LEE
AU - Sang-Sik AHN
PY - 2006
DO - 10.1093/ietfec/e89-a.6.1594
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E89-A
IS - 6
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - June 2006
AB - This paper presents a new statistical model-based voice activity detection (VAD) algorithm in the wavelet domain to improve the performance in non-stationary environments. Due to the efficient time-frequency localization and the multi-resolution characteristics of the wavelet representations, the wavelet transforms are quite suitable for processing non-stationary signals such as speech. To utilize the fact that the wavelet packet is very efficient approximation of discrete Fourier transform and has built-in de-noising capability, we first apply wavelet packet decomposition to effectively localize the energy in frequency space, use spectral subtraction, and employ matched filtering to enhance the SNR. Since the conventional wavelet-based spectral subtraction eliminates the low-power speech signal in onset and offset regions and generates musical noise, we derive an improved multi-band spectral subtraction. On the other hand, noticing that fixed threshold cannot follow fluctuations of time varying noise power and the inability to adapt to a time-varying environment severely limits the VAD performance, we propose a statistical model-based VAD algorithm in wavelet domain with an adaptive threshold. We perform extensive computer simulations and compare with the conventional algorithms to demonstrate performance improvement of the proposed algorithm under various noise environments.
ER -