In this letter, we explore joint optimization of perceptual gain function and deep neural networks (DNNs) for a single-channel speech enhancement task. A DNN architecture is proposed which incorporates the masking properties of the human auditory system to make the residual noise inaudible. This new DNN architecture directly trains a perceptual gain function which is used to estimate the magnitude spectrum of clean speech from noisy speech features. Experimental results demonstrate that the proposed speech enhancement approach can achieve significant improvements over the baselines when tested with TIMIT sentences corrupted by various types of noise, no matter whether the noise conditions are included in the training set or not.
Wei HAN
PLA University of Science and Technology
Xiongwei ZHANG
PLA University of Science and Technology
Gang MIN
PLA University of Science and Technology
Xingyu ZHOU
PLA University of Science and Technology
Meng SUN
PLA University of Science and Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Wei HAN, Xiongwei ZHANG, Gang MIN, Xingyu ZHOU, Meng SUN, "Joint Optimization of Perceptual Gain Function and Deep Neural Networks for Single-Channel Speech Enhancement" in IEICE TRANSACTIONS on Fundamentals,
vol. E100-A, no. 2, pp. 714-717, February 2017, doi: 10.1587/transfun.E100.A.714.
Abstract: In this letter, we explore joint optimization of perceptual gain function and deep neural networks (DNNs) for a single-channel speech enhancement task. A DNN architecture is proposed which incorporates the masking properties of the human auditory system to make the residual noise inaudible. This new DNN architecture directly trains a perceptual gain function which is used to estimate the magnitude spectrum of clean speech from noisy speech features. Experimental results demonstrate that the proposed speech enhancement approach can achieve significant improvements over the baselines when tested with TIMIT sentences corrupted by various types of noise, no matter whether the noise conditions are included in the training set or not.
URL: https://globals.ieice.org/en_transactions/fundamentals/10.1587/transfun.E100.A.714/_p
Copy
@ARTICLE{e100-a_2_714,
author={Wei HAN, Xiongwei ZHANG, Gang MIN, Xingyu ZHOU, Meng SUN, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Joint Optimization of Perceptual Gain Function and Deep Neural Networks for Single-Channel Speech Enhancement},
year={2017},
volume={E100-A},
number={2},
pages={714-717},
abstract={In this letter, we explore joint optimization of perceptual gain function and deep neural networks (DNNs) for a single-channel speech enhancement task. A DNN architecture is proposed which incorporates the masking properties of the human auditory system to make the residual noise inaudible. This new DNN architecture directly trains a perceptual gain function which is used to estimate the magnitude spectrum of clean speech from noisy speech features. Experimental results demonstrate that the proposed speech enhancement approach can achieve significant improvements over the baselines when tested with TIMIT sentences corrupted by various types of noise, no matter whether the noise conditions are included in the training set or not.},
keywords={},
doi={10.1587/transfun.E100.A.714},
ISSN={1745-1337},
month={February},}
Copy
TY - JOUR
TI - Joint Optimization of Perceptual Gain Function and Deep Neural Networks for Single-Channel Speech Enhancement
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 714
EP - 717
AU - Wei HAN
AU - Xiongwei ZHANG
AU - Gang MIN
AU - Xingyu ZHOU
AU - Meng SUN
PY - 2017
DO - 10.1587/transfun.E100.A.714
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E100-A
IS - 2
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - February 2017
AB - In this letter, we explore joint optimization of perceptual gain function and deep neural networks (DNNs) for a single-channel speech enhancement task. A DNN architecture is proposed which incorporates the masking properties of the human auditory system to make the residual noise inaudible. This new DNN architecture directly trains a perceptual gain function which is used to estimate the magnitude spectrum of clean speech from noisy speech features. Experimental results demonstrate that the proposed speech enhancement approach can achieve significant improvements over the baselines when tested with TIMIT sentences corrupted by various types of noise, no matter whether the noise conditions are included in the training set or not.
ER -