In this letter, we propose a novel speech separation method based on perceptual weighted deep recurrent neural network (DRNN) which incorporate the masking properties of the human auditory system. In supervised training stage, we firstly utilize the clean label speech of two different speakers to calculate two perceptual weighting matrices. Then, the obtained different perceptual weighting matrices are utilized to adjust the mean squared error between the network outputs and the reference features of both the two clean speech so that the two different speech can mask each other. Experimental results on TSP speech corpus demonstrate that the proposed speech separation approach can achieve significant improvements over the state-of-the-art methods when tested with different mixing cases.
Wei HAN
PLA University of Science and Technology
Xiongwei ZHANG
PLA University of Science and Technology
Meng SUN
PLA University of Science and Technology
Li LI
PLA University of Science and Technology
Wenhua SHI
PLA University of Science and Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Wei HAN, Xiongwei ZHANG, Meng SUN, Li LI, Wenhua SHI, "An Improved Supervised Speech Separation Method Based on Perceptual Weighted Deep Recurrent Neural Networks" in IEICE TRANSACTIONS on Fundamentals,
vol. E100-A, no. 2, pp. 718-721, February 2017, doi: 10.1587/transfun.E100.A.718.
Abstract: In this letter, we propose a novel speech separation method based on perceptual weighted deep recurrent neural network (DRNN) which incorporate the masking properties of the human auditory system. In supervised training stage, we firstly utilize the clean label speech of two different speakers to calculate two perceptual weighting matrices. Then, the obtained different perceptual weighting matrices are utilized to adjust the mean squared error between the network outputs and the reference features of both the two clean speech so that the two different speech can mask each other. Experimental results on TSP speech corpus demonstrate that the proposed speech separation approach can achieve significant improvements over the state-of-the-art methods when tested with different mixing cases.
URL: https://globals.ieice.org/en_transactions/fundamentals/10.1587/transfun.E100.A.718/_p
Copy
@ARTICLE{e100-a_2_718,
author={Wei HAN, Xiongwei ZHANG, Meng SUN, Li LI, Wenhua SHI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={An Improved Supervised Speech Separation Method Based on Perceptual Weighted Deep Recurrent Neural Networks},
year={2017},
volume={E100-A},
number={2},
pages={718-721},
abstract={In this letter, we propose a novel speech separation method based on perceptual weighted deep recurrent neural network (DRNN) which incorporate the masking properties of the human auditory system. In supervised training stage, we firstly utilize the clean label speech of two different speakers to calculate two perceptual weighting matrices. Then, the obtained different perceptual weighting matrices are utilized to adjust the mean squared error between the network outputs and the reference features of both the two clean speech so that the two different speech can mask each other. Experimental results on TSP speech corpus demonstrate that the proposed speech separation approach can achieve significant improvements over the state-of-the-art methods when tested with different mixing cases.},
keywords={},
doi={10.1587/transfun.E100.A.718},
ISSN={1745-1337},
month={February},}
Copy
TY - JOUR
TI - An Improved Supervised Speech Separation Method Based on Perceptual Weighted Deep Recurrent Neural Networks
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 718
EP - 721
AU - Wei HAN
AU - Xiongwei ZHANG
AU - Meng SUN
AU - Li LI
AU - Wenhua SHI
PY - 2017
DO - 10.1587/transfun.E100.A.718
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E100-A
IS - 2
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - February 2017
AB - In this letter, we propose a novel speech separation method based on perceptual weighted deep recurrent neural network (DRNN) which incorporate the masking properties of the human auditory system. In supervised training stage, we firstly utilize the clean label speech of two different speakers to calculate two perceptual weighting matrices. Then, the obtained different perceptual weighting matrices are utilized to adjust the mean squared error between the network outputs and the reference features of both the two clean speech so that the two different speech can mask each other. Experimental results on TSP speech corpus demonstrate that the proposed speech separation approach can achieve significant improvements over the state-of-the-art methods when tested with different mixing cases.
ER -