Recently, notable improvements in voice activity detection (VAD) problem have been achieved by adopting several machine learning techniques. Among them, the deep neural network (DNN) which learns the mapping between the noisy speech features and the corresponding voice activity status with its deep hidden structure has been one of the most popular techniques. In this letter, we propose a novel approach which enhances the robustness of DNN in mismatched noise conditions with multi-task learning (MTL) framework. In the proposed algorithm, a feature enhancement task for speech features is jointly trained with the conventional VAD task. The experimental results show that the DNN with the proposed framework outperforms the conventional DNN-based VAD algorithm.
Tae Gyoon KANG
Seoul National University
Nam Soo KIM
Seoul National University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Tae Gyoon KANG, Nam Soo KIM, "DNN-Based Voice Activity Detection with Multi-Task Learning" in IEICE TRANSACTIONS on Information,
vol. E99-D, no. 2, pp. 550-553, February 2016, doi: 10.1587/transinf.2015EDL8168.
Abstract: Recently, notable improvements in voice activity detection (VAD) problem have been achieved by adopting several machine learning techniques. Among them, the deep neural network (DNN) which learns the mapping between the noisy speech features and the corresponding voice activity status with its deep hidden structure has been one of the most popular techniques. In this letter, we propose a novel approach which enhances the robustness of DNN in mismatched noise conditions with multi-task learning (MTL) framework. In the proposed algorithm, a feature enhancement task for speech features is jointly trained with the conventional VAD task. The experimental results show that the DNN with the proposed framework outperforms the conventional DNN-based VAD algorithm.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.2015EDL8168/_p
Copy
@ARTICLE{e99-d_2_550,
author={Tae Gyoon KANG, Nam Soo KIM, },
journal={IEICE TRANSACTIONS on Information},
title={DNN-Based Voice Activity Detection with Multi-Task Learning},
year={2016},
volume={E99-D},
number={2},
pages={550-553},
abstract={Recently, notable improvements in voice activity detection (VAD) problem have been achieved by adopting several machine learning techniques. Among them, the deep neural network (DNN) which learns the mapping between the noisy speech features and the corresponding voice activity status with its deep hidden structure has been one of the most popular techniques. In this letter, we propose a novel approach which enhances the robustness of DNN in mismatched noise conditions with multi-task learning (MTL) framework. In the proposed algorithm, a feature enhancement task for speech features is jointly trained with the conventional VAD task. The experimental results show that the DNN with the proposed framework outperforms the conventional DNN-based VAD algorithm.},
keywords={},
doi={10.1587/transinf.2015EDL8168},
ISSN={1745-1361},
month={February},}
Copy
TY - JOUR
TI - DNN-Based Voice Activity Detection with Multi-Task Learning
T2 - IEICE TRANSACTIONS on Information
SP - 550
EP - 553
AU - Tae Gyoon KANG
AU - Nam Soo KIM
PY - 2016
DO - 10.1587/transinf.2015EDL8168
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2016
AB - Recently, notable improvements in voice activity detection (VAD) problem have been achieved by adopting several machine learning techniques. Among them, the deep neural network (DNN) which learns the mapping between the noisy speech features and the corresponding voice activity status with its deep hidden structure has been one of the most popular techniques. In this letter, we propose a novel approach which enhances the robustness of DNN in mismatched noise conditions with multi-task learning (MTL) framework. In the proposed algorithm, a feature enhancement task for speech features is jointly trained with the conventional VAD task. The experimental results show that the DNN with the proposed framework outperforms the conventional DNN-based VAD algorithm.
ER -