In this letter, we propose a novel supervised pre-training technique for deep neural network (DNN)-hidden Markov model systems to achieve robust speech recognition in adverse environments. In the proposed approach, our aim is to initialize the DNN parameters such that they yield abstract features robust to acoustic environment variations. In order to achieve this, we first derive the abstract features from an early fine-tuned DNN model which is trained based on a clean speech database. By using the derived abstract features as the target values, the standard error back-propagation algorithm with the stochastic gradient descent method is performed to estimate the initial parameters of the DNN. The performance of the proposed algorithm was evaluated on Aurora-4 DB, and better results were observed compared to a number of conventional pre-training methods.
Shin Jae KANG
Seoul National University
Kang Hyun LEE
Seoul National University
Nam Soo KIM
Seoul National University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Shin Jae KANG, Kang Hyun LEE, Nam Soo KIM, "Supervised Denoising Pre-Training for Robust ASR with DNN-HMM" in IEICE TRANSACTIONS on Information,
vol. E98-D, no. 12, pp. 2345-2348, December 2015, doi: 10.1587/transinf.2015EDL8118.
Abstract: In this letter, we propose a novel supervised pre-training technique for deep neural network (DNN)-hidden Markov model systems to achieve robust speech recognition in adverse environments. In the proposed approach, our aim is to initialize the DNN parameters such that they yield abstract features robust to acoustic environment variations. In order to achieve this, we first derive the abstract features from an early fine-tuned DNN model which is trained based on a clean speech database. By using the derived abstract features as the target values, the standard error back-propagation algorithm with the stochastic gradient descent method is performed to estimate the initial parameters of the DNN. The performance of the proposed algorithm was evaluated on Aurora-4 DB, and better results were observed compared to a number of conventional pre-training methods.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.2015EDL8118/_p
Copy
@ARTICLE{e98-d_12_2345,
author={Shin Jae KANG, Kang Hyun LEE, Nam Soo KIM, },
journal={IEICE TRANSACTIONS on Information},
title={Supervised Denoising Pre-Training for Robust ASR with DNN-HMM},
year={2015},
volume={E98-D},
number={12},
pages={2345-2348},
abstract={In this letter, we propose a novel supervised pre-training technique for deep neural network (DNN)-hidden Markov model systems to achieve robust speech recognition in adverse environments. In the proposed approach, our aim is to initialize the DNN parameters such that they yield abstract features robust to acoustic environment variations. In order to achieve this, we first derive the abstract features from an early fine-tuned DNN model which is trained based on a clean speech database. By using the derived abstract features as the target values, the standard error back-propagation algorithm with the stochastic gradient descent method is performed to estimate the initial parameters of the DNN. The performance of the proposed algorithm was evaluated on Aurora-4 DB, and better results were observed compared to a number of conventional pre-training methods.},
keywords={},
doi={10.1587/transinf.2015EDL8118},
ISSN={1745-1361},
month={December},}
Copy
TY - JOUR
TI - Supervised Denoising Pre-Training for Robust ASR with DNN-HMM
T2 - IEICE TRANSACTIONS on Information
SP - 2345
EP - 2348
AU - Shin Jae KANG
AU - Kang Hyun LEE
AU - Nam Soo KIM
PY - 2015
DO - 10.1587/transinf.2015EDL8118
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E98-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2015
AB - In this letter, we propose a novel supervised pre-training technique for deep neural network (DNN)-hidden Markov model systems to achieve robust speech recognition in adverse environments. In the proposed approach, our aim is to initialize the DNN parameters such that they yield abstract features robust to acoustic environment variations. In order to achieve this, we first derive the abstract features from an early fine-tuned DNN model which is trained based on a clean speech database. By using the derived abstract features as the target values, the standard error back-propagation algorithm with the stochastic gradient descent method is performed to estimate the initial parameters of the DNN. The performance of the proposed algorithm was evaluated on Aurora-4 DB, and better results were observed compared to a number of conventional pre-training methods.
ER -