This paper proposes a deep learning-based non-intrusive objective speech intelligibility estimation method based on recurrent neural network (RNN) with long short-term memory (LSTM) structure. Conventional non-intrusive estimation methods such as standard P.563 have poor estimation performance and lack of consistency, especially, in various noise and reverberation environments. The proposed method trains the LSTM RNN model parameters by utilizing the STOI that is the standard intrusive intelligibility estimation method with reference speech signal. The input and output of the LSTM RNN are the MFCC vector and the frame-wise STOI value, respectively. Experimental results show that the proposed objective intelligibility estimation method outperforms the conventional standard P.563 in various noisy and reverberant environments.
Deokgyu YUN
Seoul National University of Science and Technology
Hannah LEE
Seoul National University of Science and Technology
Seung Ho CHOI
Seoul National University of Science and Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Deokgyu YUN, Hannah LEE, Seung Ho CHOI, "A Deep Learning-Based Approach to Non-Intrusive Objective Speech Intelligibility Estimation" in IEICE TRANSACTIONS on Information,
vol. E101-D, no. 4, pp. 1207-1208, April 2018, doi: 10.1587/transinf.2017EDL8225.
Abstract: This paper proposes a deep learning-based non-intrusive objective speech intelligibility estimation method based on recurrent neural network (RNN) with long short-term memory (LSTM) structure. Conventional non-intrusive estimation methods such as standard P.563 have poor estimation performance and lack of consistency, especially, in various noise and reverberation environments. The proposed method trains the LSTM RNN model parameters by utilizing the STOI that is the standard intrusive intelligibility estimation method with reference speech signal. The input and output of the LSTM RNN are the MFCC vector and the frame-wise STOI value, respectively. Experimental results show that the proposed objective intelligibility estimation method outperforms the conventional standard P.563 in various noisy and reverberant environments.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.2017EDL8225/_p
Copy
@ARTICLE{e101-d_4_1207,
author={Deokgyu YUN, Hannah LEE, Seung Ho CHOI, },
journal={IEICE TRANSACTIONS on Information},
title={A Deep Learning-Based Approach to Non-Intrusive Objective Speech Intelligibility Estimation},
year={2018},
volume={E101-D},
number={4},
pages={1207-1208},
abstract={This paper proposes a deep learning-based non-intrusive objective speech intelligibility estimation method based on recurrent neural network (RNN) with long short-term memory (LSTM) structure. Conventional non-intrusive estimation methods such as standard P.563 have poor estimation performance and lack of consistency, especially, in various noise and reverberation environments. The proposed method trains the LSTM RNN model parameters by utilizing the STOI that is the standard intrusive intelligibility estimation method with reference speech signal. The input and output of the LSTM RNN are the MFCC vector and the frame-wise STOI value, respectively. Experimental results show that the proposed objective intelligibility estimation method outperforms the conventional standard P.563 in various noisy and reverberant environments.},
keywords={},
doi={10.1587/transinf.2017EDL8225},
ISSN={1745-1361},
month={April},}
Copy
TY - JOUR
TI - A Deep Learning-Based Approach to Non-Intrusive Objective Speech Intelligibility Estimation
T2 - IEICE TRANSACTIONS on Information
SP - 1207
EP - 1208
AU - Deokgyu YUN
AU - Hannah LEE
AU - Seung Ho CHOI
PY - 2018
DO - 10.1587/transinf.2017EDL8225
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2018
AB - This paper proposes a deep learning-based non-intrusive objective speech intelligibility estimation method based on recurrent neural network (RNN) with long short-term memory (LSTM) structure. Conventional non-intrusive estimation methods such as standard P.563 have poor estimation performance and lack of consistency, especially, in various noise and reverberation environments. The proposed method trains the LSTM RNN model parameters by utilizing the STOI that is the standard intrusive intelligibility estimation method with reference speech signal. The input and output of the LSTM RNN are the MFCC vector and the frame-wise STOI value, respectively. Experimental results show that the proposed objective intelligibility estimation method outperforms the conventional standard P.563 in various noisy and reverberant environments.
ER -