We propose an active learning framework for speech recognition that reduces the amount of data required for acoustic modeling. This framework consists of two steps. We first obtain a phone-error distribution using an acoustic model estimated from transcribed speech data. Then, from a text corpus we select a sentence whose phone-occurrence distribution is close to the phone-error distribution and collect its speech data. We repeat this process to increase the amount of transcribed speech data. We applied this framework to speaker adaptation and acoustic model training. Our evaluation results showed that it significantly reduced the amount of transcribed data while maintaining the same level of accuracy.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Hiroko MURAKAMI, Koichi SHINODA, Sadaoki FURUI, "Active Learning Using Phone-Error Distribution for Speech Modeling" in IEICE TRANSACTIONS on Information,
vol. E95-D, no. 10, pp. 2486-2494, October 2012, doi: 10.1587/transinf.E95.D.2486.
Abstract: We propose an active learning framework for speech recognition that reduces the amount of data required for acoustic modeling. This framework consists of two steps. We first obtain a phone-error distribution using an acoustic model estimated from transcribed speech data. Then, from a text corpus we select a sentence whose phone-occurrence distribution is close to the phone-error distribution and collect its speech data. We repeat this process to increase the amount of transcribed speech data. We applied this framework to speaker adaptation and acoustic model training. Our evaluation results showed that it significantly reduced the amount of transcribed data while maintaining the same level of accuracy.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.E95.D.2486/_p
Copy
@ARTICLE{e95-d_10_2486,
author={Hiroko MURAKAMI, Koichi SHINODA, Sadaoki FURUI, },
journal={IEICE TRANSACTIONS on Information},
title={Active Learning Using Phone-Error Distribution for Speech Modeling},
year={2012},
volume={E95-D},
number={10},
pages={2486-2494},
abstract={We propose an active learning framework for speech recognition that reduces the amount of data required for acoustic modeling. This framework consists of two steps. We first obtain a phone-error distribution using an acoustic model estimated from transcribed speech data. Then, from a text corpus we select a sentence whose phone-occurrence distribution is close to the phone-error distribution and collect its speech data. We repeat this process to increase the amount of transcribed speech data. We applied this framework to speaker adaptation and acoustic model training. Our evaluation results showed that it significantly reduced the amount of transcribed data while maintaining the same level of accuracy.},
keywords={},
doi={10.1587/transinf.E95.D.2486},
ISSN={1745-1361},
month={October},}
Copy
TY - JOUR
TI - Active Learning Using Phone-Error Distribution for Speech Modeling
T2 - IEICE TRANSACTIONS on Information
SP - 2486
EP - 2494
AU - Hiroko MURAKAMI
AU - Koichi SHINODA
AU - Sadaoki FURUI
PY - 2012
DO - 10.1587/transinf.E95.D.2486
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E95-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2012
AB - We propose an active learning framework for speech recognition that reduces the amount of data required for acoustic modeling. This framework consists of two steps. We first obtain a phone-error distribution using an acoustic model estimated from transcribed speech data. Then, from a text corpus we select a sentence whose phone-occurrence distribution is close to the phone-error distribution and collect its speech data. We repeat this process to increase the amount of transcribed speech data. We applied this framework to speaker adaptation and acoustic model training. Our evaluation results showed that it significantly reduced the amount of transcribed data while maintaining the same level of accuracy.
ER -