The paper addresses a scheme of lightly supervised training of an acoustic model, which exploits a large amount of data with closed caption texts but not faithful transcripts. In the proposed scheme, a sequence of the closed caption text and that of the ASR hypothesis by the baseline system are aligned. Then, a set of dedicated classifiers is designed and trained to select the correct one among them or reject both. It is demonstrated that the classifiers can effectively filter the usable data for acoustic model training. The scheme realizes automatic training of the acoustic model with an increased amount of data. A significant improvement in the ASR accuracy is achieved from the baseline system and also in comparison with the conventional method of lightly supervised training based on simple matching.
Sheng LI
Kyoto University
Yuya AKITA
Kyoto University
Tatsuya KAWAHARA
Kyoto University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Sheng LI, Yuya AKITA, Tatsuya KAWAHARA, "Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training" in IEICE TRANSACTIONS on Information,
vol. E98-D, no. 8, pp. 1545-1552, August 2015, doi: 10.1587/transinf.2015EDP7047.
Abstract: The paper addresses a scheme of lightly supervised training of an acoustic model, which exploits a large amount of data with closed caption texts but not faithful transcripts. In the proposed scheme, a sequence of the closed caption text and that of the ASR hypothesis by the baseline system are aligned. Then, a set of dedicated classifiers is designed and trained to select the correct one among them or reject both. It is demonstrated that the classifiers can effectively filter the usable data for acoustic model training. The scheme realizes automatic training of the acoustic model with an increased amount of data. A significant improvement in the ASR accuracy is achieved from the baseline system and also in comparison with the conventional method of lightly supervised training based on simple matching.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.2015EDP7047/_p
Copy
@ARTICLE{e98-d_8_1545,
author={Sheng LI, Yuya AKITA, Tatsuya KAWAHARA, },
journal={IEICE TRANSACTIONS on Information},
title={Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training},
year={2015},
volume={E98-D},
number={8},
pages={1545-1552},
abstract={The paper addresses a scheme of lightly supervised training of an acoustic model, which exploits a large amount of data with closed caption texts but not faithful transcripts. In the proposed scheme, a sequence of the closed caption text and that of the ASR hypothesis by the baseline system are aligned. Then, a set of dedicated classifiers is designed and trained to select the correct one among them or reject both. It is demonstrated that the classifiers can effectively filter the usable data for acoustic model training. The scheme realizes automatic training of the acoustic model with an increased amount of data. A significant improvement in the ASR accuracy is achieved from the baseline system and also in comparison with the conventional method of lightly supervised training based on simple matching.},
keywords={},
doi={10.1587/transinf.2015EDP7047},
ISSN={1745-1361},
month={August},}
Copy
TY - JOUR
TI - Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training
T2 - IEICE TRANSACTIONS on Information
SP - 1545
EP - 1552
AU - Sheng LI
AU - Yuya AKITA
AU - Tatsuya KAWAHARA
PY - 2015
DO - 10.1587/transinf.2015EDP7047
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E98-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2015
AB - The paper addresses a scheme of lightly supervised training of an acoustic model, which exploits a large amount of data with closed caption texts but not faithful transcripts. In the proposed scheme, a sequence of the closed caption text and that of the ASR hypothesis by the baseline system are aligned. Then, a set of dedicated classifiers is designed and trained to select the correct one among them or reject both. It is demonstrated that the classifiers can effectively filter the usable data for acoustic model training. The scheme realizes automatic training of the acoustic model with an increased amount of data. A significant improvement in the ASR accuracy is achieved from the baseline system and also in comparison with the conventional method of lightly supervised training based on simple matching.
ER -