Morphemes, which are obtained from morphological parsing, and statistical sub-words, which are derived from data-driven splitting, are commonly used as the recognition units for speech recognition of agglutinative languages. In this letter, we propose a discriminative approach to select the splitting result, which is more likely to improve the recognizer's performance, for each distinct word type. An objective function which involves the unigram language model (LM) probability and the count of misrecognized phones on the acoustic training data is defined and minimized. After determining the splitting result for each word in the text corpus, we select the frequent units to build a hybrid vocabulary including morphemes and statistical sub-words. Compared to a statistical sub-word based system, the hybrid system achieves 0.8% letter error rates (LERs) reduction on the test set.
Xin LI
Chinese Academy of Sciences
Jielin PAN
Chinese Academy of Sciences
Qingwei ZHAO
Chinese Academy of Sciences
Yonghong YAN
Chinese Academy of Sciences
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Xin LI, Jielin PAN, Qingwei ZHAO, Yonghong YAN, "Discriminative Approach to Build Hybrid Vocabulary for Conversational Telephone Speech Recognition of Agglutinative Languages" in IEICE TRANSACTIONS on Information,
vol. E96-D, no. 11, pp. 2478-2482, November 2013, doi: 10.1587/transinf.E96.D.2478.
Abstract: Morphemes, which are obtained from morphological parsing, and statistical sub-words, which are derived from data-driven splitting, are commonly used as the recognition units for speech recognition of agglutinative languages. In this letter, we propose a discriminative approach to select the splitting result, which is more likely to improve the recognizer's performance, for each distinct word type. An objective function which involves the unigram language model (LM) probability and the count of misrecognized phones on the acoustic training data is defined and minimized. After determining the splitting result for each word in the text corpus, we select the frequent units to build a hybrid vocabulary including morphemes and statistical sub-words. Compared to a statistical sub-word based system, the hybrid system achieves 0.8% letter error rates (LERs) reduction on the test set.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.E96.D.2478/_p
Copy
@ARTICLE{e96-d_11_2478,
author={Xin LI, Jielin PAN, Qingwei ZHAO, Yonghong YAN, },
journal={IEICE TRANSACTIONS on Information},
title={Discriminative Approach to Build Hybrid Vocabulary for Conversational Telephone Speech Recognition of Agglutinative Languages},
year={2013},
volume={E96-D},
number={11},
pages={2478-2482},
abstract={Morphemes, which are obtained from morphological parsing, and statistical sub-words, which are derived from data-driven splitting, are commonly used as the recognition units for speech recognition of agglutinative languages. In this letter, we propose a discriminative approach to select the splitting result, which is more likely to improve the recognizer's performance, for each distinct word type. An objective function which involves the unigram language model (LM) probability and the count of misrecognized phones on the acoustic training data is defined and minimized. After determining the splitting result for each word in the text corpus, we select the frequent units to build a hybrid vocabulary including morphemes and statistical sub-words. Compared to a statistical sub-word based system, the hybrid system achieves 0.8% letter error rates (LERs) reduction on the test set.},
keywords={},
doi={10.1587/transinf.E96.D.2478},
ISSN={1745-1361},
month={November},}
Copy
TY - JOUR
TI - Discriminative Approach to Build Hybrid Vocabulary for Conversational Telephone Speech Recognition of Agglutinative Languages
T2 - IEICE TRANSACTIONS on Information
SP - 2478
EP - 2482
AU - Xin LI
AU - Jielin PAN
AU - Qingwei ZHAO
AU - Yonghong YAN
PY - 2013
DO - 10.1587/transinf.E96.D.2478
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E96-D
IS - 11
JA - IEICE TRANSACTIONS on Information
Y1 - November 2013
AB - Morphemes, which are obtained from morphological parsing, and statistical sub-words, which are derived from data-driven splitting, are commonly used as the recognition units for speech recognition of agglutinative languages. In this letter, we propose a discriminative approach to select the splitting result, which is more likely to improve the recognizer's performance, for each distinct word type. An objective function which involves the unigram language model (LM) probability and the count of misrecognized phones on the acoustic training data is defined and minimized. After determining the splitting result for each word in the text corpus, we select the frequent units to build a hybrid vocabulary including morphemes and statistical sub-words. Compared to a statistical sub-word based system, the hybrid system achieves 0.8% letter error rates (LERs) reduction on the test set.
ER -