Semantic concept in an utterance is obtained by a fuzzy matching methods to solve problems such as words' variation induced by automatic speech recognition (ASR), or missing field of key information by users in the process of spoken language understanding (SLU). A two-stage method is proposed: first, we adopt conditional random field (CRF) for building probabilistic models to segment and label entity names from an input sentence. Second, fuzzy matching based on similarity function is conducted between the named entities labeled by a CRF model and the reference characters of a dictionary. The experiments compare the performances in terms of accuracy and processing speed. Dice similarity and cosine similarity based on TF score can achieve better accuracy performance among four similarity measures, which equal to and greater than 93% in F1-measure. Especially the latter one improved by 8.8% and 9% respectively compared to q-gram and improved edit-distance, which are two conventional methods for string fuzzy matching.
Yanling LI
Chinese Academy of Sciences,Inner Mongolia Normal University
Qingwei ZHAO
Chinese Academy of Sciences
Yonghong YAN
Chinese Academy of Sciences
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Yanling LI, Qingwei ZHAO, Yonghong YAN, "Fuzzy Matching of Semantic Class in Chinese Spoken Language Understanding" in IEICE TRANSACTIONS on Information,
vol. E96-D, no. 8, pp. 1845-1852, August 2013, doi: 10.1587/transinf.E96.D.1845.
Abstract: Semantic concept in an utterance is obtained by a fuzzy matching methods to solve problems such as words' variation induced by automatic speech recognition (ASR), or missing field of key information by users in the process of spoken language understanding (SLU). A two-stage method is proposed: first, we adopt conditional random field (CRF) for building probabilistic models to segment and label entity names from an input sentence. Second, fuzzy matching based on similarity function is conducted between the named entities labeled by a CRF model and the reference characters of a dictionary. The experiments compare the performances in terms of accuracy and processing speed. Dice similarity and cosine similarity based on TF score can achieve better accuracy performance among four similarity measures, which equal to and greater than 93% in F1-measure. Especially the latter one improved by 8.8% and 9% respectively compared to q-gram and improved edit-distance, which are two conventional methods for string fuzzy matching.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.E96.D.1845/_p
Copy
@ARTICLE{e96-d_8_1845,
author={Yanling LI, Qingwei ZHAO, Yonghong YAN, },
journal={IEICE TRANSACTIONS on Information},
title={Fuzzy Matching of Semantic Class in Chinese Spoken Language Understanding},
year={2013},
volume={E96-D},
number={8},
pages={1845-1852},
abstract={Semantic concept in an utterance is obtained by a fuzzy matching methods to solve problems such as words' variation induced by automatic speech recognition (ASR), or missing field of key information by users in the process of spoken language understanding (SLU). A two-stage method is proposed: first, we adopt conditional random field (CRF) for building probabilistic models to segment and label entity names from an input sentence. Second, fuzzy matching based on similarity function is conducted between the named entities labeled by a CRF model and the reference characters of a dictionary. The experiments compare the performances in terms of accuracy and processing speed. Dice similarity and cosine similarity based on TF score can achieve better accuracy performance among four similarity measures, which equal to and greater than 93% in F1-measure. Especially the latter one improved by 8.8% and 9% respectively compared to q-gram and improved edit-distance, which are two conventional methods for string fuzzy matching.},
keywords={},
doi={10.1587/transinf.E96.D.1845},
ISSN={1745-1361},
month={August},}
Copy
TY - JOUR
TI - Fuzzy Matching of Semantic Class in Chinese Spoken Language Understanding
T2 - IEICE TRANSACTIONS on Information
SP - 1845
EP - 1852
AU - Yanling LI
AU - Qingwei ZHAO
AU - Yonghong YAN
PY - 2013
DO - 10.1587/transinf.E96.D.1845
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E96-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2013
AB - Semantic concept in an utterance is obtained by a fuzzy matching methods to solve problems such as words' variation induced by automatic speech recognition (ASR), or missing field of key information by users in the process of spoken language understanding (SLU). A two-stage method is proposed: first, we adopt conditional random field (CRF) for building probabilistic models to segment and label entity names from an input sentence. Second, fuzzy matching based on similarity function is conducted between the named entities labeled by a CRF model and the reference characters of a dictionary. The experiments compare the performances in terms of accuracy and processing speed. Dice similarity and cosine similarity based on TF score can achieve better accuracy performance among four similarity measures, which equal to and greater than 93% in F1-measure. Especially the latter one improved by 8.8% and 9% respectively compared to q-gram and improved edit-distance, which are two conventional methods for string fuzzy matching.
ER -