Error Correction Using Long Context Match for Smartphone Speech Recognition

Yuan LIANG; Koji IWANO; Koichi SHINODA

doi:10.1587/transinf.2015EDP7179

Error Correction Using Long Context Match for Smartphone Speech Recognition

Yuan LIANG, Koji IWANO, Koichi SHINODA

Full Text Views

0

Share
Cite this

Summary :

Most error correction interfaces for speech recognition applications on smartphones require the user to first mark an error region and choose the correct word from a candidate list. We propose a simple multimodal interface to make the process more efficient. We develop Long Context Match (LCM) to get candidates that complement the conventional word confusion network (WCN). Assuming that not only the preceding words but also the succeeding words of the error region are validated by users, we use such contexts to search higher-order n-grams corpora for matching word sequences. For this purpose, we also utilize the Web text data. Furthermore, we propose a combination of LCM and WCN (“LCM + WCN”) to provide users with candidate lists that are more relevant than those yielded by WCN alone. We compare our interface with the WCN-based interface on the Corpus of Spontaneous Japanese (CSJ). Our proposed “LCM + WCN” method improved the 1-best accuracy by 23%, improved the Mean Reciprocal Rank (MRR) by 28%, and our interface reduced the user's load by 12%.

Publication: IEICE TRANSACTIONS on Information Vol.E98-D No.11 pp.1932-1942

Publication Date: 2015/11/01

Publicized: 2015/07/31

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2015EDP7179

Type of Manuscript: PAPER

Category: Speech and Hearing

Authors

Yuan LIANG
  Tokyo Institute of Technology
Koji IWANO
  Tokyo City University
Koichi SHINODA
  Tokyo Institute of Technology

Keyword

speech recognition, error correction, multimodal interface, word confusion network, context match

Cite this

Copy

Yuan LIANG, Koji IWANO, Koichi SHINODA, "Error Correction Using Long Context Match for Smartphone Speech Recognition" in IEICE TRANSACTIONS on Information, vol. E98-D, no. 11, pp. 1932-1942, November 2015, doi: 10.1587/transinf.2015EDP7179.
Abstract: Most error correction interfaces for speech recognition applications on smartphones require the user to first mark an error region and choose the correct word from a candidate list. We propose a simple multimodal interface to make the process more efficient. We develop Long Context Match (LCM) to get candidates that complement the conventional word confusion network (WCN). Assuming that not only the preceding words but also the succeeding words of the error region are validated by users, we use such contexts to search higher-order n-grams corpora for matching word sequences. For this purpose, we also utilize the Web text data. Furthermore, we propose a combination of LCM and WCN (“LCM + WCN”) to provide users with candidate lists that are more relevant than those yielded by WCN alone. We compare our interface with the WCN-based interface on the Corpus of Spontaneous Japanese (CSJ). Our proposed “LCM + WCN” method improved the 1-best accuracy by 23%, improved the Mean Reciprocal Rank (MRR) by 28%, and our interface reduced the user's load by 12%.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.2015EDP7179/_p

Copy

@ARTICLE{e98-d_11_1932,
author={Yuan LIANG, Koji IWANO, Koichi SHINODA, },
journal={IEICE TRANSACTIONS on Information},
title={Error Correction Using Long Context Match for Smartphone Speech Recognition},
year={2015},
volume={E98-D},
number={11},
pages={1932-1942},
abstract={Most error correction interfaces for speech recognition applications on smartphones require the user to first mark an error region and choose the correct word from a candidate list. We propose a simple multimodal interface to make the process more efficient. We develop Long Context Match (LCM) to get candidates that complement the conventional word confusion network (WCN). Assuming that not only the preceding words but also the succeeding words of the error region are validated by users, we use such contexts to search higher-order n-grams corpora for matching word sequences. For this purpose, we also utilize the Web text data. Furthermore, we propose a combination of LCM and WCN (“LCM + WCN”) to provide users with candidate lists that are more relevant than those yielded by WCN alone. We compare our interface with the WCN-based interface on the Corpus of Spontaneous Japanese (CSJ). Our proposed “LCM + WCN” method improved the 1-best accuracy by 23%, improved the Mean Reciprocal Rank (MRR) by 28%, and our interface reduced the user's load by 12%.},
keywords={},
doi={10.1587/transinf.2015EDP7179},
ISSN={1745-1361},
month={November},}

Copy

TY - JOUR
TI - Error Correction Using Long Context Match for Smartphone Speech Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 1932
EP - 1942
AU - Yuan LIANG
AU - Koji IWANO
AU - Koichi SHINODA
PY - 2015
DO - 10.1587/transinf.2015EDP7179
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E98-D
IS - 11
JA - IEICE TRANSACTIONS on Information
Y1 - November 2015
AB - Most error correction interfaces for speech recognition applications on smartphones require the user to first mark an error region and choose the correct word from a candidate list. We propose a simple multimodal interface to make the process more efficient. We develop Long Context Match (LCM) to get candidates that complement the conventional word confusion network (WCN). Assuming that not only the preceding words but also the succeeding words of the error region are validated by users, we use such contexts to search higher-order n-grams corpora for matching word sequences. For this purpose, we also utilize the Web text data. Furthermore, we propose a combination of LCM and WCN (“LCM + WCN”) to provide users with candidate lists that are more relevant than those yielded by WCN alone. We compare our interface with the WCN-based interface on the Corpus of Spontaneous Japanese (CSJ). Our proposed “LCM + WCN” method improved the 1-best accuracy by 23%, improved the Mean Reciprocal Rank (MRR) by 28%, and our interface reduced the user's load by 12%.
ER -