This paper proposes a new stochastic language model for speech recognition based on function-word N-grams and content-word N-grams. The conventional word N-gram models are effective for speech recognition, but they represent only local constraints within a few successive words and lack the ability to capture global syntactic or semantic relationships between words. To represent more global constraints, the proposed language model gives the N-gram probabilities of word sequences, with attention given only to function words or to content words. The sequences of function words and of content words are expected to represent syntactic and semantic constraints, respectively. Probabilities of function-word bigrams and content-word bigrams were estimated from a 10,000-sentence text database, and analysis using information theoretic measure showed that expected constraints were extracted appropriately. As an application of this model to speech recognition, a post-processor was constructed to select the optimum sentence candidate from a phrase lattice obtained by a phrase recognition system. The phrase candidate sequence with the highest total acoustic and linguistic score was sought by dynamic programming. The results of experiments carried out on the utterances of 12 speakers showed that the proposed method is more accurate than a CFG-based method, thus demonstrating its effectiveness in improving speech recognition performance.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Ryosuke ISOTANI, Shoichi MATSUNAGA, Shigeki SAGAYAMA, "Speech Recognition Using Function-Word N-Grams and Content-Word N-Grams" in IEICE TRANSACTIONS on Information,
vol. E78-D, no. 6, pp. 692-697, June 1995, doi: .
Abstract: This paper proposes a new stochastic language model for speech recognition based on function-word N-grams and content-word N-grams. The conventional word N-gram models are effective for speech recognition, but they represent only local constraints within a few successive words and lack the ability to capture global syntactic or semantic relationships between words. To represent more global constraints, the proposed language model gives the N-gram probabilities of word sequences, with attention given only to function words or to content words. The sequences of function words and of content words are expected to represent syntactic and semantic constraints, respectively. Probabilities of function-word bigrams and content-word bigrams were estimated from a 10,000-sentence text database, and analysis using information theoretic measure showed that expected constraints were extracted appropriately. As an application of this model to speech recognition, a post-processor was constructed to select the optimum sentence candidate from a phrase lattice obtained by a phrase recognition system. The phrase candidate sequence with the highest total acoustic and linguistic score was sought by dynamic programming. The results of experiments carried out on the utterances of 12 speakers showed that the proposed method is more accurate than a CFG-based method, thus demonstrating its effectiveness in improving speech recognition performance.
URL: https://globals.ieice.org/en_transactions/information/10.1587/e78-d_6_692/_p
Copy
@ARTICLE{e78-d_6_692,
author={Ryosuke ISOTANI, Shoichi MATSUNAGA, Shigeki SAGAYAMA, },
journal={IEICE TRANSACTIONS on Information},
title={Speech Recognition Using Function-Word N-Grams and Content-Word N-Grams},
year={1995},
volume={E78-D},
number={6},
pages={692-697},
abstract={This paper proposes a new stochastic language model for speech recognition based on function-word N-grams and content-word N-grams. The conventional word N-gram models are effective for speech recognition, but they represent only local constraints within a few successive words and lack the ability to capture global syntactic or semantic relationships between words. To represent more global constraints, the proposed language model gives the N-gram probabilities of word sequences, with attention given only to function words or to content words. The sequences of function words and of content words are expected to represent syntactic and semantic constraints, respectively. Probabilities of function-word bigrams and content-word bigrams were estimated from a 10,000-sentence text database, and analysis using information theoretic measure showed that expected constraints were extracted appropriately. As an application of this model to speech recognition, a post-processor was constructed to select the optimum sentence candidate from a phrase lattice obtained by a phrase recognition system. The phrase candidate sequence with the highest total acoustic and linguistic score was sought by dynamic programming. The results of experiments carried out on the utterances of 12 speakers showed that the proposed method is more accurate than a CFG-based method, thus demonstrating its effectiveness in improving speech recognition performance.},
keywords={},
doi={},
ISSN={},
month={June},}
Copy
TY - JOUR
TI - Speech Recognition Using Function-Word N-Grams and Content-Word N-Grams
T2 - IEICE TRANSACTIONS on Information
SP - 692
EP - 697
AU - Ryosuke ISOTANI
AU - Shoichi MATSUNAGA
AU - Shigeki SAGAYAMA
PY - 1995
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E78-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 1995
AB - This paper proposes a new stochastic language model for speech recognition based on function-word N-grams and content-word N-grams. The conventional word N-gram models are effective for speech recognition, but they represent only local constraints within a few successive words and lack the ability to capture global syntactic or semantic relationships between words. To represent more global constraints, the proposed language model gives the N-gram probabilities of word sequences, with attention given only to function words or to content words. The sequences of function words and of content words are expected to represent syntactic and semantic constraints, respectively. Probabilities of function-word bigrams and content-word bigrams were estimated from a 10,000-sentence text database, and analysis using information theoretic measure showed that expected constraints were extracted appropriately. As an application of this model to speech recognition, a post-processor was constructed to select the optimum sentence candidate from a phrase lattice obtained by a phrase recognition system. The phrase candidate sequence with the highest total acoustic and linguistic score was sought by dynamic programming. The results of experiments carried out on the utterances of 12 speakers showed that the proposed method is more accurate than a CFG-based method, thus demonstrating its effectiveness in improving speech recognition performance.
ER -