A Novel Discriminative Method for Pronunciation Quality Assessment

Junbo ZHANG; Fuping PAN; Bin DONG; Qingwei ZHAO; Yonghong YAN

doi:10.1587/transinf.E96.D.1145

A Novel Discriminative Method for Pronunciation Quality Assessment

Junbo ZHANG, Fuping PAN, Bin DONG, Qingwei ZHAO, Yonghong YAN

Full Text Views

0

Share
Cite this

Summary :

In this paper, we presented a novel method for automatic pronunciation quality assessment. Unlike the popular “Goodness of Pronunciation” (GOP) method, this method does not map the decoding confidence into pronunciation quality score, but differentiates the different pronunciation quality utterances directly. In this method, the student's utterance need to be decoded for two times. The first-time decoding was for getting the time points of each phone of the utterance by a forced alignment using a conventional trained acoustic model (AM). The second-time decoding was for differentiating the pronunciation quality for each triphone using a specially trained AM, where the triphones in different pronunciation qualities were trained as different units, and the model was trained in discriminative method to ensure the model has the best discrimination among the triphones whose names were same but pronunciation quality scores were different. The decoding network in the second-time decoding included different pronunciation quality triphones, so the phone-level scores can be obtained from the decoding result directly. The phone-level scores were combined into the sentence-level scores using maximum entropy criterion. The experimental results shows that the scoring performance was increased significantly compared to the GOP method, especially in sentence-level.

Publication: IEICE TRANSACTIONS on Information Vol.E96-D No.5 pp.1145-1151

Publication Date: 2013/05/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E96.D.1145

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Junbo ZHANG, Fuping PAN, Bin DONG, Qingwei ZHAO, Yonghong YAN, "A Novel Discriminative Method for Pronunciation Quality Assessment" in IEICE TRANSACTIONS on Information, vol. E96-D, no. 5, pp. 1145-1151, May 2013, doi: 10.1587/transinf.E96.D.1145.
Abstract: In this paper, we presented a novel method for automatic pronunciation quality assessment. Unlike the popular “Goodness of Pronunciation” (GOP) method, this method does not map the decoding confidence into pronunciation quality score, but differentiates the different pronunciation quality utterances directly. In this method, the student's utterance need to be decoded for two times. The first-time decoding was for getting the time points of each phone of the utterance by a forced alignment using a conventional trained acoustic model (AM). The second-time decoding was for differentiating the pronunciation quality for each triphone using a specially trained AM, where the triphones in different pronunciation qualities were trained as different units, and the model was trained in discriminative method to ensure the model has the best discrimination among the triphones whose names were same but pronunciation quality scores were different. The decoding network in the second-time decoding included different pronunciation quality triphones, so the phone-level scores can be obtained from the decoding result directly. The phone-level scores were combined into the sentence-level scores using maximum entropy criterion. The experimental results shows that the scoring performance was increased significantly compared to the GOP method, especially in sentence-level.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.E96.D.1145/_p

Copy

@ARTICLE{e96-d_5_1145,
author={Junbo ZHANG, Fuping PAN, Bin DONG, Qingwei ZHAO, Yonghong YAN, },
journal={IEICE TRANSACTIONS on Information},
title={A Novel Discriminative Method for Pronunciation Quality Assessment},
year={2013},
volume={E96-D},
number={5},
pages={1145-1151},
abstract={In this paper, we presented a novel method for automatic pronunciation quality assessment. Unlike the popular “Goodness of Pronunciation” (GOP) method, this method does not map the decoding confidence into pronunciation quality score, but differentiates the different pronunciation quality utterances directly. In this method, the student's utterance need to be decoded for two times. The first-time decoding was for getting the time points of each phone of the utterance by a forced alignment using a conventional trained acoustic model (AM). The second-time decoding was for differentiating the pronunciation quality for each triphone using a specially trained AM, where the triphones in different pronunciation qualities were trained as different units, and the model was trained in discriminative method to ensure the model has the best discrimination among the triphones whose names were same but pronunciation quality scores were different. The decoding network in the second-time decoding included different pronunciation quality triphones, so the phone-level scores can be obtained from the decoding result directly. The phone-level scores were combined into the sentence-level scores using maximum entropy criterion. The experimental results shows that the scoring performance was increased significantly compared to the GOP method, especially in sentence-level.},
keywords={},
doi={10.1587/transinf.E96.D.1145},
ISSN={1745-1361},
month={May},}

Copy

TY - JOUR
TI - A Novel Discriminative Method for Pronunciation Quality Assessment
T2 - IEICE TRANSACTIONS on Information
SP - 1145
EP - 1151
AU - Junbo ZHANG
AU - Fuping PAN
AU - Bin DONG
AU - Qingwei ZHAO
AU - Yonghong YAN
PY - 2013
DO - 10.1587/transinf.E96.D.1145
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E96-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2013
AB - In this paper, we presented a novel method for automatic pronunciation quality assessment. Unlike the popular “Goodness of Pronunciation” (GOP) method, this method does not map the decoding confidence into pronunciation quality score, but differentiates the different pronunciation quality utterances directly. In this method, the student's utterance need to be decoded for two times. The first-time decoding was for getting the time points of each phone of the utterance by a forced alignment using a conventional trained acoustic model (AM). The second-time decoding was for differentiating the pronunciation quality for each triphone using a specially trained AM, where the triphones in different pronunciation qualities were trained as different units, and the model was trained in discriminative method to ensure the model has the best discrimination among the triphones whose names were same but pronunciation quality scores were different. The decoding network in the second-time decoding included different pronunciation quality triphones, so the phone-level scores can be obtained from the decoding result directly. The phone-level scores were combined into the sentence-level scores using maximum entropy criterion. The experimental results shows that the scoring performance was increased significantly compared to the GOP method, especially in sentence-level.
ER -