This paper introduces an evaluation framework for Japanese noisy speech recognition named AURORA-2J. Speech recognition systems must still be improved to be robust to noisy environments, but this improvement requires development of the standard evaluation corpus and assessment technologies. Recently, the Aurora 2, 3 and 4 corpora and their evaluation scenarios have had significant impact on noisy speech recognition research. The AURORA-2J is a Japanese connected digits corpus and its evaluation scripts are designed in the same way as Aurora 2 with the help of European Telecommunications Standards Institute (ETSI) AURORA group. This paper describes the data collection, baseline scripts, and its baseline performance. We also propose a new performance analysis method that considers differences in recognition performance among speakers. This method is based on the word accuracy per speaker, revealing the degree of the individual difference of the recognition performance. We also propose categorization of modifications, applied to the original HTK baseline system, which helps in comparing the systems and in recognizing technologies that improve the performance best within the same category.
Satoshi NAKAMURA
Kazuya TAKEDA
Kazumasa YAMAMOTO
Takeshi YAMADA
Shingo KUROIWA
Norihide KITAOKA
Takanobu NISHIURA
Akira SASOU
Mitsunori MIZUMACHI
Chiyomi MIYAJIMA
Masakiyo FUJIMOTO
Toshiki ENDO
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Satoshi NAKAMURA, Kazuya TAKEDA, Kazumasa YAMAMOTO, Takeshi YAMADA, Shingo KUROIWA, Norihide KITAOKA, Takanobu NISHIURA, Akira SASOU, Mitsunori MIZUMACHI, Chiyomi MIYAJIMA, Masakiyo FUJIMOTO, Toshiki ENDO, "AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition" in IEICE TRANSACTIONS on Information,
vol. E88-D, no. 3, pp. 535-544, March 2005, doi: 10.1093/ietisy/e88-d.3.535.
Abstract: This paper introduces an evaluation framework for Japanese noisy speech recognition named AURORA-2J. Speech recognition systems must still be improved to be robust to noisy environments, but this improvement requires development of the standard evaluation corpus and assessment technologies. Recently, the Aurora 2, 3 and 4 corpora and their evaluation scenarios have had significant impact on noisy speech recognition research. The AURORA-2J is a Japanese connected digits corpus and its evaluation scripts are designed in the same way as Aurora 2 with the help of European Telecommunications Standards Institute (ETSI) AURORA group. This paper describes the data collection, baseline scripts, and its baseline performance. We also propose a new performance analysis method that considers differences in recognition performance among speakers. This method is based on the word accuracy per speaker, revealing the degree of the individual difference of the recognition performance. We also propose categorization of modifications, applied to the original HTK baseline system, which helps in comparing the systems and in recognizing technologies that improve the performance best within the same category.
URL: https://globals.ieice.org/en_transactions/information/10.1093/ietisy/e88-d.3.535/_p
Copy
@ARTICLE{e88-d_3_535,
author={Satoshi NAKAMURA, Kazuya TAKEDA, Kazumasa YAMAMOTO, Takeshi YAMADA, Shingo KUROIWA, Norihide KITAOKA, Takanobu NISHIURA, Akira SASOU, Mitsunori MIZUMACHI, Chiyomi MIYAJIMA, Masakiyo FUJIMOTO, Toshiki ENDO, },
journal={IEICE TRANSACTIONS on Information},
title={AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition},
year={2005},
volume={E88-D},
number={3},
pages={535-544},
abstract={This paper introduces an evaluation framework for Japanese noisy speech recognition named AURORA-2J. Speech recognition systems must still be improved to be robust to noisy environments, but this improvement requires development of the standard evaluation corpus and assessment technologies. Recently, the Aurora 2, 3 and 4 corpora and their evaluation scenarios have had significant impact on noisy speech recognition research. The AURORA-2J is a Japanese connected digits corpus and its evaluation scripts are designed in the same way as Aurora 2 with the help of European Telecommunications Standards Institute (ETSI) AURORA group. This paper describes the data collection, baseline scripts, and its baseline performance. We also propose a new performance analysis method that considers differences in recognition performance among speakers. This method is based on the word accuracy per speaker, revealing the degree of the individual difference of the recognition performance. We also propose categorization of modifications, applied to the original HTK baseline system, which helps in comparing the systems and in recognizing technologies that improve the performance best within the same category.},
keywords={},
doi={10.1093/ietisy/e88-d.3.535},
ISSN={},
month={March},}
Copy
TY - JOUR
TI - AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 535
EP - 544
AU - Satoshi NAKAMURA
AU - Kazuya TAKEDA
AU - Kazumasa YAMAMOTO
AU - Takeshi YAMADA
AU - Shingo KUROIWA
AU - Norihide KITAOKA
AU - Takanobu NISHIURA
AU - Akira SASOU
AU - Mitsunori MIZUMACHI
AU - Chiyomi MIYAJIMA
AU - Masakiyo FUJIMOTO
AU - Toshiki ENDO
PY - 2005
DO - 10.1093/ietisy/e88-d.3.535
JO - IEICE TRANSACTIONS on Information
SN -
VL - E88-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2005
AB - This paper introduces an evaluation framework for Japanese noisy speech recognition named AURORA-2J. Speech recognition systems must still be improved to be robust to noisy environments, but this improvement requires development of the standard evaluation corpus and assessment technologies. Recently, the Aurora 2, 3 and 4 corpora and their evaluation scenarios have had significant impact on noisy speech recognition research. The AURORA-2J is a Japanese connected digits corpus and its evaluation scripts are designed in the same way as Aurora 2 with the help of European Telecommunications Standards Institute (ETSI) AURORA group. This paper describes the data collection, baseline scripts, and its baseline performance. We also propose a new performance analysis method that considers differences in recognition performance among speakers. This method is based on the word accuracy per speaker, revealing the degree of the individual difference of the recognition performance. We also propose categorization of modifications, applied to the original HTK baseline system, which helps in comparing the systems and in recognizing technologies that improve the performance best within the same category.
ER -