1-2hit |
Satoshi NAKAMURA Kazuya TAKEDA Kazumasa YAMAMOTO Takeshi YAMADA Shingo KUROIWA Norihide KITAOKA Takanobu NISHIURA Akira SASOU Mitsunori MIZUMACHI Chiyomi MIYAJIMA Masakiyo FUJIMOTO Toshiki ENDO
This paper introduces an evaluation framework for Japanese noisy speech recognition named AURORA-2J. Speech recognition systems must still be improved to be robust to noisy environments, but this improvement requires development of the standard evaluation corpus and assessment technologies. Recently, the Aurora 2, 3 and 4 corpora and their evaluation scenarios have had significant impact on noisy speech recognition research. The AURORA-2J is a Japanese connected digits corpus and its evaluation scripts are designed in the same way as Aurora 2 with the help of European Telecommunications Standards Institute (ETSI) AURORA group. This paper describes the data collection, baseline scripts, and its baseline performance. We also propose a new performance analysis method that considers differences in recognition performance among speakers. This method is based on the word accuracy per speaker, revealing the degree of the individual difference of the recognition performance. We also propose categorization of modifications, applied to the original HTK baseline system, which helps in comparing the systems and in recognizing technologies that improve the performance best within the same category.
Toshiharu HORIUCHI Mitsunori MIZUMACHI Satoshi NAKAMURA
This paper proposes a simple method for estimation and compensation of signal direction, to deal with relative change of sound source location caused by the movements of a microphone array and a sound source. This method introduces a delay filter that has shifted and sampled sinc functions. This paper presents a concept for the joint optimization of arrival time differences and of the coordinate system of a mobile microphone array. We use the LMS algorithm to derive this method by maintaining a certain relationship between the directions of the microphone array and the sound source directions. This method directly estimates the relative directions of the microphone array to the sound source directions by minimizing the relative differences of arrival time among the observed signals, not by estimating the time difference of arrival (TDOA) between two observed signals. This method also compensates the time delay of the observed signals simultaneously, and it has a feature to maintain that the output signals are in phase. Simulation results support effectiveness of the method.