1-2hit |
Osamu YOSHIOKA Yasuhiro MINAMI Kiyohiro SHIKANO
This paper describes a multimodal dialogue system employing speech input. This system uses three input methods (through a speech recognizer, a mouse, and a keyboard) and two output methods (through a display and using sound). For the speech recognizer, an algorithm is employed for large-vocabulary speaker-independent continuous speech recognition based on the HMM-LR technique. This system is implemented for telephone directory assistance to evaluate the speech recognition algorithm and to investigate the variations in speech structure that users utter to computers. Speech input is used in a multimodal environment. The collecting of dialogue data between computers and users is also carried out. Twenty telephone-number retrieval tasks are used to evaluate this system. In the experiments, all the users are equally trained in using the dialogue system with an interactive guidance system implemented on a workstation. Simplified city maps that indicate subscriber names and addresses are used to reduce the implicit restrictions imposed by written sentences, thus allowing each user to develop his own forms of expression. The task completion rate is 99.0% and approximately 75% of the users say that they prefer this system to using a telephone book. Moreover, there is a significant decrease in nonkeyword usage, i.e., the usage of words other than names and addresses, for users who receive more utterance practice.
Shoko YAMAHATA Yoshikazu YAMAGUCHI Atsunori OGAWA Hirokazu MASATAKI Osamu YOSHIOKA Satoshi TAKAHASHI
Recognition errors caused by out-of-vocabulary (OOV) words lead critical problems when developing spoken language understanding systems based on automatic speech recognition technology. And automatic vocabulary adaptation is an essential technique to solve these problems. In this paper, we propose a novel and effective automatic vocabulary adaptation method. Our method selects OOV words from relevant documents using combined scores of semantic and acoustic similarities. Using this combined score that reflects both semantic and acoustic aspects, only necessary OOV words can be selected without registering redundant words. In addition, our method estimates probabilities of OOV words using semantic similarity and a class-based N-gram language model. These probabilities will be appropriate since they are estimated by considering both frequencies of OOV words in target speech data and the stable class N-gram probabilities. Experimental results show that our method improves OOV selection accuracy and recognition accuracy of newly registered words in comparison with conventional methods.