IEICE globals.ieice.org Site

Keyword Search Result

[Keyword] multimodal interface(5hit)

1-5hit

Error Correction Using Long Context Match for Smartphone Speech Recognition
Yuan LIANG Koji IWANO Koichi SHINODA

PAPER-Speech and Hearing

Pubricized:
2015/07/31
Vol:
E98-D No:11
Page(s):
1932-1942
Most error correction interfaces for speech recognition applications on smartphones require the user to first mark an error region and choose the correct word from a candidate list. We propose a simple multimodal interface to make the process more efficient. We develop Long Context Match (LCM) to get candidates that complement the conventional word confusion network (WCN). Assuming that not only the preceding words but also the succeeding words of the error region are validated by users, we use such contexts to search higher-order n-grams corpora for matching word sequences. For this purpose, we also utilize the Web text data. Furthermore, we propose a combination of LCM and WCN (“LCM + WCN”) to provide users with candidate lists that are more relevant than those yielded by WCN alone. We compare our interface with the WCN-based interface on the Corpus of Spontaneous Japanese (CSJ). Our proposed “LCM + WCN” method improved the 1-best accuracy by 23%, improved the Mean Reciprocal Rank (MRR) by 28%, and our interface reduced the user's load by 12%.
Interactive Object Recognition through Hypothesis Generation and Confirmation
Md. Altab HOSSAIN Rahmadi KURNIA Akio NAKAMURA Yoshinori KUNO

PAPER-Interactive Systems

Vol:
E89-D No:7
Page(s):
2197-2206
An effective human-robot interaction is essential for wide penetration of service robots into the market. Such robot needs a vision system to recognize objects. It is, however, difficult to realize vision systems that can work in various conditions. More robust techniques of object recognition and image segmentation are essential. Thus, we have proposed to use the human user's assistance for objects recognition through speech. This paper presents a system that recognizes objects in occlusion and/or multicolor cases using geometric and photometric analysis of images. Based on the analysis results, the system makes a hypothesis of the scene. Then, it asks the user for confirmation by describing the hypothesis. If the hypothesis is not correct, the system generates another hypothesis until it correctly understands the scene. Through experiments on a real mobile robot, we have confirmed the usefulness of the system.
Interactive Object Recognition System for a Helper Robot Using Photometric Invariance
Md. Altab HOSSAIN Rahmadi KURNIA Akio NAKAMURA Yoshinori KUNO

PAPER

Vol:
E88-D No:11
Page(s):
2500-2508
We are developing a helper robot that carries out tasks ordered by the user through speech. The robot needs a vision system to recognize the objects appearing in the orders. It is, however, difficult to realize vision systems that can work in various conditions. Thus, we have proposed to use the human user's assistance through speech. When the vision system cannot achieve a task, the robot makes a speech to the user so that the natural response by the user can give helpful information for its vision system. Our previous system assumes that it can segment images without failure. However, if there are occluded objects and/or objects composed of multicolor parts, segmentation failures cannot be avoided. This paper presents an extended system that tries to recover from segmentation failures using photometric invariance. If the system is not sure about segmentation results, the system asks the user by appropriate expressions depending on the invariant values. Experimental results show the usefulness of the system.
The Efficiency of Various Multimodal Input Interfaces Evaluated in Two Empirical Studies
Xiangshi REN Gao ZHANG Guozhong DAI

PAPER-Welfare Engineering

Vol:
E84-D No:10
Page(s):
1421-1426
Although research into multimodal interfaces has been around for a long time, we believe that some basic issues have not been studied yet, e.g. the choice of modalities and their combinations is usually made without any quantitative evaluation. This study seeks to identify the best combinations of modalities through usability testing. How do users choose different interaction modes when they work on a particular application? Two experimental evaluations were conducted to compare interaction modes on a CAD system and a map system respectively. For the CAD system, the results show that, in terms of total manipulation time (drawing and modification time) and subjective preferences, the "pen + speech + mouse" combination was the best of the seven interaction modes tested. On the map system, the results show that the "pen + speech" combination mode is the best of fourteen interaction modes tested. The experiments also provide information on how users adapt to each interaction mode and the ease with which they are able to use these modes.
A Speech Dialogue System with Multimodal Interface for Telephone Directory Assistance
Osamu YOSHIOKA Yasuhiro MINAMI Kiyohiro SHIKANO

PAPER

Vol:
E78-D No:6
Page(s):
616-621
This paper describes a multimodal dialogue system employing speech input. This system uses three input methods (through a speech recognizer, a mouse, and a keyboard) and two output methods (through a display and using sound). For the speech recognizer, an algorithm is employed for large-vocabulary speaker-independent continuous speech recognition based on the HMM-LR technique. This system is implemented for telephone directory assistance to evaluate the speech recognition algorithm and to investigate the variations in speech structure that users utter to computers. Speech input is used in a multimodal environment. The collecting of dialogue data between computers and users is also carried out. Twenty telephone-number retrieval tasks are used to evaluate this system. In the experiments, all the users are equally trained in using the dialogue system with an interactive guidance system implemented on a workstation. Simplified city maps that indicate subscriber names and addresses are used to reduce the implicit restrictions imposed by written sentences, thus allowing each user to develop his own forms of expression. The task completion rate is 99.0% and approximately 75% of the users say that they prefer this system to using a telephone book. Moreover, there is a significant decrease in nonkeyword usage, i.e., the usage of words other than names and addresses, for users who receive more utterance practice.