IEICE TRANSACTIONS on Information

  • Impact Factor

    0.59

  • Eigenfactor

    0.002

  • article influence

    0.1

  • Cite Score

    1.4

Advance publication (published online immediately after acceptance)

Volume E88-D No.11  (Publication Date:2005/11/01)

    Special Section on Life-like Agent and its Communication
  • FOREWORD

    Mitsuru ISHIZUKA  Shigeo MORISHIMA  

     
    FOREWORD

      Page(s):
    2443-2444
  • From Chatterbots to Natural Interaction--Face to Face Communication with Embodied Conversational Agents

    Matthias REHM  Elisabeth ANDRE  

     
    INVITED PAPER

      Page(s):
    2445-2452

    In this paper, we present a game of dice that combines multi-party communication with a tangible interface. The game has been used as a testbed to study typical conversational behavior patterns in interactions between human users and synthetic agents. In particular, we were interested in the question to what extent the interaction with the agent can be considered as natural. As an evaluation criterion, we propose to investigate whether the communicative behaviors of humans differ when conversing with an agent as opposed to conversing with other humans.

  • Human Physiology as a Basis for Designing and Evaluating Affective Communication with Life-Like Characters

    Helmut PRENDINGER  Mitsuru ISHIZUKA  

     
    INVITED PAPER

      Page(s):
    2453-2460

    This paper highlights some of our recent research efforts in designing and evaluating life-like characters that are capable of entertaining affective and social communication with human users. The key novelty of our approach is the use of human physiological information: first, as a method to evaluate the effect of life-like character behavior on a moment-to-moment basis, and second, as an input modality for a new generation of interface agents that we call 'physiologically perceptive' life-like characters. By exploiting the stream of primarily involuntary human responses, such as autonomic nervous system activity or eye movements, those characters are expected to respond to users' affective and social needs in a truly sensitive, and hence effective, friendly, and beneficial way.

  • Interaction Builder: A Rapid Prototyping Tool for Developing Web-Based MMI Applications

    Kouichi KATSURADA  Hiroaki ADACHI  Kunitoshi SATO  Hirobumi YAMADA  Tsuneo NITTA  

     
    PAPER

      Page(s):
    2461-2468

    We have developed Interaction Builder (IB), a rapid prototyping tool for constructing web-based Multi-Modal Interaction (MMI) applications. The goal of IB is making it easy to develop MMI applications with speech recognition, life-like agents, speech synthesis, web browsing, etc. For this purpose, IB supports the following interface and functions: (1) GUI for implementing MMI systems without the details of MMI and MMI description language, (2) functionalities of handling synchronized multimodal inputs/outputs, (3) a test run mode for run-time testing. The results of evaluation tests showed that the application development cycle using IB was significantly shortened in comparison with the time using a text editor both for MMI description language experts and for beginners.

  • Proposal of a Multimodal Interaction Description Language for Various Interactive Agents

    Masahiro ARAKI  Akiko KOUZAWA  Kenji TACHIBANA  

     
    PAPER

      Page(s):
    2469-2476

    In this paper, we propose a new multimodal interaction description language, MIML (Multimodal Interaction Markup Language), which defines dialogue patterns between human and various types of interactive agents. The feature of this language is three-layered description of agent-based interactive systems. The high-level description is a task definition that can easily construct typical agent-based interactive task control information. The middle-level description is an interaction description that defines agent's behavior and user's input at the granularity of dialogue segment. The low-level description is a platform dependent description that can override the pre-defined function in the interaction description. The connection between task-level and interaction-level is realized by generation of interaction description templates from the task level description. The connection between interaction-level and platform-level is realized by a binding mechanism of XML. As a result of the comparison with other languages, MIML has advantages in high-level interaction description, modality extensibility and compatibility with standardized technologies.

  • Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation

    Tatsuo YOTSUKURA  Shigeo MORISHIMA  Satoshi NAKAMURA  

     
    PAPER

      Page(s):
    2477-2483

    An accurate audio-visual speech corpus is inevitable for talking-heads research. This paper presents our audio-visual speech corpus collection and proposes a head-movement normalization method and a facial motion generation method. The audio-visual corpus contains speech data, movie data on faces, and positions and movements of facial organs. The corpus consists of Japanese phoneme-balanced sentences uttered by a female native speaker. An accurate facial capture is realized by using an optical motion-capture system. We captured high-resolution 3D data by arranging many markers on the speaker's face. In addition, we propose a method of acquiring the facial movements and removing head movements by using affine transformation for computing displacements of pure facial organs. Finally, in order to easily create facial animation from this motion data, we propose a technique assigning the captured data to the facial polygon model. Evaluation results demonstrate the effectiveness of the proposed facial motion generation method and show the relationship between the number of markers and errors.

  • Speech Synthesis with Various Emotional Expressions and Speaking Styles by Style Interpolation and Morphing

    Makoto TACHIBANA  Junichi YAMAGISHI  Takashi MASUKO  Takao KOBAYASHI  

     
    PAPER

      Page(s):
    2484-2491

    This paper describes an approach to generating speech with emotional expressivity and speaking style variability. The approach is based on a speaking style and emotional expression modeling technique for HMM-based speech synthesis. We first model several representative styles, each of which is a speaking style and/or an emotional expression, in an HMM-based speech synthesis framework. Then, to generate synthetic speech with an intermediate style from representative ones, we synthesize speech from a model obtained by interpolating representative style models using a model interpolation technique. We assess the style interpolation technique with subjective evaluation tests using four representative styles, i.e., neutral, joyful, sad, and rough in read speech and synthesized speech from models obtained by interpolating models for all combinations of two styles. The results show that speech synthesized from the interpolated model has a style in between the two representative ones. Moreover, we can control the degree of expressivity for speaking styles or emotions in synthesized speech by changing the interpolation ratio in interpolation between neutral and other representative styles. We also show that we can achieve style morphing in speech synthesis, namely, changing style smoothly from one representative style to another by gradually changing the interpolation ratio.

  • Human Walking Motion Synthesis with Desired Pace and Stride Length Based on HSMM

    Naotake NIWASE  Junichi YAMAGISHI  Takao KOBAYASHI  

     
    PAPER

      Page(s):
    2492-2499

    This paper presents a new technique for automatically synthesizing human walking motion. In the technique, a set of fundamental motion units called motion primitives is defined and each primitive is modeled statistically from motion capture data using a hidden semi-Markov model (HSMM), which is a hidden Markov model (HMM) with explicit state duration probability distributions. The mean parameter for the probability distribution function of HSMM is assumed to be given by a function of factors that control the walking pace and stride length, and a training algorithm, called factor adaptive training, is derived based on the EM algorithm. A parameter generation algorithm from motion primitive HSMMs with given control factors is also described. Experimental results for generating walking motion are presented when the walking pace and stride length are changed. The results show that the proposing technique can generate smooth and realistic motion, which are not included in the motion capture data, without the need for smoothing or interpolation.

  • Interactive Object Recognition System for a Helper Robot Using Photometric Invariance

    Md. Altab HOSSAIN  Rahmadi KURNIA  Akio NAKAMURA  Yoshinori KUNO  

     
    PAPER

      Page(s):
    2500-2508

    We are developing a helper robot that carries out tasks ordered by the user through speech. The robot needs a vision system to recognize the objects appearing in the orders. It is, however, difficult to realize vision systems that can work in various conditions. Thus, we have proposed to use the human user's assistance through speech. When the vision system cannot achieve a task, the robot makes a speech to the user so that the natural response by the user can give helpful information for its vision system. Our previous system assumes that it can segment images without failure. However, if there are occluded objects and/or objects composed of multicolor parts, segmentation failures cannot be avoided. This paper presents an extended system that tries to recover from segmentation failures using photometric invariance. If the system is not sure about segmentation results, the system asks the user by appropriate expressions depending on the invariant values. Experimental results show the usefulness of the system.

  • Bidirectional Eye Contact for Human-Robot Communication

    Dai MIYAUCHI  Akio NAKAMURA  Yoshinori KUNO  

     
    PAPER

      Page(s):
    2509-2516

    Eye contact is an effective means of controlling human communication, such as in starting communication. It seems that we can make eye contact if we simply look at each other. However, this alone does not establish eye contact. Both parties also need to be aware of being watched by the other. We propose a method of bidirectional eye contact satisfying these conditions for human-robot communication. When a human wants to start communication with a robot, he/she watches the robot. If it finds a human looking at it, the robot turns to him/her, changing its facial expressions to let him/her know its awareness of his/her gaze. When the robot wants to initiate communication with a particular person, it moves its body and face toward him/her and changes its facial expressions to make the person notice its gaze. We show several experimental results to prove the effectiveness of this method. Moreover, we present a robot that can recognize hand gestures after making eye contact with the human to show the usefulness of eye contact as a means of controlling communication.

  • Social Identification of Embodied Interactive Agent

    Yugo TAKEUCHI  Keiko WATANABE  

     
    PAPER

      Page(s):
    2517-2522

    An embodied interactive agent has a virtual body that is generally drawn by CG animation. We intuitively assume that the agent's body primarily expresses non-verbal messages, or symbolizes its social characteristics through its appearance. However, we have not objectively elucidated the expressive competence of an agent's body beyond the conclusions of our empirical and subjective intuition. Therefore, it is necessary to explore scientifically how users regard the functional competence of an agent's embodiment. Do users attribute the intelligence of an agent to its virtual body? We investigated how users physically interact with an agent which is merely a virtual entity drawn on the display by CG, through "showing" something to the eyes of the agent, "listening" to something from the mouth of the agent, and "speaking" something into the ears of the agent. However, such interaction does not necessarily attribute the intellectual processing function to the agent, and this issue is explored through two psychological experiments.

  • Producing Effective Shot Transitions in CG Contents Based on a Cognitive Model of User Involvement

    Masashi OKAMOTO  Yukiko I. NAKANO  Kazunori OKAMOTO  Ken'ichi MATSUMURA  Toyoaki NISHIDA  

     
    PAPER

      Page(s):
    2523-2532

    In virtue of great progress in computer graphics technologies, CG movies have been getting popular. However, cinematography techniques, which contribute to improving the contents' comprehensibility, need to be learned from professional experiences, and not easily acquired by non-professional people. This paper focuses on film cutting as one of the most important cinematography techniques in conversational scenes, and presents a system that automatically generates shot transitions to improve comprehensibility of CG contents. First, we propose a cognitive model of User Involvement serving as constraints on selecting shot transitions. Then, to examine the validity of the model, we analyze shot transitions in TV programs, and based on the analysis, we implement a CG contents creation system. Results of our preliminary evaluation experiment show the effectiveness of the proposed method, specifically in enhancing contents' comprehensibility.

  • Regular Section
  • Minimizing the Directory Size for Large-Scale Shared-Memory Multiprocessors

    Jinseok KONG  Pen-Chung YEW  Gyungho LEE  

     
    PAPER-Computer Systems

      Page(s):
    2533-2543

    Directory-based cache coherence schemes are commonly used in large-scale shared-memory multiprocessors, but most of them rely on heuristics to avoid large hardware requirements. We proposed using physical address mapping on directories to significantly reduce directory size needed. This approach allows the size of directory to grow as O(cn log2 n) as in optimal pointer-based directory schemes [11], where n is the number of nodes in the system and c is the number of cache lines in each cache memory. Performance aspects of the proposed scheme are studied in detail using simulation.

  • Efficient Execution of Range Top-k Queries in Aggregate R-Trees

    Seokjin HONG  Bongki MOON  Sukho LEE  

     
    PAPER-Database

      Page(s):
    2544-2554

    A range top-k query returns the topmost k records in the order set by a measure attribute within a specified region of multi-dimensional data. The range top-k query is a powerful tool for analysis in spatial databases and data warehouse environments. In this paper, we propose an algorithm to answer the query by selectively traversing an aggregate R-tree having MAX as the aggregate values. The algorithm can execute the query by accessing only a small part of the leaf nodes within a query region. Therefore, it shows good query performance regardless of the size of the query region. We suggest an efficient pruning technique for the priority queue, which reduces the cost of handling the priority queue, and also propose an efficient technique for leaf node organization to reduce the number of node accesses to execute the range top-k queries.

  • Failure Trace Analysis of Timed Circuits for Automatic Timing Constraints Derivation

    Tomoya KITAI  Tomohiro YONEDA  Chris MYERS  

     
    PAPER-Dependable Computing

      Page(s):
    2555-2564

    This work proposes a technique to automatically obtain timing constraints for a given timed circuit to operate correctly. A designated set of delay parameters of a circuit are first set to sufficiently large bounds, and verification runs followed by failure analysis are repeated. Each verification run performs timed state space enumeration under the given delay bounds, and produces a failure trace if it exists. The failure trace is analyzed, and sufficient timing constraints to prevent the failure are obtained. Then, the delay bounds are tightened according to the timing constraints by using an ILP (Integer Linear Programming) solver. This process terminates when either some delay bounds under which no failure is detected are found or no new delay bounds to prevent the failures can be obtained. The experimental results using a naive implementation show that the proposed method can efficiently handle asynchronous benchmark circuits and nontrivial GasP circuits.

  • Concatenative Speech Synthesis Based on the Plural Unit Selection and Fusion Method

    Tatsuya MIZUTANI  Takehiko KAGOSHIMA  

     
    PAPER-Speech and Hearing

      Page(s):
    2565-2572

    This paper proposes a novel speech synthesis method to generate human-like natural speech. The conventional unit-selection-based synthesis method selects speech units from a large database, and concatenates them with or without modifying the prosody to generate synthetic speech. This method features highly human-like voice quality. The method, however, has a problem that a suitable speech unit is not necessarily selected. Since the unsuitable speech unit selection causes discontinuity between the consecutive speech units, the synthesized speech quality deteriorates. It might be considered that the conventional method can attain higher speech quality if the database size increases. However, preparation of a larger database requires a longer recording time. The narrator's voice quality does not remain constant throughout the recording period. This fact deteriorates the database quality, and still leaves the problem of unsuitable selection. We propose the plural unit selection and fusion method which avoids this problem. This method integrates the unit fusion used in the unit-training-based method with the conventional unit-selection-based method. The proposed method selects plural speech units for each segment, fuses the selected speech units for each segment, modifies the prosody of the fused speech units, and concatenates them to generate synthetic speech. This unit fusion creates speech units which are connected to one another with much less voice discontinuity, and realizes high quality speech. A subjective evaluation test showed that the proposed method greatly improves the speech quality compared with the conventional method. Also, it showed that the speech quality of the proposed method is kept high regardless of the database size, from small (10 minutes) to large (40 minutes). The proposed method is a new framework in the sense that it is a hybrid method between the unit-selection-based method and the unit-training-based method. In the framework, the algorithms of the unit selection and the unit fusion are exchangeable for more efficient techniques. Thus, the framework is expected to lead to new synthesis methods.

  • A New Iris Recognition Method Using Independent Component Analysis

    Seung-In NOH  Kwanghyuk BAE  Kang Ryoung PARK  Jaihie KIM  

     
    PAPER-Image Recognition, Computer Vision

      Page(s):
    2573-2581

    In a conventional method based on quadrature 2D Gabor wavelets to extract iris features, the iris recognition is performed by a 256-byte iris code, which is computed by applying the Gabor wavelets to a given area of the iris. However, there is a code redundancy because the iris code is generated by basis functions without considering the characteristics of the iris texture. Therefore, the size of the iris code is increased unnecessarily. In this paper we propose a new feature extraction algorithm based on independent component analysis (ICA) for a compact iris code. We implemented the ICA to generate optimal basis functions which could represent iris signals efficiently. In practice the coefficients of the ICA expansions are used as feature vectors. Then iris feature vectors are encoded into the iris code for storing and comparing individual's iris patterns. Additionally, we introduce a method to refine the ICA basis functions for improving the recognition performance. Experimental results show that our proposed method has a similar equal error rate as a conventional method based on the Gabor wavelets, and the iris code size of our proposed methods is five times smaller than that of the Gabor wavelets.

  • Hybrid Image Composition Mechanism for Enhancing Volume Graphics Clusters

    Jorji NONAKA  Nobuyuki KUKIMOTO  Yasuo EBARA  Masato OGATA  Takeshi IWASHITA  Masanori KANAZAWA  Koji KOYAMADA  

     
    PAPER-Computer Graphics

      Page(s):
    2582-2590

    Volume Graphics Clusters (VG Clusters) have proven to be efficient in a wide range of visualization applications and have also shown promise in some other applications where the image composition device could be fully utilized. The main differentiating feature from other graphics clusters is a specialized image composition device, commercially available as the MPC Image Compositor, which enables the building of do-it-yourself VG Clusters. Although this device is highly scalable, the unidirectional composition flow limits the data subdivision to the quantity of physically available rendering nodes. In addition, the limited buffer memory limits the maximum capable image composition size, therefore limiting its use in large-scale data visualization and high-resolution visualization. To overcome these limitations, we propose and evaluate an image composition mechanism in which additional hardware is used for assisting the image composition process. Because of the synergistic use of two distinct image composition hardware devices we named it "Hybrid Image Composition". Some encouraging results were obtained showing the effectiveness of this solution in improving the VG Cluster 's potential. A low-cost parallel port based hardware barrier is also presented as an efficient method for further enhancing this kind of small-scale VG Cluster. Moreover, this solution has proven to be especially useful in clusters built using low-speed networks, such as Fast Ethernet, which are still in common use.

  • Detection System of Clustered Microcalcifications on CR Mammogram

    Hideya TAKEO  Kazuo SHIMURA  Takashi IMAMURA  Akinobu SHIMIZU  Hidefumi KOBATAKE  

     
    PAPER-Biological Engineering

      Page(s):
    2591-2602

    CR (Computed Radiography) is characterized by high sensitivity and wide dynamic range. Moreover, it has the advantage of being able to transfer exposed images directly to a computer-aided detection (CAD) system which is not possible using conventional film digitizer systems. This paper proposes a high-performance clustered microcalcification detection system for CR mammography. Before detecting and classifying candidate regions, the system preprocesses images with a normalization step to take into account various imaging conditions and to enhance microcalcifications with weak contrast. Large-scale experiments using images taken under various imaging conditions at seven hospitals were performed. According to analysis of the experimental results, the proposed system displays high performance. In particular, at a true positive detection rate of 97.1%, the false positive clusters average is only 0.4 per image. The introduction of geometrical features of each microcalcification for identifying true microcalcifications contributed to the performance improvement. One of the aims of this study was to develop a system for practical use. The results indicate that the proposed system is promising.

  • FTOG-Based Management and Recovery Services

    Myungseok KANG  Jaeyun JUNG  Younghoon WHANG  Youngyong KIM  Hagbae KIM  

     
    LETTER-Dependable Computing

      Page(s):
    2603-2605

    This paper presents a Fault-Tolerant Object Group (FTOG) model that provides the group management service and the fault-tolerance service for consistency maintenance and state transparency. Through Intelligent Home Network Simulator, we verify that FTOG model supports both of reliability and the stability of the distributed system.

  • Multiband Vector Quantization Based on Inner Product for Wideband Speech Coding

    Joon-Hyuk CHANG  Sanjit K. MITRA  

     
    LETTER-Speech and Hearing

      Page(s):
    2606-2608

    This paper describes a multiband vector quantization (VQ) technique based on inner product for wideband speech coding at 16 kb/s. Our approach consists of splitting the input speech into two separate bands and then applying an independent coding scheme for each band. A code excited linear prediction (CELP) coder is used in the lower band while a transform based coding strategy is applied in the higher band. The spectral components in the higher frequency band are represented by a set of modulated lapped transform (MLT) coefficients. The higher frequency band is divided into three subbands, and the MLT coefficients construct a vector for each subband. Specifically, for the VQ of these vectors, an inner product-based distance measure is proposed as a new strategy. The proposed 16 kb/s coder with the inner-product based distortion measure achieves better performance than the 48 kb/s ITU-T G.722 in subjective quality tests.

  • Robust Multi-Body Motion Segmentation Based on Fuzzy k-Subspace Clustering

    Xi LI  Zhengnan NING  Liuwei XIANG  

     
    LETTER-Image Recognition, Computer Vision

      Page(s):
    2609-2614

    The problem of multi-body motion segmentation is important in many computer vision applications. In this paper, we propose a novel algorithm called fuzzy k-subspace clustering for robust segmentation. The proposed method exploits the property that under orthographic camera model the tracked feature points of moving objects reside in multiple subspaces. We compute a partition of feature points into corresponding subspace clusters. First, we find a "soft partition" of feature points based on fuzzy k-subspace algorithm. The proposed fuzzy k-subspace algorithm iteratively minimizes the objective function using Weighted Singular Value Decomposition. Then the points with high partition confidence are gathered to form the subspace bases and the remaining points are classified using their distance to the bases. The proposed method can handle the case of missing data naturally, meaning that the feature points do not have to be visible throughout the sequence. The method is robust to noise and insensitive to initialization. Extensive experiments on synthetic and real data show the effectiveness of the proposed fuzzy k-subspace clustering algorithm.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.