IEICE TRANSACTIONS on Information

  • Impact Factor

    0.59

  • Eigenfactor

    0.002

  • article influence

    0.1

  • Cite Score

    1.4

Advance publication (published online immediately after acceptance)

Volume E88-D No.10  (Publication Date:2005/10/01)

    Special Section on Image Recognition and Understanding
  • FOREWORD

    Noboru BABAGUCHI  

     
    FOREWORD

      Page(s):
    2241-2241
  • A Nonlinear Principal Component Analysis of Image Data

    Ryo SAEGUSA  Hitoshi SAKANO  Shuji HASHIMOTO  

     
    PAPER

      Page(s):
    2242-2248

    Principal Component Analysis (PCA) has been applied in various areas such as pattern recognition and data compression. In some cases, however, PCA does not extract the characteristics of the data-distribution efficiently. In order to overcome this problem, we have proposed a novel method of Nonlinear PCA which preserves the order of the principal components. In this paper, we reduce the dimensionality of image data using the proposed method, and examine its effectiveness in the compression and recognition of images.

  • Image Segmentation with Fast Wavelet-Based Color Segmenting and Directional Region Growing

    Din-Yuen CHAN  Chih-Hsueh LIN  Wen-Shyong HSIEH  

     
    PAPER

      Page(s):
    2249-2259

    This investigation proposes a fast wavelet-based color segmentation (FWCS) technique and a modified directional region-growing (DRG) technique for semantic image segmentation. The FWCS is a subsequent combination of progressive color truncation and histogram-based color extraction processes for segmenting color regions in images. By exploring specialized centroids of segmented fragments as initial growing seeds, the proposed DRG operates a directional 1-D region growing on pairs of color segmented regions based on those centroids. When the two examined regions are positively confirmed by DRG, the proposed framework subsequently computes the texture features extracted from these two regions to further check their relation using texture similarity testing (TST). If any pair of regions passes double checking with both DRG and TST, they are identified as associated regions. If two associated regions/areas are connective, they are unified to a union area enclosed by a single contour. On the contrary, the proposed framework merely acknowledges a linking relation between those associated regions/areas highlighted with any linking mark. Particularly, by the systematic integration of all proposed processes, the critical issue to decide the ending level of wavelet decomposition in various images can be efficiently solved in FWCS by a quasi-linear high-frequency analysis model newly proposed. The simulations conducted here demonstrate that the proposed segmentation framework can achieve a quasi-semantic segmentation without priori a high-level knowledge.

  • Statistical Optimization for 3-D Reconstruction from a Single View

    Kenichi KANATANI  Yasuyuki SUGAYA  

     
    PAPER

      Page(s):
    2260-2268

    We analyze the noise sensitivity of the focal length computation, the principal point estimation, and the orthogonality enforcement for single-view 3-D reconstruction based on vanishing points and orthogonality. We point out that due to the nonlinearity of the problem the standard statistical optimization is not very effective. We present a practical compromise for avoiding the computational failure and preserving high accuracy, allowing a consistent 3-D shape in the presence of however large noise.

  • Optimizing a Triangular Mesh for Shape Reconstruction from Images

    Atsutada NAKATSUJI  Yasuyuki SUGAYA  Kenichi KANATANI  

     
    PAPER

      Page(s):
    2269-2276

    In reconstructing 3-D from images based on feature points, one usually defines a triangular mesh that has these feature points as vertices and displays the scene as a polyhedron. If the scene itself is a polyhedron, however, some of the displayed edges may be inconsistent with the true shape. This paper presents a new technique for automatically eliminating such inconsistencies by using a special template. We also present a technique for removing spurious occluding edges. All the procedures do not require any thresholds to be adjusted. Using real images, we demonstrate that our method has high capability to correct inconsistencies.

  • Visual Direction Estimation from a Monocular Image

    Haiyuan WU  Qian CHEN  Toshikazu WADA  

     
    PAPER

      Page(s):
    2277-2285

    This paper describes a sophisticated method to estimate visual direction using iris contours. This method requires only one monocular image taken by a camera with unknown focal length. In order to estimate the visual direction, we assume the visual directions of both eyes are parallel and iris boundaries are circles in 3D space. In this case, the two planes where the iris boundaries reside are also parallel. We estimate the normal vector of the two planes from the iris contours extracted from an input image by using an extended "two-circle" algorithm. Unlike most existing gaze estimation algorithms that require information about eye corners and heuristic knowledge about 3D structure of the eye in addition to the iris contours, our method uses two iris contours only. Another contribution of our method is the ability of estimating the focal length of the camera. It allows one to use a zoom lens to take images and the focal length can be adjusted at any time. The extensive experiments over simulated images and real images demonstrate the robustness and the effectiveness of our method.

  • An Efficient Search Method Based on Dynamic Attention Map by Ising Model

    Kazuhiro HOTTA  Masaru TANAKA  Takio KURITA  Taketoshi MISHIMA  

     
    PAPER

      Page(s):
    2286-2295

    This paper presents Dynamic Attention Map by Ising model for face detection. In general, a face detector can not know where faces there are and how many faces there are in advance. Therefore, the face detector must search the whole regions on the image and requires much computational time. To speed up the search, the information obtained at previous search points should be used effectively. In order to use the likelihood of face obtained at previous search points effectively, Ising model is adopted to face detection. Ising model has the two-state spins; "up" and "down". The state of a spin is updated by depending on the neighboring spins and an external magnetic field. Ising spins are assigned to "face" and "non-face" states of face detection. In addition, the measured likelihood of face is integrated into the energy function of Ising model as the external magnetic field. It is confirmed that face candidates would be reduced effectively by spin flip dynamics. To improve the search performance further, the single level Ising search method is extended to the multilevel Ising search. The interactions between two layers which are characterized by the renormalization group method is used to reduce the face candidates. The effectiveness of the multilevel Ising search method is also confirmed by the comparison with the single level Ising search method.

  • Multiple Description Pattern Analysis: Robustness to Misclassification Using Local Discriminant Frame Expansions

    Widhyakorn ASDORNWISED  Somchai JITAPUNKUL  

     
    PAPER

      Page(s):
    2296-2307

    In this paper, a source coding model for learning multiple concept descriptions of data is proposed. Our source coding model is based on the concept of transmitting data over multiple channels, called multiple description (MD) coding. In particular, frame expansions have been used in our MD coding models for pattern classification. Using this model, there are several interesting properties within a class of multiple classifier algorithms that share with our proposed scheme. Generalization of the MD view under an extension of local discriminant basis towards the theory of frames allows the formulation of a generalized class of low-complexity learning algorithms applicable to high-dimensional pattern classification. To evaluate this approach, performance results for automatic target recognition (ATR) are presented for synthetic aperture radar (SAR) images from the MSTAR public release data set. From the experimental results, our approach outperforms state-of-the-art methods such as conditional Gaussian signal model, Adaboost, and ECOC-SVM.

  • A New Detection Approach for the Fingerprint Core Location Using Extended Relation Graph

    Tomohiko OHTSUKA  Takeshi TAKAHASHI  

     
    LETTER

      Page(s):
    2308-2312

    This paper describes a new approach to detect a fingerprint core location using the extended relational graph, which is generated by the segmentation of the ridge directional image. The extended relational graph presents the adjacency between segments of the directional image and the boundary information between segments of the directional image. The boundary curves generated by the boundary information in the extended relational graph is approximated to the straight lines. The fingerprint core location is calculated as center of the gravity in the points of intersection of these approximated lines. Experimental results show that 90.8% of the 130 fingerprint samples are succeeded to detect the core location.

  • Query Learning Method for Character Recognition Methods Using Genetic Algorithm

    Hitoshi SAKANO  

     
    LETTER

      Page(s):
    2313-2316

    We propose a learning method combining query learning and a "genetic translator" we previously developed. Query learning is a useful technique for high-accuracy, high-speed learning and reduction of training sample size. However, it has not been applied to practical optical character readers (OCRs) because human beings cannot recognize queries as character images in the feature space used in practical OCR devices. We previously proposed a character image reconstruction method using a genetic algorithm. This method is applied as a "translator" from feature space for query learning of character recognition. The results of an experiment with hand-written numeral recognition show the possibility of training sample size reduction.

  • Regular Section
  • Optimal Tracking Design for Hybrid Uncertain Input-Delay Systems under State and Control Constraints via Evolutionary Programming Approach

    Yu-Pin CHANG  

     
    PAPER-Algorithm Theory

      Page(s):
    2317-2328

    A novel digital redesign methodology based on evolutionary programming (EP) is introduced to find the 'best' digital controller for optimal tracking design of hybrid uncertain multi-input/ multi-output (MIMO) input-delay systems with constraints on states and controls. To deal with these multivariable concurrent specifications and system restrictions, instead of conventional interval methods, the proposed global optimization scheme is able to practically implement optimal digital controller for constrained uncertain hybrid systems with input time delay. Further, an illustrative example is included to demonstrate the efficiency of the proposed method.

  • Enumeration Methods for Repeatedly Solving Multidimensional Knapsack Sub-Problems

    Ross J.W. JAMES  Yuji NAKAGAWA  

     
    PAPER-Algorithm Theory

      Page(s):
    2329-2340

    In order to solve large Multidimensional Knapsack problems we examine a technique which decomposes a problem instance into two parts. The first part is solved using a traditional technique, such as Dynamic Programming, to reduce the number of variables in the problem by creating a single variable with many non-dominated states. In the second part the remaining variables are determined by an algorithm that repeatedly enumerates them with different constraint and objective requirements. The constraint and objective requirements are imposed by the various non-dominated states of the variable created in the first part of this technique. The main advantage of this approach is that when memory requirements prevent traditional techniques solving a problem instance, the enumeration provides a much less memory-intensive method, enabling a solution to be found. Two approaches are proposed for repeatedly enumerating a 0/1 Multidimensional Knapsack problem. It is demonstrated how these enumeration methods, in conjunction with the Modular Approach, were used to find the optimal solutions to a number of 500-variable, 5-constraint Multidimensional Knapsack problem instances proposed in the literature. The exact solutions to these instances were previously unknown.

  • A Polynomial-Time Algorithm for Merging Structured Documents

    Nobutaka SUZUKI  

     
    PAPER-Contents Technology and Web Information Systems

      Page(s):
    2341-2353

    Document merging is essential to synchronizing several versions of a document concurrently edited by two or more users. A few methods for merging structured documents have been proposed so far, and yet the methods may not always merge given documents appropriately. As an aid for finding an appropriate merging, using another approach we propose a polynomial-time algorithm for merging structured documents. In the approach, we merge given two documents (treated as ordered trees) by optimally transforming the documents into isomorphic ones, using operations such as add (add a new node), del (delete an existing node), and upd (make two nodes have the same label).

  • Rules and Algorithms for Phonetic Transcription of Standard Malay

    Yousif A. EL-IMAM  Zuraidah Mohd DON  

     
    PAPER-Speech and Hearing

      Page(s):
    2354-2372

    Phonetic transcription of text is an indispensable component of text-to-speech (TTS) systems and is used in acoustic modeling for speech recognition and other natural language processing applications. One approach to the transcription of written text into phonetic entities or sounds is to use a set of well-defined context and language-dependent rules. The process of transcribing text into sounds starts by preprocessing the text and representing it by lexical items to which the rules are applicable. The rules can be segregated into phonemic and phonetic rules. Phonemic rules operate on graphemes to convert them into phonemes. Phonetic rules operate on phonemes and convert them into context-dependent phonetic entities with actual sounds. Converting from written text into actual sounds, developing a comprehensive set of rules, and transforming the rules into implementable algorithms for any language cause several problems that have their origins in the relative lack of correspondence between the spelling of the lexical items and their sound contents. For Standard Malay (SM) these problems are not as severe as those for languages of complex spelling systems, such as English and French, but they do exist. In this paper, developing a comprehensive computerized system for processing SM text and transcribing it into phonetic entities and evaluating the performance of this system, irrespective of the application, is discussed. In particular, the following issues are dealt with in this paper: (1) the spelling and other problems of SM writing and their impact on converting graphemes into phonemes, (2) the development of a comprehensive set of grapheme-to-phoneme rules for SM, (3) a description of the phonetic variations of SM or how the phonemes of SM vary in context and the development of a set of phoneme-to-phonetic transcription rules, (4) the formulation of the phonemic and phonetic rules into algorithms that are applicable to the computer-based processing of input SM text, and (5) the evaluation of the performance of the process of converting SM text into actual sounds by the above mentioned methods.

  • Composite Support Vector Machines with Extended Discriminative Features for Accurate Face Detection

    Tae-Kyun KIM  Josef KITTLER  

     
    PAPER-Image Recognition, Computer Vision

      Page(s):
    2373-2379

    This paper describes a pattern classifier for detecting frontal-view faces via learning a decision boundary. The proposed classifier consists of two major parts for improving classification accuracy: the implicit modeling of both the face and the near-face classes resulting in an extended discriminative feature set, and the subsequent composite Support Vector Machines (SVMs) for speeding up the classification. For the extended discriminative feature set, Principal Component Analysis (PCA) or Independent Component Analysis (ICA) is performed for the face and near-face classes separately. The projections and distances to the two different subspaces are complementary, which significantly enhances classification accuracy of SVM. Multiple nonlinear SVMs are trained for the local facial feature spaces considering the general multi-modal characteristic of the face space. Each component SVM has a simpler boundary than that of a single SVM for the whole face space. The most appropriate component SVM is selected by a gating mechanism based on clustering. The classification by utilizing one of the multiple SVMs guarantees good generalization performance and speeds up face detection. The proposed classifier is finally implemented to work in real-time by cascading a boosting based face detector.

  • Texture Classification Using Hierarchical Linear Discriminant Space

    Yousun KANG  Ken'ichi MOROOKA  Hiroshi NAGAHASHI  

     
    PAPER-Image Recognition, Computer Vision

      Page(s):
    2380-2388

    As a representative of the linear discriminant analysis, the Fisher method is most widely used in practice and it is very effective in two-class classification. However, when it is expanded to a multi-class classification problem, the precision of its discrimination may become worse. A main reason is an occurrence of overlapped distributions on the discriminant space built by Fisher criterion. In order to take such overlaps among classes into consideration, our approach builds a new discriminant space by hierarchically classifying the overlapped classes. In this paper, we propose a new hierarchical discriminant analysis for texture classification. We divide the discriminant space into subspaces by recursively grouping the overlapped classes. In the experiment, texture images from many classes are classified based on the proposed method. We show the outstanding result compared with the conventional Fisher method.

  • A Computational Model for Taxonomy-Based Word Learning Inspired by Infant Developmental Word Acquisition

    Akira TOYOMURA  Takashi OMORI  

     
    PAPER-Biocybernetics, Neurocomputing

      Page(s):
    2389-2398

    To develop human interfaces such as home information equipment, highly capable word learning ability is required. In particular, in order to realize user-customized and situation-dependent interaction using language, a function is needed that can build new categories online in response to presented objects for an advanced human interface. However, at present, there are few basic studies focusing on the purpose of language acquisition with category formation. In this study, taking hints from an analogy between machine learning and infant developmental word acquisition, we propose a taxonomy-based word-learning model using a neural network. Through computer simulations, we show that our model can build categories and find the name of an object based on categorization.

  • Neural Network Training Algorithm with Positive Correlation

    Md. SHAHJAHAN  Kazuyuki MURASE  

     
    PAPER-Biocybernetics, Neurocomputing

      Page(s):
    2399-2409

    In this paper, we present a learning approach, positive correlation learning (PCL), that creates a multilayer neural network with good generalization ability. A correlation function is added to the standard error function of back propagation learning, and the error function is minimized by a steepest-descent method. During training, all the unnecessary units in the hidden layer are correlated with necessary ones in a positive sense. PCL can therefore create positively correlated activities of hidden units in response to input patterns. We show that PCL can reduce the information on the input patterns and decay the weights, which lead to improved generalization ability. Here, the information is defined with respect to hidden unit activity since the hidden unit plays a crucial role in storing the information on the input patterns. That is, as previously proposed, the information is defined by the difference between the uncertainty of the hidden unit at the initial stage of learning and the uncertainty of the hidden unit at the final stage of learning. After deriving new weight update rules for the PCL, we applied this method to several standard benchmark classification problems such as breast cancer, diabetes and glass identification problems. Experimental results confirmed that the PCL produces positively correlated hidden units and reduces significantly the amount of information, resulting improved generalization ability.

  • A High Speed Fuzzy Inference Processor with Dynamic Analysis and Scheduling Capabilities

    Shih-Hsu HUANG  Jian-Yuan LAI  

     
    LETTER-Computer Components

      Page(s):
    2410-2416

    The most obvious architectural solution for high-speed fuzzy inference is to exploit temporal parallelism and spatial parallelism inherited in a fuzzy inference execution. However, in fact, the active rules in each fuzzy inference execution are often only a small part of the total rules. In this paper, we present a new architecture that uses less hardware resources by discarding non-active rules in the earlier pipeline stage. Compared with previous work, implementation data show that the proposed architecture achieves very good results in terms of the inference speed and the chip area.

  • Properties of Role-Based Access Control in a Teaching Management System

    Kazushi TANIHIRA  Hiromi KOBAYASHI  

     
    LETTER-Educational Technology

      Page(s):
    2417-2421

    This paper presents properties of role-based access control which were obtained through a development of a prototype of a teaching management system. These properties are related to assignment of temporal constraints and access control procedure in terms of the corresponding flow of user's view and considered to be suitable to other information systems.

  • Hybrid Image Compression Scheme Based on PVQ and DCTVQ

    Zhe-Ming LU  Hui PEI  

     
    LETTER-Image Processing and Video Processing

      Page(s):
    2422-2426

    An efficient hybrid image vector quantization (VQ) technique based on a classification in the DCT domain is presented in this letter. This algorithm combines two kinds of VQ, predictive VQ (PVQ) and discrete cosine transform domain VQ (DCTVQ), and adopts a simple classifier which employs only three DCT coefficients in the 88 block. For each image block, the classifier switches to the PVQ coder if the block is relatively complex, and otherwise switches to the DCTVQ coder. Experimental results show that the proposed algorithm can achieve higher PSNR values than ordinary VQ, PVQ, JPEG, and JPEG2000 at the same bit-rate.

  • A Spatiotemporal Neuronal Filter for Channel Equalization and Video Restoration

    Elhassane IBNELHAJ  Driss ABOUTAJDINE  

     
    LETTER-Image Processing and Video Processing

      Page(s):
    2427-2431

    In this paper we present a 3D adaptive nonlinear filter, namely the 3D adaptive CPWLN, based on the Canonical Piece Wise-Linear Network with an LMS L-filter type of adaptation. This filter is used to equalize nonlinear channel effect and remove impulsive/or mixed impulsive and Additive White Gaussian noise from video sequences. First, motion compensation is performed by a robust estimator. Then, a 3-D CPWLN LMS L-filter is applied. The overall combination is able to adequately remove undesired effects of communication channel and noise. Computer simulations on real-world image sequences are included. The algorithm yields promising results in terms of both objective and subjective quality of the restored sequence.

  • Image Collector II: A System to Gather a Large Number of Images from the Web

    Keiji YANAI  

     
    LETTER-Image Processing and Video Processing

      Page(s):
    2432-2436

    We propose a system that enables us to gather hundreds of images related to one set of keywords provided by a user from the World Wide Web. The system is called Image Collector II. The Image Collector, which we proposed previously, can gather only one or two hundreds of images. We propose the two following improvements on our previous system in terms of the number of gathered images and their precision: (1) We extract some words appearing with high frequency from all HTML files in which output images are embedded in an initial image gathering, and using them as keywords, we carry out a second image gathering. Through this process, we can obtain hundreds of images for one set of keywords. (2) The more images we gather, the more the precision of gathered images decreases. To improve the precision, we introduce word vectors of HTML files embedding images into the image selecting process in addition to image feature vectors.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.