Author Search Result

[Author] Kiyoharu AIZAWA(36hit)

1-20hit(36hit)

  • Noisy Localization Annotation Refinement for Object Detection

    Jiafeng MAO  Qing YU  Kiyoharu AIZAWA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/05/25
      Vol:
    E104-D No:9
      Page(s):
    1478-1485

    Well annotated dataset is crucial to the training of object detectors. However, the production of finely annotated datasets for object detection tasks is extremely labor-intensive, therefore, cloud sourcing is often used to create datasets, which leads to these datasets tending to contain incorrect annotations such as inaccurate localization bounding boxes. In this study, we highlight a problem of object detection with noisy bounding box annotations and show that these noisy annotations are harmful to the performance of deep neural networks. To solve this problem, we further propose a framework to allow the network to modify the noisy datasets by alternating refinement. The experimental results demonstrate that our proposed framework can significantly alleviate the influences of noise on model performance.

  • Cumulative Angular Distance Measure for Color Indexing

    Nagul COOHAROJANANONE  Kiyoharu AIZAWA  

     
    LETTER-Databases

      Vol:
    E84-D No:4
      Page(s):
    537-540

    In this paper we will present a new color distance measure, that is, angular distance of cumulative histogram. The proposed measure is robust to light variation. We also applied the weitght value to DR, DG, DB according to a Hue histogram of the query image. Moreover, we have compared the measure to previous popular measure that is cumulative L1 distance measure. We show that our method performed more accurate and perceptually relevant result.

  • Evaluating the Stability of Deep Image Quality Assessment with Respect to Image Scaling

    Koki TSUBOTA  Hiroaki AKUTSU  Kiyoharu AIZAWA  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2022/07/25
      Vol:
    E105-D No:10
      Page(s):
    1829-1833

    Image quality assessment (IQA) is a fundamental metric for image processing tasks (e.g., compression). With full-reference IQAs, traditional IQAs, such as PSNR and SSIM, have been used. Recently, IQAs based on deep neural networks (deep IQAs), such as LPIPS and DISTS, have also been used. It is known that image scaling is inconsistent among deep IQAs, as some perform down-scaling as pre-processing, whereas others instead use the original image size. In this paper, we show that the image scale is an influential factor that affects deep IQA performance. We comprehensively evaluate four deep IQAs on the same five datasets, and the experimental results show that image scale significantly influences IQA performance. We found that the most appropriate image scale is often neither the default nor the original size, and the choice differs depending on the methods and datasets used. We visualized the stability and found that PieAPP is the most stable among the four deep IQAs.

  • Retrieval of Images Captured by Car Cameras Using Its Front and Side Views and GPS Data

    Toshihiko YAMASAKI  Takayuki ISHIKAWA  Kiyoharu AIZAWA  

     
    PAPER

      Vol:
    E90-D No:1
      Page(s):
    217-223

    Recently, cars are equipped with a lot of sensors for safety driving. We have been trying to store the driving-scene video with such sensor data and to detect the change of scenery of streets. Detection results can be used for building historical database of town scenery, automatic landmark updating of maps, and so forth. In order to compare images to detect changes, image retrieval taken at nearly identical locations is required as the first step. Since Global Positioning System (GPS) data essentially contain some noises, we cannot rely only on GPS data for our image retrieval. Therefore, we have developed an image retrieval algorithm employing edge-histogram-based image features in conjunction with hierarchical search. By using edge histograms projected onto the vertical and horizontal axes, the retrieval has been made robust to image variation due to weather change, clouds, obstacles, and so on. In addition, matching cost has been made small by limiting the matching candidates employing the hierarchical search. Experimental results have demonstrated that the mean retrieval accuracy has been improved from 65% to 76% for the front-view images and from 34% to 53% for the side-view images.

  • FOREWORD

    Masayuki TANIMOTO  Kohichi SAKANIWA  Kiyoharu AIZAWA  Kazuyoshi OSHIMA  Kiyomi KUMOZAKI  Shuji TASAKA  Yoichi MAEDA  Takeshi MIZUIKE  Mikio YAMASHITA  Hideaki YAMANAKA  Koichiro WAKASUGI  Masaaki KATAYAMA  

     
    FOREWORD

      Vol:
    E81-B No:12
      Page(s):
    2253-2256
  • FOREWORD

    Yoshinori HATORI  Shuichi MATSUMOTO  Hiroshi KOTERA  Kiyoharu AIZAWA  Fumitaka ONO  Hideo KITAJIMA  Taizo KINOSHITA  Shigeru KUROE  Yutaka TANAKA  Hideo HASHIMOTO  Mitsuharu YANO  Toshiaki WATANABE  

     
    FOREWORD

      Vol:
    E79-B No:10
      Page(s):
    1413-1414
  • Capturing Wide-View Images with Uncalibrated Cameras

    Vincent van de LAAR  Kiyoharu AIZAWA  

     
    PAPER-Image Processing, Image Pattern Recognition

      Vol:
    E83-D No:4
      Page(s):
    895-903

    This paper describes a scheme to capture a wide-view image using a camera setup with uncalibrated cameras. The setup is such that the optical axes are pointed in divergent directions. The direction of view of the resulting image can be chosen freely in any direction between these two optical axes. The scheme uses eight-parameter perspective transformations to warp the images, the parameters of which are obtained by using a relative orientation algorithm. The focal length and scale factor of the two images are estimated by using Powell's multi-dimensional optimization technique. Experiments on real images show the accuracy of the scheme.

  • A New Image Sensor with Space Variant Sampling Control on a Focal Plane

    Yasuhiro OHTSUKA  Takayuki HAMAMOTO  Kiyoharu AIZAWA  

     
    PAPER

      Vol:
    E83-D No:7
      Page(s):
    1331-1337

    We propose a new sampling control system on image sensor array. Contrary to the random access pixels, the proposed sensor is able to read out spatially variant sampled pixels at high speed, without inputting pixel address for each access. The sampling positions can be changed dynamically by rewriting the sampling position memory. The proposed sensor has a memory array that stores the sampling positions. It can achieve any spatially varying sampling patterns. A prototype of 64 64 pixels are fabricated under 0.7 µm CMOS precess.

  • Model-Based Analysis Synthesis Coding of Videotelephone Images--Conception and Basic Study of Intelligent Image Coding--

    Hiroshi HARASHIMA  Kiyoharu AIZAWA  Takahiro SAITO  

     
    INVITED PAPER

      Vol:
    E72-E No:5
      Page(s):
    452-459

    This paper deals with the recent trends of reseaches on intelligent image coding technology focusing on model-based analysis synthesis coding. By means of the intelligent image coding scheme, we will be able to realize epock-making ultra-low-rate image transmission and/or so-called value-added visual telecommunications. In order to categorize the various image coding systems and examine their potential applications in the future, an approach to define generations of image coding technologies is presented. The future generation coding systems include the model-based analysis synthesis coding and knowledge-based intelligent coding. The latter half of the paper will be devoted to the recent work of the authors on the model-based analysis-synthesis coding system for facial images.

  • Recognition of Multiple Food Items in A Single Photo for Use in A Buffet-Style Restaurant Open Access

    Masashi ANZAWA  Sosuke AMANO  Yoko YAMAKATA  Keiko MOTONAGA  Akiko KAMEI  Kiyoharu AIZAWA  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2018/11/19
      Vol:
    E102-D No:2
      Page(s):
    410-414

    We investigate image recognition of multiple food items in a single photo, focusing on a buffet restaurant application, where menu changes at every meal, and only a few images per class are available. After detecting food areas, we perform hierarchical recognition. We evaluate our results, comparing to two baseline methods.

  • Summarization of 3D Video by Rate-Distortion Trade-off

    Jianfeng XU  Toshihiko YAMASAKI  Kiyoharu AIZAWA  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E90-D No:9
      Page(s):
    1430-1438

    3D video, which consists of a sequence of mesh models, can reproduce dynamic scenes containing 3D information. To summarize 3D video, a key frame extraction method is developed using rate-distortion (R-D) trade-off. For this purpose, an effective feature vector is extracted for each frame. Shot detection is performed using the feature vectors as a preprocessing followed by key frame extraction. Simple but reasonable definitions of rate and distortion are presented. Based on an assumption of linearity, an R-D curve is generated in each shot, where the locations of the key frames are optimized. Finally, R-D trade-off can be achieved by optimizing a cost function using a Lagrange multiplier, where the number of key frames is optimized in each shot. Therefore, our system will automatically determine the best locations and the number of key frames in the sense of R-D trade-off. Our experimental results show the extracted key frames are compact and faithful to the original 3D video.

  • Computational Sensors -- Vision VLSI

    Kiyoharu AIZAWA  

     
    INVITED SURVEY PAPER

      Vol:
    E82-D No:3
      Page(s):
    580-588

    Computational sensor (smart sensor, vision chip in other words) is a very small integrated system, in which processing and sensing are unified on a single VLSI chip. It is designed for a specific targeted application. Research activities of computational sensor are described in this paper. There have been quite a few proposals and implementations in computational sensors. Firstly, their approaches are summarized from several points of view, such as advantage vs. disadvantage, neural vs. functional, architecture, analog vs. digital, local vs. global processing, imaging vs. processing, new processing paradigms. Then, several examples are introduced which are spatial processings, temporal processings, A/D conversions, programmable computational sensors. Finally, the paper is concluded.

  • Watermarking Using Inter-Block Correlation: Extension to JPEG Coded Domain

    Yoonki CHOI  Kiyoharu AIZAWA  

     
    LETTER-Information Security

      Vol:
    E84-A No:3
      Page(s):
    893-897

    Digital watermarking schemes have been discussed to solve the problem associated with copyright enforcement. Previously, we proposed a method using inter-block correlation of DCT coefficients. It has the features that the embedded watermark can be extracted without the original image nor the parameters used in embedding process and that the amount of modification, the strength of embedded watermark, depends on the local feature of an image. This feature makes it difficult for pirate to predict the position in which the watermark signal is embedded. In this paper, we propose a method which can embed/extract watermark with high speed by utilizing this watermarking method for JPEG file format.

  • Robust Object-Based Watermarking Using Feature Matching

    Viet-Quoc PHAM  Takashi MIYAKI  Toshihiko YAMASAKI  Kiyoharu AIZAWA  

     
    PAPER-Application Information Security

      Vol:
    E91-D No:7
      Page(s):
    2027-2034

    We present a robust object-based watermarking algorithm using the scale-invariant feature transform (SIFT) in conjunction with a data embedding method based on Discrete Cosine Transform (DCT). The message is embedded in the DCT domain of randomly generated blocks in the selected object region. To recognize the object region after being distorted, its SIFT features are registered in advance. In the detection scheme, we extract SIFT features from the distorted image and match them with the registered ones. Then we recover the distorted object region based on the transformation parameters obtained from the matching result using SIFT, and the watermarked message can be detected. Experimental results demonstrated that our proposed algorithm is very robust to distortions such as JPEG compression, scaling, rotation, shearing, aspect ratio change, and image filtering.

  • Detection and Tracking of Facial Features by Using Edge Pixel Counting and Deformable Circular Template Matching

    Liyanage C. DE SILVA  Kiyoharu AIZAWA  Mitsutoshi HATORI  

     
    PAPER-Image Processing, Computer Graphics and Pattern Recognition

      Vol:
    E78-D No:9
      Page(s):
    1195-1207

    In this paper face feature detection and tracking are discussed, using methods called edge pixel counting and deformable circular template matching. Instead of utilizing color or gray scale information of the facial image, the proposed edge pixel counting method utilizes the edge information to estimate the face feature positions such as eyes, nose and mouth, using a variable size face feature template, the initial size of which is predetermined by using a facial image database. The method is robust in the sense that the detection is possible with facial images with different skin color and different facial orientations. Subsequently, by using a deformable circular template matching two iris positions of the face are determined and are used in the edge pixel counting, to track the features in the next frame. Although feature tracking using gray scale template matching often fails when inter frame correlation around the feature areas are very low due to facial expression change (such as, talking, smiling, eye blinking etc.), feature tracking using edge pixel counting can track facial features reliably. Some experimental results are shown to demonstrate the effectiveness of the proposed method.

  • Subband Image Coding with Biorthogonal Wavelets

    Cha Keon CHEONG  Kiyoharu AIZAWA  Takahiro SAITO  Mitsutoshi HATORI  

     
    PAPER-Image Coding and Compression

      Vol:
    E75-A No:7
      Page(s):
    871-881

    In this paper, subband image coding with symmetric biorthogonal wavelet filters is studied. In order to implement the symmetric biorthogonal wavelet basis, we use the Laplacian Pyramid Model (LPM) and the trigonometric polynomial solution method. These symmetric biorthogonal wavelet basis are used to form filters in each subband. Also coefficients of the filter are optimized with respect to the coding efficiency. From this optimization, we show that the values of a in the LPM generating kernel have the best coding efficiency in the range of 0.7 to 0.75. We also present an optimal bit allocation method based on considerations of the reconstruction filter characteristics. The step size of each subband uniform quantizer is determined by using this bit allocation method. The coding efficiency of the symmetric biorthogonal wavelet filter is compared with those of other filters: QMF, SSKF and Orthonormal wavelet filter. Simulation results demonstrate that the symmetric biorthogonal wavelet filter is useful as a basic means for image analysis/synthesis filters and can give better coding efficiency than other filters.

  • 3-D Modeling of Real World by Fusing Multi-View Range Data and Texture Images

    Conny GUNADI  Hiroyuki SHIMIZU  Kazuya KODAMA  Kiyoharu AIZAWA  

     
    PAPER-Image Processing, Image Pattern Recognition

      Vol:
    E86-D No:5
      Page(s):
    947-955

    Construction of large-scale virtual environment is gaining more attentions for its applications in virtual mall, virtual sightseeing, tele-presence, etc. This paper presents a framework for building a realistic virtual environment from geometry-based approach. We propose an algorithm to construct a realistic 3-D model from multi-view range data and multi-view texture images. The proposed method tries to adopt the result of region segmentation of range images in some phases of the modeling process. It is shown that the relations obtained from region segmentation are quite effective in improving the result of registration as well as mesh merging.

  • SIFT-Based Non-blind Watermarking Robust to Non-linear Geometrical Distortions

    Toshihiko YAMASAKI  Kiyoharu AIZAWA  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E96-D No:6
      Page(s):
    1368-1375

    This paper presents a non-blind watermarking technique that is robust to non-linear geometric distortion attacks. This is one of the most challenging problems for copyright protection of digital content because it is difficult to estimate the distortion parameters for the embedded blocks. In our proposed scheme, the location of the blocks are recorded by the translation parameters from multiple Scale Invariant Feature Transform (SIFT) feature points. This method is based on two assumptions: SIFT features are robust to non-linear geometric distortion and even such non-linear distortion can be regarded as “linear” distortion in local regions. We conducted experiments using 149,800 images (7 standard images and 100 images downloaded from Flickr, 10 different messages, 10 different embedding block patterns, and 14 attacks). The results show that the watermark detection performance is drastically improved, while the baseline method can achieve only chance level accuracy.

  • Estimation of Semantic Impressions from Portraits

    Mari MIYATA  Kiyoharu AIZAWA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/03/18
      Vol:
    E104-D No:6
      Page(s):
    863-872

    In this paper, we present a novel portrait impression estimation method using nine pairs of semantic impression words: bitter-majestic, clear-pure, elegant-mysterious, gorgeous-mature, modern-intellectual, natural-mild, sporty-agile, sweet-sunny, and vivid-dynamic. In the first part of the study, we analyzed the relationship between the facial features in deformed portraits and the nine semantic impression word pairs over a large dataset, which we collected by a crowdsourcing process. In the second part, we leveraged the knowledge from the results of the analysis to develop a ranking network trained on the collected data and designed to estimate the semantic impression associated with a portrait. Our network demonstrated superior performance in impression estimation compared with current state-of-the-art methods.

  • Vision Chip for Very Fast Detection of Motion Vectors: Design and Implementation

    Zheng LI  Kiyoharu AIZAWA  

     
    PAPER-Imaging Circuits and Algorithms

      Vol:
    E82-C No:9
      Page(s):
    1739-1748

    This paper gives a detailed presentation of a "vision chip" for a very fast detection of motion vectors. The chip's design consists of a parallel pixel array and column parallel block-matching processors. Each pixel of the pixel array contains a photo detector, an edge detector and 4 bits of memory. In the detection of motion vectors, first, the gray level image is binarized by the edge detector and subsequently the binary edge data is used in the block matching processor. The block-matching takes place locally in pixel and globally in column. The chip can create a dense field of motion where a vector is assigned to each pixel by overlapping 2 2 target blocks. A prototype with 16 16 pixels and four block-matching processors has been designed and implemented. Preliminary results obtained by the prototype are shown.

1-20hit(36hit)

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.