Keyword Search Result

[Keyword] image(1441hit)

161-180hit(1441hit)

  • Evaluating Deep Learning for Image Classification in Adversarial Environment

    Ye PENG  Wentao ZHAO  Wei CAI  Jinshu SU  Biao HAN  Qiang LIU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/12/23
      Vol:
    E103-D No:4
      Page(s):
    825-837

    Due to the superior performance, deep learning has been widely applied to various applications, including image classification, bioinformatics, and cybersecurity. Nevertheless, the research investigations on deep learning in the adversarial environment are still on their preliminary stage. The emerging adversarial learning methods, e.g., generative adversarial networks, have introduced two vital questions: to what degree the security of deep learning with the presence of adversarial examples is; how to evaluate the performance of deep learning models in adversarial environment, thus, to raise security advice such that the selected application system based on deep learning is resistant to adversarial examples. To see the answers, we leverage image classification as an example application scenario to propose a framework of Evaluating Deep Learning for Image Classification (EDLIC) to conduct comprehensively quantitative analysis. Moreover, we introduce a set of evaluating metrics to measure the performance of different attacking and defensive techniques. After that, we conduct extensive experiments towards the performance of deep learning for image classification under different adversarial environments to validate the scalability of EDLIC. Finally, we give some advice about the selection of deep learning models for image classification based on these comparative results.

  • Posture Recognition Technology Based on Kinect

    Yan LI  Zhijie CHU  Yizhong XIN  

     
    PAPER-Human-computer Interaction

      Pubricized:
    2019/12/12
      Vol:
    E103-D No:3
      Page(s):
    621-630

    Aiming at the complexity of posture recognition with Kinect, a method of posture recognition using distance characteristics is proposed. Firstly, depth image data was collected by Kinect, and three-dimensional coordinate information of 20 skeleton joints was obtained. Secondly, according to the contribution of joints to posture expression, 60 dimensional Kinect skeleton joint data was transformed into a vector of 24-dimensional distance characteristics which were normalized according to the human body structure. Thirdly, a static posture recognition method of the shortest distance and a dynamic posture recognition method of the minimum accumulative distance with dynamic time warping (DTW) were proposed. The experimental results showed that the recognition rates of static postures, non-cross-subject dynamic postures and cross-subject dynamic postures were 95.9%, 93.6% and 89.8% respectively. Finally, posture selection, Kinect placement, and comparisons with literatures were discussed, which provides a reference for Kinect based posture recognition technology and interaction design.

  • An Efficient Image to Sound Mapping Method Preserving Speech Spectral Envelope

    Yuya HOSODA  Arata KAWAMURA  Youji IIGUNI  

     
    LETTER-Digital Signal Processing

      Vol:
    E103-A No:3
      Page(s):
    629-630

    In this paper, we propose an image to sound mapping method. This technique treats an image as a spectrogram and maps it to a sound by taking inverse FFT of the spectrogram. Amplitude spectra of a speech signal are embedded to the spectrogram to give speech intelligibility for the mapped sound. Specifically, we hold amplitude spectra of a speech signal with strong power and embed the image brightness in other frequency bands. Holding amplitude spectra of a speech signal with strong power preserves a speech spectral envelope and improves the speech quality of the mapped sound. The amplitude spectra of the mapped sound with weak power represent the image brightness, and then the image is successfully reconstructed from the mapped sound. Simulation results show that the proposed method achieves sufficient speech quality.

  • Nonparametric Distribution Prior Model for Image Segmentation

    Ming DAI  Zhiheng ZHOU  Tianlei WANG  Yongfan GUO  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2019/10/21
      Vol:
    E103-D No:2
      Page(s):
    416-423

    In many real application scenarios of image segmentation problems involving limited and low-quality data, employing prior information can significantly improve the segmentation result. For example, the shape of the object is a kind of common prior information. In this paper, we introduced a new kind of prior information, which is named by prior distribution. On the basis of nonparametric statistical active contour model, we proposed a novel distribution prior model. Unlike traditional shape prior model, our model is not sensitive to the shapes of object boundary. Using the intensity distribution of objects and backgrounds as prior information can simplify the process of establishing and solving the model. The idea of constructing our energy function is as follows. During the contour curve convergence, while maximizing distribution difference between the inside and outside of the active contour, the distribution difference between the inside/outside of contour and the prior object/background is minimized. We present experimental results on a variety of synthetic and natural images. Experimental results demonstrate the potential of the proposed method that with the information of prior distribution, the segmentation effect and speed can be both improved efficaciously.

  • Virtual Address Remapping with Configurable Tiles in Image Processing Applications

    Jae Young HUR  

     
    PAPER-Computer System

      Pubricized:
    2019/10/17
      Vol:
    E103-D No:2
      Page(s):
    309-320

    The conventional linear or tiled address maps can degrade performance and memory utilization when traffic patterns are not matched with an underlying address map. The address map is usually fixed at design time. Accordingly, it is difficult to adapt to given applications. Modern embedded system usually accommodates memory management units (MMUs). As a result, depending on virtual address patterns, the system can suffer from performance overheads due to page table walks. To alleviate this performance overhead, we propose to cluster and rearrange tiles to construct an MMU-aware configurable address map. To construct the clustered tiled map, the generic tile number remapping algorithm is presented. In the presented scheme, an address map is configured based on the adaptive dimensioning algorithm. Considering image processing applications, a design, an analysis, an implementation, and simulations are conducted. The results indicate the proposed method can improve the performance and the memory utilization with moderate hardware costs.

  • Real-Time Image Processing Based on Service Function Chaining Using CPU-FPGA Architecture

    Yuta UKON  Koji YAMAZAKI  Koyo NITTA  

     
    PAPER-Network System

      Pubricized:
    2019/08/05
      Vol:
    E103-B No:1
      Page(s):
    11-19

    Advanced information-processing services based on cloud computing are in great demand. However, users want to be able to customize cloud services for their own purposes. To provide image-processing services that can be optimized for the purpose of each user, we propose a technique for chaining image-processing functions in a CPU-field programmable gate array (FPGA) coupled server architecture. One of the most important requirements for combining multiple image-processing functions on a network, is low latency in server nodes. However, large delay occurs in the conventional CPU-FPGA architecture due to the overheads of packet reordering for ensuring the correctness of image processing and data transfer between the CPU and FPGA at the application level. This paper presents a CPU-FPGA server architecture with a real-time packet reordering circuit for low-latency image processing. In order to confirm the efficiency of our idea, we evaluated the latency of histogram of oriented gradients (HOG) feature calculation as an offloaded image-processing function. The results show that the latency is about 26 times lower than that of the conventional CPU-FPGA architecture. Moreover, the throughput decreased by less than 3.7% under the worst-case condition where 90 percent of the packets are randomly swapped at a 40-Gbps input rate. Finally, we demonstrated that a real-time video monitoring service can be provided by combining image processing functions using our architecture.

  • Image Identification of Encrypted JPEG Images for Privacy-Preserving Photo Sharing Services

    Kenta IIDA  Hitoshi KIYA  

     
    PAPER

      Pubricized:
    2019/10/25
      Vol:
    E103-D No:1
      Page(s):
    25-32

    We propose an image identification scheme for double-compressed encrypted JPEG images that aims to identify encrypted JPEG images that are generated from an original JPEG image. To store images without any visual sensitive information on photo sharing services, encrypted JPEG images are generated by using a block-scrambling-based encryption method that has been proposed for Encryption-then-Compression systems with JPEG compression. In addition, feature vectors robust against JPEG compression are extracted from encrypted JPEG images. The use of the image encryption and feature vectors allows us to identify encrypted images recompressed multiple times. Moreover, the proposed scheme is designed to identify images re-encrypted with different keys. The results of a simulation show that the identification performance of the scheme is high even when images are recompressed and re-encrypted.

  • Good Group Sparsity Prior for Light Field Interpolation Open Access

    Shu FUJITA  Keita TAKAHASHI  Toshiaki FUJII  

     
    PAPER-Image

      Vol:
    E103-A No:1
      Page(s):
    346-355

    A light field, which is equivalent to a dense set of multi-view images, has various applications such as depth estimation and 3D display. One of the essential problems in light field applications is light field interpolation, i.e., view interpolation. The interpolation accuracy is enhanced by exploiting an inherent property of a light field. One example is that an epipolar plane image (EPI), which is a 2D subset of the 4D light field, consists of many lines, and these lines have almost the same slope in a local region. This structure induces a sparse representation in the frequency domain, where most of the energy resides on a line passing through the origin. On the basis of this observation, we propose a group sparsity prior suitable for light fields to exploit their line structure fully for interpolation. Specifically, we designed the directional groups in the discrete Fourier transform (DFT) domain so that the groups can represent the concentration of the energy, and we thereby formulated an LF interpolation problem as an overlapping group lasso. We also introduce several techniques to improve the interpolation accuracy such as applying a window function, determining group weights, expanding processing blocks, and merging blocks. Our experimental results show that the proposed method can achieve better or comparable quality as compared to state-of-the-art LF interpolation methods such as convolutional neural network (CNN)-based methods.

  • Improvement of the Quality of Visual Secret Sharing Schemes with Constraints on the Usage of Shares

    Mariko FUJII  Tomoharu SHIBUYA  

     
    PAPER

      Pubricized:
    2019/10/07
      Vol:
    E103-D No:1
      Page(s):
    11-24

    (k,n)-visual secret sharing scheme ((k,n)-VSSS) is a method to divide a secret image into n images called shares that enable us to restore the original image by only stacking at least k of them without any complicated computations. In this paper, we consider (2,2)-VSSS to share two secret images at the same time only by two shares, and investigate the methods to improve the quality of decoded images. More precisely, we consider (2,2)-VSSS in which the first secret image is decoded by stacking those two shares in the usual way, while the second one is done by stacking those two shares in the way that one of them is used reversibly. Since the shares must have some subpixels that inconsistently correspond to pixels of the secret images, the decoded pixels do not agree with the corresponding pixels of the secret images, which causes serious degradation of the quality of decoded images. To reduce such degradation, we propose several methods to construct shares that utilize 8-neighbor Laplacian filter and halftoning. Then we show that the proposed methods can effectively improve the quality of decoded images. Moreover, we demonstrate that the proposed methods can be naturally extended to (2,2)-VSSS for RGB images.

  • A Weighted Viewport Quality Metric for Omnidirectional Images

    Huyen T. T. TRAN  Trang H. HOANG  Phu N. MINH  Nam PHAM NGOC  Truong CONG THANG  

     
    LETTER

      Pubricized:
    2019/10/10
      Vol:
    E103-D No:1
      Page(s):
    67-70

    Thanks to the ability to bring immersive experiences to users, Virtual Reality (VR) technologies have been gaining popularity in recent years. A key component in VR systems is omnidirectional content, which can provide 360-degree views of scenes. However, at a given time, only a portion of the full omnidirectional content, called viewport, is displayed corresponding to the user's current viewing direction. In this work, we first develop Weighted-Viewport PSNR (W-VPSNR), an objective quality metric for quality assessment of omnidirectional content. The proposed metric takes into account the foveation feature of the human visual system. Then, we build a subjective database consisting of 72 stimuli with spatial varying viewport quality. By using this database, an evaluation of the proposed metric and four conventional metrics is conducted. Experiment results show that the W-VPSNR metric well correlates with the mean opinion scores (MOS) and outperforms the conventional metrics. Also, it is found that the conventional metrics do not perform well for omnidirectional content.

  • Adaptive-Partial Template Update with Center-Shifting Recovery for High Frame Rate and Ultra-Low Delay Deformation Matching

    Songlin DU  Yuhao XU  Tingting HU  Takeshi IKENAGA  

     
    PAPER-Image

      Vol:
    E102-A No:12
      Page(s):
    1872-1881

    High frame rate and ultra-low delay matching system plays an important role in various human-machine interactive applications, which demands better performance in matching deformable and out-of-plane rotating objects. Although many algorithms have been proposed for deformation tracking and matching, few of them are suitable for hardware implementation due to complicated operations and large time consumption. This paper proposes a hardware-oriented template update and recovery method for high frame rate and ultra-low delay deformation matching system. In the proposed method, the new template is generated in real time by partially updating the template descriptor and adding new keypoints simultaneously with the matching process in pixels (proposal #1), which avoids the large inter-frame delay. The size and shape of region of interest (ROI) are made flexible and the Hamming threshold used for brute-force matching is adjusted according to pixel position and the flexible ROI (proposal #2), which solves the problem of template drift. The template is recovered by the previous one with a relative center-shifting vector when it is judged as lost via region-wise difference check (proposal #3). Evaluation results indicate that the proposed method successfully achieves the real-time processing of 784fps at the resolution of 640×480 on field-programmable gate array (FPGA), with a delay of 0.808ms/frame, as well as achieves satisfactory deformation matching results in comparison with other general methods.

  • Image Regularization with Total Variation and Optimized Morphological Gradient Priors

    Shoya OOHARA  Mitsuji MUNEYASU  Soh YOSHIDA  Makoto NAKASHIZUKA  

     
    LETTER-Image

      Vol:
    E102-A No:12
      Page(s):
    1920-1924

    For image restoration, an image prior that is obtained from the morphological gradient has been proposed. In the field of mathematical morphology, the optimization of the structuring element (SE) used for this morphological gradient using a genetic algorithm (GA) has also been proposed. In this paper, we introduce a new image prior that is the sum of the morphological gradients and total variation for an image restoration problem to improve the restoration accuracy. The proposed image prior makes it possible to almost match the fitness to a quantitative evaluation such as the mean square error. It also solves the problem of the artifact due to the unsuitability of the SE for the image. An experiment shows the effectiveness of the proposed image restoration method.

  • An Image Fusion Scheme for Single-Shot High Dynamic Range Imaging with Spatially Varying Exposures

    Chihiro GO  Yuma KINOSHITA  Sayaka SHIOTA  Hitoshi KIYA  

     
    PAPER-Image

      Vol:
    E102-A No:12
      Page(s):
    1856-1864

    This paper proposes a novel multi-exposure image fusion (MEF) scheme for single-shot high dynamic range imaging with spatially varying exposures (SVE). Single-shot imaging with SVE enables us not only to produce images without color saturation regions from a single-shot image, but also to avoid ghost artifacts in the producing ones. However, the number of exposures is generally limited to two, and moreover it is difficult to decide the optimum exposure values before the photographing. In the proposed scheme, a scene segmentation method is applied to input multi-exposure images, and then the luminance of the input images is adjusted according to both of the number of scenes and the relationship between exposure values and pixel values. The proposed method with the luminance adjustment allows us to improve the above two issues. In this paper, we focus on dual-ISO imaging as one of single-shot imaging. In an experiment, the proposed scheme is demonstrated to be effective for single-shot high dynamic range imaging with SVE, compared with conventional MEF schemes with exposure compensation.

  • An Integrated Method to Remove Color Cast and Contrast Enhancement for Underwater Image Open Access

    Siaw-Lang WONG  Raveendran PARAMESRAN  Ibuki YOSHIDA  Akira TAGUCHI  

     
    PAPER-Image

      Vol:
    E102-A No:11
      Page(s):
    1524-1532

    Light scattering and absorption of light in water cause underwater images to be poorly contrasted, haze and dominated by a single color cast. A solution to this is to find methods to improve the quality of the image that eventually leads to better visualization. We propose an integrated approach using Adaptive Gray World (AGW) and Differential Gray-Levels Histogram Equalization for Color Images (DHECI) to remove the color cast as well as improve the contrast and colorfulness of the underwater image. The AGW is an adaptive version of the GW method where apart from computing the global mean, the local mean of each channel of an image is taken into consideration and both are weighted before combining them. It is applied to remove the color cast, thereafter the DHECI is used to improve the contrast and colorfulness of the underwater image. The results of the proposed method are compared with seven state-of-the-art methods using qualitative and quantitative measures. The experimental results showed that in most cases the proposed method produced better quantitative scores than the compared methods.

  • Discriminative Convolutional Neural Network for Image Quality Assessment with Fixed Convolution Filters

    Motohiro TAKAGI  Akito SAKURAI  Masafumi HAGIWARA  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2019/08/09
      Vol:
    E102-D No:11
      Page(s):
    2265-2266

    Current image quality assessment (IQA) methods require the original images for evaluation. However, recently, IQA methods that use machine learning have been proposed. These methods learn the relationship between the distorted image and the image quality automatically. In this paper, we propose an IQA method based on deep learning that does not require a reference image. We show that a convolutional neural network with distortion prediction and fixed filters improves the IQA accuracy.

  • Personalized Food Image Classifier Considering Time-Dependent and Item-Dependent Food Distribution Open Access

    Qing YU  Masashi ANZAWA  Sosuke AMANO  Kiyoharu AIZAWA  

     
    PAPER

      Pubricized:
    2019/06/21
      Vol:
    E102-D No:11
      Page(s):
    2120-2126

    Since the development of food diaries could enable people to develop healthy eating habits, food image recognition is in high demand to reduce the effort in food recording. Previous studies have worked on this challenging domain with datasets having fixed numbers of samples and classes. However, in the real-world setting, it is impossible to include all of the foods in the database because the number of classes of foods is large and increases continually. In addition to that, inter-class similarity and intra-class diversity also bring difficulties to the recognition. In this paper, we solve these problems by using deep convolutional neural network features to build a personalized classifier which incrementally learns the user's data and adapts to the user's eating habit. As a result, we achieved the state-of-the-art accuracy of food image recognition by the personalization of 300 food records per user.

  • Fast and Robust Disparity Estimation from Noisy Light Fields Using 1-D Slanted Filters

    Gou HOUBEN  Shu FUJITA  Keita TAKAHASHI  Toshiaki FUJII  

     
    PAPER

      Pubricized:
    2019/07/03
      Vol:
    E102-D No:11
      Page(s):
    2101-2109

    Depth (disparity) estimation from a light field (a set of dense multi-view images) is currently attracting much research interest. This paper focuses on how to handle a noisy light field for disparity estimation, because if left as it is, the noise deteriorates the accuracy of estimated disparity maps. Several researchers have worked on this problem, e.g., by introducing disparity cues that are robust to noise. However, it is not easy to break the trade-off between the accuracy and computational speed. To tackle this trade-off, we have integrated a fast denoising scheme in a fast disparity estimation framework that works in the epipolar plane image (EPI) domain. Specifically, we found that a simple 1-D slanted filter is very effective for reducing noise while preserving the underlying structure in an EPI. Moreover, this simple filtering does not require elaborate parameter configurations in accordance with the target noise level. Experimental results including real-world inputs show that our method can achieve good accuracy with much less computational time compared to some state-of-the-art methods.

  • Depth from Defocus Technique Based on Cross Reblurring

    Kazumi TAKEMURA  Toshiyuki YOSHIDA  

     
    PAPER

      Pubricized:
    2019/07/11
      Vol:
    E102-D No:11
      Page(s):
    2083-2092

    This paper proposes a novel Depth From Defocus (DFD) technique based on the property that two images having different focus settings coincide if they are reblurred with the opposite focus setting, which is referred to as the “cross reblurring” property in this paper. Based on the property, the proposed technique estimates the block-wise depth profile for a target object by minimizing the mean squared error between the cross-reblurred images. Unlike existing DFD techniques, the proposed technique is free of lens parameters and independent of point spread function models. A compensation technique for a possible pixel disalignment between images is also proposed to improve the depth estimation accuracy. The experimental results and comparisons with the other DFD techniques show the advantages of our technique.

  • Cauchy Aperture and Perfect Reconstruction Filters for Extending Depth-of-Field from Focal Stack Open Access

    Akira KUBOTA  Kazuya KODAMA  Asami ITO  

     
    PAPER

      Pubricized:
    2019/08/16
      Vol:
    E102-D No:11
      Page(s):
    2093-2100

    A pupil function of aperture in image capturing systems is theoretically derived such that one can perfectly reconstruct all-in-focus image through linear filtering of the focal stack. The perfect reconstruction filters are also designed based on the derived pupil function. The designed filters are space-invariant; hence the presented method does not require region segmentation. Simulation results using synthetic scenes shows effectiveness of the derived pupil function and the filters.

  • Phase-Based Periocular Recognition with Texture Enhancement Open Access

    Luis Rafael MARVAL-PÉREZ  Koichi ITO  Takafumi AOKI  

     
    PAPER-Image

      Vol:
    E102-A No:10
      Page(s):
    1351-1363

    Access control and surveillance applications like walking-through security gates and immigration control points have a great demand for convenient and accurate biometric recognition in unconstrained scenarios with low user cooperation. The periocular region, which is a relatively new biometric trait, has been attracting much attention for recognition of an individual in such scenarios. This paper proposes a periocular recognition method that combines Phase-Based Correspondence Matching (PB-CM) with a texture enhancement technique. PB-CM has demonstrated high recognition performance in other biometric traits, e.g., face, palmprint and finger-knuckle-print. However, a major limitation for periocular region is that the performance of PB-CM degrades when the periocular skin has poor texture. We address this problem by applying texture enhancement and found out that variance normalization of texture significantly improves the performance of periocular recognition using PB-CM. Experimental evaluation using three public databases demonstrates the advantage of the proposed method compared with conventional methods.

161-180hit(1441hit)

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.