IEICE globals.ieice.org Site

Keyword Search Result

[Keyword] image(1441hit)

121-140hit(1441hit)

Maritime Target Detection Based on Electronic Image Stabilization Technology of Shipborne Camera
Xiongfei SHAN Mingyang PAN Depeng ZHAO Deqiang WANG Feng-Jang HWANG Chi-Hua CHEN

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2021/04/02
Vol:
E104-D No:7
Page(s):
948-960
During the detection of maritime targets, the jitter of the shipborne camera usually causes the video instability and the false or missed detection of targets. Aimed at tackling this problem, a novel algorithm for maritime target detection based on the electronic image stabilization technology is proposed in this study. The algorithm mainly includes three models, namely the points line model (PLM), the points classification model (PCM), and the image classification model (ICM). The feature points (FPs) are firstly classified by the PLM, and stable videos as well as target contours are obtained by the PCM. Then the smallest bounding rectangles of the target contours generated as the candidate bounding boxes (bboxes) are sent to the ICM for classification. In the experiments, the ICM, which is constructed based on the convolutional neural network (CNN), is trained and its effectiveness is verified. Our experimental results demonstrate that the proposed algorithm outperformed the benchmark models in all the common metrics including the mean square error (MSE), peak signal to noise ratio (PSNR), structural similarity index (SSIM), and mean average precision (mAP) by at least -47.87%, 8.66%, 6.94%, and 5.75%, respectively. The proposed algorithm is superior to the state-of-the-art techniques in both the image stabilization and target ship detection, which provides reliable technical support for the visual development of unmanned ships.
Image Captioning Algorithm Based on Multi-Branch CNN and Bi-LSTM
Shan HE Yuanyao LU Shengnan CHEN

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2021/04/19
Vol:
E104-D No:7
Page(s):
941-947
The development of deep learning and neural networks has brought broad prospects to computer vision and natural language processing. The image captioning task combines cutting-edge methods in two fields. By building an end-to-end encoder-decoder model, its description performance can be greatly improved. In this paper, the multi-branch deep convolutional neural network is used as the encoder to extract image features, and the recurrent neural network is used to generate descriptive text that matches the input image. We conducted experiments on Flickr8k, Flickr30k and MSCOCO datasets. According to the analysis of the experimental results on evaluation metrics, the model proposed in this paper can effectively achieve image caption, and its performance is better than classic image captioning models such as neural image annotation models.
Scene Adaptive Exposure Time Control for Imaging and Apparent Motion Sensor Open Access
Misaki SHIKAKURA Yusuke KAMEDA Takayuki HAMAMOTO

LETTER

Pubricized:
2021/01/07
Vol:
E104-A No:6
Page(s):
907-911
This paper reports the evolution and application potential of image sensors with high-speed brightness gradient sensors. We propose an adaptive exposure time control method using the apparent motion estimated by this sensor, and evaluate results for the change in illuminance and global / local motion.
Image Enhancement in 26GHz-Band 1-Bit Direct Digital RF Transmitter Using Manchester Coding
Junhao ZHANG Masafumi KAZUNO Mizuki MOTOYOSHI Suguru KAMEDA Noriharu SUEMATSU

PAPER-Wireless Communication Technologies

Pubricized:
2020/12/03
Vol:
E104-B No:6
Page(s):
654-663
In this paper, we propose a direct digital RF transmitter with a 1-bit band-pass delta-sigma modulator (BP-DSM) that uses high order image components of the 7th Nyquist zone in Manchester coding for microwave and milimeter wave application. Compared to the conventional non-return-to-zero (NRZ) coding, in which the high order image components of 1-bit BP-DSM attenuate severely in the form of sinc function, the proposed 1-bit direct digital RF transmitter in Manchester code can improve the output power and signal-to-noise ratio (SNR) of the image components at specific (4n-1)th and (4n-2)th Nyquist Zone, which is confirmed by calculating of the power spectral density. Measurements are made to compare three types of 1-bit digital-to-analog converter (DAC) signal in output power and SNR; NRZ, 50% duty return-to-zero (RZ) and Manchester coding. By using 1 Vpp/8Gbps DAC output, 1-bit signals in Manchester coding show the highest output power of -20.3dBm and SNR of 40.3dB at 7th Nyquist Zone (26GHz) in CW condition. As a result, compared to NRZ and RZ coding, at 7th Nyquist zone, the output power is improved by 8.1dB and 6dB, respectively. Meanwhile, the SNR is improved by 7.6dB and 4.9dB, respectively. In 5Mbps-QPSK condition, 1-bit signals in Manchester code show the lowest error vector magnitude (EVM) of 2.4% and the highest adjacent channel leakage ratio (ACLR) of 38.2dB with the highest output power of -18.5dBm at 7th Nyquist Zone (26GHz), respectively, compared to the NRZ and 50% duty RZ coding. The measurement and simulation results of the image component of 1-bit signals at 7th Nyquist Zone (26GHz) are consistent with the calculation results.
Hyperspectral Image Denoising Using Tensor Decomposition under Multiple Constraints
Zhen LI Baojun ZHAO Wenzheng WANG Baoxian WANG

LETTER-Image

Pubricized:
2020/12/01
Vol:
E104-A No:6
Page(s):
949-953
Hyperspectral images (HSIs) are generally susceptible to various noise, such as Gaussian and stripe noise. Recently, numerous denoising algorithms have been proposed to recover the HSIs. However, those approaches cannot use spectral information efficiently and suffer from the weakness of stripe noise removal. Here, we propose a tensor decomposition method with two different constraints to remove the mixed noise from HSIs. For a HSI cube, we first employ the tensor singular value decomposition (t-SVD) to effectively preserve the low-rank information of HSIs. Considering the continuity property of HSIs spectra, we design a simple smoothness constraint by using Tikhonov regularization for tensor decomposition to enhance the denoising performance. Moreover, we also design a new unidirectional total variation (TV) constraint to filter the stripe noise from HSIs. This strategy will achieve better performance for preserving images details than original TV models. The developed method is evaluated on both synthetic and real noisy HSIs, and shows the favorable results.
Domain Adaptive Cross-Modal Image Retrieval via Modality and Domain Translations
Rintaro YANAGI Ren TOGO Takahiro OGAWA Miki HASEYAMA

PAPER

Pubricized:
2020/11/30
Vol:
E104-A No:6
Page(s):
866-875
Various cross-modal retrieval methods that can retrieve images related to a query sentence without text annotations have been proposed. Although a high level of retrieval performance is achieved by these methods, they have been developed for a single domain retrieval setting. When retrieval candidate images come from various domains, the retrieval performance of these methods might be decreased. To deal with this problem, we propose a new domain adaptive cross-modal retrieval method. By translating a modality and domains of a query and candidate images, our method can retrieve desired images accurately in a different domain retrieval setting. Experimental results for clipart and painting datasets showed that the proposed method has better retrieval performance than that of other conventional and state-of-the-art methods.
Multiclass Dictionary-Based Statistical Iterative Reconstruction for Low-Dose CT
Hiryu KAMOSHITA Daichi KITAHARA Ken'ichi FUJIMOTO Laurent CONDAT Akira HIRABAYASHI

PAPER-Numerical Analysis and Optimization

Pubricized:
2020/10/06
Vol:
E104-A No:4
Page(s):
702-713
This paper proposes a high-quality computed tomography (CT) image reconstruction method from low-dose X-ray projection data. A state-of-the-art method, proposed by Xu et al., exploits dictionary learning for image patches. This method generates an overcomplete dictionary from patches of standard-dose CT images and reconstructs low-dose CT images by minimizing the sum of a data fidelity and a regularization term based on sparse representations with the dictionary. However, this method does not take characteristics of each patch, such as textures or edges, into account. In this paper, we propose to classify all patches into several classes and utilize an individual dictionary with an individual regularization parameter for each class. Furthermore, for fast computation, we introduce the orthogonality to column vectors of each dictionary. Since similar patches are collected in the same cluster, accuracy degradation by the orthogonality hardly occurs. Our simulations show that the proposed method outperforms the state-of-the-art in terms of both accuracy and speed.
Backbone Alignment and Cascade Tiny Object Detecting Techniques for Dolphin Detection and Classification
Yih-Cherng LEE Hung-Wei HSU Jian-Jiun DING Wen HOU Lien-Shiang CHOU Ronald Y. CHANG

PAPER-Image

Pubricized:
2020/09/29
Vol:
E104-A No:4
Page(s):
734-743
Automatic tracking and classification are essential for studying the behaviors of wild animals. Owing to dynamic far-shooting photos, the occlusion problem, protective coloration, the background noise is irregular interference for designing a computerized algorithm for reducing human labeling resources. Moreover, wild dolphin images are hard-acquired by on-the-spot investigations, which takes a lot of waiting time and hardly sets the fixed camera to automatic monitoring dolphins on the ocean in several days. It is challenging tasks to detect well and classify a dolphin from polluted photos by a single famous deep learning method in a small dataset. Therefore, in this study, we propose a generic Cascade Small Object Detection (CSOD) algorithm for dolphin detection to handle small object problems and develop visualization to backbone based classification (V2BC) for removing noise, highlighting features of dolphin and classifying the name of dolphin. The architecture of CSOD consists of the P-net and the F-net. The P-net uses the crude Yolov3 detector to be a core network to predict all the regions of interest (ROIs) at lower resolution images. Then, the F-net, which is more robust, is applied to capture the ROIs from high-resolution photos to solve single detector problems. Moreover, a visualization to backbone based classification (V2BC) method focuses on extracting significant regions of occluded dolphin and design significant post-processing by referencing the backbone of dolphins to facilitate for classification. Compared to the state of the art methods, including faster-rcnn, yolov3 detection and Alexnet, the Vgg, and the Resnet classification. All experiments show that the proposed algorithm based on CSOD and V2BC has an excellent performance in dolphin detection and classification. Consequently, compared to the related works of classification, the accuracy of the proposed designation is over 14% higher. Moreover, our proposed CSOD detection system has 42% higher performance than that of the original Yolov3 architecture.
Robust Blind Watermarking Algorithm Based on Contourlet Transform with Singular Value Decomposition
Lei SONG Xue-Cheng SUN Zhe-Ming LU

LETTER-Cryptography and Information Security

Pubricized:
2020/09/11
Vol:
E104-A No:3
Page(s):
640-643
In this Letter, we propose a blind and robust multiple watermarking scheme using Contourlet transform and singular value decomposition (SVD). The host image is first decomposed by Contourlet transform. Singular values of Contourlet coefficient blocks are adopted to embed watermark information, and a fast calculation method is proposed to avoid the heavy computation of SVD. The watermark is embedded in both low and high frequency Contourlet coefficients to increase the robustness against various attacks. Moreover, the proposed scheme intrinsically exploits the characteristics of human visual system and thus can ensure the invisibility of the watermark. Simulation results show that the proposed scheme outperforms other related methods in terms of both robustness and execution time.
GAN-Based Image Compression Using Mutual Information for Optimizing Subjective Image Similarity
Shinobu KUDO Shota ORIHASHI Ryuichi TANIDA Seishi TAKAMURA Hideaki KIMATA

PAPER-Image Processing and Video Processing

Pubricized:
2020/12/02
Vol:
E104-D No:3
Page(s):
450-460
Recently, image compression systems based on convolutional neural networks that use flexible nonlinear analysis and synthesis transformations have been developed to improve the restoration accuracy of decoded images. Although these methods that use objective metric such as peak signal-to-noise ratio and multi-scale structural similarity for optimization attain high objective results, such metric may not reflect human visual characteristics and thus degrade subjective image quality. A method using a framework called a generative adversarial network (GAN) has been reported as one of the methods aiming to improve the subjective image quality. It optimizes the distribution of restored images to be close to that of natural images; thus it suppresses visual artifacts such as blurring, ringing, and blocking. However, since methods of this type are optimized to focus on whether the restored image is subjectively natural or not, components that are not correlated with the original image are mixed into the restored image during the decoding process. Thus, even though the appearance looks natural, subjective similarity may be degraded. In this paper, we investigated why the conventional GAN-based compression techniques degrade subjective similarity, then tackled this problem by rethinking how to handle image generation in the GAN framework between image sources with different probability distributions. The paper describes a method to maximize mutual information between the coding features and the restored images. Experimental results show that the proposed mutual information amount is clearly correlated with subjective similarity and the method makes it possible to develop image compression systems with high subjective similarity.
Flexoelectric Effect on Image Sticking Caused by Residual Direct Current Voltage and Flicker Phenomenon in Fringe-Field Switching Mode Liquid Crystal Display Open Access
Daisuke INOUE Tomomi MIYAKE Mitsuhiro SUGIMOTO

INVITED PAPER-Electronic Displays

Pubricized:
2020/07/21
Vol:
E104-C No:2
Page(s):
45-51
Although transmittance changes like a quadratic function due to the DC offset voltage in FFS mode LCD, its bottom position and flicker minimum DC offset voltage varies depending on the gray level due to the flexoelectric effect. We demonstrated how the influence of the flexoelectric effect changes depending on the electrode width or black matrix position.
SEM Image Quality Assessment Based on Texture Inpainting
Zhaolin LU Ziyan ZHANG Yi WANG Liang DONG Song LIANG

LETTER-Image Processing and Video Processing

Pubricized:
2020/10/30
Vol:
E104-D No:2
Page(s):
341-345
This letter presents an image quality assessment (IQA) metric for scanning electron microscopy (SEM) images based on texture inpainting. Inspired by the observation that the texture information of SEM images is quite sensitive to distortions, a texture inpainting network is first trained to extract texture features. Then the weights of the trained texture inpainting network are transferred to the IQA network to help it learn an effective texture representation of the distorted image. Finally, supervised fine-tuning is conducted on the IQA network to predict the image quality score. Experimental results on the SEM image quality dataset demonstrate the advantages of the presented method.
Identification of Multiple Image Steganographic Methods Using Hierarchical ResNets
Sanghoon KANG Hanhoon PARK Jong-Il PARK

LETTER-Image Recognition, Computer Vision

Pubricized:
2020/11/19
Vol:
E104-D No:2
Page(s):
350-353
Image deformations caused by different steganographic methods are typically extremely small and highly similar, which makes their detection and identification to be a difficult task. Although recent steganalytic methods using deep learning have achieved high accuracy, they have been made to detect stego images to which specific steganographic methods have been applied. In this letter, a staganalytic method is proposed that uses hierarchical residual neural networks (ResNet), allowing detection (i.e. classification between stego and cover images) and identification of four spatial steganographic methods (i.e. LSB, PVD, WOW and S-UNIWARD). Experimental results show that using hierarchical ResNets achieves a classification rate of 79.71% in quinary classification, which is approximately 23% higher compared to using a plain convolutional neural network (CNN).
Rethinking the Rotation Invariance of Local Convolutional Features for Content-Based Image Retrieval
Longjiao ZHAO Yu WANG Jien KATO

PAPER-Image Processing and Video Processing

Pubricized:
2020/10/14
Vol:
E104-D No:1
Page(s):
174-182
Recently, local features computed using convolutional neural networks (CNNs) show good performance to image retrieval. The local convolutional features obtained by the CNNs (LC features) are designed to be translation invariant, however, they are inherently sensitive to rotation perturbations. This leads to miss-judgements in retrieval tasks. In this work, our objective is to enhance the robustness of LC features against image rotation. To do this, we conduct a thorough experimental evaluation of three candidate anti-rotation strategies (in-model data augmentation, in-model feature augmentation, and post-model feature augmentation), over two kinds of rotation attack (dataset attack and query attack). In the training procedure, we implement a data augmentation protocol and network augmentation method. In the test procedure, we develop a local transformed convolutional (LTC) feature extraction method, and evaluate it over different network configurations. We end up a series of good practices with steady quantitative supports, which lead to the best strategy for computing LC features with high rotation invariance in image retrieval.
Target-Oriented Deformation of Visual-Semantic Embedding Space
Takashi MATSUBARA

PAPER

Pubricized:
2020/09/24
Vol:
E104-D No:1
Page(s):
24-33
Multimodal embedding is a crucial research topic for cross-modal understanding, data mining, and translation. Many studies have attempted to extract representations from given entities and align them in a shared embedding space. However, because entities in different modalities exhibit different abstraction levels and modality-specific information, it is insufficient to embed related entities close to each other. In this study, we propose the Target-Oriented Deformation Network (TOD-Net), a novel module that continuously deforms the embedding space into a new space under a given condition, thereby providing conditional similarities between entities. Unlike methods based on cross-modal attention applied to words and cropped images, TOD-Net is a post-process applied to the embedding space learned by existing embedding systems and improves their performances of retrieval. In particular, when combined with cutting-edge models, TOD-Net gains the state-of-the-art image-caption retrieval model associated with the MS COCO and Flickr30k datasets. Qualitative analysis reveals that TOD-Net successfully emphasizes entity-specific concepts and retrieves diverse targets via handling higher levels of diversity than existing models.
Generation and Detection of Media Clones Open Access
Isao ECHIZEN Noboru BABAGUCHI Junichi YAMAGISHI Naoko NITTA Yuta NAKASHIMA Kazuaki NAKAMURA Kazuhiro KONO Fuming FANG Seiko MYOJIN Zhenzhong KUANG Huy H. NGUYEN Ngoc-Dung T. TIEU

INVITED PAPER

Pubricized:
2020/10/19
Vol:
E104-D No:1
Page(s):
12-23
With the spread of high-performance sensors and social network services (SNS) and the remarkable advances in machine learning technologies, fake media such as fake videos, spoofed voices, and fake reviews that are generated using high-quality learning data and are very close to the real thing are causing serious social problems. We launched a research project, the Media Clone (MC) project, to protect receivers of replicas of real media called media clones (MCs) skillfully fabricated by means of media processing technologies. Our aim is to achieve a communication system that can defend against MC attacks and help ensure safe and reliable communication. This paper describes the results of research in two of the five themes in the MC project: 1) verification of the capability of generating various types of media clones such as audio, visual, and text derived from fake information and 2) realization of a protection shield for media clones' attacks by recognizing them.
AdaLSH: Adaptive LSH for Solving c-Approximate Maximum Inner Product Search Problem
Kejing LU Mineichi KUDO

PAPER-Data Engineering, Web Information Systems

Pubricized:
2020/10/13
Vol:
E104-D No:1
Page(s):
138-145
Maximum inner product search (MIPS) problem has gained much attention in a wide range of applications. In order to overcome the curse of dimensionality in high-dimensional spaces, most of existing methods first transform the MIPS problem into another approximate nearest neighbor search (ANNS) problem and then solve it by Locality Sensitive Hashing (LSH). However, due to the error incurred by the transmission and incomprehensive search strategies, these methods suffer from low precision and have loose probability guarantees. In this paper, we propose a novel search method named Adaptive-LSH (AdaLSH) to solve MIPS problem more efficiently and more precisely. AdaLSH examines objects in the descending order of both norms and (the probably correctly estimated) cosine angles with a query object in support of LSH with extendable windows. Such extendable windows bring not only efficiency in searching but also the probability guarantee of finding exact or approximate MIP objects. AdaLSH gives a better probability guarantee of success than those in conventional algorithms, bringing less running times on various datasets compared with them. In addition, AdaLSH can even support exact MIPS with probability guarantee.
SCUT-AutoALP: A Diverse Benchmark Dataset for Automatic Architectural Layout Parsing
Yubo LIU Yangting LAI Jianyong CHEN Lingyu LIANG Qiaoming DENG

LETTER-Computer Graphics

Pubricized:
2020/09/03
Vol:
E103-D No:12
Page(s):
2725-2729
Computer aided design (CAD) technology is widely used for architectural design, but current CAD tools still require high-level design specifications from human. It would be significant to construct an intelligent CAD system allowing automatic architectural layout parsing (AutoALP), which generates candidate designs or predicts architectural attributes without much user intervention. To tackle these problems, many learning-based methods were proposed, and benchmark dataset become one of the essential elements for the data-driven AutoALP. This paper proposes a new dataset called SCUT-AutoALP for multi-paradigm applications. It contains two subsets: 1) Subset-I is for floor plan design containing 300 residential floor plan images with layout, boundary and attribute labels; 2) Subset-II is for urban plan design containing 302 campus plan images with layout, boundary and attribute labels. We analyzed the samples and labels statistically, and evaluated SCUT-AutoALP for different layout parsing tasks of floor plan/urban plan based on conditional generative adversarial networks (cGAN) models. The results verify the effectiveness and indicate the potential applications of SCUT-AutoALP. The dataset is available at https://github.com/designfuturelab702/SCUT-AutoALP-Database-Release.
A Simple Depth-Key-Based Image Composition Considering Object Movement in Depth Direction
Mami NAGOYA Tomoaki KIMURA Hiroyuki TSUJI

LETTER-Computer Graphics

Vol:
E103-A No:12
Page(s):
1603-1608
A simple depth-key-based image composition is proposed, which uses two still images with depth information, background and foreground object. The proposed method can place the object at various locations in the background considering the depth in the 3D world coordinate system. The main feature is that a simple algorithm is provided, which enables us to achieve the depthward movement within the camera plane, without being aware of the 3D world coordinate system. Two algorithms are proposed (P-OMDD and O-OMDD), which are based on the pin-hole camera model. As an advantage, camera calibration is not required before applying the algorithm in these methods. Since a single image is used for the object representation, each of the proposed methods has its limitations in terms of fidelity of the composite image. P-OMDD faithfully reproduces the angle at which the object is seen, but the pixels of the hidden surface are missing. On the contrary, O-OMDD can avoid the hidden surface problem, but the angle of the object is fixed, wherever it moves. It is verified through several experiments that, when using O-OMDD, subjectively natural composite images can be obtained under any object movement, in terms of size and position in the camera plane. Future tasks include improving the change in illumination due to positional changes and the partial loss of objects due to noise in depth images.
Hue-Correction Scheme Considering Non-Linear Camera Response for Multi-Exposure Image Fusion
Kouki SEO Chihiro GO Yuma KINOSHITA Hitoshi KIYA

PAPER-Image

Vol:
E103-A No:12
Page(s):
1562-1570
We propose a novel hue-correction scheme for multi-exposure image fusion (MEF). Various MEF methods have so far been studied to generate higher-quality images. However, there are few MEF methods considering hue distortion unlike other fields of image processing, due to a lack of a reference image that has correct hue. In the proposed scheme, we generate an HDR image as a reference for hue correction, from input multi-exposure images. After that, hue distortion in images fused by an MEF method is removed by using hue information of the HDR one, on the basis of the constant-hue plane in the RGB color space. In simulations, the proposed scheme is demonstrated to be effective to correct hue-distortion caused by conventional MEF methods. Experimental results also show that the proposed scheme can generate high-quality images, regardless of exposure conditions of input multi-exposure images.

121-140hit(1441hit)

Keyword Search Result

[Keyword] image(1441hit)

Maritime Target Detection Based on Electronic Image Stabilization Technology of Shipborne Camera

Image Captioning Algorithm Based on Multi-Branch CNN and Bi-LSTM

Scene Adaptive Exposure Time Control for Imaging and Apparent Motion Sensor Open Access

Image Enhancement in 26GHz-Band 1-Bit Direct Digital RF Transmitter Using Manchester Coding

Hyperspectral Image Denoising Using Tensor Decomposition under Multiple Constraints

Domain Adaptive Cross-Modal Image Retrieval via Modality and Domain Translations

Multiclass Dictionary-Based Statistical Iterative Reconstruction for Low-Dose CT

Backbone Alignment and Cascade Tiny Object Detecting Techniques for Dolphin Detection and Classification

Robust Blind Watermarking Algorithm Based on Contourlet Transform with Singular Value Decomposition

GAN-Based Image Compression Using Mutual Information for Optimizing Subjective Image Similarity

Flexoelectric Effect on Image Sticking Caused by Residual Direct Current Voltage and Flicker Phenomenon in Fringe-Field Switching Mode Liquid Crystal Display Open Access

SEM Image Quality Assessment Based on Texture Inpainting

Identification of Multiple Image Steganographic Methods Using Hierarchical ResNets

Rethinking the Rotation Invariance of Local Convolutional Features for Content-Based Image Retrieval

Target-Oriented Deformation of Visual-Semantic Embedding Space

Generation and Detection of Media Clones Open Access

AdaLSH: Adaptive LSH for Solving c-Approximate Maximum Inner Product Search Problem

SCUT-AutoALP: A Diverse Benchmark Dataset for Automatic Architectural Layout Parsing

A Simple Depth-Key-Based Image Composition Considering Object Movement in Depth Direction

Hue-Correction Scheme Considering Non-Linear Camera Response for Multi-Exposure Image Fusion

Latest Issue

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles