Keyword Search Result

[Keyword] image(1441hit)

41-60hit(1441hit)

  • A Lightweight and Efficient Infrared Pedestrian Semantic Segmentation Method

    Shangdong LIU  Chaojun MEI  Shuai YOU  Xiaoliang YAO  Fei WU  Yimu JI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2023/06/13
      Vol:
    E106-D No:9
      Page(s):
    1564-1571

    The thermal imaging pedestrian segmentation system has excellent performance in different illumination conditions, but it has some drawbacks(e.g., weak pedestrian texture information, blurred object boundaries). Meanwhile, high-performance large models have higher latency on edge devices with limited computing performance. To solve the above problems, in this paper, we propose a real-time thermal infrared pedestrian segmentation method. The feature extraction layers of our method consist of two paths. Firstly, we utilize the lossless spatial downsampling to obtain boundary texture details on the spatial path. On the context path, we use atrous convolutions to improve the receptive field and obtain more contextual semantic information. Then, the parameter-free attention mechanism is introduced at the end of the two paths for effective feature selection, respectively. The Feature Fusion Module (FFM) is added to fuse the semantic information of the two paths after selection. Finally, we accelerate method inference through multi-threading techniques on the edge computing device. Besides, we create a high-quality infrared pedestrian segmentation dataset to facilitate research. The comparative experiments on the self-built dataset and two public datasets with other methods show that our method also has certain effectiveness. Our code is available at https://github.com/mcjcs001/LEIPNet.

  • Multiple Layout Design Generation via a GAN-Based Method with Conditional Convolution and Attention

    Xing ZHU  Yuxuan LIU  Lingyu LIANG  Tao WANG  Zuoyong LI  Qiaoming DENG  Yubo LIU  

     
    LETTER-Computer Graphics

      Pubricized:
    2023/06/12
      Vol:
    E106-D No:9
      Page(s):
    1615-1619

    Recently, many AI-aided layout design systems are developed to reduce tedious manual intervention based on deep learning. However, most methods focus on a specific generation task. This paper explores a challenging problem to obtain multiple layout design generation (LDG), which generates floor plan or urban plan from a boundary input under a unified framework. One of the main challenges of multiple LDG is to obtain reasonable topological structures of layout generation with irregular boundaries and layout elements for different types of design. This paper formulates the multiple LDG task as an image-to-image translation problem, and proposes a conditional generative adversarial network (GAN), called LDGAN, with adaptive modules. The framework of LDGAN is based on a generator-discriminator architecture, where the generator is integrated with conditional convolution constrained by the boundary input and the attention module with channel and spatial features. Qualitative and quantitative experiments were conducted on the SCUT-AutoALP and RPLAN datasets, and the comparison with the state-of-the-art methods illustrate the effectiveness and superiority of the proposed LDGAN.

  • Quality Enhancement of Conventional Compression with a Learned Side Bitstream

    Takahiro NARUKO  Hiroaki AKUTSU  Koki TSUBOTA  Kiyoharu AIZAWA  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2023/04/25
      Vol:
    E106-D No:8
      Page(s):
    1296-1299

    We propose Quality Enhancement via a Side bitstream Network (QESN) technique for lossy image compression. The proposed QESN utilizes the network architecture of deep image compression to produce a bitstream for enhancing the quality of conventional compression. We also present a loss function that directly optimizes the Bjontegaard delta bit rate (BD-BR) by using a differentiable model of a rate-distortion curve. Experimental results show that QESN improves the rate by 16.7% in the BD-BR compared to Better Portable Graphics.

  • Simultaneous Visible Light Communication and Ranging Using High-Speed Stereo Cameras Based on Bicubic Interpolation Considering Multi-Level Pulse-Width Modulation

    Ruiyi HUANG  Masayuki KINOSHITA  Takaya YAMAZATO  Hiraku OKADA  Koji KAMAKURA  Shintaro ARAI  Tomohiro YENDO  Toshiaki FUJII  

     
    PAPER-Communication Theory and Signals

      Pubricized:
    2022/12/26
      Vol:
    E106-A No:7
      Page(s):
    990-997

    Visible light communication (VLC) and visible light ranging are applicable techniques for intelligent transportation systems (ITS). They use every unique light-emitting diode (LED) on roads for data transmission and range estimation. The simultaneous VLC and ranging can be applied to improve the performance of both. It is necessary to achieve rapid data rate and high-accuracy ranging when transmitting VLC data and estimating the range simultaneously. We use the signal modulation method of pulse-width modulation (PWM) to increase the data rate. However, when using PWM for VLC data transmission, images of the LED transmitters are captured at different luminance levels and are easily saturated, and LED saturation leads to inaccurate range estimation. In this paper, we establish a novel simultaneous visible light communication and ranging system for ITS using PWM. Here, we analyze the LED saturation problems and apply bicubic interpolation to solve the LED saturation problem and thus, improve the communication and ranging performance. Simultaneous communication and ranging are enabled using a stereo camera. Communication is realized using maximal-ratio combining (MRC) while ranging is achieved using phase-only correlation (POC) and sinc function approximation. Furthermore, we measured the performance of our proposed system using a field trial experiment. The results show that error-free performance can be achieved up to a communication distance of 55 m and the range estimation errors are below 0.5m within 60m.

  • Segmentation of Optic Disc and Optic Cup Based on Two-Layer Level Set with Sparse Shape Prior Constraint in Fundus Images

    Siqi WANG  Ming XU  Xiaosheng YU  Chengdong WU  

     
    LETTER-Computer Graphics

      Pubricized:
    2023/01/16
      Vol:
    E106-A No:7
      Page(s):
    1020-1024

    Glaucoma is a common high-incidence eye disease. The detection of the optic cup and optic disc in fundus images is one of the important steps in the clinical diagnosis of glaucoma. However, the fundus images are generally intensity inhomogeneity, and complex organizational structure, and are disturbed by blood vessels and lesions. In order to extract the optic disc and optic cup regions more accurately, we propose a segmentation method of the optic disc and optic cup in fundus image based on distance regularized two-layer level with sparse shape prior constraint. The experimental results show that our method can segment the optic disc and optic cup region more accurately and obtain satisfactory results.

  • Single Image Dehazing Based on Sky Area Segmentation and Image Fusion

    Xiangyang CHEN  Haiyue LI  Chuan LI  Weiwei JIANG  Hao ZHOU  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2023/04/24
      Vol:
    E106-D No:7
      Page(s):
    1249-1253

    Since the dark channel prior (DCP)-based dehazing method is ineffective in the sky area and will cause the problem of too dark and color distortion of the image, we propose a novel dehazing method based on sky area segmentation and image fusion. We first segment the image according to the characteristics of the sky area and non-sky area of the image, then estimate the atmospheric light and transmission map according to the DCP and correct them, and then fuse the original image after the contrast adaptive histogram equalization to improve the details information of the image. Experiments illustrate that our method performs well in dehazing and can reduce image distortion.

  • A Fusion Deraining Network Based on Swin Transformer and Convolutional Neural Network

    Junhao TANG  Guorui FENG  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2023/04/24
      Vol:
    E106-D No:7
      Page(s):
    1254-1257

    Single image deraining is an ill-posed problem which also has been a long-standing issue. In past few years, convolutional neural network (CNN) methods almost dominated the computer vision and achieved considerable success in image deraining. Recently the Swin Transformer-based model also showed impressive performance, even surpassed the CNN-based methods and became the state-of-the-art on high-level vision tasks. Therefore, we attempt to introduce Swin Transformer to deraining tasks. In this paper, we propose a deraining model with two sub-networks. The first sub-network includes two branches. Rain Recognition Network is a Unet with the Swin Transformer layer, which works as preliminarily restoring the background especially for the location where rain streaks appear. Detail Complement Network can extract the background detail beneath the rain streak. The second sub-network which called Refine-Unet utilizes the output of the previous one to further restore the image. Through experiments, our network achieves improvements on single image deraining compared with the previous Transformer research.

  • A Novel Discriminative Dictionary Learning Method for Image Classification

    Wentao LYU  Di ZHOU  Chengqun WANG  Lu ZHANG  

     
    PAPER-Image

      Pubricized:
    2022/12/14
      Vol:
    E106-A No:6
      Page(s):
    932-937

    In this paper, we present a novel discriminative dictionary learning (DDL) method for image classification. The local structural relationship between samples is first built by the Laplacian eigenmaps (LE), and then integrated into the basic DDL frame to suppress inter-class ambiguity in the feature space. Moreover, in order to improve the discriminative ability of the dictionary, the category label information of training samples is formulated into the objective function of dictionary learning by considering the discriminative promotion term. Thus, the data points of original samples are transformed into a new feature space, in which the points from different categories are expected to be far apart. The test results based on the real dataset indicate the effectiveness of this method.

  • Implementation of Fully-Pipelined CNN Inference Accelerator on FPGA and HBM2 Platform

    Van-Cam NGUYEN  Yasuhiko NAKASHIMA  

     
    PAPER-Computer System

      Pubricized:
    2023/03/17
      Vol:
    E106-D No:6
      Page(s):
    1117-1129

    Many deep convolutional neural network (CNN) inference accelerators on the field-programmable gate array (FPGA) platform have been widely adopted due to their low power consumption and high performance. In this paper, we develop the following to improve performance and power efficiency. First, we use a high bandwidth memory (HBM) to expand the bandwidth of data transmission between the off-chip memory and the accelerator. Second, a fully-pipelined manner, which consists of pipelined inter-layer computation and a pipelined computation engine, is implemented to decrease idle time among layers. Third, a multi-core architecture with shared-dual buffers is designed to reduce off-chip memory access and maximize the throughput. We designed the proposed accelerator on the Xilinx Alveo U280 platform with in-depth Verilog HDL instead of high-level synthesis as the previous works and explored the VGG-16 model to verify the system during our experiment. With a similar accelerator architecture, the experimental results demonstrate that the memory bandwidth of HBM is 13.2× better than DDR4. Compared with other accelerators in terms of throughput, our accelerator is 1.9×/1.65×/11.9× better than FPGA+HBM2 based/low batch size (4) GPGPU/low batch size (4) CPU. Compared with the previous DDR+FPGA/DDR+GPGPU/DDR+CPU based accelerators in terms of power efficiency, our proposed system provides 1.4-1.7×/1.7-12.6×/6.6-37.1× improvement with the large-scale CNN model.

  • Image Segmentation-Based Bicycle Riding Side Identification Method

    Jeyoen KIM  Takumi SOMA  Tetsuya MANABE  Aya KOJIMA  

     
    PAPER

      Pubricized:
    2022/11/02
      Vol:
    E106-A No:5
      Page(s):
    775-783

    This paper attempts to identify which side of the road a bicycle is currently riding on using a common camera for realizing an advanced bicycle navigation system and bicycle riding safety support system. To identify the roadway area, the proposed method performs semantic segmentation on a front camera image captured by a bicycle drive recorder or smartphone. If the roadway area extends from the center of the image to the right, the bicyclist is riding on the left side of the roadway (i.e., the correct riding position in Japan). In contrast, if the roadway area extends to the left, the bicyclist is on the right side of the roadway (i.e., the incorrect riding position in Japan). We evaluated the accuracy of the proposed method on various road widths with different traffic volumes using video captured by riding bicycles in Tsuruoka City, Yamagata Prefecture, and Saitama City, Saitama Prefecture, Japan. High accuracy (>80%) was achieved for any combination of the segmentation model, riding side identification method, and experimental conditions. Given these results, we believe that we have realized an effective image segmentation-based method to identify which side of the roadway a bicycle riding is on.

  • Pixel Variation Characteristics of a Global Shutter THz Imager and its Calibration Technique

    Yuri KANAZAWA  Prasoon AMBALATHANKANDY  Masayuki IKEBE  

     
    PAPER

      Pubricized:
    2022/11/25
      Vol:
    E106-A No:5
      Page(s):
    832-839

    We have developed a Si-CMOS terahertz image sensor to address the paucity of low-cost terahertz detectors. Our imaging pixel directly connects to a VCO-based ADC and achieves pixel parallel ADC architecture for high-speed global shutter THz imaging. In this paper, we propose a digital calibration technique for offset and gain variation of each pixel using global shutter operation. The calibration technique gives reference signal to all pixels simultaneously and takes reference frames as a part of the high-speed image captures. Using this technique, we achieve offset/non-linear gain variation suppression of 85.7% compared to without correction.

  • Effectively Utilizing the Category Labels for Image Captioning

    Junlong FENG  Jianping ZHAO  

     
    PAPER-Core Methods

      Pubricized:
    2021/12/13
      Vol:
    E106-D No:5
      Page(s):
    617-624

    As a further investigation of the image captioning task, some works extended the vision-text dataset for specific subtasks, such as the stylized caption generating. The corpus in such dataset is usually composed of obvious sentiment-bearing words. While, in some special cases, the captions are classified depending on image category. This will result in a latent problem: the generated sentences are in close semantic meaning but belong to different or even opposite categories. It is a worthy issue to explore an effective way to utilize the image category label to boost the caption difference. Therefore, we proposed an image captioning network with the label control mechanism (LCNET) in this paper. First, to further improve the caption difference, LCNET employs a semantic enhancement module to provide the decoder with global semantic vectors. Then, through the proposed label control LSTM, LCNET can dynamically modulate the caption generation depending on the image category labels. Finally, the decoder integrates the spatial image features with global semantic vectors to output the caption. Using all the standard evaluation metrics shows that our model outperforms the compared models. Caption analysis demonstrates our approach can improve the performance of semantic representation. Compared with other label control mechanisms, our model is capable of boosting the caption difference according to the labels and keeping a better consistent with image content as well.

  • Image-to-Image Translation for Data Augmentation on Multimodal Medical Images

    Yue PENG  Zuqiang MENG  Lina YANG  

     
    PAPER-Smart Healthcare

      Pubricized:
    2022/03/01
      Vol:
    E106-D No:5
      Page(s):
    686-696

    Medical images play an important role in medical diagnosis. However, acquiring a large number of datasets with annotations is still a difficult task in the medical field. For this reason, research in the field of image-to-image translation is combined with computer-aided diagnosis, and data augmentation methods based on generative adversarial networks are applied to medical images. In this paper, we try to perform data augmentation on unimodal data. The designed StarGAN V2 based network has high performance in augmenting the dataset using a small number of original images, and the augmented data is expanded from unimodal data to multimodal medical images, and this multimodal medical image data can be applied to the segmentation task with some improvement in the segmentation results. Our experiments demonstrate that the generated multimodal medical image data can improve the performance of glioma segmentation.

  • Detection Method of Fat Content in Pig B-Ultrasound Based on Deep Learning

    Wenxin DONG  Jianxun ZHANG  Shuqiu TAN  Xinyue ZHANG  

     
    PAPER-Smart Agriculture

      Pubricized:
    2022/02/07
      Vol:
    E106-D No:5
      Page(s):
    726-734

    In the pork fat content detection task, traditional physical or chemical methods are strongly destructive, have substantial technical requirements and cannot achieve nondestructive detection without slaughtering. To solve these problems, we propose a novel, convenient and economical method for detecting the fat content of pig B-ultrasound images based on hybrid attention and multiscale fusion learning, which extracts and fuses shallow detail information and deep semantic information at multiple scales. First, a deep learning network is constructed to learn the salient features of fat images through a hybrid attention mechanism. Then, the information describing pork fat is extracted at multiple scales, and the detailed information expressed in the shallow layer and the semantic information expressed in the deep layer are fused later. Finally, a deep convolution network is used to predict the fat content compared with the real label. The experimental results show that the determination coefficient is greater than 0.95 on the 130 groups of pork B-ultrasound image data sets, which is 2.90, 6.10 and 5.13 percentage points higher than that of VGGNet, ResNet and DenseNet, respectively. It indicats that the model could effectively identify the B-ultrasound image of pigs and predict the fat content with high accuracy.

  • Multi-Scale Correspondence Learning for Person Image Generation

    Shi-Long SHEN  Ai-Guo WU  Yong XU  

     
    PAPER-Person Image Generation

      Pubricized:
    2022/04/15
      Vol:
    E106-D No:5
      Page(s):
    804-812

    A generative model is presented for two types of person image generation in this paper. First, this model is applied to pose-guided person image generation, i.e., converting the pose of a source person image to the target pose while preserving the texture of that source person image. Second, this model is also used for clothing-guided person image generation, i.e., changing the clothing texture of a source person image to the desired clothing texture. The core idea of the proposed model is to establish the multi-scale correspondence, which can effectively address the misalignment introduced by transferring pose, thereby preserving richer information on appearance. Specifically, the proposed model consists of two stages: 1) It first generates the target semantic map imposed on the target pose to provide more accurate guidance during the generation process. 2) After obtaining the multi-scale feature map by the encoder, the multi-scale correspondence is established, which is useful for a fine-grained generation. Experimental results show the proposed method is superior to state-of-the-art methods in pose-guided person image generation and show its effectiveness in clothing-guided person image generation.

  • 3D Multiple-Contextual ROI-Attention Network for Efficient and Accurate Volumetric Medical Image Segmentation

    He LI  Yutaro IWAMOTO  Xianhua HAN  Lanfen LIN  Akira FURUKAWA  Shuzo KANASAKI  Yen-Wei CHEN  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/02/21
      Vol:
    E106-D No:5
      Page(s):
    1027-1037

    Convolutional neural networks (CNNs) have become popular in medical image segmentation. The widely used deep CNNs are customized to extract multiple representative features for two-dimensional (2D) data, generally called 2D networks. However, 2D networks are inefficient in extracting three-dimensional (3D) spatial features from volumetric images. Although most 2D segmentation networks can be extended to 3D networks, the naively extended 3D methods are resource-intensive. In this paper, we propose an efficient and accurate network for fully automatic 3D segmentation. Specifically, we designed a 3D multiple-contextual extractor to capture rich global contextual dependencies from different feature levels. Then we leveraged an ROI-estimation strategy to crop the ROI bounding box. Meanwhile, we used a 3D ROI-attention module to improve the accuracy of in-region segmentation in the decoder path. Moreover, we used a hybrid Dice loss function to address the issues of class imbalance and blurry contour in medical images. By incorporating the above strategies, we realized a practical end-to-end 3D medical image segmentation with high efficiency and accuracy. To validate the 3D segmentation performance of our proposed method, we conducted extensive experiments on two datasets and demonstrated favorable results over the state-of-the-art methods.

  • Learning Local Similarity with Spatial Interrelations on Content-Based Image Retrieval

    Longjiao ZHAO  Yu WANG  Jien KATO  Yoshiharu ISHIKAWA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2023/02/14
      Vol:
    E106-D No:5
      Page(s):
    1069-1080

    Convolutional Neural Networks (CNNs) have recently demonstrated outstanding performance in image retrieval tasks. Local convolutional features extracted by CNNs, in particular, show exceptional capability in discrimination. Recent research in this field has concentrated on pooling methods that incorporate local features into global features and assess the global similarity of two images. However, the pooling methods sacrifice the image's local region information and spatial relationships, which are precisely known as the keys to the robustness against occlusion and viewpoint changes. In this paper, instead of pooling methods, we propose an alternative method based on local similarity, determined by directly using local convolutional features. Specifically, we first define three forms of local similarity tensors (LSTs), which take into account information about local regions as well as spatial relationships between them. We then construct a similarity CNN model (SCNN) based on LSTs to assess the similarity between the query and gallery images. The ideal configuration of our method is sought through thorough experiments from three perspectives: local region size, local region content, and spatial relationships between local regions. The experimental results on a modified open dataset (where query images are limited to occluded ones) confirm that the proposed method outperforms the pooling methods because of robustness enhancement. Furthermore, testing on three public retrieval datasets shows that combining LSTs with conventional pooling methods achieves the best results.

  • Fish Detecting Using YOLOv4 and CVAE in Aquaculture Ponds with a Non-Uniform Strong Reflection Background

    Meng ZHAO  Junfeng WU  Hong YU  Haiqing LI  Jingwen XU  Siqi CHENG  Lishuai GU  Juan MENG  

     
    PAPER-Smart Agriculture

      Pubricized:
    2022/11/07
      Vol:
    E106-D No:5
      Page(s):
    715-725

    Accurate fish detection is of great significance in aquaculture. However, the non-uniform strong reflection in aquaculture ponds will affect the precision of fish detection. This paper combines YOLOv4 and CVAE to accurately detect fishes in the image with non-uniform strong reflection, in which the reflection in the image is removed at first and then the reflection-removed image is provided for fish detecting. Firstly, the improved YOLOv4 is applied to detect and mask the strong reflective region, to locate and label the reflective region for the subsequent reflection removal. Then, CVAE is combined with the improved YOLOv4 for inferring the priori distribution of the Reflection region and restoring the Reflection region by the distribution so that the reflection can be removed. For further improving the quality of the reflection-removed images, the adversarial learning is appended to CVAE. Finally, YOLOV4 is used to detect fishes in the high quality image. In addition, a new image dataset of pond cultured takifugu rubripes is constructed,, which includes 1000 images with fishes annotated manually, also a synthetic dataset including 2000 images with strong reflection is created and merged with the generated dataset for training and verifying the robustness of the proposed method. Comprehensive experiments are performed to compare the proposed method with the state-of-the-art fish detecting methods without reflection removal on the generated dataset. The results show that the fish detecting precision and recall of the proposed method are improved by 2.7% and 2.4% respectively.

  • ConvNeXt-Haze: A Fog Image Classification Algorithm for Small and Imbalanced Sample Dataset Based on Convolutional Neural Network

    Fuxiang LIU  Chen ZANG  Lei LI  Chunfeng XU  Jingmin LUO  

     
    PAPER

      Pubricized:
    2022/11/22
      Vol:
    E106-D No:4
      Page(s):
    488-494

    Aiming at the different abilities of the defogging algorithms in different fog concentrations, this paper proposes a fog image classification algorithm for a small and imbalanced sample dataset based on a convolution neural network, which can classify the fog images in advance, so as to improve the effect and adaptive ability of image defogging algorithm in fog and haze weather. In order to solve the problems of environmental interference, camera depth of field interference and uneven feature distribution in fog images, the CutBlur-Gauss data augmentation method and focal loss and label smoothing strategies are used to improve the accuracy of classification. It is compared with the machine learning algorithm SVM and classical convolution neural network classification algorithms alexnet, resnet34, resnet50 and resnet101. This algorithm achieves 94.5% classification accuracy on the dataset in this paper, which exceeds other excellent comparison algorithms at present, and achieves the best accuracy. It is proved that the improved algorithm has better classification accuracy.

  • A Night Image Enhancement Algorithm Based on MDIFE-Net Curve Estimation

    Jing ZHANG  Dan LI  Hong-an LI  Xuewen LI  Lizhi ZHANG  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2022/11/04
      Vol:
    E106-D No:2
      Page(s):
    229-239

    In order to solve the low-quality problems such as low brightness, poor contrast, noise interference and color imbalance in night images, a night image enhancement algorithm based on MDIFE-Net curve estimation is presented. This algorithm mainly consists of three parts: Firstly, we design an illumination estimation curve (IEC), which adjusts the pixel level of the low illumination image domain through a non-linear fitting function, maps to the enhanced image domain, and effectively eliminates the effect of illumination loss; Secondly, the DCE-Net is improved, replacing the original Relu activation function with a smoother Mish activation function, so that the parameters can be better updated; Finally, illumination estimation loss function, which combines image attributes with fidelity, is designed to drive the no-reference image enhancement, which preserves more image details while enhancing the night image. The experimental results show that our method can not only effectively improve the image contrast, but also make the details of the target more prominent, improve the visual quality of the image, and make the image achieve a better visual effect. Compared with four existing low illumination image enhancement algorithms, the NIQE and STD evaluation index values are better than other representative algorithms, verify the feasibility and validity of the algorithm, and verify the rationality and necessity of each component design through ablation experiments.

41-60hit(1441hit)

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.