Keyword Search Result

[Keyword] 3D(243hit)

1-20hit(243hit)

  • Design of IoT Systems Using Three-Dimensional Spatial Information and Technologies for Improving Usability in Actual Fields Open Access

    Hiroshi YAMAMOTO  Keigo SHIMATANI  Keigo UCHIYAMA  

     
    INVITED PAPER

      Vol:
    E107-B No:12
      Page(s):
    907-917

    In order to support a person’s various activities (e.g., primary industry, elderly care), an IoT (Internet of Things) system supporting sensing technologies that observe the status of various people, things, and places in the real world is attracting attention. The existing studies have utilized the camera device for the sensing technology to observe the condition of the real world, but the use of the camera device has problems related to the limitation of the observation period and the invasion of privacy. Therefore, new IoT systems utilizing three-dimensional LiDAR (Light Detection And Ranging) have been proposed because they can obtain three-dimensional spatial information about the real world by solving problems. However, several problems exist with the use of 3D LiDAR in the deployment on the real fields. The 3D LiDAR requires much electric power for observing the three-dimensional spatial information. In addition, the annotation process of a large volume of point-cloud data for constructing a machine-learning model requires significant time and effort. Therefore, in this study, we propose IoT systems utilizing 3D LiDAR for observing the status of targets and equipping them with new technologies to improve the practicality of the use of 3D LiDAR. First, a linkage function is designed to achieve power saving for an entire system. The linkage function operates only a sensing technology with low power consumption during normal operation and activates the 3D LiDAR with high power consumption only when it is estimated that an observation target is approaching. Second, a self-learning function is built to analyze the data collected by not only the 3D LiDAR but also the camera for automatically generating a large amount of training data with the correct label which is estimated by analyzing the camera images. Through the experimental evaluations using a prototype system, it is confirmed that the sensing technologies can correctly be interconnected to reduce the total power consumption. In addition, the machine-learning model constructed by the self-learning function can accurately estimate the status of the targets.

  • Reinforced Voxel-RCNN: An Efficient 3D Object Detection Method Based on Feature Aggregation Open Access

    Jia-ji JIANG  Hai-bin WAN  Hong-min SUN  Tuan-fa QIN  Zheng-qiang WANG  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2024/04/24
      Vol:
    E107-D No:9
      Page(s):
    1228-1238

    In this paper, the Towards High Performance Voxel-based 3D Object Detection (Voxel-RCNN) three-dimensional (3D) point cloud object detection model is used as the benchmark network. Aiming at the problems existing in the current mainstream 3D point cloud voxelization methods, such as the backbone and the lack of feature expression ability under the bird’s-eye view (BEV), a high-performance voxel-based 3D object detection network (Reinforced Voxel-RCNN) is proposed. Firstly, a 3D feature extraction module based on the integration of inverted residual convolutional network and weight normalization is designed on the 3D backbone. This module can not only well retain more point cloud feature information, enhance the information interaction between convolutional layers, but also improve the feature extraction ability of the backbone network. Secondly, a spatial feature-semantic fusion module based on spatial and channel attention is proposed from a BEV perspective. The mixed use of channel features and semantic features further improves the network’s ability to express point cloud features. In the comparison of experimental results on the public dataset KITTI, the experimental results of this paper are better than many voxel-based methods. Compared with the baseline network, the 3D average accuracy and BEV average accuracy on the three categories of Car, Cyclist, and Pedestrians are improved. Among them, in the 3D average accuracy, the improvement rate of Car category is 0.23%, Cyclist is 0.78%, and Pedestrians is 2.08%. In the context of BEV average accuracy, enhancements are observed: 0.32% for the Car category, 0.99% for Cyclist, and 2.38% for Pedestrians. The findings demonstrate that the algorithm enhancement introduced in this study effectively enhances the accuracy of target category detection.

  • Joint 2D and 3D Semantic Segmentation with Consistent Instance Semantic Open Access

    Yingcai WAN  Lijin FANG  

     
    PAPER-Image

      Pubricized:
    2023/12/15
      Vol:
    E107-A No:8
      Page(s):
    1309-1318

    2D and 3D semantic segmentation play important roles in robotic scene understanding. However, current 3D semantic segmentation heavily relies on 3D point clouds, which are susceptible to factors such as point cloud noise, sparsity, estimation and reconstruction errors, and data imbalance. In this paper, a novel approach is proposed to enhance 3D semantic segmentation by incorporating 2D semantic segmentation from RGB-D sequences. Firstly, the RGB-D pairs are consistently segmented into 2D semantic maps using the tracking pipeline of Simultaneous Localization and Mapping (SLAM). This process effectively propagates object labels from full scans to corresponding labels in partial views with high probability. Subsequently, a novel Semantic Projection (SP) block is introduced, which integrates features extracted from localized 2D fragments across different camera viewpoints into their corresponding 3D semantic features. Lastly, the 3D semantic segmentation network utilizes a combination of 2D-3D fusion features to facilitate a merged semantic segmentation process for both 2D and 3D. Extensive experiments conducted on public datasets demonstrate the effective performance of the proposed 2D-assisted 3D semantic segmentation method.

  • Video Reflection Removal by Modified EDVR and 3D Convolution Open Access

    Sota MORIYAMA  Koichi ICHIGE  Yuichi HORI  Masayuki TACHI  

     
    LETTER-Image

      Pubricized:
    2023/12/11
      Vol:
    E107-A No:8
      Page(s):
    1430-1434

    In this paper, we propose a method for video reflection removal using a video restoration framework with enhanced deformable networks (EDVR). We examine the effect of each module in EDVR on video reflection removal and modify the models using 3D convolutions. The performance of each modified model is evaluated in terms of the RMSE between the structural similarity (SSIM) and the smoothed SSIM representing temporal consistency.

  • Uniaxially Symmetrical T-Junction OMT with 45° -Tilted Branch Waveguide Ports

    Hidenori YUKAWA  Yu USHIJIMA  Toru TAKAHASHI  Toru FUKASAWA  Yoshio INASAWA  Naofumi YONEDA  Moriyasu MIYAZAKI  

     
    PAPER-Electromagnetic Theory

      Pubricized:
    2023/10/13
      Vol:
    E107-C No:3
      Page(s):
    57-65

    A T-junction orthomode transducer (OMT) is a waveguide component that separates two orthogonal linear polarizations in the same frequency band. It has a common circular waveguide short-circuited at one end and two branch rectangular waveguides arranged in opposite directions near the short circuit. One of the advantages of a T-junction OMT is its short axial length. However, the two rectangular ports, which need to be orthogonal, have different levels of performance because of asymmetry. We therefore propose a uniaxially symmetrical T-junction OMT, which is configured such that the two branch waveguides are tilted 45° to the short circuit. The uniaxially symmetrical configuration enables same levels of performance for the two ports, and its impedance matching is easier compared to that for the conventional configuration. The polarization separation principle can be explained using the principles of orthomode junction (OMJ) and turnstile OMT. Based on calculations, the proposed configuration demonstrated a return loss of 25dB, XPD of 30dB, isolation of 21dB between the two branch ports, and loss of 0.25dB, with a bandwidth of 15% in the K band. The OMT was then fabricated as a single piece via 3D printing and evaluated against the calculated performance indices.

  • Universal Angle Visibility Realized by a Volumetric 3D Display Using a Rotating Mirror-Image Helix Screen Open Access

    Karin WAKATSUKI  Chiemi FUJIKAWA  Makoto OMODANI  

     
    INVITED PAPER

      Pubricized:
    2023/08/03
      Vol:
    E107-C No:2
      Page(s):
    23-28

    Herein, we propose a volumetric 3D display in which cross-sectional images are projected onto a rotating helix screen. The method employed by this display can enable image observation from universal directions. A major challenge associated with this method is the presence of invisible regions that occur depending on the observation angle. This study aimed to fabricate a mirror-image helix screen with two helical surfaces coaxially arranged in a plane-symmetrical configuration. The visible region was actually measured to be larger than the visible region of the conventional helix screen. We confirmed that the improved visible region was almost independent of the observation angle and that the visible region was almost equally wide on both the left and right sides of the rotation axis.

  • An Evaluation of the Impact of Distance on Perceptual Quality of Textured 3D Meshes

    Duc NGUYEN  Tran THUY HIEN  Huyen T. T. TRAN  Truong THU HUONG  Pham NGOC NAM  

     
    LETTER

      Pubricized:
    2023/09/25
      Vol:
    E107-D No:1
      Page(s):
    39-43

    Distance-aware quality adaptation is a potential approach to reduce the resource requirement for the transmission and rendering of textured 3D meshes. In this paper, we carry out a subjective experiment to investigate the effects of the distance from the camera on the perceptual quality of textured 3D meshes. Besides, we evaluate the effectiveness of eight image-based objective quality metrics in representing the user's perceptual quality. Our study found that the perceptual quality in terms of mean opinion score increases as the distance from the camera increases. In addition, it is shown that normalized mutual information (NMI), a full-reference objective quality metric, is highly correlated with subjective scores.

  • Neural Network-Based Post-Processing Filter on V-PCC Attribute Frames

    Keiichiro TAKADA  Yasuaki TOKUMO  Tomohiro IKAI  Takeshi CHUJOH  

     
    LETTER

      Pubricized:
    2023/07/13
      Vol:
    E106-D No:10
      Page(s):
    1673-1676

    Video-based point cloud compression (V-PCC) utilizes video compression technology to efficiently encode dense point clouds providing state-of-the-art compression performance with a relatively small computation burden. V-PCC converts 3-dimensional point cloud data into three types of 2-dimensional frames, i.e., occupancy, geometry, and attribute frames, and encodes them via video compression. On the other hand, the quality of these frames may be degraded due to video compression. This paper proposes an adaptive neural network-based post-processing filter on attribute frames to alleviate the degradation problem. Furthermore, a novel training method using occupancy frames is studied. The experimental results show average BD-rate gains of 3.0%, 29.3% and 22.2% for Y, U and V respectively.

  • Crosstalk Analysis and Countermeasures of High-Bandwidth 3D-Stacked Memory Using Multi-Hop Inductive Coupling Interface Open Access

    Kota SHIBA  Atsutake KOSUGE  Mototsugu HAMADA  Tadahiro KURODA  

     
    BRIEF PAPER

      Pubricized:
    2022/09/30
      Vol:
    E106-C No:7
      Page(s):
    391-394

    This paper describes an in-depth analysis of crosstalk in a high-bandwidth 3D-stacked memory using a multi-hop inductive coupling interface and proposes two countermeasures. This work analyzes the crosstalk among seven stacked chips using a 3D electromagnetic (EM) simulator. The detailed analysis reveals two main crosstalk sources: concentric coils and adjacent coils. To suppress these crosstalks, this paper proposes two corresponding countermeasures: shorted coils and 8-shaped coils. The combination of these coils improves area efficiency by a factor of 4 in simulation. The proposed methods enable an area-efficient inductive coupling interface for high-bandwidth stacked memory.

  • 3D Multiple-Contextual ROI-Attention Network for Efficient and Accurate Volumetric Medical Image Segmentation

    He LI  Yutaro IWAMOTO  Xianhua HAN  Lanfen LIN  Akira FURUKAWA  Shuzo KANASAKI  Yen-Wei CHEN  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/02/21
      Vol:
    E106-D No:5
      Page(s):
    1027-1037

    Convolutional neural networks (CNNs) have become popular in medical image segmentation. The widely used deep CNNs are customized to extract multiple representative features for two-dimensional (2D) data, generally called 2D networks. However, 2D networks are inefficient in extracting three-dimensional (3D) spatial features from volumetric images. Although most 2D segmentation networks can be extended to 3D networks, the naively extended 3D methods are resource-intensive. In this paper, we propose an efficient and accurate network for fully automatic 3D segmentation. Specifically, we designed a 3D multiple-contextual extractor to capture rich global contextual dependencies from different feature levels. Then we leveraged an ROI-estimation strategy to crop the ROI bounding box. Meanwhile, we used a 3D ROI-attention module to improve the accuracy of in-region segmentation in the decoder path. Moreover, we used a hybrid Dice loss function to address the issues of class imbalance and blurry contour in medical images. By incorporating the above strategies, we realized a practical end-to-end 3D medical image segmentation with high efficiency and accuracy. To validate the 3D segmentation performance of our proposed method, we conducted extensive experiments on two datasets and demonstrated favorable results over the state-of-the-art methods.

  • Learning Multi-Level Features for Improved 3D Reconstruction

    Fairuz SAFWAN MAHAD  Masakazu IWAMURA  Koichi KISE  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/12/08
      Vol:
    E106-D No:3
      Page(s):
    381-390

    3D reconstruction methods using neural networks are popular and have been studied extensively. However, the resulting models typically lack detail, reducing the quality of the 3D reconstruction. This is because the network is not designed to capture the fine details of the object. Therefore, in this paper, we propose two networks designed to capture both the coarse and fine details of the object to improve the reconstruction of the detailed parts of the object. To accomplish this, we design two networks. The first network uses a multi-scale architecture with skip connections to associate and merge features from other levels. For the second network, we design a multi-branch deep generative network that separately learns the local features, generic features, and the intermediate features through three different tailored components. In both network architectures, the principle entails allowing the network to learn features at different levels that can reconstruct the fine parts and the overall shape of the reconstructed 3D model. We show that both of our methods outperformed state-of-the-art approaches.

  • Development of Electronic Tile for Decorating Walls and 3D Surfaces Open Access

    Makoto OMODANI  Hiroyuki YAGUCHI  Fusako KUSUNOKI  

     
    INVITED PAPER

      Pubricized:
    2022/09/30
      Vol:
    E106-C No:2
      Page(s):
    21-25

    We have proposed and developed e-Tile for wall decoration and ornaments for interior/exterior. A prototype of 2m×2m large energy-saving reflective panel was realized by arraying 400 e-Tiles on a flat plane. Prototypes of cubic displays were also realized by constructing e-Tiles to cubic shape. Artistic display effects and 3D impression could be found in these cubic prototypes. We hope e-Tile is a promising solution to extend the application field of e-Paper to decorative use including architectural applications.

  • Unrolled Network for Light Field Display

    Kotaro MATSUURA  Chihiro TSUTAKE  Keita TAKAHASHI  Toshiaki FUJII  

     
    LETTER

      Pubricized:
    2022/05/06
      Vol:
    E105-D No:10
      Page(s):
    1721-1725

    Inspired by the framework of algorithm unrolling, we propose a scalable network architecture that computes layer patterns for light field displays, enabling control of the trade-off between the display quality and the computational cost on a single pre-trained network.

  • Learning Pyramidal Feature Hierarchy for 3D Reconstruction

    Fairuz Safwan MAHAD  Masakazu IWAMURA  Koichi KISE  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2021/11/16
      Vol:
    E105-D No:2
      Page(s):
    446-449

    Neural network-based three-dimensional (3D) reconstruction methods have produced promising results. However, they do not pay particular attention to reconstructing detailed parts of objects. This occurs because the network is not designed to capture the fine details of objects. In this paper, we propose a network designed to capture both the coarse and fine details of objects to improve the reconstruction of the fine parts of objects.

  • Feature Description with Feature Point Registration Error Using Local and Global Point Cloud Encoders

    Kenshiro TAMATA  Tomohiro MASHITA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/10/11
      Vol:
    E105-D No:1
      Page(s):
    134-140

    A typical approach to reconstructing a 3D environment model is scanning the environment with a depth sensor and fitting the accumulated point cloud to 3D models. In this kind of scenario, a general 3D environment reconstruction application assumes temporally continuous scanning. However in some practical uses, this assumption is unacceptable. Thus, a point cloud matching method for stitching several non-continuous 3D scans is required. Point cloud matching often includes errors in the feature point detection because a point cloud is basically a sparse sampling of the real environment, and it may include quantization errors that cannot be ignored. Moreover, depth sensors tend to have errors due to the reflective properties of the observed surface. We therefore make the assumption that feature point pairs between two point clouds will include errors. In this work, we propose a feature description method robust to the feature point registration error described above. To achieve this goal, we designed a deep learning based feature description model that consists of a local feature description around the feature points and a global feature description of the entire point cloud. To obtain a feature description robust to feature point registration error, we input feature point pairs with errors and train the models with metric learning. Experimental results show that our feature description model can correctly estimate whether the feature point pair is close enough to be considered a match or not even when the feature point registration errors are large, and our model can estimate with higher accuracy in comparison to methods such as FPFH or 3DMatch. In addition, we conducted experiments for combinations of input point clouds, including local or global point clouds, both types of point cloud, and encoders.

  • GECNN for Weakly Supervised Semantic Segmentation of 3D Point Clouds

    Zifen HE  Shouye ZHU  Ying HUANG  Yinhui ZHANG  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/09/24
      Vol:
    E104-D No:12
      Page(s):
    2237-2243

    This paper presents a novel method for weakly supervised semantic segmentation of 3D point clouds using a novel graph and edge convolutional neural network (GECNN) towards 1% and 10% point cloud with labels. Our general framework facilitates semantic segmentation by encoding both global and local scale features via a parallel graph and edge aggregation scheme. More specifically, global scale graph structure cues of point clouds are captured by a graph convolutional neural network, which is propagated from pairwise affinity representation over the whole graph established in a d-dimensional feature embedding space. We integrate local scale features derived from a dynamic edge feature aggregation convolutional neural networks that allows us to fusion both global and local cues of 3D point clouds. The proposed GECNN model is trained by using a comprehensive objective which consists of incomplete, inexact, self-supervision and smoothness constraints based on partially labeled points. The proposed approach enforces global and local consistency constraints directly on the objective losses. It inherently handles the challenges of segmenting sparse 3D point clouds with limited annotations in a large scale point cloud space. Our experiments on the ShapeNet and S3DIS benchmarks demonstrate the effectiveness of the proposed approach for efficient (within 20 epochs) learning of large scale point cloud semantics despite very limited labels.

  • Effects of Initial Configuration on Attentive Tracking of Moving Objects Whose Depth in 3D Changes

    Anis Ur REHMAN  Ken KIHARA  Sakuichi OHTSUKA  

     
    PAPER-Vision

      Pubricized:
    2021/02/25
      Vol:
    E104-A No:9
      Page(s):
    1339-1344

    In daily reality, people often pay attention to several objects that change positions while being observed. In the laboratory, this process is investigated by a phenomenon known as multiple object tracking (MOT) which is a task that evaluates attentive tracking performance. Recent findings suggest that the attentional set for multiple moving objects whose depth changes in three dimensions from one plane to another is influenced by the initial configuration of the objects. When tracking objects, it is difficult for people to expand their attentional set to multiple-depth planes once attention has been focused on a single plane. However, less is known about people contracting their attentional set from multiple-depth planes to a single-depth plane. In two experiments, we examined tracking accuracy when four targets or four distractors, which were initially distributed on two planes, come together on one of the planes during an MOT task. The results from this study suggest that people have difficulty changing the depth range of their attention during attentive tracking, and attentive tracking performance depends on the initial attentional set based on the configuration prior to attentive tracking.

  • Recent Advances in Video Action Recognition with 3D Convolutions Open Access

    Kensho HARA  

     
    INVITED PAPER

      Pubricized:
    2020/12/07
      Vol:
    E104-A No:6
      Page(s):
    846-856

    The performance of video action recognition has improved significantly in recent decades. Current recognition approaches mainly utilize convolutional neural networks to acquire video feature representations. In addition to the spatial information of video frames, temporal information such as motions and changes is important for recognizing videos. Therefore, the use of convolutions in a spatiotemporal three-dimensional (3D) space for representing spatiotemporal features has garnered significant attention. Herein, we introduce recent advances in 3D convolutions for video action recognition.

  • Efficient Patch Merging for Atlas Construction in 3DoF+ Video Coding

    Hyun-Ho KIM  Sung-Gyun LIM  Gwangsoon LEE  Jun Young JEONG  Jae-Gon KIM  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2020/12/14
      Vol:
    E104-D No:3
      Page(s):
    477-480

    The emerging three degree of freedom plus (3DoF+) video provides more interactive and deep immersive visual experience. 3DoF+ video introduces motion parallax to 360 video providing omnidirectional view with limited changes of the view position. A large set of views are required to support such 3DoF+ visual experience, hence it is essential to compress a tremendous amount of 3DoF+ video. Recently, MPEG is developing a standard for efficient coding of 3DoF+ video that consists of multiple videos, and its test model named Test Model for Immersive Video (TMIV). In the TMIV, the redundancy between the input source views is removed as much as possible by selecting one or several basic views and predicting the remaining views from the basic views. Each unpredicted region is cropped to a bounding box called patch, and then a large number of patches are packed into atlases together with the selected basic views. As a result, multiple source views are converted into one or more atlas sequences to be compressed. In this letter, we present an improved clustering method using patch merging in the atlas construction in the TMIV. The proposed method achieves significant BD-rate reduction in terms of various end-to-end evaluation metrics in the experiment, and was adopted in TMIV6.0.

  • Depth Range Control in Visually Equivalent Light Field 3D Open Access

    Munekazu DATE  Shinya SHIMIZU  Hideaki KIMATA  Dan MIKAMI  Yoshinori KUSACHI  

     
    INVITED PAPER-Electronic Displays

      Pubricized:
    2020/08/13
      Vol:
    E104-C No:2
      Page(s):
    52-58

    3D video contents depend on the shooting condition, which is camera positioning. Depth range control in the post-processing stage is not easy, but essential as the video from arbitrary camera positions must be generated. If light field information can be obtained, video from any viewpoint can be generated exactly and post-processing is possible. However, a light field has a huge amount of data, and capturing a light field is not easy. To compress data quantity, we proposed the visually equivalent light field (VELF), which uses the characteristics of human vision. Though a number of cameras are needed, VELF can be captured by a camera array. Since camera interpolation is made using linear blending, calculation is so simple that we can construct a ray distribution field of VELF by optical interpolation in the VELF3D display. It produces high image quality due to its high pixel usage efficiency. In this paper, we summarize the relationship between the characteristics of human vision, VELF and VELF3D display. We then propose a method to control the depth range for the observed image on the VELF3D display and discuss the effectiveness and limitations of displaying the processed image on the VELF3D display. Our method can be applied to other 3D displays. Since the calculation is just weighted averaging, it is suitable for real-time applications.

1-20hit(243hit)

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.