Shuichi MAEDA Akihiro FUKAMI Kaiki YAMAZAKI
There are several benefits of the information that is invisible to the human eye. “Invisible” here means that it can be visualized or quantified when using instruments. For example, it can improve security without compromising product design. We have succeeded in making an invisible digital image on a metal substrate using periodic repeatability by thin-film interference of niobium oxides. Although this digital information is invisible in the visible light wavelength range of 400-800nm, but detectable in the infrared light that of 800-1150nm. This technology has a potential to be applied to anti-counterfeiting and traceability.
Karin WAKATSUKI Chiemi FUJIKAWA Makoto OMODANI
Herein, we propose a volumetric 3D display in which cross-sectional images are projected onto a rotating helix screen. The method employed by this display can enable image observation from universal directions. A major challenge associated with this method is the presence of invisible regions that occur depending on the observation angle. This study aimed to fabricate a mirror-image helix screen with two helical surfaces coaxially arranged in a plane-symmetrical configuration. The visible region was actually measured to be larger than the visible region of the conventional helix screen. We confirmed that the improved visible region was almost independent of the observation angle and that the visible region was almost equally wide on both the left and right sides of the rotation axis.
Taisei URAKAMI Tamami MARUYAMA Shimpei NISHIYAMA Manato KUSAMIZU Akira ONO Takahiro SHIOZAWA
The novel patch element shapes with the interdigital and multi-via structures for mushroom-type metasurface reflectors are proposed for controlling the reflection phases. The interdigital structure provides a wide reflection phase range by changing the depth of the interdigital fingers. In addition, the multi-via structure provides the higher positive reflection phases such as near +180°. The sufficient reflection phase range of 360° and the low polarization dependent properties could be confirmed by the electromagnetic field simulation. The metasurface reflector for the normal incident plane wave was designed. The desired reflection angles and sharp far field patterns of the reflected beams could be confirmed in the simulation results. The prototype reflectors for the experiments should be designed in the same way as the primary reflector design of the reflector antenna. Specifically, the reflector design method based on the ray tracing method using the incident wave phase was proposed for the prototype. The experimental radiation pattern for the reflector antenna composed of the transmitting antenna (TX) and the prototype metasurface reflector was similar to the simulated radiation pattern. The effectiveness of the proposed structures and their design methods could be confirmed by these simulation and experiment results.
Peng GAO Xin-Yue ZHANG Xiao-Li YANG Jian-Cheng NI Fei WANG
Despite Siamese trackers attracting much attention due to their scalability and efficiency in recent years, researchers have ignored the background appearance, which leads to their inapplicability in recognizing arbitrary target objects with various variations, especially in complex scenarios with background clutter and distractors. In this paper, we present a simple yet effective Siamese tracker, where the shifted windows multi-head self-attention is produced to learn the characteristics of a specific given target object for visual tracking. To validate the effectiveness of our proposed tracker, we use the Swin Transformer as the backbone network and introduced an auxiliary feature enhancement network. Extensive experimental results on two evaluation datasets demonstrate that the proposed tracker outperforms other baselines.
Images captured in low-light environments have low visibility and high noise, which will seriously affect subsequent visual tasks such as target detection and face recognition. Therefore, low-light image enhancement is of great significance in obtaining high-quality images and is a challenging problem in computer vision tasks. A low-light enhancement model, LLFormer, based on the Vision Transformer, uses axis-based multi-head self-attention and a cross-layer attention fusion mechanism to reduce the complexity and achieve feature extraction. This algorithm can enhance images well. However, the calculation of the attention mechanism is complex and the number of parameters is large, which limits the application of the model in practice. In response to this problem, a lightweight module, PoolFormer, is used to replace the attention module with spatial pooling, which can increase the parallelism of the network and greatly reduce the number of model parameters. To suppress image noise and improve visual effects, a new loss function is constructed for model optimization. The experiment results show that the proposed method not only reduces the number of parameters by 49%, but also performs better in terms of image detail restoration and noise suppression compared with the baseline model. On the LOL dataset, the PSNR and SSIM were 24.098dB and 0.8575 respectively. On the MIT-Adobe FiveK dataset, the PSNR and SSIM were 27.060dB and 0.9490. The evaluation results on the two datasets are better than the current mainstream low-light enhancement algorithms.
Kazuki EGASHIRA Atsuyuki MIYAI Qing YU Go IRIE Kiyoharu AIZAWA
We propose a novel classification problem setting where Undesirable Classes (UCs) are defined for each class. UC is the class you specifically want to avoid misclassifying. To address this setting, we propose a framework to reduce the probabilities for UCs while increasing the probability for a correct class.
Xiaotian WANG Tingxuan LI Takuya TAMURA Shunsuke NISHIDA Takehito UTSURO
In the research of machine reading comprehension of Japanese how-to tip QA tasks, conventional extractive machine reading comprehension methods have difficulty in dealing with cases in which the answer string spans multiple locations in the context. The method of fine-tuning of the BERT model for machine reading comprehension tasks is not suitable for such cases. In this paper, we trained a generative machine reading comprehension model of Japanese how-to tip by constructing a generative dataset based on the website “wikihow” as a source of information. We then proposed two methods for multi-task learning to fine-tune the generative model. The first method is the multi-task learning with a generative and extractive hybrid training dataset, where both generative and extractive datasets are simultaneously trained on a single model. The second method is the multi-task learning with the inter-sentence semantic similarity and answer generation, where, drawing upon the answer generation task, the model additionally learns the distance between the sentences of the question/context and the answer in the training examples. The evaluation results showed that both of the multi-task learning methods significantly outperformed the single-task learning method in generative question-and-answer examples. Between the two methods for multi-task learning, that with the inter-sentence semantic similarity and answer generation performed the best in terms of the manual evaluation result. The data and the code are available at https://github.com/EternalEdenn/multitask_ext-gen_sts-gen.
Kenichi FUJITA Atsushi ANDO Yusuke IJIMA
This paper proposes a speech rhythm-based method for speaker embeddings to model phoneme duration using a few utterances by the target speaker. Speech rhythm is one of the essential factors among speaker characteristics, along with acoustic features such as F0, for reproducing individual utterances in speech synthesis. A novel feature of the proposed method is the rhythm-based embeddings extracted from phonemes and their durations, which are known to be related to speaking rhythm. They are extracted with a speaker identification model similar to the conventional spectral feature-based one. We conducted three experiments, speaker embeddings generation, speech synthesis with generated embeddings, and embedding space analysis, to evaluate the performance. The proposed method demonstrated a moderate speaker identification performance (15.2% EER), even with only phonemes and their duration information. The objective and subjective evaluation results demonstrated that the proposed method can synthesize speech with speech rhythm closer to the target speaker than the conventional method. We also visualized the embeddings to evaluate the relationship between the distance of the embeddings and the perceptual similarity. The visualization of the embedding space and the relation analysis between the closeness indicated that the distribution of embeddings reflects the subjective and objective similarity.
Wenkai LIU Lin ZHANG Menglong WU Xichang CAI Hongxia DONG
The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.
Rikuya SASAKI Hiroyuki ICHIDA Htoo Htoo Sandi KYAW Keiichi KANEKO
The increasing demand for high-performance computing in recent years has led to active research on massively parallel systems. The interconnection network in a massively parallel system interconnects hundreds of thousands of processing elements so that they can process large tasks while communicating among others. By regarding the processing elements as nodes and the links between processing elements as edges, respectively, we can discuss various problems of interconnection networks in the framework of the graph theory. Many topologies have been proposed for interconnection networks of massively parallel systems. The hypercube is a very popular topology and it has many variants. The cross-cube is such a topology, which can be obtained by adding one extra edge to each node of the hypercube. The cross-cube reduces the diameter of the hypercube, and allows cycles of odd lengths. Therefore, we focus on the cross-cube and propose an algorithm that constructs disjoint paths from a node to a set of nodes. We give a proof of correctness of the algorithm. Also, we show that the time complexity and the maximum path length of the algorithm are O(n3 log n) and 2n - 3, respectively. Moreover, we estimate that the average execution time of the algorithm is O(n2) based on a computer experiment.
Jinsoo SEO Junghyun KIM Hyemi KIM
Song-level feature summarization is fundamental for the browsing, retrieval, and indexing of digital music archives. This study proposes a deep neural network model, CQTXNet, for extracting song-level feature summary for cover song identification. CQTXNet incorporates depth-wise separable convolution, residual network connections, and attention models to extend previous approaches. An experimental evaluation of the proposed CQTXNet was performed on two publicly available cover song datasets by varying the number of network layers and the type of attention modules.
Tomoki MINAMATA Hiroki HAMASAKI Hiroshi KAWASAKI Hajime NAGAHARA Satoshi ONO
This paper proposes a novel application of coded apertures (CAs) for visual information hiding. CA is one of the representative computational photography techniques, in which a patterned mask is attached to a camera as an alternative to a conventional circular aperture. With image processing in the post-processing phase, various functions such as omnifocal image capturing and depth estimation can be performed. In general, a watermark embedded as high-frequency components is difficult to extract if captured outside the focal length, and defocus blur occurs. Installation of a CA into the camera is a simple solution to mitigate the difficulty, and several attempts are conducted to make a better design for stable extraction. On the contrary, our motivation is to design a specific CA as well as an information hiding scheme; the secret information can only be decoded if an image with hidden information is captured with the key aperture at a certain distance outside the focus range. The proposed technique designs the key aperture patterns and information hiding scheme through evolutionary multi-objective optimization so as to minimize the decryption error of a hidden image when using the key aperture while minimizing the accuracy when using other apertures. During the optimization process, solution candidates, i.e., key aperture patterns and information hiding schemes, are evaluated on actual devices to account for disturbances that cannot be considered in optical simulations. Experimental results have shown that decoding can be performed with the designed key aperture and similar ones, that decrypted image quality deteriorates as the similarity between the key and the aperture used for decryption decreases, and that the proposed information hiding technique works on actual devices.
Le Trieu PHONG Tran Thi PHUONG Lihua WANG Seiichi OZAWA
In this paper, we explore privacy-preserving techniques in federated learning, including those can be used with both neural networks and decision trees. We begin by identifying how information can be leaked in federated learning, after which we present methods to address this issue by introducing two privacy-preserving frameworks that encompass many existing privacy-preserving federated learning (PPFL) systems. Through experiments with publicly available financial, medical, and Internet of Things datasets, we demonstrate the effectiveness of privacy-preserving federated learning and its potential to develop highly accurate, secure, and privacy-preserving machine learning systems in real-world scenarios. The findings highlight the importance of considering privacy in the design and implementation of federated learning systems and suggest that privacy-preserving techniques are essential in enabling the development of effective and practical machine learning systems.
Zhaohu PAN Hang LI Xiaojing HUANG
In this paper, we investigate optimal design of millimeter-wave (mmWave) multiuser line-of-sight multiple-input-multiple-output (LOS MIMO) systems using hybrid arrays of subarrays based on hybrid block diagonalization (BD) precoding and combining scheme. By introducing a general 3D geometric channel model, the optimal subarray separation products of the transmitter and receiver for maximizing sum-rate is designed in terms of two regular configurations of adjacent subarrays and interleaved subarrays for different users, respectively. We analyze the sensitivity of the optimal design parameters on performance in terms of a deviation factor, and derive expressions for the eigenvalues of the multiuser equivalent LOS MIMO channel matrix, which are also valid for non-optimal design. Simulation results show that the interleaved subarrays can support longer distance communication than the adjacent subarrays given the appropriate fixed subarray deployment.
Binu SHRESTHA Yuyuan CHANG Kazuhiko FUKAWA
Device-to-device (D2D) communication allows user terminals to directly communicate with each other without the need for any base stations (BSs). Since the D2D communication underlaying a cellular system shares frequency channels with BSs, co-channel interference may occur. Successive interference cancellation (SIC), which is also called the serial interference canceler, detects and subtracts user signals from received signals in descending order of received power, can cope with the above interference and has already been applied to fog nodes that manage communications among machine-to-machine (M2M) devices besides direct communications with BSs. When differences among received power levels of user signals are negligible, however, SIC cannot work well and thus causes degradation in bit error rate (BER) performance. To solve such a problem, this paper proposes to apply parallel interference cancellation (PIC), which can simultaneously detect both desired and interfering signals under the maximum likelihood criterion and can maintain good BER performance even when power level differences among users are small. When channel coding is employed, however, SIC can be superior to PIC in terms of BER under some channel conditions. Considering the superiority, this paper also proposes to select the proper cancellation scheme and modulation and coding scheme (MCS) that can maximize the throughput of D2D under a constraint of BER, in which the canceler selection is referred to as adaptive interference cancellation. Computer simulations show that PIC outperforms SIC under almost all channel conditions and thus the adaptive selection from PIC and SIC can achieve a marginal gain over PIC, while PIC can achieve 10% higher average system throughput than that of SIC. As for transmission delay time, it is demonstrated that the adaptive selection and PIC can shorten the delay time more than any other schemes, although the fog node causes the delay time of 1ms at least.
Chang SUN Xiaoyu SUN Jiamin LI Pengcheng ZHU Dongming WANG Xiaohu YOU
The application of millimeter wave (mmWave) directional transmission technology in high-speed railway (HSR) scenarios helps to achieve the goal of multiple gigabit data rates with low latency. However, due to the high mobility of trains, the traditional initial access (IA) scheme with high time consumption is difficult to guarantee the effectiveness of the beam alignment. In addition, the high path loss at the coverage edge of the millimeter wave remote radio unit (mmW-RRU) will also bring great challenges to the stability of IA performance. Fortunately, the train trajectory in HSR scenarios is periodic and regular. Moreover, the cell-free network helps to improve the system coverage performance. Based on these observations, this paper proposes an efficient IA scheme based on location and history information in cell-free networks, where the train can flexibly select a set of mmW-RRUs according to the received signal quality. We specifically analyze the collaborative IA process based on the exhaustive search and based on location and history information, derive expressions for IA success probability and delay, and perform the numerical analysis. The results show that the proposed scheme can significantly reduce the IA delay and effectively improve the stability of IA success probability.
Shota AKIYOSHI Yuzo TAENAKA Kazuya TSUKAMOTO Myung LEE
Cross-domain data fusion is becoming a key driver in the growth of numerous and diverse applications in the Internet of Things (IoT) era. We have proposed the concept of a new information platform, Geo-Centric Information Platform (GCIP), that enables IoT data fusion based on geolocation, i.e., produces spatio-temporal content (STC), and then provides the STC to users. In this environment, users cannot know in advance “when,” “where,” or “what type” of STC is being generated because the type and timing of STC generation vary dynamically with the diversity of IoT data generated in each geographical area. This makes it difficult to directly search for a specific STC requested by the user using the content identifier (domain name of URI or content name). To solve this problem, a new content discovery method that does not directly specify content identifiers is needed while taking into account (1) spatial and (2) temporal constraints. In our previous study, we proposed a content discovery method that considers only spatial constraints and did not consider temporal constraints. This paper proposes a new content discovery method that matches user requests with content metadata (topic) characteristics while taking into account spatial and temporal constraints. Simulation results show that the proposed method successfully discovers appropriate STC in response to a user request.
Compressed sensing is a rapidly growing research field in signal and image processing, machine learning, statistics, and systems control. In this survey paper, we provide a review of the theoretical foundations of compressed sensing and present state-of-the-art algorithms for solving the corresponding optimization problems. Additionally, we discuss several practical applications of compressed sensing, such as group testing, sparse system identification, and sparse feedback gain design, and demonstrate their effectiveness through Python programs. This survey paper aims to contribute to the advancement of compressed sensing research and its practical applications in various scientific disciplines.
Chikako TAKASAKI Tomohiro KORIKAWA Kyota HATTORI Hidenari OHWADA
In the beyond 5G and 6G networks, the number of connected devices and their types will greatly increase including not only user devices such as smartphones but also the Internet of Things (IoT). Moreover, Non-terrestrial networks (NTN) introduce dynamic changes in the types of connected devices as base stations or access points are moving objects. Therefore, continuous network capacity design is required to fulfill the network requirements of each device. However, continuous optimization of network capacity design for each device within a short time span becomes difficult because of the heavy calculation amount. We introduce device types as groups of devices whose traffic characteristics resemble and optimize network capacity per device type for efficient network capacity design. This paper proposes a method to classify device types by analyzing only encrypted traffic behavior without using payload and packets of specific protocols. In the first stage, general device types, such as IoT and non-IoT, are classified by analyzing packet header statistics using machine learning. Then, in the second stage, connected devices classified as IoT in the first stage are classified into IoT device types, by analyzing a time series of traffic behavior using deep learning. We demonstrate that the proposed method classifies device types by analyzing traffic datasets and outperforms the existing IoT-only device classification methods in terms of the number of types and the accuracy. In addition, the proposed model performs comparable as a state-of-the-art model of traffic classification, ResNet 1D model. The proposed method is suitable to grasp device types in terms of traffic characteristics toward efficient network capacity design in networks where massive devices for various services are connected and the connected devices continuously change.
Takanori HARA Masahiro SASABE Kento SUGIHARA Shoji KASAHARA
To establish a network service in network functions virtualization (NFV) networks, the orchestrator addresses the challenge of service chaining and virtual network function placement (SC-VNFP) by mapping virtual network functions (VNFs) and virtual links onto physical nodes and links. Unlike traditional networks, network operators in NFV networks must contend with both hardware and software failures in order to ensure resilient network services, as NFV networks consist of physical nodes and software-based VNFs. To guarantee network service quality in NFV networks, the existing work has proposed an approach for the SC-VNFP problem that considers VNF diversity and redundancy. VNF diversity splits a single VNF into multiple lightweight replica instances that possess the same functionality as the original VNF, which are then executed in a distributed manner. VNF redundancy, on the other hand, deploys backup instances with standby mode on physical nodes to prepare for potential VNF failures. However, the existing approach does not adequately consider the tradeoff between resource efficiency and service availability in the context of VNF diversity and redundancy. In this paper, we formulate the SC-VNFP problem with VNF diversity and redundancy as a two-step integer linear program (ILP) that adjusts the balance between service availability and resource efficiency. Through numerical experiments, we demonstrate the fundamental characteristics of the proposed ILP, including the tradeoff between resource efficiency and service availability.