Aorui GOU Jingjing LIU Xiaoxiang CHEN Xiaoyang ZENG Yibo FAN
Convolutional Neural Networks (CNNs) and Transformers have achieved remarkable performance in detection and classification tasks. Nevertheless, their feature extraction cannot consider both local and global information, so the detection and classification performance can be further improved. In addition, more and more deep learning networks are designed as more and more complex, and the amount of computation and storage space required is also significantly increased. This paper proposes a combination of CNN and transformer, and designs a local feature enhancement module and global context modeling module to enhance the cascade network. While the local feature enhancement module increases the range of feature extraction, the global context modeling is used to capture the feature maps' global information. To decrease the model complexity, a shared sublayer is designed to realize the sharing of weight parameters between the adjacent convolutional layers or cross convolutional layers, thereby reducing the number of convolutional weight parameters. Moreover, to effectively improve the detection performance of neural networks without increasing network parameters, the optimal transport assignment approach is proposed to resolve the problem of label assignment. The classification loss and regression loss are the summations of the cost between the demander and supplier. The experiment results demonstrate that the proposed Combination of CNN and Transformer with Shared Sublayer (CCTSS) performs better than the state-of-the-art methods in various datasets and applications.
Aditya RAKHMADI Kazuyuki SAITO
Transcatheter renal denervation (RDN) is a novel treatment to reduce blood pressure in patients with resistant hypertension using an energy-based catheter, mostly radio frequency (RF) current, by eliminating renal sympathetic nerve. However, several inconsistent RDN treatments were reported, mainly due to RF current narrow heating area, and the inability to confirm a successful nerve ablation in a deep area. We proposed microwave energy as an alternative for creating a wider ablation area. However, confirming a successful ablation is still a problem. In this paper, we designed a prediction method for deep renal nerve ablation sites using hybrid numerical calculation-driven machine learning (ML) in combination with a microwave catheter. This work is a first-step investigation to check the hybrid ML prediction capability in a real-world situation. A catheter with a single-slot coaxial antenna at 2.45 GHz with a balloon catheter, combined with a thin thermometer probe on the balloon surface, is proposed. Lumen temperature measured by the probe is used as an ML input to predict the temperature rise at the ablation site. Heating experiments using 6 and 8 mm hole phantom with a 41.3 W excited power, and 8 mm with 36.4 W excited power, were done eight times each to check the feasibility and accuracy of the ML algorithm. In addition, the temperature on the ablation site is measured for reference. Prediction by ML algorithm agrees well with the reference, with a maximum difference of 6°C and 3°C in 6 and 8 mm (both power), respectively. Overall, the proposed ML algorithm is capable of predicting the ablation site temperature rise with high accuracy.
Yanming CHEN Bin LYU Zhen YANG Fei LI
In this paper, we investigate a wireless-powered relays assisted batteryless IoT network based on the non-linear energy harvesting model, where there exists an energy service provider constituted by the hybrid access point (HAP) and an IoT service provider constituted by multiple clusters. The HAP provides energy signals to the batteryless devices for information backscattering and the wireless-powered relays for energy harvesting. The relays are deployed to assist the batteryless devices with the information transmission to the HAP by using the harvested energy. To model the energy interactions between the energy service provider and IoT service provider, we propose a Stackelberg game based framework. We aim to maximize the respective utility values of the two providers. Since the utility maximization problem of the IoT service provider is non-convex, we employ the fractional programming theory and propose a block coordinate descent (BCD) based algorithm with successive convex approximation (SCA) and semi-definite relaxation (SDR) techniques to solve it. Numerical simulation results confirm that compared to the benchmark schemes, our proposed scheme can achieve larger utility values for both the energy service provider and IoT service provider.
Tomoya OTA Alexander N. LOZHKIN Ken TAMANOI Hiroyoshi ISHIKAWA Takurou NISHIKAWA
This paper proposes a multibeam digital predistorter (DPD) that suppresses intercarrier interference caused by nonlinear distortions of power amplifiers (PAs) while reducing the power consumption of a multibeam array antenna transmitter. The proposed DPD reduces power consumption by allowing the final PAs of the array antenna transmitter to operate in a highly efficient nonlinear mode and compensating for the nonlinear distortions of the PAs with a unified dedicated DPD per subarray. Additionally, it provides the required high-quality signal transmission for high throughputs, such as realizing a 256-quadrature amplitude modulation (QAM) transmission instead of a 64-QAM transmission. Specifically, it adds an inverse-component signal to cancel the interference from an adjacent carrier of another beam. Consequently, it can suppress the intercarrier interference in the beam direction and improve the error vector magnitude (EVM) during the multibeam transmission, in which the frequency bands of the beams are adjacent. The experimental results obtained for two beams at 28.0 and 28.4GHz demonstrate that, compared with the previous single-beam DPD, the proposed multibeam DPD can improve the EVM. Also, they demonstrate that the proposed DPD can achieve an EVM value of <3%, which completely satisfies the 3GPP requirements for a 256-QAM transmission.
Human motion prediction has always been an interesting research topic in computer vision and robotics. It means forecasting human movements in the future conditioning on historical 3-dimensional human skeleton sequences. Existing predicting algorithms usually rely on extensive annotated or non-annotated motion capture data and are non-adaptive. This paper addresses the problem of few-frame human motion prediction, in the spirit of the recent progress on manifold learning. More precisely, our approach is based on the insight that achieving an accurate prediction relies on a sufficiently linear expression in the latent space from a few training data in observation space. To accomplish this, we propose Regressive Gaussian Process Latent Variable Model (RGPLVM) that introduces a novel regressive kernel function for the model training. By doing so, our model produces a linear mapping from the training data space to the latent space, while effectively transforming the prediction of human motion in physical space to the linear regression analysis in the latent space equivalent. The comparison with two learning motion prediction approaches (the state-of-the-art meta learning and the classical LSTM-3LR) demonstrate that our GPLVM significantly improves the prediction performance on various of actions in the small-sample size regime.
Various haze removal methods based on the atmospheric scattering model have been presented in recent years. Most methods have targeted strong haze images where light is scattered equally in all color channels. This paper presents a haze removal method using near-infrared (NIR) images for relatively weak haze images. In order to recover the lost edges, the presented method first extracts edges from an appropriately weighted NIR image and fuses it with the color image. By introducing a wavelength-dependent scattering model, our method then estimates the transmission map for each color channel and recovers the color more naturally from the edge-recovered image. Finally, the edge-recovered and the color-recovered images are blended. In this blending process, the regions with high lightness, such as sky and clouds, where unnatural color shifts are likely to occur, are effectively estimated, and the optimal weighting map is obtained. Our qualitative and quantitative evaluations using 59 pairs of color and NIR images demonstrated that our method can recover edges and colors more naturally in weak haze images than conventional methods.
Mitsuru UESUGI Yoshiaki SHINAGAWA Kazuhiro KOSAKA Toru OKADA Takeo UETA Kosuke ONO
With the rapid increase in the amount of data communication in 5G networks, there is a strong demand to reduce the power of the entire network, so the use of highly power-efficient millimeter-wave (mm-wave) networks is being considered. However, while mm-wave communication has high power efficiency, it has strong straightness, so it is difficult to secure stable communication in an environment with blocking. Especially when considering use cases such as autonomous driving, continuous communication is required when transmitting streaming data such as moving images taken by vehicles, it is necessary to compensate the blocking problem. For this reason, the authors examined an optimum radio access technology (RAT) selection scheme which selects mm-wave communication when mm-wave can be used and select wide-area macro-communication when mm-wave may be blocked. In addition, the authors implemented the scheme on a prototype device and conducted field tests and confirmed that mm-wave communication and macro communication were switched at an appropriate timing.
Shangdong LIU Chaojun MEI Shuai YOU Xiaoliang YAO Fei WU Yimu JI
The thermal imaging pedestrian segmentation system has excellent performance in different illumination conditions, but it has some drawbacks(e.g., weak pedestrian texture information, blurred object boundaries). Meanwhile, high-performance large models have higher latency on edge devices with limited computing performance. To solve the above problems, in this paper, we propose a real-time thermal infrared pedestrian segmentation method. The feature extraction layers of our method consist of two paths. Firstly, we utilize the lossless spatial downsampling to obtain boundary texture details on the spatial path. On the context path, we use atrous convolutions to improve the receptive field and obtain more contextual semantic information. Then, the parameter-free attention mechanism is introduced at the end of the two paths for effective feature selection, respectively. The Feature Fusion Module (FFM) is added to fuse the semantic information of the two paths after selection. Finally, we accelerate method inference through multi-threading techniques on the edge computing device. Besides, we create a high-quality infrared pedestrian segmentation dataset to facilitate research. The comparative experiments on the self-built dataset and two public datasets with other methods show that our method also has certain effectiveness. Our code is available at https://github.com/mcjcs001/LEIPNet.
Fazhan YANG Xingge GUO Song LIANG Peipei ZHAO Shanhua LI
Visual saliency prediction has improved dramatically since the advent of convolutional neural networks (CNN). Although CNN achieves excellent performance, it still cannot learn global and long-range contextual information well and lacks interpretability due to the locality of convolution operations. We proposed a saliency prediction model based on multi-prior enhancement and cross-modal attention collaboration (ME-CAS). Concretely, we designed a transformer-based Siamese network architecture as the backbone for feature extraction. One of the transformer branches captures the context information of the image under the self-attention mechanism to obtain a global saliency map. At the same time, we build a prior learning module to learn the human visual center bias prior, contrast prior, and frequency prior. The multi-prior input to another Siamese branch to learn the detailed features of the underlying visual features and obtain the saliency map of local information. Finally, we use an attention calibration module to guide the cross-modal collaborative learning of global and local information and generate the final saliency map. Extensive experimental results demonstrate that our proposed ME-CAS achieves superior results on public benchmarks and competitors of saliency prediction models. Moreover, the multi-prior learning modules enhance images express salient details, and model interpretability.
The application of time-series prediction is very extensive, and it is an important problem across many fields, such as stock prediction, sales prediction, and loan prediction and so on, which play a great value in production and life. It requires that the model can effectively capture the long-term feature dependence between the output and input. Recent studies show that Transformer can improve the prediction ability of time-series. However, Transformer has some problems that make it unable to be directly applied to time-series prediction, such as: (1) Local agnosticism: Self-attention in Transformer is not sensitive to short-term feature dependence, which leads to model anomalies in time-series; (2) Memory bottleneck: The spatial complexity of regular transformation increases twice with the sequence length, making direct modeling of long time-series infeasible. In order to solve these problems, this paper designs an efficient model for long time-series prediction. It is a double pyramid bidirectional feature fusion mechanism network with parallel Temporal Convolution Network (TCN) and FastFormer. This network structure can combine the time series fine-grained information captured by the Temporal Convolution Network with the global interactive information captured by FastFormer, it can well handle the time series prediction problem.
Yuxiang ZHANG Dehua LIU Chuanpeng SU Juncheng LIU
Uncovered muck truck detection aims to detect the muck truck and distinguish whether it is covered or not by dust-proof net to trace the source of pollution. Unlike traditional detection problem, recalling all uncovered trucks is more important than accurate locating for pollution traceability. When two objects are very close in an image, the occluded object may not be recalled because the non-maximum suppression (NMS) algorithm can remove the overlapped proposal. To address this issue, we propose a Location First NMS method to match the ground truth boxes and predicted boxes by position rather than class identifier (ID) in the training stage. Firstly, a box matching method is introduced to re-assign the predicted box ID using the closest ground truth one, which can avoid object missing when the IoU of two proposals is greater than the threshold. Secondly, we design a loss function to adapt the proposed algorithm. Thirdly, a uncovered muck truck detection system is designed using the method in a real scene. Experiment results show the effectiveness of the proposed method.
Shohei SAKURAI Mayu IIDA Kosei OKUNUKI Masahito KUSHIDA
In this study, vertically aligned carbon nanotubes (VA-CNTs) were grown from filler-added LB films with accumulated AlFe2O4 nanoparticles and palmitic acid (C16) as the filler molecule after different hydrogen reduction temperatures of 500°C and 750°C, and the grown VA-CNTs were compared and evaluated. As a result, VA-CNTs were approximately doubled in length after 500°C hydrogen reduction compared to 750°C hydrogen reduction when AlFe2O4 NPs were used. On the other hand, when the catalyst area ratio was decreased by using palmitic acid, i.e., the distance between CNTs was increased, VA-CNTs rapidly shortened after 500°C hydrogen reduction, and VA-CNTs were no longer obtained even in the range where VA-CNTs were obtained in 750°C hydrogen reduction. The inner and outer diameters of VA-CNTs decreased with decreasing catalyst area ratio at 750°C hydrogen reduction and tended to increase at 500°C hydrogen reduction. The morphology of the catalyst nanoparticles after CVD was observed to change significantly depending on the hydrogen reduction temperature and catalyst area ratio. These observations indicate that the state of the catalyst nanoparticles immediately before the CNT growth process greatly affects the physical properties of the CNTs.
Lead bromide-based perovskite organic-inorganic quantum-well films incorporated polycyclic aromatic chromophores into the organic layer (in other words, hybrid quantum-wells combined lead bromide semiconductor and organic semiconductors) were prepared by use of the spin-coating technique from the DMF solution in which PbBr2 and alkyl ammonium bromides which were linked polycyclic aromatics, pyrene, phenanthrene, and anthracene. When the pyrene-linked methyl ammonium bromide, which has a relatively small molecular cross-section with regard to the inorganic semiconductor plane, was employed, a lead bromide-based perovskite structure was successfully formed in the spin-coated films. When the phenanthrene-linked and anthracene-linked ammonium bromides, whose chromophore have large molecular cross-sections, were employed, lead bromide-based perovskite structures were not formed. However, the introduction of longer alkyl chains into the aromatics-linked ammonium bromides made it possible to form the perovskite structure.
Fuma MOTOYAMA Koichi KOBAYASHI Yuh YAMASHITA
A Boolean network (BN) is well known as a discrete model for analysis and control of complex networks such as gene regulatory networks. Since complex networks are large-scale in general, it is important to consider model reduction. In this paper, we consider model reduction that the information on fixed points (singleton attractors) is preserved. In model reduction studied here, the interaction graph obtained from a given BN is utilized. In the existing method, the minimum feedback vertex set (FVS) of the interaction graph is focused on. The dimension of the state is reduced to the number of elements of the minimum FVS. In the proposed method, we focus on complement and absorption laws of Boolean functions in substitution operations of a Boolean function into other one. By simplifying Boolean functions, the dimension of the state may be further reduced. Through a numerical example, we present that by the proposed method, the dimension of the state can be reduced for BNs that the dimension of the state cannot be reduced by the existing method.
Daichi WATARI Ittetsu TANIGUCHI Francky CATTHOOR Charalampos MARANTOS Kostas SIOZIOS Elham SHIRAZI Dimitrios SOUDRIS Takao ONOYE
Energy management in buildings is vital for reducing electricity costs and maximizing the comfort of occupants. Excess solar generation can be used by combining a battery storage system and a heating, ventilation, and air-conditioning (HVAC) system so that occupants feel comfortable. Despite several studies on the scheduling of appliances, batteries, and HVAC, comprehensive and time scalable approaches are required that integrate such predictive information as renewable generation and thermal comfort. In this paper, we propose an thermal-comfort aware online co-scheduling framework that incorporates optimal energy scheduling and a prediction model of PV generation and thermal comfort with the model predictive control (MPC) approach. We introduce a photovoltaic (PV) energy nowcasting and thermal-comfort-estimation model that provides useful information for optimization. The energy management problem is formulated as three coordinated optimization problems that cover fast and slow time-scales by considering predicted information. This approach reduces the time complexity without a significant negative impact on the result's global nature and its quality. Experimental results show that our proposed framework achieves optimal energy management that takes into account the trade-off between electricity expenses and thermal comfort. Our sensitivity analysis indicates that introducing a battery significantly improves the trade-off relationship.
Toshiyuki MIYAMOTO Marika IZAWA
Event structures are a well-known modeling formalism for concurrent systems with causality and conflict relations. The flow event structure (FES) is a variant of event structures, which is a generalization of the prime event structure. In an FES, two events may be in conflict even though they are not syntactically in conflict; this is called a semantic conflict. The existence of semantic conflict in an FES motivates reducing conflict relations (i.e., conflict reduction) to obtain a simpler structure. In this paper, we study conflict reduction in acyclic FESs. A necessary and sufficient condition for conflict reduction is given; algorithms to compute semantic conflict, local configurations, and conflict reduction are proposed. A great time reduction was observed in computational experiments when comparing the proposed with the naive method.
Gensai TEI Long LIU Masahiro WATANABE
We have designed a near-infrared wavelength Si/CaF2 DFB quantum cascade laser and investigated the possibility of single-mode laser oscillation by analysis of the propagation mode, gain, scattering time of Si quantum well, and threshold current density. As the waveguide and resonator, a slab-type waveguide structure with a Si/CaF2 active layer sandwiched by SiO2 on a Si (111) substrate and a grating structure in an n-Si conducting layer were assumed. From the results of optical propagation mode analysis, by assuming a λ/4-shifted bragg waveguide structure, it was found that the single vertical and horizontal TM mode propagation is possible at the designed wavelength of 1.70µm. In addition, a design of the active layer is proposed and its current injection capability is roughly estimated to be 25.1kA/cm2, which is larger than required threshold current density of 1.4kA/cm2 calculated by combining analysis results of the scattering time, population inversion, gain of quantum cascade lasers, and coupling theory of a Bragg waveguide. The results strongly indicate the possibility of single-mode laser oscillation.
Runze WANG Zehua ZHANG Yueqin ZHANG Zhongyuan JIANG Shilin SUN Guixiang MA
Recent studies in protein structure prediction such as AlphaFold have enabled deep learning to achieve great attention on the Drug-Target Affinity (DTA) task. Most works are dedicated to embed single molecular property and homogeneous information, ignoring the diverse heterogeneous information gains that are contained in the molecules and interactions. Motivated by this, we propose an end-to-end deep learning framework to perform Molecular Heterogeneous features Fusion (MolHF) for DTA prediction on heterogeneity. To address the challenges that biochemical attributes locates in different heterogeneous spaces, we design a Molecular Heterogeneous Information Learning module with multi-strategy learning. Especially, Molecular Heterogeneous Attention Fusion module is present to obtain the gains of molecular heterogeneous features. With these, the diversity of molecular structure information for drugs can be extracted. Extensive experiments on two benchmark datasets show that our method outperforms the baselines in all four metrics. Ablation studies validate the effect of attentive fusion and multi-group of drug heterogeneous features. Visual presentations demonstrate the impact of protein embedding level and the model ability of fitting data. In summary, the diverse gains brought by heterogeneous information contribute to drug-target affinity prediction.
Wenrong XIAO Yong CHEN Suqin GUO Kun CHEN
An attention residual network with triple feature as input is proposed to predict the remaining useful life (RUL) of bearings. First, the channel attention and spatial attention are connected in series into the residual connection of the residual neural network to obtain a new attention residual module, so that the newly constructed deep learning network can better pay attention to the weak changes of the bearing state. Secondly, the “triple feature” is used as the input of the attention residual network, so that the deep learning network can better grasp the change trend of bearing running state, and better realize the prediction of the RUL of bearing. Finally, The method is verified by a set of experimental data. The results show the method is simple and effective, has high prediction accuracy, and reduces manual intervention in RUL prediction.
Epileptic seizure prediction is an important research topic in the clinical epilepsy treatment, which can provide opportunities to take precautionary measures for epilepsy patients and medical staff. EEG is an commonly used tool for studying brain activity, which records the electrical discharge of brain. Many studies based on machine learning algorithms have been proposed to solve the task using EEG signal. In this study, we propose a novel seizure prediction models based on convolutional neural networks and scalp EEG for a binary classification between preictal and interictal states. The short-time Fourier transform has been used to translate raw EEG signals into STFT sepctrums, which is applied as input of the models. The fusion features have been obtained through the side-output constructions and used to train and test our models. The test results show that our models can achieve comparable results in both sensitivity and FPR upon fusion features. The proposed patient-specific model can be used in seizure prediction system for EEG classification.