Cong PANG Ye NI Jiaming CHENG Lin ZHOU Li ZHAO
In our work, we propose a lightweight two-stage convolutional recurrent network (BP-CRN) for multichannel speech enhancement (mcse), which consists of beamforming and post-filtering. Drawing inspiration from traditional methods, we design two core modules for spatial filtering and post-filtering with compensation, named BM and PF, respectively. Both core modules employ a convolutional encoding-decoding structure and utilize complex frequency-time long short-term memory (CFT-LSTM) blocks in the middle. Furthermore, the inter-module mask module is introduced to estimate and convey implicit spatial information and assist the post-filtering module in refining spatial filtering and suppressing residual noise. Experimental results demonstrate that, our proposed method contains only 1.27M parameters and outperforms three other mcse methods in terms of PESQ and STOI metrics.
Guangjin OUYANG Yong GUO Yu LU Fang HE
With the rapid development of Internet technology, the type and quantity of network traffic data have increased accordingly, and network traffic classification has become an important research task. In previous research, there are methods based on traditional machine learning and deep learning; compared to machine learning, deep learning can obtain good results by converting network traffic into two-dimensional images and utilizing deep learning classification models. However, all of these methods have some limitations: the trained models cannot learn sustainably, and the generalization ability of the models is limited. In order to solve this problem, we propose a network traffic classification methods based on incremental learning and Mixup, which is based on generative adversarial networks. First, the network traffic is converted into a 2D image, the original database is linearly interpolated using Mixup to reduce the overfitting tendency of the model and improve the generalization ability, and the traffic is classified using the ability of deep learning on the image. Secondly, we improve the traditional incremental learning algorithm. To effectively address the imbalance between old and new categories in incremental learning. The experimental results show that the model performs well in classification experiments, reaching 92.26% and 93.86% accuracy on the ISCXVPN2016 and USTC datasets, respectively, and we can maintain a high accuracy rate with limited storage space in the process of increasing new categories.
A backdoor sample attack is an attack that causes a deep neural network to misrecognize data that include a specific trigger because the model has been trained on malicious data that insert triggers into the deep neural network. The deep neural network correctly recognizes data without triggers, but incorrectly recognizes data with triggers. These backdoor attacks have mainly been studied in the image domain; however, defense research in the text domain is insufficient. In this study, we propose a method to defend against textual backdoor samples using a detection model. The proposed method detects a textual backdoor sample by comparing the resulting value of the target model with that of the model trained on the original training data. This method can defend against attacks without access to the entire training data. For the experimental setup, we used the TensorFlow library, and the MR and IMDB datasets were used as the experimental datasets. As a result of the experiment, when 1000 partial training datasets were used to train the detection model, the proposed method could classify the MR and IMDB datasets with detection rates of 79.6% and 83.2%, respectively.
Hiroki HOSHINO Kentaro KUSAMA Takayuki ARAI
We have developed novel coating materials capable of absorbing fingerprint oils over time. When touch screens are operated with fingers, these oils adhere to the surface, rendering them visibly dirty. When finger oils adhere to anti-reflective coatings and structures, such as moth-eye films, their anti-reflective efficacy is substantially compromised. Specifically, in moth-eye films, the oils penetrate the grooves of the bell-shaped array and are difficult to remove. In this paper, we discuss our investigation into a technique for developing anti-fingerprint properties using these novel coating materials.
Kengo NAKATA Daisuke MIYASHITA Jun DEGUCHI Ryuichi FUJIMOTO
Quantization is commonly used to reduce the inference time of convolutional neural networks (CNNs). To reduce the inference time without drastically reducing accuracy, optimal bit widths need to be allocated for each layer or filter of the CNN. In conventional methods, the optimal bit allocation is obtained by using the gradient descent algorithm while minimizing the model size. However, the model size has little to no correlation with the inference time. In this paper, we present a computational-complexity metric called MAC×bit that is strongly correlated with the inference time of quantized CNNs. We propose a gradient descent-based regularization method that uses this metric for optimal bit allocation of a quantized CNN to improve the recognition accuracy and reduce the inference time. In experiments, the proposed method reduced the inference time of a quantized ResNet-18 model by 21.0% compared with the conventional regularization method based on model size while maintaining comparable recognition accuracy.
Guanqun SHEN Kaikai CHI Osama ALFARRAJ Amr TOLBA
IoT devices, which possess limited battery capacity and computing capabilities, are unable to meet many applications’ demands. The integration of wireless power transfer and edge computing has emerged as a promising solution for this problem. Nevertheless, efficiently making offloading decisions and allocating resources pose significant challenges, particularly in the scenarios of multiple access points (APs). This paper focuses on optimizing the sum computation rate (SCR) in a wireless powered network having multiple APs. The devices work in binary offloading, operating under frequency-division multiple access (FDMA) and time-division multiple access (TDMA), respectively. To efficiently address these two mixed-integer nonlinear programming problems, a deep reinforcement learning based algorithm is employed to determine the near-optimal offloading decisions. Additionally, under the given offloading decision, we present an algorithm using the golden section search for FDMA to obtain the subsequent optimal time allocation, and apply convex optimization algorithm to obtain the optimal time allocation for TDMA. Our algorithms achieve over 95 percent of the maximum SCR with low complexity. In comparison to the baseline algorithms, our proposed algorithms exhibit advantages in terms of convergence speed and attained SCR.
This letter presents a solution for large classroom interactions using cloud computing and mobile devices. A lecturer can collect student photos or texts and give real-time feedback. Students confirmed in anonymous surveys that this solution enabled them to actively participate in classes and enhanced their learning even in large classrooms.
Federated Learning (FL) facilitates deep learning model training across distributed networks while ensuring data privacy. When deployed on edge devices, network pruning becomes essential due to the constraints of computational resources. However, traditional FL pruning methods face bias issues arising from the varied distribution of local data, which poses a significant challenge. To address this, we propose DDPruneFL, an innovative FL pruning framework that utilizes Discriminative Data (DD). Specifically, we utilize minimally pre-trained local models, allowing each client to extract semantic concepts as DD, which then inform an iterative pruning process. As a result, DDPruneFL significantly outperforms existing methods on four benchmark datasets, adeptly handling both IID and non-IID distributions and Client Selection scenarios. This model achieves state-of-the-art (SOTA) performance in this field. Moreover, our studies comprehensively validate the effectiveness of DD. Furthermore, a detailed computational complexity analysis focused on Floating-point Operations (FLOPs) is also conducted. The FLOPs analysis reveals that DDPruneFL significantly improves performance during inference while only marginally increasing training costs. Additionally, it exhibits a cost advantage in inference when compared to other pruning FL methods of the same type, further emphasizing its cost-effectiveness and practicality.
Daniel Akira ANDO Toshihiko NISHIMURA Takanori SATO Takeo OHGANE Yasutaka OGAWA Junichiro HAGIWARA
Implementation of several wireless applications such as radar systems and source localization is possible with direction of arrival (DOA) estimation, an array signal processing technique. In the past, we proposed a DOA estimation method using deep neural networks (DNNs), which presented very good performance compared to the traditional root multiple signal classification (root-MUSIC) algorithm when the number of radio wave sources is two. However, once three radio wave sources are considered, the performance of that proposed DNN decays especially at low and high signal-to-noise ratios (SNRs). In this paper, mainly focusing on the case of three sources, we present two additional strategies based on our previous method and capable of dealing with each SNR region. The first, which supports DOA estimation at low SNRs, is a scheme that makes use of principal component analysis (PCA). By representing the DNN input data in a lower dimension with PCA, it is believed that the noise corrupting the data is greatly reduced, which leads to improved performance at such SNRs. The second, which supports DOA estimation at high SNRs, is a scheme where several DNNs specialized in radio waves with close DOA are accordingly selected to produce a more reliable angular spectrum grid in such circumstances. Finally, in order to merge both ideas together, we use our previously proposed SNR estimation technique, with which appropriate selection between the two schemes mentioned above is performed. We have verified the superiority of our methods over root-MUSIC and our previous technique through computer simulation when the number of sources is three. In addition, brief discussion on the performance of these proposed methods for the case of higher number of sources is also given.
Qi QI Zi TENG Hongmei HUO Ming XU Bing BAI
To super-resolve low-resolution (LR) face image suffering from strong noise and fuzzy interference, we present a novel approach for noisy face super-resolution (SR) that is based on three-level information representation constraints. To begin with, we develop a feature distillation network that focuses on extracting pertinent face information, which incorporates both statistical anti-interference models and latent contrast algorithms. Subsequently, we incorporate a face identity embedding model and a discrete wavelet transform model, which serve as additional supervision mechanisms for the reconstruction process. The face identity embedding model ensures the reconstruction of identity information in hypersphere identity metric space, while the discrete wavelet transform model operates in the wavelet domain to supervise the restoration of spatial structures. The experimental results clearly demonstrate the efficacy of our proposed method, which is evident through the lower Learned Perceptual Image Patch Similarity (LPIPS) score and Fréchet Inception Distances (FID), and overall practicability of the reconstructed images.
Ji XI Pengxu JIANG Yue XIE Wei JIANG Hao DING
The relevant model based on convolutional neural networks (CNNs) has been proven to be an effective solution in speech enhancement algorithms. However, there needs to be more research on CNNs based on microphone arrays, especially in exploring the correlation between networks associated with different microphones. In this paper, we proposed a CNN-based feature integration network for speech enhancement in microphone arrays. The input of CNN is composed of short-time Fourier transform (STFT) from different microphones. CNN includes the encoding layer, decoding layer, and skip structure. In addition, the designed feature integration layer enables information exchange between different microphones, and the designed feature fusion layer integrates additional information. The experiment proved the superiority of the designed structure.
Accurate water level prediction systems improve safety and quality of life. This study introduces a method that uses clustering and deep learning of multisite data to enhance the water level prediction of the Three Gorges Dam. The results show that Cluster-GRU-based can provide accurate forecasts for up to seven days.
Yoshiaki TAKATA Akira ONISHI Ryoma SENDA Hiroyuki SEKI
Register automaton (RA) is an extension of finite automaton for dealing with data values in an infinite domain. In the previous work, we proposed disjunctive μ↓-calculus (μ↓d-calculus), which is a subclass of modal μ-calculus with the freeze quantifier, and showed that it has the same expressive power as RA. However, μ↓d-calculus is defined as a logic on finite words, whereas temporal specifications in model checking are usually given in terms of infinite words. In this paper, we re-define the syntax and semantics of μ↓d-calculus to be suitable for infinite words and prove that the obtained temporal logic, called μ↓dω-calculus, has the same expressive power as Büchi RA.
Jingjing LIU Chuanyang LIU Yiquan WU Zuo SUN
As one of electrical components in transmission lines, vibration damper plays a role in preventing the power lines dancing, and its recognition is an important task for intelligent inspection. However, due to the complex background interference in aerial images, current deep learning algorithms for vibration damper detection often lack accuracy and robustness. To achieve vibration damper detection more accurately, in this study, improved You Only Look Once (YOLO) model is proposed for performing damper detection. Firstly, a damper dataset containing 1900 samples with different scenarios was created. Secondly, the backbone network of YOLOv4 was improved by combining the Res2Net module and Dense blocks, reducing computational consumption and improving training speed. Then, an improved path aggregation network (PANet) structure was introduced in YOLOv4, combined with top-down and bottom-up feature fusion strategies to achieve feature enhancement. Finally, the proposed YOLO model and comparative model were trained and tested on the damper dataset. The experimental results and analysis indicate that the proposed model is more effective and robust than the comparative models. More importantly, the average precision (AP) of this model can reach 98.8%, which is 6.2% higher than that of original YOLOv4 model; and the prediction speed of this model is 62 frames per second (FPS), which is 5 FPS faster than that of YOLOv4 model.
In a 100VDC/5A resistive circuit, silver electrical contacts with airflow ejection structure are separated at a constant speed. Break arcs are generated between the contacts and blown by the airflow between the contact gap. Airflow rate is varied by changing shapes of the contacts. The break arcs are observed by two high-speed cameras. Following results are shown. Arc duration is shortened by the airflow. When the airflow rate is increased, the arc duration becomes shorter, and the break arcs are driven farther outward from the center axis of the contacts and are extinguished in a shorter length.
Electrical contacts are separated at a constant opening speed in a 48VDC/50A-600A resistive circuit. Break arcs are observed using two high-speed cameras from the top and side directions. Lengths of the break arcs are analyzed from images taken by the cameras. Arc voltages and currents corresponding to the analyzed arc lengths are investigated to obtain voltage-current characteristics of the break arcs. Relationships between the arc length versus gap voltage and the arc length versus circuit current are obtained. These results are slightly scattered. Therefore, to obtain one-to-one relationships between the arc length and the gap voltage, approximate curves should be determined for these results. Using these approximate curves, eventually, the voltage-current characteristics for each arc length are indicated.
Toshiyuki WATANABE Fujio KUROKAWA
Current resonance type of LLC converter is widely used owing to their low switching losses; however, the problem is that they have a large transformer loss. We examine the reduction of AC resistance of the transformer winding and high coupling between the primary and secondary windings of the transformer, as a method for reducing the copper loss. In this case, it is necessary to consider the effects of the increase in stray capacitance between the primary and secondary windings of the transformer. This paper describes the influence of the loss due to the capacitance generated between the transformer windings when a noise filter is connected to the LLC converter. Furthermore, we propose a new method for reducing loss by connecting a bridge-capacitor between the primary and secondary sides of the transformer. The results of the new method are shown, and compared with those of the simulations to demonstrate effectiveness.
Graphs are highly flexible data structures that can model various data and relationships. By using graphs, we can abstract and represent various things in the real world. The technology of artificially generating graphs is important in various fields where graphs are applied to various fields in engineering, including communication networks, social networks, and so on. In this paper, we organize and introduce graph generation techniques from early random-based methods to the latest deep graph generators, focusing on the aspects of feature reproduction and specification. Techniques for reproducing and specifying graph features in graph generation may provide new research methods for classical graph theory and optimization problems on graphs. This paper also presents recent achievements that may lead to further exploration in these fields and discusses the future prospects of graph generation.
Takahito YOSHIDA Takaharu YAGUCHI Takashi MATSUBARA
Accurately simulating physical systems is essential in various fields. In recent years, deep learning has been used to automatically build models of such systems by learning from data. One such method is the neural ordinary differential equation (neural ODE), which treats the output of a neural network as the time derivative of the system states. However, while this and related methods have shown promise, their training strategies still require further development. Inspired by error analysis techniques in numerical analysis while replacing numerical errors with modeling errors, we propose the error-analytic strategy to address this issue. Therefore, our strategy can capture long-term errors and thus improve the accuracy of long-term predictions.
Keitaro NAKASAI Shin KOMEDA Masateru TSUNODA Masayuki KASHIMA
To automatically measure the mental workload of developers, existing studies have used biometric measures such as brain waves and the heart rate. However, developers are often required to equip certain devices when measuring them, and can therefore be physically burdened. In this study, we evaluated the feasibility of non-contact biometric measures based on the nasal skin temperature (NST). In the experiment, the proposed biometric measures were more accurate than non-biometric measures.