IEICE globals.ieice.org Site

Author Search Result

[Author] Long WU(8hit)

1-8hit

On the Complementary Role of DNN Multi-Level Enhancement for Noisy Robust Speaker Recognition in an I-Vector Framework
Xingyu ZHANG Xia ZOU Meng SUN Penglong WU Yimin WANG Jun HE

LETTER-Speech and Hearing

Vol:
E103-A No:1
Page(s):
356-360
In order to improve the noise robustness of automatic speaker recognition, many techniques on speech/feature enhancement have been explored by using deep neural networks (DNN). In this work, a DNN multi-level enhancement (DNN-ME), which consists of the stages of signal enhancement, cepstrum enhancement and i-vector enhancement, is proposed for text-independent speaker recognition. Given the fact that these enhancement methods are applied in different stages of the speaker recognition pipeline, it is worth exploring the complementary role of these methods, which benefits the understanding of the pros and cons of the enhancements of different stages. In order to use the capabilities of DNN-ME as much as possible, two kinds of methods called Cascaded DNN-ME and joint input of DNNs are studied. Weighted Gaussian mixture models (WGMMs) proposed in our previous work is also applied to further improve the model's performance. Experiments conducted on the Speakers in the Wild (SITW) database have shown that DNN-ME demonstrated significant superiority over the systems with only a single enhancement for noise robust speaker recognition. Compared with the i-vector baseline, the equal error rate (EER) was reduced from 5.75 to 4.01.
Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network
Wenkai LIU Cuizhu QIN Menglong WU Wenle BAI Hongxia DONG

LETTER-Human-computer Interaction

Pubricized:
2023/02/15
Vol:
E106-D No:5
Page(s):
1081-1084
Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.
Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology
Wenkai LIU Lin ZHANG Menglong WU Xichang CAI Hongxia DONG

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2023/10/23
Vol:
E107-D No:1
Page(s):
83-92
The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.
An Adaptively Biased OFDM Based on Hartley Transform for Visible Light Communication Systems Open Access
Menglong WU Yongfa XIE Yongchao SHI Jianwen ZHANG Tianao YAO Wenkai LIU

LETTER-Communication Theory and Signals

Pubricized:
2023/09/20
Vol:
E107-A No:6
Page(s):
928-931
Direct-current biased optical orthogonal frequency division multiplexing (DCO-OFDM) converts bipolar OFDM signals into unipolar non-negative signals by introducing a high DC bias, which satisfies the requirement that the signal transmitted by intensity modulated/direct detection (IM/DD) must be positive. However, the high DC bias results in low power efficiency of DCO-OFDM. An adaptively biased optical OFDM was proposed, which could be designed with different biases according to the signal amplitude to improve power efficiency in this letter. The adaptive bias does not need to be taken off deliberately at the receiver, and the interference caused by the adaptive bias will only be placed on the reserved subcarriers, which will not affect the effective information. Moreover, the proposed OFDM uses Hartley transform instead of Fourier transform used in conventional optical OFDM, which makes this OFDM have low computational complexity and high spectral efficiency. The simulation results show that the normalized optical bit energy to noise power ratio (Eb(opt)/N0) required by the proposed OFDM at the bit error rate (BER) of 10-3 is, on average, 7.5 dB and 3.4 dB lower than that of DCO-OFDM and superimposed asymmetrically clipped optical OFDM (ACO-OFDM), respectively.
Peak-to-Average Power Ratio Reduction Scheme in DCO-OFDM with a Combined Index Modulation and Convex Optimization Open Access
Menglong WU Jianwen ZHANG Yongfa XIE Yongchao SHI Tianao YAO

LETTER-Communication Theory and Signals

Pubricized:
2024/03/22
Vol:
E107-A No:8
Page(s):
1425-1429
Direct-current biased optical orthogonal frequency division multiplexing (DCO-OFDM) exhibits a high peak-to-average power ratio (PAPR), which leads to nonlinear distortion in the system. In response to the above, the study proposes a scheme that combines direct-current biased optical orthogonal frequency division multiplexing with index modulation (DCO-OFDM-IM) and convex optimization algorithms. The proposed scheme utilizes partially activated subcarriers of the system to transmit constellation modulated symbol information, and transmits additional symbol information of the system through the combination of activated carrier index. Additionally, a dither signal is added to the system’s idle subcarriers, and the convex optimization algorithm is applied to solve for the optimal values of this dither signal. Therefore, by ensuring the system’s peak power remains unchanged, the scheme enhances the system’s average transmission power and thus achieves a reduction in the PAPR. Experimental results indicate that at a system’s complementary cumulative distribution function (CCDF) of 10-4, the proposed scheme reduces the PAPR by approximately 3.5 dB compared to the conventional DCO-OFDM system. Moreover, at a bit error rate (BER) of 10-3, the proposed scheme can lower the signal-to-noise ratio (SNR) by about 1 dB relative to the traditional DCO-OFDM system. Therefore, the proposed scheme enables a more substantial reduction in PAPR and improvement in BER performance compared to the conventional DCO-OFDM approach.
Siamese Visual Tracking with Dual-Pipeline Correlated Fusion Network
Ying KANG Cong LIU Ning WANG Dianxi SHI Ning ZHOU Mengmeng LI Yunlong WU

PAPER-Image Recognition, Computer Vision

Pubricized:
2021/07/09
Vol:
E104-D No:10
Page(s):
1702-1711
Siamese visual tracking, viewed as a problem of max-similarity matching to the target template, has absorbed increasing attention in computer vision. However, it is a challenge for current Siamese trackers that the demands of balance between accuracy in real-time tracking and robustness in long-time tracking are hard to meet. This work proposes a new Siamese based tracker with a dual-pipeline correlated fusion network (named as ADF-SiamRPN), which consists of one initial template for robust correlation, and the other transient template with the ability of adaptive feature optimal selection for accurate correlation. By the promotion from the learnable correlation-response fusion network afterwards, we are in pursuit of the synthetical improvement of tracking performance. To compare the performance of ADF-SiamRPN with state-of-the-art trackers, we conduct lots of experiments on benchmarks like OTB100, UAV123, VOT2016, VOT2018, GOT-10k, LaSOT and TrackingNet. The experimental results of tracking demonstrate that ADF-SiamRPN outperforms all the compared trackers and achieves the best balance between accuracy and robustness.
Detection and Tracking Method for Dynamic Barcodes Based on a Siamese Network
Menglong WU Cuizhu QIN Hongxia DONG Wenkai LIU Xiaodong NIE Xichang CAI Yundong LI

PAPER-Wireless Communication Technologies

Pubricized:
2022/01/13
Vol:
E105-B No:7
Page(s):
866-875
In many screen to camera communication (S2C) systems, the barcode preprocessing method is a significant prerequisite because barcodes may be deformed due to various environmental factors. However, previous studies have focused on barcode detection under static conditions; to date, few studies have been carried out on dynamic conditions (for example, the barcode video stream or the transmitter and receiver are moving). Therefore, we present a detection and tracking method for dynamic barcodes based on a Siamese network. The backbone of the CNN in the Siamese network is improved by SE-ResNet. The detection accuracy achieved 89.5%, which stands out from other classical detection networks. The EAO reaches 0.384, which is better than previous tracking methods. It is also superior to other methods in terms of accuracy and robustness. The SE-ResNet in this paper improved the EAO by 1.3% compared with ResNet in SiamMask. Also, our method is not only applicable to static barcodes but also allows real-time tracking and segmentation of barcodes captured in dynamic situations.
Compressed Sensing for Range-Resolved Signal of Ballistic Target with Low Computational Complexity
Wentao LV Jiliang LIU Xiaomin BAO Xiaocheng YANG Long WU

LETTER-Digital Signal Processing

Vol:
E99-A No:6
Page(s):
1238-1242
The classification of warheads and decoys is a core technology in the defense of the ballistic missile. Usually, a high range resolution is favorable for the development of the classification algorithm, which requires a high sampling rate in fast time, and thus leads to a heavy computation burden for data processing. In this paper, a novel method based on compressed sensing (CS) is presented to improve the range resolution of the target with low computational complexity. First, a tool for electromagnetic calculation, such as CST Microwave Studio, is used to simulate the frequency response of the electromagnetic scattering of the target. Second, the range-resolved signal of the target is acquired by further processing. Third, a greedy algorithm is applied to this signal. By the iterative search of the maximum value from the signal rather than the calculation of the inner product for raw echo, the scattering coefficients of the target can be reconstructed efficiently. A series of experimental results demonstrates the effectiveness of our method.

Author Search Result

[Author] Long WU(8hit)

On the Complementary Role of DNN Multi-Level Enhancement for Noisy Robust Speaker Recognition in an I-Vector Framework

Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network

Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology

An Adaptively Biased OFDM Based on Hartley Transform for Visible Light Communication Systems Open Access

Peak-to-Average Power Ratio Reduction Scheme in DCO-OFDM with a Combined Index Modulation and Convex Optimization Open Access

Siamese Visual Tracking with Dual-Pipeline Correlated Fusion Network

Detection and Tracking Method for Dynamic Barcodes Based on a Siamese Network

Compressed Sensing for Range-Resolved Signal of Ballistic Target with Low Computational Complexity

Latest Issue

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles