Author Search Result

[Author] Long WU(8hit)

1-8hit
  • On the Complementary Role of DNN Multi-Level Enhancement for Noisy Robust Speaker Recognition in an I-Vector Framework

    Xingyu ZHANG  Xia ZOU  Meng SUN  Penglong WU  Yimin WANG  Jun HE  

     
    LETTER-Speech and Hearing

      Vol:
    E103-A No:1
      Page(s):
    356-360

    In order to improve the noise robustness of automatic speaker recognition, many techniques on speech/feature enhancement have been explored by using deep neural networks (DNN). In this work, a DNN multi-level enhancement (DNN-ME), which consists of the stages of signal enhancement, cepstrum enhancement and i-vector enhancement, is proposed for text-independent speaker recognition. Given the fact that these enhancement methods are applied in different stages of the speaker recognition pipeline, it is worth exploring the complementary role of these methods, which benefits the understanding of the pros and cons of the enhancements of different stages. In order to use the capabilities of DNN-ME as much as possible, two kinds of methods called Cascaded DNN-ME and joint input of DNNs are studied. Weighted Gaussian mixture models (WGMMs) proposed in our previous work is also applied to further improve the model's performance. Experiments conducted on the Speakers in the Wild (SITW) database have shown that DNN-ME demonstrated significant superiority over the systems with only a single enhancement for noise robust speaker recognition. Compared with the i-vector baseline, the equal error rate (EER) was reduced from 5.75 to 4.01.

  • Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network

    Wenkai LIU  Cuizhu QIN  Menglong WU  Wenle BAI  Hongxia DONG  

     
    LETTER-Human-computer Interaction

      Pubricized:
    2023/02/15
      Vol:
    E106-D No:5
      Page(s):
    1081-1084

    Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.

  • Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology

    Wenkai LIU  Lin ZHANG  Menglong WU  Xichang CAI  Hongxia DONG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/10/23
      Vol:
    E107-D No:1
      Page(s):
    83-92

    The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.

  • An Adaptively Biased OFDM Based on Hartley Transform for Visible Light Communication Systems Open Access

    Menglong WU  Yongfa XIE  Yongchao SHI  Jianwen ZHANG  Tianao YAO  Wenkai LIU  

     
    LETTER-Communication Theory and Signals

      Pubricized:
    2023/09/20
      Vol:
    E107-A No:6
      Page(s):
    928-931

    Direct-current biased optical orthogonal frequency division multiplexing (DCO-OFDM) converts bipolar OFDM signals into unipolar non-negative signals by introducing a high DC bias, which satisfies the requirement that the signal transmitted by intensity modulated/direct detection (IM/DD) must be positive. However, the high DC bias results in low power efficiency of DCO-OFDM. An adaptively biased optical OFDM was proposed, which could be designed with different biases according to the signal amplitude to improve power efficiency in this letter. The adaptive bias does not need to be taken off deliberately at the receiver, and the interference caused by the adaptive bias will only be placed on the reserved subcarriers, which will not affect the effective information. Moreover, the proposed OFDM uses Hartley transform instead of Fourier transform used in conventional optical OFDM, which makes this OFDM have low computational complexity and high spectral efficiency. The simulation results show that the normalized optical bit energy to noise power ratio (Eb(opt)/N0) required by the proposed OFDM at the bit error rate (BER) of 10-3 is, on average, 7.5 dB and 3.4 dB lower than that of DCO-OFDM and superimposed asymmetrically clipped optical OFDM (ACO-OFDM), respectively.

  • Peak-to-Average Power Ratio Reduction Scheme in DCO-OFDM with a Combined Index Modulation and Convex Optimization Open Access

    Menglong WU  Jianwen ZHANG  Yongfa XIE  Yongchao SHI  Tianao YAO  

     
    LETTER-Communication Theory and Signals

      Pubricized:
    2024/03/22
      Vol:
    E107-A No:8
      Page(s):
    1425-1429

    Direct-current biased optical orthogonal frequency division multiplexing (DCO-OFDM) exhibits a high peak-to-average power ratio (PAPR), which leads to nonlinear distortion in the system. In response to the above, the study proposes a scheme that combines direct-current biased optical orthogonal frequency division multiplexing with index modulation (DCO-OFDM-IM) and convex optimization algorithms. The proposed scheme utilizes partially activated subcarriers of the system to transmit constellation modulated symbol information, and transmits additional symbol information of the system through the combination of activated carrier index. Additionally, a dither signal is added to the system’s idle subcarriers, and the convex optimization algorithm is applied to solve for the optimal values of this dither signal. Therefore, by ensuring the system’s peak power remains unchanged, the scheme enhances the system’s average transmission power and thus achieves a reduction in the PAPR. Experimental results indicate that at a system’s complementary cumulative distribution function (CCDF) of 10-4, the proposed scheme reduces the PAPR by approximately 3.5 dB compared to the conventional DCO-OFDM system. Moreover, at a bit error rate (BER) of 10-3, the proposed scheme can lower the signal-to-noise ratio (SNR) by about 1 dB relative to the traditional DCO-OFDM system. Therefore, the proposed scheme enables a more substantial reduction in PAPR and improvement in BER performance compared to the conventional DCO-OFDM approach.

  • Siamese Visual Tracking with Dual-Pipeline Correlated Fusion Network

    Ying KANG  Cong LIU  Ning WANG  Dianxi SHI  Ning ZHOU  Mengmeng LI  Yunlong WU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/09
      Vol:
    E104-D No:10
      Page(s):
    1702-1711

    Siamese visual tracking, viewed as a problem of max-similarity matching to the target template, has absorbed increasing attention in computer vision. However, it is a challenge for current Siamese trackers that the demands of balance between accuracy in real-time tracking and robustness in long-time tracking are hard to meet. This work proposes a new Siamese based tracker with a dual-pipeline correlated fusion network (named as ADF-SiamRPN), which consists of one initial template for robust correlation, and the other transient template with the ability of adaptive feature optimal selection for accurate correlation. By the promotion from the learnable correlation-response fusion network afterwards, we are in pursuit of the synthetical improvement of tracking performance. To compare the performance of ADF-SiamRPN with state-of-the-art trackers, we conduct lots of experiments on benchmarks like OTB100, UAV123, VOT2016, VOT2018, GOT-10k, LaSOT and TrackingNet. The experimental results of tracking demonstrate that ADF-SiamRPN outperforms all the compared trackers and achieves the best balance between accuracy and robustness.

  • Detection and Tracking Method for Dynamic Barcodes Based on a Siamese Network

    Menglong WU  Cuizhu QIN  Hongxia DONG  Wenkai LIU  Xiaodong NIE  Xichang CAI  Yundong LI  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2022/01/13
      Vol:
    E105-B No:7
      Page(s):
    866-875

    In many screen to camera communication (S2C) systems, the barcode preprocessing method is a significant prerequisite because barcodes may be deformed due to various environmental factors. However, previous studies have focused on barcode detection under static conditions; to date, few studies have been carried out on dynamic conditions (for example, the barcode video stream or the transmitter and receiver are moving). Therefore, we present a detection and tracking method for dynamic barcodes based on a Siamese network. The backbone of the CNN in the Siamese network is improved by SE-ResNet. The detection accuracy achieved 89.5%, which stands out from other classical detection networks. The EAO reaches 0.384, which is better than previous tracking methods. It is also superior to other methods in terms of accuracy and robustness. The SE-ResNet in this paper improved the EAO by 1.3% compared with ResNet in SiamMask. Also, our method is not only applicable to static barcodes but also allows real-time tracking and segmentation of barcodes captured in dynamic situations.

  • Compressed Sensing for Range-Resolved Signal of Ballistic Target with Low Computational Complexity

    Wentao LV  Jiliang LIU  Xiaomin BAO  Xiaocheng YANG  Long WU  

     
    LETTER-Digital Signal Processing

      Vol:
    E99-A No:6
      Page(s):
    1238-1242

    The classification of warheads and decoys is a core technology in the defense of the ballistic missile. Usually, a high range resolution is favorable for the development of the classification algorithm, which requires a high sampling rate in fast time, and thus leads to a heavy computation burden for data processing. In this paper, a novel method based on compressed sensing (CS) is presented to improve the range resolution of the target with low computational complexity. First, a tool for electromagnetic calculation, such as CST Microwave Studio, is used to simulate the frequency response of the electromagnetic scattering of the target. Second, the range-resolved signal of the target is acquired by further processing. Third, a greedy algorithm is applied to this signal. By the iterative search of the maximum value from the signal rather than the calculation of the inner product for raw echo, the scattering coefficients of the target can be reconstructed efficiently. A series of experimental results demonstrates the effectiveness of our method.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.