Author Search Result

[Author] Jie ZHU(16hit)

1-16hit
  • Mode Normalization Enhanced Recurrent Model for Multi-Modal Semantic Trajectory Prediction

    Shaojie ZHU  Lei ZHANG  Bailong LIU  Shumin CUI  Changxing SHAO  Yun LI  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/10/04
      Vol:
    E103-D No:1
      Page(s):
    174-176

    Multi-modal semantic trajectory prediction has become a new challenge due to the rapid growth of multi-modal semantic trajectories with text message. Traditional RNN trajectory prediction methods have the following problems to process multi-modal semantic trajectory. The distribution of multi-modal trajectory samples shifts gradually with training. It leads to difficult convergency and long training time. Moreover, each modal feature shifts in different directions, which produces multiple distributions of dataset. To solve the above problems, MNERM (Mode Normalization Enhanced Recurrent Model) for multi-modal semantic trajectory is proposed. MNERM embeds multiple modal features together and combines the LSTM network to capture long-term dependency of trajectory. In addition, it designs Mode Normalization mechanism to normalize samples with multiple means and variances, and each distribution normalized falls into the action area of the activation function, so as to improve the prediction efficiency while improving greatly the training speed. Experiments on real dataset show that, compared with SERM, MNERM reduces the sensitivity of learning rate, improves the training speed by 9.120 times, increases HR@1 by 0.03, and reduces the ADE by 120 meters.

  • A Two-Stage Approach for Fine-Grained Visual Recognition via Confidence Ranking and Fusion

    Kangbo SUN  Jie ZHU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2020/09/11
      Vol:
    E103-D No:12
      Page(s):
    2693-2700

    Location and feature representation of object's parts play key roles in fine-grained visual recognition. To promote the final recognition accuracy without any bounding boxes/part annotations, many studies adopt object location networks to propose bounding boxes/part annotations with only category labels, and then crop the images into partial images to help the classification network make the final decision. In our work, to propose more informative partial images and effectively extract discriminative features from the original and partial images, we propose a two-stage approach that can fuse the original features and partial features by evaluating and ranking the information of partial images. Experimental results show that our proposed approach achieves excellent performance on two benchmark datasets, which demonstrates its effectiveness.

  • A Weighted Overlapped Block-Based Compressive Sensing in SAR Imaging

    Hanxu YOU  Lianqiang LI  Jie ZHU  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2016/12/15
      Vol:
    E100-D No:3
      Page(s):
    590-593

    The compressive sensing (CS) theory has been widely used in synthetic aperture radar (SAR) imaging for its ability to reconstruct image from an extremely small set of measurements than what is generally considered necessary. Because block-based CS approaches in SAR imaging always cause block boundaries between two adjacent blocks, resulting in namely the block artefacts. In this paper, we propose a weighted overlapped block-based compressive sensing (WOBCS) method to reduce the block artefacts and accomplish SAR imaging. It has two main characteristics: 1) the strategy of sensing small and recovering big and 2) adaptive weighting technique among overlapped blocks. This proposed method is implemented by the well-known CS recovery schemes like orthogonal matching pursuit (OMP) and BCS-SPL. Promising results are demonstrated through several experiments.

  • Lexicon-Based Local Representation for Text-Dependent Speaker Verification

    Hanxu YOU  Wei LI  Lianqiang LI  Jie ZHU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2016/12/05
      Vol:
    E100-D No:3
      Page(s):
    587-589

    A text-dependent i-vector extraction scheme and a lexicon-based binary vector (L-vector) representation are proposed to improve the performance of text-dependent speaker verification. I-vector and L-vector are used to represent the utterances for enrollment and test. An improved cosine distance kernel is constructed by combining i-vector and L-vector together and is used to distinguish both speaker identity and lexical (or text) diversity with back-end support vector machine (SVM). Experiments are conducted on RSR 2015 Corpus part 1 and part 2, the results indicate that at most 30% improvement can be obtained compared with traditional i-vector baseline.

  • A Novel Speech Enhancement System Based on the Coherence-Based Algorithm and the Differential Beamforming

    Lei WANG  Jie ZHU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2018/08/31
      Vol:
    E101-D No:12
      Page(s):
    3253-3257

    This letter proposes a novel speech enhancement system based on the ‘L’ shaped triple-microphone. The modified coherence-based algorithm and the first-order differential beamforming are combined to filter the spatial distributed noise. The experimental results reveal that the proposed algorithm achieves significant performance in spatial filtering under different noise scenarios.

  • A Novel Multi-Knowledge Distillation Approach

    Lianqiang LI  Kangbo SUN  Jie ZHU  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2020/10/19
      Vol:
    E104-D No:1
      Page(s):
    216-219

    Knowledge distillation approaches can transfer information from a large network (teacher network) to a small network (student network) to compress and accelerate deep neural networks. This paper proposes a novel knowledge distillation approach called multi-knowledge distillation (MKD). MKD consists of two stages. In the first stage, it employs autoencoders to learn compact and precise representations of the feature maps (FM) from the teacher network and the student network, these representations can be treated as the essential of the FM, i.e., EFM. In the second stage, MKD utilizes multiple kinds of knowledge, i.e., the magnitude of individual sample's EFM and the similarity relationships among several samples' EFM to enhance the generalization ability of the student network. Compared with previous approaches that employ FM or the handcrafted features from FM, the EFM learned from autoencoders can be transferred more efficiently and reliably. Furthermore, the rich information provided by the multiple kinds of knowledge guarantees the student network to mimic the teacher network as closely as possible. Experimental results also show that MKD is superior to the-state-of-arts.

  • A Novel Frame Aggregation Scheduler to Solve the Head-of-Line Blocking Problem for Real-Time UDP Traffic in Aggregation-Enabled WLANs

    Linjie ZHU  Bin WU  Zhiwei WEI  Yu TANG  

     
    LETTER-Information Network

      Pubricized:
    2019/03/29
      Vol:
    E102-D No:7
      Page(s):
    1408-1411

    In this letter, a novel frame aggregation scheduler is proposed to solve the head-of-line blocking problem for real-time user datagram protocol (UDP) traffic in error-prone and aggregation-enabled wireless local area networks (WLANs). The key to the proposed scheduler is to break the restriction of in-order delivery over the WLAN. The simulation results show that the proposed scheduler can achieve high UDP goodput and low delay compared to the conventional scheduler.

  • Joint Patch Weighting and Moment Matching for Unsupervised Domain Adaptation in Micro-Expression Recognition

    Jie ZHU  Yuan ZONG  Hongli CHANG  Li ZHAO  Chuangao TANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2021/11/17
      Vol:
    E105-D No:2
      Page(s):
    441-445

    Unsupervised domain adaptation (DA) is a challenging machine learning problem since the labeled training (source) and unlabeled testing (target) sets belong to different domains and then have different feature distributions, which has recently attracted wide attention in micro-expression recognition (MER). Although some well-performing unsupervised DA methods have been proposed, these methods cannot well solve the problem of unsupervised DA in MER, a. k. a., cross-domain MER. To deal with such a challenging problem, in this letter we propose a novel unsupervised DA method called Joint Patch weighting and Moment Matching (JPMM). JPMM bridges the source and target micro-expression feature sets by minimizing their probability distribution divergence with a multi-order moment matching operation. Meanwhile, it takes advantage of the contributive facial patches by the weight learning such that a domain-invariant feature representation involving micro-expression distinguishable information can be learned. Finally, we carry out extensive experiments to evaluate the proposed JPMM method is superior to recent state-of-the-art unsupervised DA methods in dealing with cross-domain MER.

  • A Spectral Clustering Based Filter-Level Pruning Method for Convolutional Neural Networks

    Lianqiang LI  Jie ZHU  Ming-Ting SUN  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/09/17
      Vol:
    E102-D No:12
      Page(s):
    2624-2627

    Convolutional Neural Networks (CNNs) usually have millions or even billions of parameters, which make them hard to be deployed into mobile devices. In this work, we present a novel filter-level pruning method to alleviate this issue. More concretely, we first construct an undirected fully connected graph to represent a pre-trained CNN model. Then, we employ the spectral clustering algorithm to divide the graph into some subgraphs, which is equivalent to clustering the similar filters of the CNN into the same groups. After gaining the grouping relationships among the filters, we finally keep one filter for one group and retrain the pruned model. Compared with previous pruning methods that identify the redundant filters by heuristic ways, the proposed method can select the pruning candidates more reasonably and precisely. Experimental results also show that our proposed pruning method has significant improvements over the state-of-the-arts.

  • Study of Performance of OFDM-CDMA System with Various Processing Gains

    Junjie ZHU  Wai Choong WONG  Liyanage C. DESILVA  

     
    LETTER-Terrestrial Radio Communications

      Vol:
    E84-B No:3
      Page(s):
    678-681

    In this paper, the performance of OFDM-CDMA system in mobile communication channels with different detection algorithms is evaluated in terms of the probability of bit error as a function of processing gain. Both the frequency diversity and multi-user interference (MUI) are taken into account. The simulation results show that MUI is the dominant factor that affects system performance, and MMSE outperforms the other algorithms. Based on the simulation results, a modified system scheme is proposed which performs better than the conventional method.

  • Searching and Learning Discriminative Regions for Fine-Grained Image Retrieval and Classification

    Kangbo SUN  Jie ZHU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/10/18
      Vol:
    E105-D No:1
      Page(s):
    141-149

    Local discriminative regions play important roles in fine-grained image analysis tasks. How to locate local discriminative regions with only category label and learn discriminative representation from these regions have been hot spots. In our work, we propose Searching Discriminative Regions (SDR) and Learning Discriminative Regions (LDR) method to search and learn local discriminative regions in images. The SDR method adopts attention mechanism to iteratively search for high-response regions in images, and uses this as a clue to locate local discriminative regions. Moreover, the LDR method is proposed to learn compact within category and sparse between categories representation from the raw image and local images. Experimental results show that our proposed approach achieves excellent performance in both fine-grained image retrieval and classification tasks, which demonstrates its effectiveness.

  • An Approach to Detect Cavities in X-Ray Astronomical Images Using Granular Convolutional Neural Networks

    Zhixian MA  Jie ZHU  Weitian LI  Haiguang XU  

     
    PAPER-Pattern Recognition

      Pubricized:
    2017/07/18
      Vol:
    E100-D No:10
      Page(s):
    2578-2586

    Detection of cavities in X-ray astronomical images has become a field of interest, since the flourishing studies on black holes and the Active Galactic Nuclei (AGN). In this paper, an approach is proposed to detect cavities in X-ray astronomical images using our newly designed Granular Convolutional Neural Network (GCNN) based classifiers. The raw data are firstly preprocessed to obtain images of the observed objects, i.e., galaxies or galaxy clusters. In each image, pixels are classified into three categories, (1) the faint backgrounds (BKG), (2) the cavity regions (CAV), and (3) the bright central gas regions (CNT). And the sample sets are then generated by dividing large images into subimages with a window size according to the cavities' scale. Since the number of BKG samples are far more than the other types, to achieve balanced training sets, samples from the major class are split into subsets, i.e., granule. Then a group of three-convolutional-layer granular CNN networks without subsampling layers are designed as the classifiers, and trained with the labeled granular sample sets. Finally, the trained GCNN classifiers are applied to new observations, so as to estimate the cavity regions with a voting strategy and locate them with elliptical profiles on the raw observation images. Experiments and applications of our approach are demonstrated on 40 X-ray astronomical observations retrieved from chandra Data Archive (CDA). Comparisons among our approach, the β-model fitting and the Unsharp Masking (UM) methods were also performed, which prove our approach was more accurate and robust.

  • Filter Level Pruning Based on Similar Feature Extraction for Convolutional Neural Networks

    Lianqiang LI  Yuhui XU  Jie ZHU  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/01/18
      Vol:
    E101-D No:4
      Page(s):
    1203-1206

    This paper introduces a filter level pruning method based on similar feature extraction for compressing and accelerating the convolutional neural networks by k-means++ algorithm. In contrast to other pruning methods, the proposed method would analyze the similarities in recognizing features among filters rather than evaluate the importance of filters to prune the redundant ones. This strategy would be more reasonable and effective. Furthermore, our method does not result in unstructured network. As a result, it needs not extra sparse representation and could be efficiently supported by any off-the-shelf deep learning libraries. Experimental results show that our filter pruning method could reduce the number of parameters and the amount of computational costs in Lenet-5 by a factor of 17.9× with only 0.3% accuracy loss.

  • A Retransmission Scheme in IEEE 802.11be Synchronized Multi-Link WLANs

    Linjie ZHU  Liang GU  Rongliang CHEN  

     
    LETTER-Mobile Information Network and Personal Communications

      Pubricized:
    2022/11/02
      Vol:
    E106-A No:5
      Page(s):
    871-875

    A novel retransmission scheme, considering both transmission rate and frame error rate, is proposed to alleviate the inefficiencies caused by head-of-line blocking and null padding problems during retransmission in IEEE 802.11be synchronous multi-link wireless local area networks. Simulation results show that the proposed scheme improves throughput by up to 200% over the legacy scheme by reallocating lost subframes and adding effective duplicate subframes to multiple links.

  • A Multi-Task Scheme for Supervised DNN-Based Single-Channel Speech Enhancement by Using Speech Presence Probability as the Secondary Training Target

    Lei WANG  Jie ZHU  Kangbo SUN  

    This paper has been cancelled due to violation of duplicate submission policy on IEICE Transactions on Information and Systems.
     
    PAPER-Speech and Hearing

      Pubricized:
    2021/08/05
      Vol:
    E104-D No:11
      Page(s):
    1963-1970

    To cope with complicated interference scenarios in realistic acoustic environment, supervised deep neural networks (DNNs) are investigated to estimate different user-defined targets. Such techniques can be broadly categorized into magnitude estimation and time-frequency mask estimation techniques. Further, the mask such as the Wiener gain can be estimated directly or derived by the estimated interference power spectral density (PSD) or the estimated signal-to-interference ratio (SIR). In this paper, we propose to incorporate the multi-task learning in DNN-based single-channel speech enhancement by using the speech presence probability (SPP) as a secondary target to assist the target estimation in the main task. The domain-specific information is shared between two tasks to learn a more generalizable representation. Since the performance of multi-task network is sensitive to the weight parameters of loss function, the homoscedastic uncertainty is introduced to adaptively learn the weights, which is proven to outperform the fixed weighting method. Simulation results show the proposed multi-task scheme improves the speech enhancement performance overall compared to the conventional single-task methods. And the joint direct mask and SPP estimation yields the best performance among all the considered techniques.

  • A Speech Enhancement Method Based on Multi-Task Bayesian Compressive Sensing

    Hanxu YOU  Zhixian MA  Wei LI  Jie ZHU  

     
    PAPER-Speech and Hearing

      Pubricized:
    2016/11/30
      Vol:
    E100-D No:3
      Page(s):
    556-563

    Traditional speech enhancement (SE) algorithms usually have fluctuant performance when they deal with different types of noisy speech signals. In this paper, we propose multi-task Bayesian compressive sensing based speech enhancement (MT-BCS-SE) algorithm to achieve not only comparable performance to but also more stable performance than traditional SE algorithms. MT-BCS-SE algorithm utilizes the dependence information among compressive sensing (CS) measurements and the sparsity of speech signals to perform SE. To obtain sufficient sparsity of speech signals, we adopt overcomplete dictionary to transform speech signals into sparse representations. K-SVD algorithm is employed to learn various overcomplete dictionaries. The influence of the overcomplete dictionary on MT-BCS-SE algorithm is evaluated through large numbers of experiments, so that the most suitable dictionary could be adopted by MT-BCS-SE algorithm for obtaining the best performance. Experiments were conducted on well-known NOIZEUS corpus to evaluate the performance of the proposed algorithm. In these cases of NOIZEUS corpus, MT-BCS-SE is shown that to be competitive or even superior to traditional SE algorithms, such as optimally-modified log-spectral amplitude (OMLSA), multi-band spectral subtraction (SSMul), and minimum mean square error (MMSE), in terms of signal-noise ratio (SNR), speech enhancement gain (SEG) and perceptual evaluation of speech quality (PESQ) and to have better stability than traditional SE algorithms.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.