1-16hit |
Shaojie ZHU Lei ZHANG Bailong LIU Shumin CUI Changxing SHAO Yun LI
Multi-modal semantic trajectory prediction has become a new challenge due to the rapid growth of multi-modal semantic trajectories with text message. Traditional RNN trajectory prediction methods have the following problems to process multi-modal semantic trajectory. The distribution of multi-modal trajectory samples shifts gradually with training. It leads to difficult convergency and long training time. Moreover, each modal feature shifts in different directions, which produces multiple distributions of dataset. To solve the above problems, MNERM (Mode Normalization Enhanced Recurrent Model) for multi-modal semantic trajectory is proposed. MNERM embeds multiple modal features together and combines the LSTM network to capture long-term dependency of trajectory. In addition, it designs Mode Normalization mechanism to normalize samples with multiple means and variances, and each distribution normalized falls into the action area of the activation function, so as to improve the prediction efficiency while improving greatly the training speed. Experiments on real dataset show that, compared with SERM, MNERM reduces the sensitivity of learning rate, improves the training speed by 9.120 times, increases HR@1 by 0.03, and reduces the ADE by 120 meters.
Location and feature representation of object's parts play key roles in fine-grained visual recognition. To promote the final recognition accuracy without any bounding boxes/part annotations, many studies adopt object location networks to propose bounding boxes/part annotations with only category labels, and then crop the images into partial images to help the classification network make the final decision. In our work, to propose more informative partial images and effectively extract discriminative features from the original and partial images, we propose a two-stage approach that can fuse the original features and partial features by evaluating and ranking the information of partial images. Experimental results show that our proposed approach achieves excellent performance on two benchmark datasets, which demonstrates its effectiveness.
Hanxu YOU Lianqiang LI Jie ZHU
The compressive sensing (CS) theory has been widely used in synthetic aperture radar (SAR) imaging for its ability to reconstruct image from an extremely small set of measurements than what is generally considered necessary. Because block-based CS approaches in SAR imaging always cause block boundaries between two adjacent blocks, resulting in namely the block artefacts. In this paper, we propose a weighted overlapped block-based compressive sensing (WOBCS) method to reduce the block artefacts and accomplish SAR imaging. It has two main characteristics: 1) the strategy of sensing small and recovering big and 2) adaptive weighting technique among overlapped blocks. This proposed method is implemented by the well-known CS recovery schemes like orthogonal matching pursuit (OMP) and BCS-SPL. Promising results are demonstrated through several experiments.
Hanxu YOU Wei LI Lianqiang LI Jie ZHU
A text-dependent i-vector extraction scheme and a lexicon-based binary vector (L-vector) representation are proposed to improve the performance of text-dependent speaker verification. I-vector and L-vector are used to represent the utterances for enrollment and test. An improved cosine distance kernel is constructed by combining i-vector and L-vector together and is used to distinguish both speaker identity and lexical (or text) diversity with back-end support vector machine (SVM). Experiments are conducted on RSR 2015 Corpus part 1 and part 2, the results indicate that at most 30% improvement can be obtained compared with traditional i-vector baseline.
This letter proposes a novel speech enhancement system based on the ‘L’ shaped triple-microphone. The modified coherence-based algorithm and the first-order differential beamforming are combined to filter the spatial distributed noise. The experimental results reveal that the proposed algorithm achieves significant performance in spatial filtering under different noise scenarios.
Lianqiang LI Kangbo SUN Jie ZHU
Knowledge distillation approaches can transfer information from a large network (teacher network) to a small network (student network) to compress and accelerate deep neural networks. This paper proposes a novel knowledge distillation approach called multi-knowledge distillation (MKD). MKD consists of two stages. In the first stage, it employs autoencoders to learn compact and precise representations of the feature maps (FM) from the teacher network and the student network, these representations can be treated as the essential of the FM, i.e., EFM. In the second stage, MKD utilizes multiple kinds of knowledge, i.e., the magnitude of individual sample's EFM and the similarity relationships among several samples' EFM to enhance the generalization ability of the student network. Compared with previous approaches that employ FM or the handcrafted features from FM, the EFM learned from autoencoders can be transferred more efficiently and reliably. Furthermore, the rich information provided by the multiple kinds of knowledge guarantees the student network to mimic the teacher network as closely as possible. Experimental results also show that MKD is superior to the-state-of-arts.
Linjie ZHU Bin WU Zhiwei WEI Yu TANG
In this letter, a novel frame aggregation scheduler is proposed to solve the head-of-line blocking problem for real-time user datagram protocol (UDP) traffic in error-prone and aggregation-enabled wireless local area networks (WLANs). The key to the proposed scheduler is to break the restriction of in-order delivery over the WLAN. The simulation results show that the proposed scheduler can achieve high UDP goodput and low delay compared to the conventional scheduler.
Jie ZHU Yuan ZONG Hongli CHANG Li ZHAO Chuangao TANG
Unsupervised domain adaptation (DA) is a challenging machine learning problem since the labeled training (source) and unlabeled testing (target) sets belong to different domains and then have different feature distributions, which has recently attracted wide attention in micro-expression recognition (MER). Although some well-performing unsupervised DA methods have been proposed, these methods cannot well solve the problem of unsupervised DA in MER, a. k. a., cross-domain MER. To deal with such a challenging problem, in this letter we propose a novel unsupervised DA method called Joint Patch weighting and Moment Matching (JPMM). JPMM bridges the source and target micro-expression feature sets by minimizing their probability distribution divergence with a multi-order moment matching operation. Meanwhile, it takes advantage of the contributive facial patches by the weight learning such that a domain-invariant feature representation involving micro-expression distinguishable information can be learned. Finally, we carry out extensive experiments to evaluate the proposed JPMM method is superior to recent state-of-the-art unsupervised DA methods in dealing with cross-domain MER.
Lianqiang LI Jie ZHU Ming-Ting SUN
Convolutional Neural Networks (CNNs) usually have millions or even billions of parameters, which make them hard to be deployed into mobile devices. In this work, we present a novel filter-level pruning method to alleviate this issue. More concretely, we first construct an undirected fully connected graph to represent a pre-trained CNN model. Then, we employ the spectral clustering algorithm to divide the graph into some subgraphs, which is equivalent to clustering the similar filters of the CNN into the same groups. After gaining the grouping relationships among the filters, we finally keep one filter for one group and retrain the pruned model. Compared with previous pruning methods that identify the redundant filters by heuristic ways, the proposed method can select the pruning candidates more reasonably and precisely. Experimental results also show that our proposed pruning method has significant improvements over the state-of-the-arts.
Junjie ZHU Wai Choong WONG Liyanage C. DESILVA
In this paper, the performance of OFDM-CDMA system in mobile communication channels with different detection algorithms is evaluated in terms of the probability of bit error as a function of processing gain. Both the frequency diversity and multi-user interference (MUI) are taken into account. The simulation results show that MUI is the dominant factor that affects system performance, and MMSE outperforms the other algorithms. Based on the simulation results, a modified system scheme is proposed which performs better than the conventional method.
Local discriminative regions play important roles in fine-grained image analysis tasks. How to locate local discriminative regions with only category label and learn discriminative representation from these regions have been hot spots. In our work, we propose Searching Discriminative Regions (SDR) and Learning Discriminative Regions (LDR) method to search and learn local discriminative regions in images. The SDR method adopts attention mechanism to iteratively search for high-response regions in images, and uses this as a clue to locate local discriminative regions. Moreover, the LDR method is proposed to learn compact within category and sparse between categories representation from the raw image and local images. Experimental results show that our proposed approach achieves excellent performance in both fine-grained image retrieval and classification tasks, which demonstrates its effectiveness.
Zhixian MA Jie ZHU Weitian LI Haiguang XU
Detection of cavities in X-ray astronomical images has become a field of interest, since the flourishing studies on black holes and the Active Galactic Nuclei (AGN). In this paper, an approach is proposed to detect cavities in X-ray astronomical images using our newly designed Granular Convolutional Neural Network (GCNN) based classifiers. The raw data are firstly preprocessed to obtain images of the observed objects, i.e., galaxies or galaxy clusters. In each image, pixels are classified into three categories, (1) the faint backgrounds (BKG), (2) the cavity regions (CAV), and (3) the bright central gas regions (CNT). And the sample sets are then generated by dividing large images into subimages with a window size according to the cavities' scale. Since the number of BKG samples are far more than the other types, to achieve balanced training sets, samples from the major class are split into subsets, i.e., granule. Then a group of three-convolutional-layer granular CNN networks without subsampling layers are designed as the classifiers, and trained with the labeled granular sample sets. Finally, the trained GCNN classifiers are applied to new observations, so as to estimate the cavity regions with a voting strategy and locate them with elliptical profiles on the raw observation images. Experiments and applications of our approach are demonstrated on 40 X-ray astronomical observations retrieved from chandra Data Archive (CDA). Comparisons among our approach, the β-model fitting and the Unsharp Masking (UM) methods were also performed, which prove our approach was more accurate and robust.
This paper introduces a filter level pruning method based on similar feature extraction for compressing and accelerating the convolutional neural networks by k-means++ algorithm. In contrast to other pruning methods, the proposed method would analyze the similarities in recognizing features among filters rather than evaluate the importance of filters to prune the redundant ones. This strategy would be more reasonable and effective. Furthermore, our method does not result in unstructured network. As a result, it needs not extra sparse representation and could be efficiently supported by any off-the-shelf deep learning libraries. Experimental results show that our filter pruning method could reduce the number of parameters and the amount of computational costs in Lenet-5 by a factor of 17.9× with only 0.3% accuracy loss.
Linjie ZHU Liang GU Rongliang CHEN
A novel retransmission scheme, considering both transmission rate and frame error rate, is proposed to alleviate the inefficiencies caused by head-of-line blocking and null padding problems during retransmission in IEEE 802.11be synchronous multi-link wireless local area networks. Simulation results show that the proposed scheme improves throughput by up to 200% over the legacy scheme by reallocating lost subframes and adding effective duplicate subframes to multiple links.
To cope with complicated interference scenarios in realistic acoustic environment, supervised deep neural networks (DNNs) are investigated to estimate different user-defined targets. Such techniques can be broadly categorized into magnitude estimation and time-frequency mask estimation techniques. Further, the mask such as the Wiener gain can be estimated directly or derived by the estimated interference power spectral density (PSD) or the estimated signal-to-interference ratio (SIR). In this paper, we propose to incorporate the multi-task learning in DNN-based single-channel speech enhancement by using the speech presence probability (SPP) as a secondary target to assist the target estimation in the main task. The domain-specific information is shared between two tasks to learn a more generalizable representation. Since the performance of multi-task network is sensitive to the weight parameters of loss function, the homoscedastic uncertainty is introduced to adaptively learn the weights, which is proven to outperform the fixed weighting method. Simulation results show the proposed multi-task scheme improves the speech enhancement performance overall compared to the conventional single-task methods. And the joint direct mask and SPP estimation yields the best performance among all the considered techniques.
Hanxu YOU Zhixian MA Wei LI Jie ZHU
Traditional speech enhancement (SE) algorithms usually have fluctuant performance when they deal with different types of noisy speech signals. In this paper, we propose multi-task Bayesian compressive sensing based speech enhancement (MT-BCS-SE) algorithm to achieve not only comparable performance to but also more stable performance than traditional SE algorithms. MT-BCS-SE algorithm utilizes the dependence information among compressive sensing (CS) measurements and the sparsity of speech signals to perform SE. To obtain sufficient sparsity of speech signals, we adopt overcomplete dictionary to transform speech signals into sparse representations. K-SVD algorithm is employed to learn various overcomplete dictionaries. The influence of the overcomplete dictionary on MT-BCS-SE algorithm is evaluated through large numbers of experiments, so that the most suitable dictionary could be adopted by MT-BCS-SE algorithm for obtaining the best performance. Experiments were conducted on well-known NOIZEUS corpus to evaluate the performance of the proposed algorithm. In these cases of NOIZEUS corpus, MT-BCS-SE is shown that to be competitive or even superior to traditional SE algorithms, such as optimally-modified log-spectral amplitude (OMLSA), multi-band spectral subtraction (SSMul), and minimum mean square error (MMSE), in terms of signal-noise ratio (SNR), speech enhancement gain (SEG) and perceptual evaluation of speech quality (PESQ) and to have better stability than traditional SE algorithms.