Hsiu-Chih LEE Shyh-Cheng LEE Yi-Pin LIN Cheng-Kuang LIU
Based on the Si CMOS process, a low operating voltage and low power light emitting device is presented. It has a power transfer efficiency of 1 to 2 orders higher than previous reports and can be used as a high efficiency photodiode. Configurations using the same structure as both the light emitter and the optical receiver, and employing a simple modulation instrument is then proposed for applications in the chip-to-chip optical alignment and the signal transmission. Only single power supply is required in the emitter-receiver circuits and is compatible with other integrated circuits made by the CMOS process.
Yang LI Zhuang MIAO Ming HE Yafei ZHANG Hang LI
How to represent images into highly compact binary codes is a critical issue in many computer vision tasks. Existing deep hashing methods typically focus on designing loss function by using pairwise or triplet labels. However, these methods ignore the attention mechanism in the human visual system. In this letter, we propose a novel Deep Attention Residual Hashing (DARH) method, which directly learns hash codes based on a simple pointwise classification loss function. Compared to previous methods, our method does not need to generate all possible pairwise or triplet labels from the training dataset. Specifically, we develop a new type of attention layer which can learn human eye fixation and significantly improves the representation ability of hash codes. In addition, we embedded the attention layer into the residual network to simultaneously learn discriminative image features and hash codes in an end-to-end manner. Extensive experiments on standard benchmarks demonstrate that our method preserves the instance-level similarity and outperforms state-of-the-art deep hashing methods in the image retrieval application.
Lianqiang LI Kangbo SUN Jie ZHU
Knowledge distillation approaches can transfer information from a large network (teacher network) to a small network (student network) to compress and accelerate deep neural networks. This paper proposes a novel knowledge distillation approach called multi-knowledge distillation (MKD). MKD consists of two stages. In the first stage, it employs autoencoders to learn compact and precise representations of the feature maps (FM) from the teacher network and the student network, these representations can be treated as the essential of the FM, i.e., EFM. In the second stage, MKD utilizes multiple kinds of knowledge, i.e., the magnitude of individual sample's EFM and the similarity relationships among several samples' EFM to enhance the generalization ability of the student network. Compared with previous approaches that employ FM or the handcrafted features from FM, the EFM learned from autoencoders can be transferred more efficiently and reliably. Furthermore, the rich information provided by the multiple kinds of knowledge guarantees the student network to mimic the teacher network as closely as possible. Experimental results also show that MKD is superior to the-state-of-arts.
Yi ZHOU Yusheng JI Weidong XIANG Sateesh ADDEPALLI Aihuang GUO Fuqiang LIU
To accurately evaluate and manage future distributed wireless networks, it is indispensable to fully understand cooperative propagation channels. In this contribution, we propose cascaded multi-keyhole channel models for analyzing cooperative diversity wireless communications. The cascaded Wishart distribution is adopted to investigate the eigenvalue distribution of the multi-keyhole MIMO (multiple input multiple output) channel matrix, and the capacity performance is also presented for the wireless systems over such channels. A diversity order approximation method is proposed for better evaluating the eigenvalue and capacity distributions. The good match of analytical derivations and numerical simulations validates the proposed models and analysis methods. The proposed models can provide an important reference for the optimization and management of cooperative diversity wireless networks.
Meiming FU Qingyang LIU Jiayi LIU Xiang WANG Hongyan YANG
Network virtualization has become a promising paradigm for supporting diverse vertical services in Software Defined Networks (SDNs). Each vertical service is carried by a virtual network (VN), which normally has a chaining structure. In this way, a Service Function Chain (SFC) is composed by an ordered set of virtual network functions (VNFs) to provide tailored network services. Such new programmable flexibilities for future networks also bring new network management challenges: how to collect and analyze network measurement data, and further predict and diagnose the performance of SFCs? This is a fundamental problem for the management of SFCs, because the VNFs could be migrated in case of SFC performance degradation to avoid Service Level Agreement (SLA) violation. Despite the importance of the problem, SFC performance analysis has not attracted much research attention in the literature. In this current paper, enabled by a novel detailed network debugging technology, In-band Network Telemetry (INT), we propose a learning based framework for early SFC fault prediction and diagnosis. Based on the SFC traffic flow measurement data provided by INT, the framework firstly extracts SFC performance features. Then, Long Short-Term Memory (LSTM) networks are utilized to predict the upcoming values for these features in the next time slot. Finally, Support Vector Machine (SVM) is utilized as network fault classifier to predict possible SFC faults. We also discuss the practical utilization relevance of the proposed framework, and conduct a set of network emulations to validate the performance of the proposed framework.
In this letter we propose a practical sensing-based opportunistic spectrum sharing scheme for cognitive radio (CR) downlink MIMO systems. Multi-antennas are exploited at the secondary transmitter to opportunistically access the primary spectrum and effectively achieve a balance between secondary throughput maximization and mitigation of interference probably caused to primary radio link. We first introduce a brief secondary frame structure, in which a sensing phase is exploited to estimate the effective interference channel. According to the sensing result and taking the interference caused by the primary link into account, we propose an enhanced signal-to-leakage-and-noise ratio (SLNR)-based precoding scheme for the secondary transmitter. Compared to conventional schemes where perfect knowledge of the channels over which the CR transmitter interferes with the primary receiver (PR) is assumed, our proposed scheme shows its superiority and simulation results validate this.
Ying YANG Wenxiang DONG Weiqiang LIU Weidong WANG
Mobility load balancing (MLB) is a key technology for self-organization networks (SONs). In this paper, we explore the mobility load balancing problem and propose a unified cell specific offset adjusting algorithm (UCSOA) which more accurately adjusts the largely uneven load between neighboring cells and is easily implemented in practice with low computing complexity and signal overhead. Moreover, we evaluate the UCSOA algorithm in two different traffic conditions and prove that the UCSOA algorithm can get the lower call blocking rates and handover failure rates. Furthermore, the interdependency of the proposed UCSOA algorithm's performance and that of the inter-cell interference coordination (ICIC) algorithm is explored. A self-organization soft frequency reuse scheme is proposed. It demonstrates UCSOA algorithm and ICIC algorithm can obtain a positive effect for each other and improve the network performance in LTE system.
Jian LU Norihiro UEMI Gang LI Tohru IFUKUBE
In this paper, a digital processing method is described for modifying tone contrast that is defined as the greatest difference in frequencies between peaks and valleys of pitch curves in monosyllable utterances. Under quiet and noisy backgrounds, modified Mandarin tone words were presented to hearing-im- paired Chinese listeners with moderate to severe sensorineural hearing loss. The listeners were asked to identify four alternative monosyllable words which were distinguishable by tones 1, 2, 3 and 4 respectively. Employing this method, it was found that modified speech with enhanced tone contrast yielded moderate gains in the percentage of correct identification of the tones when compared to unmodified speech tones with only compression amplification. It was likewise found that reducing tone contrast generally reduced the degree of correct tone identification. These findings therefore offer support to the assertion that a hearing aid with tone modifications is indeed effective for hearing-impaired Chinese.
Zhigang LIU Qi WANG Yongdong TAN
The control and diagnosis networks in Maglev Train are the most important parts. In the paper, the control and diagnosis network structures are discussed, and the disadvantages of them are described and analyzed. In virtue of role automation decentralized system (RoADS), some basic ideas of RoADS are applied in new network. The structure, component parts and application of new network are proposed, designed and discussed in detail. The comparison results show that new network not only embodies some RoADS' ideas but also better meets the demands of control and diagnosis networks in Maglev Train.
Wenhao JIANG Wenjiang FENG Shaoxiang GU Yuxiang LIU Zhiming WANG
In this paper, we study the power allocation problem in a relay assisted multi-band underlay cognitive radio network. Such a network allows unlicensed users (secondary users) to access the spectrum bands under a transmission power constraint. Due to the concave increasing property of logarithm function, it is not always wise for secondary users to expend all the transmission power in one band if their aim is to maximize achievable data rate. In particular, we study a scenario where two secondary users and a half-duplexing relay exist with two available bands. The two users choose different bands for direct data transmission and use the other band for relay transmission. By properly allocating the power on two bands, each user may be able to increase its total achievable data rate while satisfying the power constraint. We formulate the power allocation problem as a non-cooperative game and investigate its Nash equilibria. We prove the power allocation game is a supermodular game and that Nash equilibria exist. We further find the best response function of users and propose a best response update algorithm to solve the corresponding dynamic game. Numerical results show the overall performance in terms of achievable rates is improved through our proposed transmission scheme and power allocation algorithm. Our proposed algorithm also shows satisfactory performance in terms of convergence speed.
Wei CHEN Gang LIU Jun GUO Shinichiro OMACHI Masako OMACHI Yujing GUO
In speech recognition, confidence annotation adopts a single confidence feature or a combination of different features for classification. These confidence features are always extracted from decoding information. However, it is proved that about 30% of knowledge of human speech understanding is mainly derived from high-level information. Thus, how to extract a high-level confidence feature statistically independent of decoding information is worth researching in speech recognition. In this paper, a novel confidence feature extraction algorithm based on latent topic similarity is proposed. Each word topic distribution and context topic distribution in one recognition result is firstly obtained using the latent Dirichlet allocation (LDA) topic model, and then, the proposed word confidence feature is extracted by determining the similarities between these two topic distributions. The experiments show that the proposed feature increases the number of information sources of confidence features with a good information complementary effect and can effectively improve the performance of confidence annotation combined with confidence features from decoding information.
Thinning and line extraction of binary images not only reduces data storage amount, automatically creates the adjacency and relativity between line and points but also provides applications for automatic inspection systems, pattern recognition systems and vectorization. Based on the features of construction drawings, new thinning and line extraction algorithms were proposed in this study. The experimental results showed that the proposed method has a higher reliability and produces better quality than the various existing methods.
Qingqing ZHANG Jielin PAN Yang LIN Jian SHAO Yonghong YAN
In recent decades, there has been a great deal of research into the problem of bilingual speech recognition - to develop a recognizer that can handle inter- and intra-sentential language switching between two languages. This paper presents our recent work on the development of a grammar-constrained, Mandarin-English bilingual Speech Recognition System (MESRS) for real world music retrieval. Two of the main difficult issues in handling the bilingual speech recognition systems for real world applications are tackled in this paper. One is to balance the performance and the complexity of the bilingual speech recognition system; the other is to effectively deal with the matrix language accents in embedded language. In order to process the intra-sentential language switching and reduce the amount of data required to robustly estimate statistical models, a compact single set of bilingual acoustic models derived by phone set merging and clustering is developed instead of using two separate monolingual models for each language. In our study, a novel Two-pass phone clustering method based on Confusion Matrix (TCM) is presented and compared with the log-likelihood measure method. Experiments testify that TCM can achieve better performance. Since potential system users' native language is Mandarin which is regarded as a matrix language in our application, their pronunciations of English as the embedded language usually contain Mandarin accents. In order to deal with the matrix language accents in embedded language, different non-native adaptation approaches are investigated. Experiments show that model retraining method outperforms the other common adaptation methods such as Maximum A Posteriori (MAP). With the effective incorporation of approaches on phone clustering and non-native adaptation, the Phrase Error Rate (PER) of MESRS for English utterances was reduced by 24.47% relatively compared to the baseline monolingual English system while the PER on Mandarin utterances was comparable to that of the baseline monolingual Mandarin system. The performance for bilingual utterances achieved 22.37% relative PER reduction.
Shinfeng D. LIN Chien-Chuang LIN Shih-Chieh SHIE
MPEG-4 emphasizes on coding efficiency and allows for content-based access and transmission of arbitrary shaped object. It addresses the encoding of video object using shape coding, motion estimation, and texture coding for interactivity, high compression ratio, and scalability. In this letter, an advanced object-adaptive vertex-based shape coding method is proposed for encoding the shape of video objects. This method exploits octant-based representation to represent the relation of adjacent vertices and that relation can be used to improve coding efficiency. Simulation results demonstrate that the proposed method may reduce more bits for closely spaced vertices.
Jun WANG Desheng WANG Yingzhuang LIU
In this paper, we investigate the problem of maximizing the weighted sum outage rate in multiuser multiple-input single-output (MISO) interference channels, where the transmitters have no knowledge of the exact values of channel coefficients, only the statistical information. Unfortunately, this problem is nonconvex and very difficult to deal with. We propose a new, provably convergent iterative algorithm where in each iteration, the original problem is approximated as second-order cone programming (SOCP) by introducing slack variables and using convex approximation. Simulation results show that the proposed SOCP algorithm converges in a few steps, and yields a better performance gain with a lower computational complexity than existing algorithms.
Yulong XU Zhuang MIAO Jiabao WANG Yang LI Hang LI Yafei ZHANG Weiguang XU Zhisong PAN
Correlation filter-based approaches achieve competitive results in visual tracking, but the traditional correlation tracking methods failed in mining the color information of the videos. To address this issue, we propose a novel tracker combined with color features in a correlation filter framework, which extracts not only gray but also color information as the feature maps to compute the maximum response location via multi-channel correlation filters. In particular, we modify the label function of the conventional classifier to improve positioning accuracy and employ a discriminative correlation filter to handle scale variations. Experiments are performed on 35 challenging benchmark color sequences. And the results clearly show that our method outperforms state-of-the-art tracking approaches while operating in real-time.
Stance prediction on social media aims to infer the stances of users towards a specific topic or event, which are not expressed explicitly. It is of great significance for public opinion analysis to extract and determine users' stances using user-generated content on social media. Existing research makes use of various signals, ranging from text content to online network connections of users on these platforms. However, it lacks joint modeling of the heterogeneous information for stance prediction. In this paper, we propose a self-supervised heterogeneous graph contrastive learning framework for stance prediction in online debate forums. Firstly, we perform data augmentation on the original heterogeneous information network to generate an augmented view. The original view and augmented view are learned from a meta-path based graph encoder respectively. Then, the contrastive learning among the two views is conducted to obtain high-quality representations of users and issues. Finally, the stance prediction is accomplished by matrix factorization between users and issues. The experimental results on an online debate forum dataset show that our model outperforms other competitive baseline methods significantly.
Bei HE Guijin WANG Xinggang LIN Chenbo SHI Chunxiao LIU
This paper proposes a high-accuracy sub-pixel registration framework based on phase correlation for noisy images. First we introduce a denoising module, where the edge-preserving filter is adopted. This strategy not only filters off the noise but also preserves most of the original image signal. A confidence-weighted optimization module is then proposed to fit the linear phase plane discriminately and to achieve sub-pixel shifts. Experiments demonstrate the effectiveness of the combination of our modules and improvements of the accuracy and robustness against noise compared to other sub-pixel phase correlation methods in the Fourier domain.
Ning LI Yan GUO Qi-Hui WU Jin-Long WANG Xue-Liang LIU
A method based on covariance differencing for a uniform linear array is proposed to counter the problem of direction finding of narrowband signals under a colored noise environment. By assuming a Hermitian symmetric Toeplitz matrix for the unknown noise, the array covariance matrix is transformed into a centrohermitian matrix in an appropriate way allowing the noise component to be eliminated. The modified covariance differencing algorithm provides accurate direction of arrival (DOA) estimation when the incident signals are uncorrelated or just two of the signals are coherent. If there are more than two coherent signals, the presented method combined with spatial smoothing (SS) scheme can be used. Unlike the original method, the new approach dispenses the need to determine the true angles and the phantom angles. Simulation results demonstrate the effectiveness of presented algorithm.
Zhong ZHANG Hong WANG Shuang LIU Liang ZHENG
Feature representation, as a key component of scene character recognition, has been widely studied and a number of effective methods have been proposed. In this letter, we propose the novel method named coupled spatial learning (CSL) for scene character representation. Different from the existing methods, the proposed CSL method simultaneously discover the spatial context in both the dictionary learning and coding stages. Concretely, we propose to build the spatial dictionary by preserving the corresponding positions of the codewords. Correspondingly, we introduce the spatial coding strategy which utilizes the spatiality regularization to consider the relationship among features in the Euclidean space. Based on the spatial dictionary and spatial coding, the spatial context can be effectively integrated in the visual representations. We verify our method on two widely used databases (ICDAR2003 and Chars74k), and the experimental results demonstrate that our method achieves competitive results compared with the state-of-the-art methods. In addition, we further validate the proposed CSL method on the Caltech-101 database for image classification task, and the experimental results show the good generalization ability of the proposed CSL.