Xu CHENG Nijun LI Tongchi ZHOU Zhenyang WU Lin ZHOU
In this paper, we propose an efficient tracking method that is formulated as a multi-task reverse sparse representation problem. The proposed method learns the representation of all tasks jointly using a customized APG method within several iterations. In order to reduce the computational complexity, the proposed tracking algorithm starts from a feature selection scheme that chooses suitable number of features from the object and background in the dynamic environment. Based on the selected feature, multiple templates are constructed with a few candidates. The candidate that corresponds to the highest similarity to the object templates is considered as the final tracking result. In addition, we present a template update scheme to capture the appearance changes of the object. At the same time, we keep several earlier templates in the positive template set unchanged to alleviate the drifting problem. Both qualitative and quantitative evaluations demonstrate that the proposed tracking algorithm performs favorably against the state-of-the-art methods.
Gee-Sern HSU Hsiao-Chia PENG Ding-Yu LIN Chyi-Yeu LIN
Face recognition across pose is generally tackled by either 2D based or 3D based approaches. The 2D-based often require a training set from which the cross-pose multi-view relationship can be learned and applied for recognition. The 3D based are mostly composed of 3D surface reconstruction of each gallery face, synthesis of 2D images of novel views using the reconstructed model, and match of the synthesized images to the probes. The depth information provides crucial information for arbitrary poses but more methods are yet to be developed. Extended from a latest face reconstruction method using a single 3D reference model and a frontal registered face, this study focuses on using the reconstructed 3D face for recognition. The recognition performance varies with poses, the closer to the front, the better. Several ways to improve the performance are attempted, including different numbers of fiducial points for alignment, multiple reference models considered in the reconstruction phase, and both frontal and profile poses available in the gallery. These attempts make this approach competitive to the state-of-the-art methods.
Aram KIM Junhee PARK Byung-Uk LEE
In a patch-based super-resolution algorithm, a low-resolution patch is influenced by surrounding patches due to blurring. We propose to remove this boundary effect by subtracting the blur from the surrounding high-resolution patches, which enables more accurate sparse representation. We demonstrate improved performance through experimentation. The proposed algorithm can be applied to most of patch-based super-resolution algorithms to achieve additional improvement.
Huaxin XIAO Yu LIU Wei WANG Maojun ZHANG
In consideration of the image noise captured by photoelectric cameras at nighttime, a robust motion detection algorithm based on sparse representation is proposed in this study. A universal dictionary for arbitrary scenes is presented. Realistic and synthetic experiments demonstrate the robustness of the proposed approach.
Ryo AIHARA Ryoichi TAKASHIMA Tetsuya TAKIGUCHI Yasuo ARIKI
This paper presents a voice conversion (VC) technique for noisy environments based on a sparse representation of speech. Sparse representation-based VC using Non-negative matrix factorization (NMF) is employed for noise-added spectral conversion between different speakers. In our previous exemplar-based VC method, source exemplars and target exemplars are extracted from parallel training data, having the same texts uttered by the source and target speakers. The input source signal is represented using the source exemplars and their weights. Then, the converted speech is constructed from the target exemplars and the weights related to the source exemplars. However, this exemplar-based approach needs to hold all training exemplars (frames), and it requires high computation times to obtain the weights of the source exemplars. In this paper, we propose a framework to train the basis matrices of the source and target exemplars so that they have a common weight matrix. By using the basis matrices instead of the exemplars, the VC is performed with lower computation times than with the exemplar-based method. The effectiveness of this method was confirmed by comparing its effectiveness (in speaker conversion experiments using noise-added speech data) with that of an exemplar-based method and a conventional Gaussian mixture model (GMM)-based method.
Lijian ZHOU Wanquan LIU Zhe-Ming LU Tingyuan NIE
In this Letter, a new face recognition approach based on curvelets and local ternary patterns (LTP) is proposed. First, we observe that the curvelet transform is a new anisotropic multi-resolution transform and can efficiently represent edge discontinuities in face images, and that the LTP operator is one of the best texture descriptors in terms of characterizing face image details. This motivated us to decompose the image using the curvelet transform, and extract the features in different frequency bands. As revealed by curvelet transform properties, the highest frequency band information represents the noisy information, so we directly drop it from feature selection. The lowest frequency band mainly contains coarse image information, and thus we deal with it more precisely to extract features as the face's details using LTP. The remaining frequency bands mainly represent edge information, and we normalize them for achieving explicit structure information. Then, all the extracted features are put together as the elementary feature set. With these features, we can reduce the features' dimension using PCA, and then use the sparse sensing technique for face recognition. Experiments on the Yale database, the extended Yale B database, and the CMU PIE database show the effectiveness of the proposed methods.
Regularized forward selection is viewed as a method for obtaining a sparse representation in a nonparametric regression problem. In regularized forward selection, regression output is represented by a weighted sum of several significant basis functions that are selected from among a large number of candidates by using a greedy training procedure in terms of a regularized cost function and applying an appropriate model selection method. In this paper, we propose a model selection method in regularized forward selection. For the purpose, we focus on the reduction of a cost function, which is brought by appending a new basis function in a greedy training procedure. We first clarify a bias and variance decomposition of the cost reduction and then derive a probabilistic upper bound for the variance of the cost reduction under some conditions. The derived upper bound reflects an essential feature of the greedy training procedure; i.e., it selects a basis function which maximally reduces the cost function. We then propose a thresholding method for determining significant basis functions by applying the derived upper bound as a threshold level and effectively combining it with the leave-one-out cross validation method. Several numerical experiments show that generalization performance of the proposed method is comparable to that of the other methods while the number of basis functions selected by the proposed method is greatly smaller than by the other methods. We can therefore say that the proposed method is able to yield a sparse representation while keeping a relatively good generalization performance. Moreover, our method has an advantage that it is free from a selection of a regularization parameter.
Koji INOUE Kohei ISECHI Hironobu SAITO Yoshimitsu KUROKI
This paper proposes an inter-prediction method for the upcoming video coding standard named HEVC (High Efficiency Video Coding). The HEVC offers an inter-prediction framework called local intensity compensation which represents a current block by a linear combination of some reference blocks. The proposed method calculates weight coefficients of the linear combination by using sparse representation. Experimental results show that the proposed method increases prediction accuracy in comparison with other methods.
For face recognition with a single training image per person, Collaborative Representation based Classification (CRC) has significantly less complexity than Extended Sparse Representation based Classification (ESRC). However, CRC gets lower recognition rates than ESRC. In order to combine the advantages of CRC and ESRC, we propose Extended Collaborative Representation based Classification (ECRC) for face recognition with a single training image per person. ECRC constructs an auxiliary intraclass variant dictionary to represent the possible variation between the testing and training images. Experimental results show that ECRC outperforms the compared methods in terms of both high recognition rates and low computation complexity.
Hai YANG Yunfei XU Qinwei ZHAO Ruohua ZHOU Yonghong YAN
Sparse representation has been studied within the field of signal processing as a means of providing a compact form of signal representation. This paper introduces a sparse representation based framework named Sparse Probabilistic Linear Discriminant Analysis in speaker recognition. In this latent variable model, probabilistic linear discriminant analysis is modified to obtain an algorithm for learning overcomplete sparse representations by replacing the Gaussian prior on the factors with Laplace prior that encourages sparseness. For a given speaker signal, the dictionary obtained from this model has good representational power while supporting optimal discrimination of the classes. An expectation-maximization algorithm is derived to train the model with a variational approximation to a range of heavy-tailed distributions whose limit is the Laplace. The variational approximation is also used to compute the likelihood ratio score of all trials of speakers. This approach performed well on the core-extended conditions of the NIST 2010 Speaker Recognition Evaluation, and is competitive compared to the Gaussian Probabilistic Linear Discriminant Analysis, in terms of normalized Decision Cost Function and Equal Error Rate.
In the image classification applications, the test sample with multiple man-handcrafted descriptions can be sparsely represented by a few training subjects. Our paper is motivated by the success of multi-task joint sparse representation (MTJSR), and considers that the different modalities of features not only have the constraint of joint sparsity across different tasks, but also have the constraint of local manifold structure across different features. We introduce the constraint of local manifold structure into the MTJSR framework, and propose the Locality-constrained multi-task joint sparse representation method (LC-MTJSR). During the optimization of the formulated objective, the stochastic gradient descent method is used to guarantee fast convergence rate, which is essential for large-scale image categorization. Experiments on several challenging object classification datasets show that our proposed algorithm is better than the MTJSR, and is competitive with the state-of-the-art multiple kernel learning methods.
Hyunduk KIM Sang-Heon LEE Myoung-Kyu SOHN Dong-Ju KIM Byungmin KIM
Super resolution (SR) reconstruction is the process of fusing a sequence of low-resolution images into one high-resolution image. Many researchers have introduced various SR reconstruction methods. However, these traditional methods are limited in the extent to which they allow recovery of high-frequency information. Moreover, due to the self-similarity of face images, most of the facial SR algorithms are machine learning based. In this paper, we introduce a facial SR algorithm that combines learning-based and regularized SR image reconstruction algorithms. Our conception involves two main ideas. First, we employ separated frequency components to reconstruct high-resolution images. In addition, we separate the region of the training face image. These approaches can help to recover high-frequency information. In our experiments, we demonstrate the effectiveness of these ideas.
This paper presents a method for learning an overcomplete, nonnegative dictionary and for obtaining the corresponding coefficients so that a group of nonnegative signals can be sparsely represented by them. This is accomplished by posing the learning as a problem of nonnegative matrix factorization (NMF) with maximization of the incoherence of the dictionary and of the sparsity of coefficients. By incorporating a dictionary-incoherence penalty and a sparsity penalty in the NMF formulation and then adopting a hierarchically alternating optimization strategy, we show that the problem can be cast as two sequential optimal problems of quadratic functions. Each optimal problem can be solved explicitly so that the whole problem can be efficiently solved, which leads to the proposed algorithm, i.e., sparse hierarchical alternating least squares (SHALS). The SHALS algorithm is structured by iteratively solving the two optimal problems, corresponding to the learning process of the dictionary and to the estimating process of the coefficients for reconstructing the signals. Numerical experiments demonstrate that the new algorithm performs better than the nonnegative K-SVD (NN-KSVD) algorithm and several other famous algorithms, and its computational cost is remarkably lower than the compared algorithms.
This paper presents a novel scale-rotation invariant generative model (SRIGM) and a kernel sparse representation classification (KSRC) method for scene categorization. Recently the sparse representation classification (SRC) methods have been highly successful in a number of image processing tasks. Despite its popularity, the SRC framework lucks the abilities to handle multi-class data with high inter-class similarity or high intra-class variation. The kernel random coordinate descent (KRCD) algorithm is proposed for
Ruicong ZHI Qiuqi RUAN Zhifei WANG
A facial components based facial expression recognition algorithm with sparse representation classifier is proposed. Sparse representation classifier is based on sparse representation and computed by L1-norm minimization problem on facial components. The features of “important” training samples are selected to represent test sample. Furthermore, fuzzy integral is utilized to fuse individual classifiers for facial components. Experiments for frontal views and partially occluded facial images show that this method is efficient and robust to partial occlusion on facial images.
Adel ZAHEDI Mohammad-Hossein KAHAEI
A flexible and computationally efficient method for spectral analysis of sinusoidal signals using the Basis Pursuit De-Noising (BPDN) is proposed. This method estimates a slotted Auto-Correlation Function (ACF) and computes the spectrum as the sparse representation of the ACF in a dictionary of cosine functions. Simulation results illustrate flexibility and effectiveness of the proposed method.
Sparse representation based classification (SRC) has emerged as a new paradigm for solving face recognition problems. Further research found that the main limitation of SRC is the assumption of pixel-accurate alignment between the test image and the training set. A. Wagner used a series of linear programs that iteratively minimize the sparsity of the registration error. In this paper, we propose another face registration method called three-point positioning method. Experiments show that our proposed method achieves better performance.
Makoto NAKASHIZUKA Hiroyuki OKUMURA Youji IIGUNI
In this paper, we propose a method for supervised single-channel speech separation through sparse decomposition using periodic signal models. The proposed separation method employs sparse decomposition, which decomposes a signal into a set of periodic signals under a sparsity penalty. In order to achieve separation through sparse decomposition, the decomposed periodic signals have to be assigned to the corresponding sources. For the assignment of the periodic signal, we introduce clustering using a K-means algorithm to group the decomposed periodic signals into as many clusters as the number of speakers. After the clustering, each cluster is assigned to its corresponding speaker using preliminarily learnt codebooks. Through separation experiments, we compare our method with MaxVQ, which performs separation on the frequency spectrum domain. The experimental results in terms of signal-to-distortion ratio show that the proposed sparse decomposition method is comparable to the frequency domain approach and has less computational costs for assignment of speech components.
Masaaki NAGAHARA Takahiro MATSUDA Kazunori HAYASHI
In remote control, efficient compression or representation of control signals is essential to send them through rate-limited channels. For this purpose, we propose an approach of sparse control signal representation using the compressive sampling technique. The problem of obtaining sparse representation is formulated by cardinality-constrained
This study proposes a method to decompose a signal into a set of periodic signals. The proposed decomposition method imposes a penalty on the resultant periodic subsignals in order to improve the sparsity of decomposition and avoid the overestimation of periods. This penalty is defined as the weighted sum of the l2 norms of the resultant periodic subsignals. This decomposition is approximated by an unconstrained minimization problem. In order to solve this problem, a relaxation algorithm is applied. In the experiments, decomposition results are presented to demonstrate the simultaneous detection of periods and waveforms hidden in signal mixtures.