1-5hit |
Topic modeling as a well-known method is widely applied for not only text data mining but also multimedia data analysis such as video data analysis. However, existing models cannot adequately handle time dependency and multimodal data modeling for video data that generally contain image information and speech information. In this paper, we therefore propose a novel topic model, sequential symmetric correspondence hierarchical Dirichlet processes (Seq-Sym-cHDP) extended from sequential conditionally independent hierarchical Dirichlet processes (Seq-CI-HDP) and sequential correspondence hierarchical Dirichlet processes (Seq-cHDP), to improve the multimodal data modeling mechanism via controlling the pivot assignments with a latent variable. An inference scheme for Seq-Sym-cHDP based on a posterior representation sampler is also developed in this work. We finally demonstrate that our model outperforms other baseline models via experiments.
Fei XU Pinxin LIU Jing XU Jianfeng YANG S.M. YIU
Bloom Filter is a bit array (a one-dimensional storage structure) that provides a compact representation for a set of data, which can be used to answer the membership query in an efficient manner with a small number of false positives. It has a lot of applications in many areas. In this paper, we extend the design of Bloom Filter by using a multi-dimensional matrix to replace the one-dimensional structure with three different implementations, namely OFFF, WOFF, FFF. We refer the extended Bloom Filter as Feng Filter. We show the false positive rates of our method. We compare the false positive rate of OFFF with that of the traditional one-dimensional Bloom Filter and show that under certain condition, OFFF has a lower false positive rate. Traditional Bloom Filter can be regarded as a special case of our Feng Filter.
Due to universal frequency reuse, cell edge users in HSDPA suffer from serious inter-cell interference (ICI). In this letter we present a coordinated scheme for HSDPA which can mitigate ICI by interference avoidance in spatial domain. A system level simulation shows that our scheme can effectively improve the performance of the cell edge users.
Video data mining based on topic models as an emerging technique recently has become a very popular research topic. In this paper, we present a novel topic model named sequential correspondence hierarchical Dirichlet processes (Seq-cHDP) to learn the hidden structure within video data. The Seq-cHDP model can be deemed as an extended hierarchical Dirichlet processes (HDP) model containing two important features: one is the time-dependency mechanism that connects neighboring video frames on the basis of a time dependent Markovian assumption, and the other is the correspondence mechanism that provides a solution for dealing with the multimodal data such as the mixture of visual words and speech words extracted from video files. A cascaded Gibbs sampling method is applied for implementing the inference task of Seq-cHDP. We present a comprehensive evaluation for Seq-cHDP through experimentation and finally demonstrate that Seq-cHDP outperforms other baseline models.
Hai YANG Yunfei XU Qinwei ZHAO Ruohua ZHOU Yonghong YAN
Sparse representation has been studied within the field of signal processing as a means of providing a compact form of signal representation. This paper introduces a sparse representation based framework named Sparse Probabilistic Linear Discriminant Analysis in speaker recognition. In this latent variable model, probabilistic linear discriminant analysis is modified to obtain an algorithm for learning overcomplete sparse representations by replacing the Gaussian prior on the factors with Laplace prior that encourages sparseness. For a given speaker signal, the dictionary obtained from this model has good representational power while supporting optimal discrimination of the classes. An expectation-maximization algorithm is derived to train the model with a variational approximation to a range of heavy-tailed distributions whose limit is the Laplace. The variational approximation is also used to compute the likelihood ratio score of all trials of speakers. This approach performed well on the core-extended conditions of the NIST 2010 Speaker Recognition Evaluation, and is competitive compared to the Gaussian Probabilistic Linear Discriminant Analysis, in terms of normalized Decision Cost Function and Equal Error Rate.