1-2hit |
Xueqing ZHANG Xiaoxia LIU Jun GUO Wenlei BAI Daguang GAN
As scientific and technological resources are experiencing information overload, it is quite expensive to find resources that users are interested in exactly. The personalized recommendation system is a good candidate to solve this problem, but data sparseness and the cold starting problem still prevent the application of the recommendation system. Sparse data affects the quality of the similarity measurement and consequently the quality of the recommender system. In this paper, we propose a matrix factorization recommendation algorithm based on similarity calculation(SCMF), which introduces potential similarity relationships to solve the problem of data sparseness. A penalty factor is adopted in the latent item similarity matrix calculation to capture more real relationships furthermore. We compared our approach with other 6 recommendation algorithms and conducted experiments on 5 public data sets. According to the experimental results, the recommendation precision can improve by 2% to 9% versus the traditional best algorithm. As for sparse data sets, the prediction accuracy can also improve by 0.17% to 18%. Besides, our approach was applied to patent resource exploitation provided by the wanfang patents retrieval system. Experimental results show that our method performs better than commonly used algorithms, especially under the cold starting condition.
Xiaoxia LIU Degen HUANG Zhangzhi YIN Fuji REN
Collocation is a ubiquitous phenomenon in languages and accurate collocation recognition and extraction is of great significance to many natural language processing tasks. Collocations can be differentiated from simple bigram collocations to collocation frames (referring to distant multi-gram collocations). So far little focus is put on collocation frames. Oriented to translation and parsing, this study aims to recognize and extract the longest possible collocation frames from given sentences. We first extract bigram collocations with distributional semantics based method by introducing collocation patterns and integrating some state-of-the-art association measures. Based on bigram collocations extracted by the proposed method, we get the longest collocation frames according to recursive nature and linguistic rules of collocations. Compared with the baseline systems, the proposed method performs significantly better in bigram collocation extraction both in precision and recall. And in extracting collocation frames, the proposed method performs even better with the precision similar to its bigram collocation extraction results.