A Ranking-Based Text Matching Approach for Plagiarism Detection

Leilei KONG, Zhongyuan HAN, Haoliang QI, Zhimao LU

  • Full Text Views

    0

  • Cite this

Summary :

This paper addresses the issue of text matching for plagiarism detection. This task aims at identifying the matching plagiarism segments in a pair of suspicious document and its plagiarism source document. All the time, heuristic-based methods are mainly utilized to resolve this problem. But the heuristics rely on the experts' experiences and fail to integrate more features to detect the high obfuscation plagiarism matches. In this paper, a statistical machine learning approach, named the Ranking-based Text Matching Approach for Plagiarism Detection, is proposed to deal with the issues of high obfuscation plagiarism detection. The plagiarism text matching is formalized as a ranking problem, and a pairwise learning to rank algorithm is exploited to identify the most probable plagiarism matches for a given suspicious segment. Especially, the Meteor evaluation metrics of machine translation are subsumed by the proposed method to capture the lexical and semantic text similarity. The proposed method is evaluated on PAN12 and PAN13 text alignment corpus of plagiarism detection and compared to the methods achieved the best performance in PAN12, PAN13 and PAN14. Experimental results demonstrate that the proposed method achieves statistically significantly better performance than the baseline methods in all twelve document collections belonging to five different plagiarism categories. Especially at the PAN12 Artificial-high Obfuscation sub-corpus and PAN13 Summary Obfuscation plagiarism sub-corpus, the main evaluation metrics PlagDet of the proposed method are even 22% and 43% relative improvements than the baselines. Moreover, the efficiency of the proposed method is also better than that of baseline methods.

Publication
IEICE TRANSACTIONS on Fundamentals Vol.E101-A No.5 pp.799-810
Publication Date
2018/05/01
Publicized
Online ISSN
1745-1337
DOI
10.1587/transfun.E101.A.799
Type of Manuscript
PAPER
Category
Information Theory

Authors

Leilei KONG
  the Heilongjiang Institute of Technology
Zhongyuan HAN
  the Heilongjiang Institute of Technology
Haoliang QI
  the Heilongjiang Institute of Technology,the State Key Laboratory of Digital Publishing Technology of China
Zhimao LU
  the Dalian University of Technology

Keyword

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.