Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix – Pursuit of Enhanced Informational Search on the Web –

Etsuro FUJITA, Keizo OYAMA

  • Full Text Views

    0

  • Cite this

Summary :

With the successful adoption of link analysis techniques such as PageRank and web spam filtering, current web search engines well support “navigational search”. However, due to the use of a simple conjunctive Boolean filter in addition to the inappropriateness of user queries, such an engine does not necessarily well support “informational search”. Informational search would be better handled by a web search engine using an informational retrieval model combined with enhancement techniques such as query expansion and relevance feedback. Moreover, the realization of such an engine requires a method to prosess the model efficiently. In this paper we propose a novel extension of an existing top-k query processing technique to improve search efficiency. We add to it the technique utilizing a simple data structure called a “term-document binary matrix,” resulting in more efficient evaluation of top-k queries even when the queries have been expanded. We show on the basis of experimental evaluation using the TREC GOV2 data set and expanded versions of the evaluation queries attached to this data set that the proposed method can speed up evaluation considerably compared with existing techniques especially when the number of query terms gets larger.

Publication
IEICE TRANSACTIONS on Information Vol.E96-D No.5 pp.1016-1028
Publication Date
2013/05/01
Publicized
Online ISSN
1745-1361
DOI
10.1587/transinf.E96.D.1016
Type of Manuscript
Special Section PAPER (Special Section on Data Engineering and Information Management)
Category
Advanced Search

Authors

Keyword

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.