With the successful adoption of link analysis techniques such as PageRank and web spam filtering, current web search engines well support “navigational search”. However, due to the use of a simple conjunctive Boolean filter in addition to the inappropriateness of user queries, such an engine does not necessarily well support “informational search”. Informational search would be better handled by a web search engine using an informational retrieval model combined with enhancement techniques such as query expansion and relevance feedback. Moreover, the realization of such an engine requires a method to prosess the model efficiently. In this paper we propose a novel extension of an existing top-k query processing technique to improve search efficiency. We add to it the technique utilizing a simple data structure called a “term-document binary matrix,” resulting in more efficient evaluation of top-k queries even when the queries have been expanded. We show on the basis of experimental evaluation using the TREC GOV2 data set and expanded versions of the evaluation queries attached to this data set that the proposed method can speed up evaluation considerably compared with existing techniques especially when the number of query terms gets larger.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Etsuro FUJITA, Keizo OYAMA, "Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix – Pursuit of Enhanced Informational Search on the Web –" in IEICE TRANSACTIONS on Information,
vol. E96-D, no. 5, pp. 1016-1028, May 2013, doi: 10.1587/transinf.E96.D.1016.
Abstract: With the successful adoption of link analysis techniques such as PageRank and web spam filtering, current web search engines well support “navigational search”. However, due to the use of a simple conjunctive Boolean filter in addition to the inappropriateness of user queries, such an engine does not necessarily well support “informational search”. Informational search would be better handled by a web search engine using an informational retrieval model combined with enhancement techniques such as query expansion and relevance feedback. Moreover, the realization of such an engine requires a method to prosess the model efficiently. In this paper we propose a novel extension of an existing top-k query processing technique to improve search efficiency. We add to it the technique utilizing a simple data structure called a “term-document binary matrix,” resulting in more efficient evaluation of top-k queries even when the queries have been expanded. We show on the basis of experimental evaluation using the TREC GOV2 data set and expanded versions of the evaluation queries attached to this data set that the proposed method can speed up evaluation considerably compared with existing techniques especially when the number of query terms gets larger.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.E96.D.1016/_p
Copy
@ARTICLE{e96-d_5_1016,
author={Etsuro FUJITA, Keizo OYAMA, },
journal={IEICE TRANSACTIONS on Information},
title={Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix – Pursuit of Enhanced Informational Search on the Web –},
year={2013},
volume={E96-D},
number={5},
pages={1016-1028},
abstract={With the successful adoption of link analysis techniques such as PageRank and web spam filtering, current web search engines well support “navigational search”. However, due to the use of a simple conjunctive Boolean filter in addition to the inappropriateness of user queries, such an engine does not necessarily well support “informational search”. Informational search would be better handled by a web search engine using an informational retrieval model combined with enhancement techniques such as query expansion and relevance feedback. Moreover, the realization of such an engine requires a method to prosess the model efficiently. In this paper we propose a novel extension of an existing top-k query processing technique to improve search efficiency. We add to it the technique utilizing a simple data structure called a “term-document binary matrix,” resulting in more efficient evaluation of top-k queries even when the queries have been expanded. We show on the basis of experimental evaluation using the TREC GOV2 data set and expanded versions of the evaluation queries attached to this data set that the proposed method can speed up evaluation considerably compared with existing techniques especially when the number of query terms gets larger.},
keywords={},
doi={10.1587/transinf.E96.D.1016},
ISSN={1745-1361},
month={May},}
Copy
TY - JOUR
TI - Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix – Pursuit of Enhanced Informational Search on the Web –
T2 - IEICE TRANSACTIONS on Information
SP - 1016
EP - 1028
AU - Etsuro FUJITA
AU - Keizo OYAMA
PY - 2013
DO - 10.1587/transinf.E96.D.1016
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E96-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2013
AB - With the successful adoption of link analysis techniques such as PageRank and web spam filtering, current web search engines well support “navigational search”. However, due to the use of a simple conjunctive Boolean filter in addition to the inappropriateness of user queries, such an engine does not necessarily well support “informational search”. Informational search would be better handled by a web search engine using an informational retrieval model combined with enhancement techniques such as query expansion and relevance feedback. Moreover, the realization of such an engine requires a method to prosess the model efficiently. In this paper we propose a novel extension of an existing top-k query processing technique to improve search efficiency. We add to it the technique utilizing a simple data structure called a “term-document binary matrix,” resulting in more efficient evaluation of top-k queries even when the queries have been expanded. We show on the basis of experimental evaluation using the TREC GOV2 data set and expanded versions of the evaluation queries attached to this data set that the proposed method can speed up evaluation considerably compared with existing techniques especially when the number of query terms gets larger.
ER -