IEICE globals.ieice.org Site

Author Search Result

[Author] Jun ADACHI(7hit)

1-7hit

Probabilistic Automaton-Based Fuzzy English-Text Retrieval
Manabu OHTA Atsuhiro TAKASU Jun ADACHI

PAPER-Software Systems

Vol:
E86-D No:9
Page(s):
1835-1844
Optical Character Reader (OCR) incorrect recognition is a serious problem when searching for OCR-scanned documents in databases such as digital libraries. In order to reduce costs, this paper proposes fuzzy retrieval methods for English text containing errors in the recognized text without correcting the errors manually. The proposed methods generate multiple search terms for each input query term based on probabilistic automata which reflect both error-occurrence probabilities and character-connection probabilities. Experimental results of test-set retrieval indicate that one of the proposed methods improves the recall rate from 95.96% to 98.15% at the cost of a decrease in precision from 100.00% to 96.01% with 20 expanded search terms.
Load Balancing Scheme on the Basis of Huffman Coding for P2P Information Retrieval
Hisashi KURASAWA Atsuhiro TAKASU Jun ADACHI

PAPER-Contents Technology and Web Information Systems

Vol:
E92-D No:10
Page(s):
2064-2072
Although a distributed index on a distributed hash table (DHT) enables efficient document query processing in Peer-to-Peer information retrieval (P2P IR), the index costs a lot to construct and it tends to be an unfair management because of the unbalanced term frequency distribution. We devised a new distributed index, named Huffman-DHT, for P2P IR. The new index uses an algorithm similar to Huffman coding with a modification to the DHT structure based on the term distribution. In a Huffman-DHT, a frequent term is assigned to a short ID and allocated a large space in the node ID space in DHT. Throuth ID management, the Huffman-DHT balances the index registration accesses among peers and reduces load concentrations. Huffman-DHT is the first approach to adapt concepts of coding theory and term frequency distribution to load balancing. We evaluated this approach in experiments using a document collection and assessed its load balancing capabilities in P2P IR. The experimental results indicated that it is most effective when the P2P system consists of about 30,000 nodes and contains many documents. Moreover, we proved that we can construct a Huffman-DHT easily by estimating the probability distribution of the term occurrence from a small number of sample documents.
Decomposing the Web Graph into Parameterized Connected Components
Tomonari MASADA Atsuhiro TAKASU Jun ADACHI

PAPER

Vol:
E87-D No:2
Page(s):
380-388
We propose a novel method for Web page grouping based only on hyperlink information. Because of the explosive growth of the World Wide Web, page grouping is expected to provide a general grasp of the Web for effective information search and netsurfing. The Web can be regarded as a gigantic digraph where pages are vertices and links are arcs. Our grouping method is a generalization of decomposition into strongly connected components, in which each group is constructed as a subset of a strongly connected component. Moreover, group sizes can be controlled by adjusting a parameter, called the threshold parameter. We call the resulting groups parameterized connected components (PCCs). The algorithm is simple and admits parallelization. Notably, we apply Dijkstra's shortest path algorithm in our grouping method. This paper also includes experimental results for 15 million Web pages, which show the contribution of our method to efficient Web surfer navigation.
Optimal Pivot Selection Method Based on the Partition and the Pruning Effect for Metric Space Indexes
Hisashi KURASAWA Daiji FUKAGAWA Atsuhiro TAKASU Jun ADACHI

PAPER

Vol:
E94-D No:3
Page(s):
504-514
This paper proposes a new method to reduce the cost of nearest neighbor searches in metric spaces. Many similarity search indexes recursively divide a region into subregions by using pivots, and construct a tree-structured index. Most of recently developed indexes focus on pruning objects and do not pay much attention to the tree balancing. As a result, indexes having imbalanced tree-structure may be constructed and the search cost is degraded. We propose a similarity search index called the Partitioning Capacity (PC) Tree. It selects the optimal pivot in terms of the PC that quantifies the balance of the regions partitioned by a pivot as well as the estimated effectiveness of the search pruning by the pivot. As a result, PCTree reduces the search cost for various data distributions. We experimentally compared PCTree with four indexes using synthetic data and five real datasets. The experimental results shows that the PCTree successfully reduces the search cost.
FOREWORD
Jun ADACHI

FOREWORD

Vol:
E87-D No:2
Page(s):
360-360
A Relevance-Based Superimposition Model for Effective Information Retrieval
Teruhito KANAZAWA Atsuhiro TAKASU Jun ADACHI

PAPER-Natural Language Processing

Vol:
E83-D No:12
Page(s):
2152-2160
Semantic ambiguity is a serious problem in information retrieval. Query expansion has been proposed as one method of solving this problem. However, queries tend not to have much information for fitting query vectors to the latent semantics, which are difficult to express in a few query terms given by users. We propose a document vector modification method that modifies document vectors based on the relevance of documents. This method is expected to show better retrieval effectiveness than conventional methods. In this paper, we evaluate our method through retrieval experiments in which the relevance of documents extracted from scientific papers is assessed, and a comparison with tfidf is described.
Margin-Based Pivot Selection for Similarity Search Indexes
Hisashi KURASAWA Daiji FUKAGAWA Atsuhiro TAKASU Jun ADACHI

PAPER-Multimedia Databases

Vol:
E93-D No:6
Page(s):
1422-1432
When developing an index for a similarity search in metric spaces, how to divide the space for effective search pruning is a fundamental issue. We present Maximal Metric Margin Partitioning (MMMP), a partitioning scheme for similarity search indexes. MMMP divides the data based on its distribution pattern, especially for the boundaries of clusters. A partitioning boundary created by MMMP is likely to be located in a sparse area between clusters. Moreover, the partitioning boundary is at maximum distances from the two cluster edges. We also present an indexing scheme, named the MMMP-Index, which uses MMMP and pivot filtering. The MMMP-Index can prune many objects that are not relevant to a query, and it reduces the query execution cost. Our experimental results show that MMMP effectively indexes clustered data and reduces the search cost. For clustered data in a vector space, the MMMP-Index reduces the computational cost to less than two thirds that of comparable schemes.

Author Search Result

[Author] Jun ADACHI(7hit)

Probabilistic Automaton-Based Fuzzy English-Text Retrieval

Load Balancing Scheme on the Basis of Huffman Coding for P2P Information Retrieval

Decomposing the Web Graph into Parameterized Connected Components

Optimal Pivot Selection Method Based on the Partition and the Pruning Effect for Metric Space Indexes

FOREWORD

A Relevance-Based Superimposition Model for Effective Information Retrieval

Margin-Based Pivot Selection for Similarity Search Indexes

Latest Issue

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles