1-6hit |
Jung-Tae LEE Young-In SONG Hae-Chang RIM
Previous approaches to question retrieval in community-based question answering rely on statistical translation techniques to match users' questions (queries) against collections of previously asked questions. This paper presents a simple but effective method for computing word relatedness to improve question retrieval based on word co-occurrence information directly extracted from question and answer archives. Experimental results show that the proposed approach significantly outperforms translation-based approaches.
Young-In SONG Kyoung-Soo HAN So-Young PARK Sang-Bum KIM Hae-Chang RIM
In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.
Kyoung-Soo HAN Young-In SONG Sang-Bum KIM Hae-Chang RIM
We propose a definitional question answering system that extracts phrases using syntactic patterns which are easily constructed manually and can reduce the coverage problem. Experimental results show that our phrase extraction system outperforms a sentence extraction system, especially for selecting concise answers, in terms of recall and precision, and indicate that the proper text unit of answer candidates and the final answer has a significant effect on the system performance.
Gumwon HONG Jeong-Hoon LEE Young-In SONG Do-Gil LEE Hae-Chang RIM
This paper presents a new approach to word spacing problems by mining reliable words from the Web and use them as additional resources. Conventional approaches to automatic word spacing use noise-free data to train parameters for word spacing models. However, the insufficiency and irrelevancy of training examples is always the main bottleneck associated with automatic word spacing. To mitigate the data-sparseness problem, this paper proposes an algorithm to discover reliable words on the Web to expand the vocabularies and a model to utilize the words as additional resources. The proposed approach is very simple and practical to adapt to new domains. Experimental results show that the proposed approach achieves better performance compared to the conventional word spacing approaches.
Joo-Young LEE Young-In SONG Hae-Chang RIM Kyoung-Soo HAN
In this paper, we suggest a new probabilistic model of semantic role labeling, which uses the frameset of the predicate as explicit linguistic knowledge for providing global information on the predicate-argument structure that local classifier is unable to catch. The proposed model consists of three sub-models: role sequence generation model, frameset generation model, and matching model. The role sequence generation model generates the semantic role sequence candidates of a given predicate by using the local classification approach, which is a widely used approach in previous research. The frameset generation model estimates the probability of each frameset that the predicate can take. The matching model is designed to measure the degree of the matching between the generated role sequence and the frameset by using several features. These features are developed to represent the predicate-argument structure information described in the frameset. In the experiments, our model shows that the use of knowledge about the predicate-argument structure is effective for selecting a more appropriate semantic role sequence.
Yeo-Chan YOON So-Young PARK Young-In SONG Hae-Chang RIM Dae-Woong RHEE
In this paper, we propose a new model of automatically constructing an acronym dictionary. The proposed model generates possible acronym candidates from a definition, and then verifies each acronym-definition pair with a Naive Bayes classifier based on web documents. In order to achieve high dictionary quality, the proposed model utilizes the characteristics of acronym generation types: a syllable-based generation type, a word-based generation type, and a mixed generation type. Compared with a previous model recognizing an acronym-definition pair in a document, the proposed model verifying a pair in web documents improves approximately 50% recall on obtaining acronym-definition pairs from 314 Korean definitions. Also, the proposed model improves 7.25% F-measure on verifying acronym-definition candidate pairs by utilizing specialized classifiers with the characteristics of acronym generation types.