1-2hit |
Kritsada SRIPHAEW Thanaruk THEERAMUNKONG
Mining generalized frequent patterns of generalized association rules is an important process in knowledge discovery system. In this paper, we propose a new approach for efficiently mining all frequent patterns using a novel set enumeration algorithm with two types of constraints on two generalized itemset relationships, called subset-superset and ancestor-descendant constraints. We also show a method to mine a smaller set of generalized closed frequent itemsets instead of mining a large set of conventional generalized frequent itemsets. To this end, we develop two algorithms called SET and cSET for mining generalized frequent itemsets and generalized closed frequent itemsets, respectively. By a number of experiments, the proposed algorithms outperform the previous well-known algorithms in both computational time and memory utilization. Furthermore, the experiments with real datasets indicate that mining generalized closed frequent itemsets gains more merit on computational costs since the number of generalized closed frequent itemsets is much more smaller than the number of generalized frequent itemsets.
Kritsada SRIPHAEW Thanaruk THEERAMUNKONG
Assessment of discovered patterns is an important issue in the field of knowledge discovery. This paper presents an evaluation method that utilizes citation (reference) information to assess the quality of discovered document relations. With the concept of transitivity as direct/indirect citations, a series of evaluation criteria is introduced to define the validity of discovered relations. Two kinds of validity, called soft validity and hard validity, are proposed to express the quality of the discovered relations. For the purpose of impartial comparison, the expected validity is statistically estimated based on the generative probability of each relation pattern. The proposed evaluation is investigated using more than 10,000 documents obtained from a research publication database. With frequent itemset mining as a process to discover document relations, the proposed method was shown to be a powerful way to evaluate the relations in four aspects: soft/hard scoring, direct/indirect citation, relative quality over the expected value, and comparison to human judgment.