1-14hit |
Satoshi MATSUMOTO Takayoshi SHOUDAI Tomoyuki UCHIDA Tetsuhiro MIYAHARA Yusuke SUZUKI
A linear term tree is defined as an edge-labeled rooted tree pattern with ordered children and internal structured variables whose labels are mutually distinct. A variable can be replaced with arbitrary edge-labeled rooted ordered trees. We consider the polynomial time learnability of finite unions of linear term trees in the exact learning model formalized by Angluin. The language L(t) of a linear term tree t is the set of all trees obtained from t by substituting arbitrary edge-labeled rooted ordered trees for all variables in t. Moreover, for a finite set S of linear term trees, we define L(S)=∪t∈S L(t). A target of learning, denoted by T*, is a finite set of linear term trees, where the number of edge labels is infinite. In this paper, for any set T* of m linear term trees (m ≥ 0), we present a query learning algorithm which exactly identifies T* in polynomial time using at most 2mn2 Restricted Subset queries and at most m+1 Equivalence queries, where n is the maximum size of counterexamples. Finally, we note that finite sets of linear term trees are not learnable in polynomial time using Restricted Equivalence, Membership and Subset queries.
Takayoshi SHOUDAI Takashi YAMADA
This paper deals with a problem to decide whether a given graph structure appears as a pattern in the structure of a given graph. A graph pattern is a triple p=(V,E,H), where (V,E) is a graph and H is a set of variables, which are ordered lists of vertices in V. A variable can be replaced with an arbitrary connected graph by a kind of hyperedge replacements. A substitution is a collection of such replacements. The graph pattern matching problem (GPMP) is the computational problem to decide whether or not a given graph G is obtained from a given graph pattern p by a substitution. In this paper, we show that GPMP for a graph pattern p and a graph G is solvable in polynomial time if the length of every variable in p is 2, p is of bounded treewidth, and G is connected.
Tetsuhiro MIYAHARA Tomoyuki UCHIDA Takayoshi SHOUDAI Tetsuji KUBOYAMA Kenichi TAKAHASHI Hiroaki UEDA
We present a new method for discovering knowledge from structured data which are represented by graphs in the framework of Inductive Logic Programming. A graph, or network, is widely used for representing relations between various data and expressing a small and easily understandable hypothesis. The analyzing system directly manipulating graphs is useful for knowledge discovery. Our method uses Formal Graph System (FGS) as a knowledge representation language for graph structured data. FGS is a kind of logic programming system which directly deals with graphs just like first order terms. And our method employs a refutably inductive inference algorithm as a learning algorithm. A refutably inductive inference algorithm is a special type of inductive inference algorithm with refutability of hypothesis spaces, and is suitable for knowledge discovery. We give a sufficiently large hypothesis space, the set of weakly reducing FGS programs. And we show that this hypothesis space is refutably inferable from complete data. We have designed and implemented a prototype of a knowledge discovery system KD-FGS, which is based on our method and acquires knowledge directly from graph structured data. Finally we discuss the applicability of our method for graph structured data with experimental results on some graph theoretical notions.
Takayoshi SHOUDAI Tetsuhiro MIYAHARA Tomoyuki UCHIDA Satoshi MATSUMOTO Yusuke SUZUKI
A term is a connected acyclic graph (unrooted unordered tree) pattern with structured variables, which are ordered lists of one or more distinct vertices. A variable of a term has a variable label and can be replaced with an arbitrary tree by hyperedge replacement according to the variable label. The dimension of a term is the maximum number of vertices in the variables of it. A term is said to be linear if each variable label in it occurs exactly once. Let T be a tree and t a linear term. In this paper, we study the graph pattern matching problem (GPMP) for T and t, which decides whether or not T is obtained from t by replacing variables in t with some trees. First we show that GPMP for T and t is NP-complete if the dimension of t is greater than or equal to 4. Next we give a polynomial time algorithm for solving GPMP for a tree of bounded degree and a linear term of bounded dimension. Finally we show that GPMP for a tree of arbitrary degree and a linear term of dimension 2 is solvable in polynomial time.
Takayoshi SHOUDAI Satoshi MATSUMOTO Yusuke SUZUKI Tomoyuki UCHIDA Tetsuhiro MIYAHARA
A formal graph system (FGS for short) is a logic program consisting of definite clauses whose arguments are graph patterns instead of first-order terms. The definite clauses are referred to as graph rewriting rules. An FGS is shown to be a useful unifying framework for learning graph languages. In this paper, we show the polynomial-time PAC learnability of a subclass of FGS languages defined by parameterized hereditary FGSs with bounded degree, from the viewpoint of computational learning theory. That is, we consider VH-FGSLk,Δ(m, s, t, r, w, d) as the class of FGS languages consisting of graphs of treewidth at most k and of maximum degree at most Δ which is defined by variable-hereditary FGSs consisting of m graph rewriting rules having TGP patterns as arguments. The parameters s, t, and r denote the maximum numbers of variables, atoms in the body, and arguments of each predicate symbol of each graph rewriting rule in an FGS, respectively. The parameters w and d denote the maximum number of vertices of each hyperedge and the maximum degree of each vertex of TGP patterns in each graph rewriting rule in an FGS, respectively. VH-FGSLk,Δ(m, s, t, r, w, d) has infinitely many languages even if all the parameters are bounded by constants. Then we prove that the class VH-FGSLk,Δ(m, s, t, r, w, d) is polynomial-time PAC learnable if all m, s, t, r, w, d, Δ are constants except for k.
Takayoshi SHOUDAI Kazuhide AIKOH Yusuke SUZUKI Satoshi MATSUMOTO Tetsuhiro MIYAHARA Tomoyuki UCHIDA
An efficient means of learning tree-structural features from tree-structured data would enable us to construct effective mining methods for tree-structured data. Here, a pattern representing rich tree-structural features common to tree-structured data and a polynomial time algorithm for learning important tree patterns are necessary for mining knowledge from tree-structured data. As such a tree pattern, we introduce a term tree pattern t such that any edge label of t belongs to a finite alphabet Λ, any internal vertex of t has ordered children and t has a new kind of structured variable, called a height-constrained variable. A height-constrained variable has a pair of integers (i, j) as constraints, and it can be replaced with a tree whose trunk length is at least i and whose height is at most j. This replacement is called height-constrained replacement. A sequence of consecutive height-constrained variables is called a variable-chain. In this paper, we present polynomial time algorithms for solving the membership problem and the minimal language (MINL) problem for term tree patternshaving no variable-chain. The membership problem for term tree patternsis to decide whether or not a given tree can be obtained from a given term tree pattern by applying height-constrained replacements to all height-constrained variables in the term tree pattern. The MINL problem for term tree patternsis to find a term tree pattern t such that the language generated by t is minimal among languages, generated by term tree patterns, which contain all given tree-structured data. Finally, we show that the class, i.e., the set of all term tree patternshaving no variable-chain, is polynomial time inductively inferable from positive data if |Λ| ≥ 2.
Tomoyuki UCHIDA Takayoshi SHOUDAI Satoru MIYANO
We define a new framework for rewriting graphs, called a formal graph system (FGS), which is a logic program having hypergraphs instead of terms in first-order logic. We first prove that a class of graphs is generated by a hyperedge replacement grammar if and only if it is defined by an FGS of a special form called a regular FGS. In the same way as logic programs, we can define a refutation tree for an FGS. The classes of TTSP graphs and outerplanar graphs are definable by regular FGSs. Then, we consider the problem of constructing a refutation tree of a graph for these FGSs. For the FGS defining TTSP graphs, we present a refutation tree algorithm of O(log2nlogm) time with O(nm) processors on an EREW PRAM. For the FGS defining outerplanar graphs, we show that the refutation tree problem can be solved in O(log2n) time with O(nm) processors on an EREW PRAM. Here, n and m are the numbers of vertices and edges of an input graph, respectively.
Hitoshi YAMASAKI Takayoshi SHOUDAI
A graph is an interval graph if and only if each vertex in the graph can be associated with an interval on the real line such that any two vertices are adjacent in the graph exactly when the corresponding intervals have a nonempty intersection. A number of interesting applications for interval graphs have been found in the literature. In order to find structural features common to structural data which can be represented by intervals, this paper proposes new interval graph structured patterns, called linear interval graph patterns, and a polynomial time algorithm for finding a minimally generalized linear interval graph pattern explaining a given finite set of interval graphs.
Tomoyuki UCHIDA Satoshi MATSUMOTO Takayoshi SHOUDAI Yusuke SUZUKI Tetsuhiro MIYAHARA
A formal graph system (FGS) is a logic programming system that directly manipulates graphs by dealing with graph patterns instead of terms of first-order predicate logic. In this paper, based on an FGS, we introduce a primitive formal ordered tree system (pFOTS) as a formal system defining labeled ordered tree languages. A pFOTS program is a finite set of graph rewriting rules. A logic program is well-known to be suitable to represent background knowledge. The query learning model is an established mathematical model of learning via queries in computational learning theory. In this learning model, we show the exact learnability of a pFOTS program consisting of one graph rewriting rule and background knowledge defined by a pFOTS program using a polynomial number of queries.
Yusuke SUZUKI Takayoshi SHOUDAI Tomoyuki UCHIDA Tetsuhiro MIYAHARA
A term tree pattern is a rooted ordered tree pattern which consists of ordered tree structures with edge labels and structured variables with labels. All variables with the same label in a term tree pattern can be simultaneously replaced with ordered trees isomorphic to the same rooted ordered tree. Then, a term tree pattern is suitable for representing structural features common to tree structured data such as XML documents on the web, the secondary structures of RNA in biology and parse trees describing grammatical structures of natural languages. Let $ott$ be the set of all term tree patterns which have one or more variables with the same label. Let $lott$ be the set of all term tree patterns t such that all variables in t have distinct labels. We remark that $lottsubsetneq ott$ holds. In this paper, we consider a problem, called Matching problem for term tree patterns, of deciding whether or not a given rooted ordered tree T is obtained from a given term tree pattern t by replacing variables in t with rooted ordered trees. We show that Matching problem for term tree patterns in $ott$ is NP-complete, by giving a reduction from the string pattern matching problem, which is NP-complete. Next, by giving operations on an interval, which is a set containing all integers between two given integers representing vertex identifiers, we propose an efficient algorithm for solving Matching problem for term tree patterns in $lottsubsetneq ott$. Then, we show that, when an ordered tree having N vertices and a term tree pattern $t in lott$ having n vertices are given, the proposed matching algorithm correctly solves this problem in O(nN) time.
Ryoji TAKAMI Yusuke SUZUKI Tomoyuki UCHIDA Takayoshi SHOUDAI
Two-Terminal Series Parallel (TTSP, for short) graphs are used as data models in applications for electric networks and scheduling problems. We propose a TTSP term graph which is a TTSP graph having structured variables, that is, a graph pattern over a TTSP graph. Let TGTTSP be the set of all TTSP term graphs whose variable labels are mutually distinct. For a TTSP term graph g in TGTTSP, the TTSP graph language of g, denoted by L(g), is the set of all TTSP graphs obtained from g by substituting arbitrary TTSP graphs for all variables in g. Firstly, when a TTSP graph G and a TTSP term graph g are given as inputs, we present a polynomial time matching algorithm which decides whether or not L(g) contains G. The minimal language problem for the class LTTSP={L(g) | g ∈ TGTTSP} is, given a set S of TTSP graphs, to find a TTSP term graph g in TGTTSP such that L(g) is minimal among all TTSP graph languages which contain all TTSP graphs in S. Secondly, we give a polynomial time algorithm for solving the minimal language problem for LTTSP. Finally, we show that LTTSP is polynomial time inductively inferable from positive data.
Takayoshi SHOUDAI Satoru MIYANO
Let C{c1, , cm} be a family of subsets of a finite set S{1, , n}, a subset S of S is a co-hitting set if S contains no element of C as a subset. By using an O((log n)2) time EREW PRAM algorithm for a maximal independent set problem (MIS), we show that a maximal co-hitting set for S can be computed on an EREW PRAN in time O(αβ(log(nm))2) using O(n2 m) processors, where αmax{|cii1, , n} and βmax{|djj1, , n} with dj{ci|jci}. This implies that if αβO((log(nm))k) then the problem is solvable in NC.
Takayoshi SHOUDAI Yuta YOSHIMURA Yusuke SUZUKI Tomoyuki UCHIDA Tetsuhiro MIYAHARA
A cograph (complement reducible graph) is a graph which can be generated by disjoint union and complement operations on graphs, starting with a single vertex graph. Cographs arise in many areas of computer science and are studied extensively. With the goal of developing an effective data mining method for graph structured data, in this paper we introduce a graph pattern expression, called a cograph pattern, which is a special type of cograph having structured variables. Firstly, we show that a problem whether or not a given cograph pattern g matches a given cograph G is NP-complete. From this result, we consider the polynomial time learnability of cograph pattern languages defined by cograph patterns having variables labeled with mutually different labels, called linear cograph patterns. Secondly, we present a polynomial time matching algorithm for linear cograph patterns. Next, we give a polynomial time algorithm for obtaining a minimally generalized linear cograph pattern which explains given positive data. Finally, we show that the class of linear cograph pattern languages is polynomial time inductively inferable from positive data.
Satoshi MATSUMOTO Tomoyuki UCHIDA Takayoshi SHOUDAI Yusuke SUZUKI Tetsuhiro MIYAHARA
A regular pattern is a string consisting of constant symbols and distinct variable symbols. The language of a regular pattern is the set of all constant strings obtained by replacing all variable symbols in the regular pattern with non-empty strings. The present paper deals with the learning problem of languages of regular patterns within Angluin's query learning model, which is an established mathematical model of learning via queries in computational learning theory. The class of languages of regular patterns was known to be identifiable from one positive example using a polynomial number of membership queries, in the query learning model. In present paper, we show that the class of languages of regular patterns is identifiable from one positive example using a linear number of membership queries, with respect to the length of the positive example.