Author Search Result

[Author] Shirou MARUYAMA(3hit)

1-3hit
  • Scalable Detection of Frequent Substrings by Grammar-Based Compression

    Masaya NAKAHARA  Shirou MARUYAMA  Tetsuji KUBOYAMA  Hiroshi SAKAMOTO  

     
    PAPER

      Vol:
    E96-D No:3
      Page(s):
    457-464

    A scalable pattern discovery by compression is proposed. A string is representable by a context-free grammar deriving the string deterministically. In this framework of grammar-based compression, the aim of the algorithm is to output as small a grammar as possible. Beyond that, the optimization problem is approximately solvable. In such approximation algorithms, the compressor based on edit-sensitive parsing (ESP) is especially suitable for detecting maximal common substrings as well as long frequent substrings. Based on ESP, we design a linear time algorithm to find all frequent patterns in a string approximately and prove several lower bounds to guarantee the length of extracted patterns. We also examine the performance of our algorithm by experiments in biological sequences and other compressible real world texts. Compared to other practical algorithms, our algorithm is faster and more scalable with large and repetitive strings.

  • Context-Sensitive Grammar Transform: Compression and Pattern Matching

    Shirou MARUYAMA  Youhei TANAKA  Hiroshi SAKAMOTO  Masayuki TAKEDA  

     
    PAPER

      Vol:
    E93-D No:2
      Page(s):
    219-226

    A framework of context-sensitive grammar transform for speeding-up compressed pattern matching (CPM) is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching algorithm. The compression ratio is a match for gzip and Re-Pair, and the search speed of our CPM algorithm is almost twice faster than the KMP-type CPM algorithm on Byte-Pair-Encoding by Shibata et al., and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al., which is regarded as one of the best combinations that allows a practically fast search.

  • A Space-Saving Approximation Algorithm for Grammar-Based Compression

    Hiroshi SAKAMOTO  Shirou MARUYAMA  Takuya KIDA  Shinichi SHIMOZONO  

     
    PAPER

      Vol:
    E92-D No:2
      Page(s):
    158-165

    A space-efficient approximation algorithm for the grammar-based compression problem, which requests for a given string to find a smallest context-free grammar deriving the string, is presented. For the input length n and an optimum CFG size g, the algorithm consumes only O(g log g) space and O(n log*n) time to achieve O((log*n)log n) approximation ratio to the optimum compression, where log*n is the maximum number of logarithms satisfying log log log n > 1. This ratio is thus regarded to almost O(log n), which is the currently best approximation ratio. While g depends on the string, it is known that g=Ω(log n) and for strings from k-letter alphabet [12].

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.