Author Search Result

[Author] Hiroshi G. OKUNO(3hit)

1-3hit
  • Automatic Allocation of Training Data for Speech Understanding Based on Multiple Model Combinations

    Kazunori KOMATANI  Mikio NAKANO  Masaki KATSUMARU  Kotaro FUNAKOSHI  Tetsuya OGATA  Hiroshi G. OKUNO  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:9
      Page(s):
    2298-2307

    The optimal way to build speech understanding modules depends on the amount of training data available. When only a small amount of training data is available, effective allocation of the data is crucial to preventing overfitting of statistical methods. We have developed a method for allocating a limited amount of training data in accordance with the amount available. Our method exploits rule-based methods for when the amount of data is small, which are included in our speech understanding framework based on multiple model combinations, i.e., multiple automatic speech recognition (ASR) modules and multiple language understanding (LU) modules, and then allocates training data preferentially to the modules that dominate the overall performance of speech understanding. Experimental evaluation showed that our allocation method consistently outperforms baseline methods that use a single ASR module and a single LU module while the amount of training data increases.

  • Selecting Help Messages by Using Robust Grammar Verification for Handling Out-of-Grammar Utterances in Spoken Dialogue Systems

    Kazunori KOMATANI  Yuichiro FUKUBAYASHI  Satoshi IKEDA  Tetsuya OGATA  Hiroshi G. OKUNO  

     
    PAPER-Speech and Hearing

      Vol:
    E93-D No:12
      Page(s):
    3359-3367

    We address the issue of out-of-grammar (OOG) utterances in spoken dialogue systems by generating help messages. Help message generation for OOG utterances is a challenge because language understanding based on automatic speech recognition (ASR) of OOG utterances is usually erroneous; important words are often misrecognized or missing from such utterances. Our grammar verification method uses a weighted finite-state transducer, to accurately identify the grammar rule that the user intended to use for the utterance, even if important words are missing from the ASR results. We then use a ranking algorithm, RankBoost, to rank help message candidates in order of likely usefulness. Its features include the grammar verification results and the utterance history representing the user's experience.

  • Common Acoustical Pole Estimation from Multi-Channel Musical Audio Signals

    Takuya YOSHIOKA  Takafumi HIKICHI  Masato MIYOSHI  Hiroshi G. OKUNO  

     
    PAPER-Engineering Acoustics

      Vol:
    E89-A No:1
      Page(s):
    240-247

    This paper describes a method for estimating the amplitude characteristics of poles common to multiple room transfer functions from musical audio signals received by multiple microphones. Knowledge of these pole characteristics would make it easier to manipulate audio equalizers, since they correspond to the room resonance. It has been proven that an estimate of the poles can be calculated precisely when a source signal is white. However, if a source signal is colored as in the case of a musical audio signal, the estimate is degraded by the frequency characteristics originally contained in the source signal. In this paper, we consider that an amplitude spectrum of a musical audio signal consists of its envelope and fine structure. We assume that musical pieces can be classified into several categories according to their average amplitude spectral envelopes. Based on this assumption, the amplitude spectral envelope of the musical audio signal can be obtained from prior knowledge of the average amplitude spectral envelope of a musical piece category into which the target piece is classified. On the other hand, the fine structure is identified based on its time variance. By removing both the spectral envelope and the fine structure from the amplitude spectrum estimated with the conventional method, the amplitude characteristics of the acoustical poles can be extracted. Simulation results for 20 popular songs revealed that our method was capable of estimating the amplitude characteristics of the acoustical poles with a spectral distortion of 3.11 dB. In particular, most of the spectral peaks, corresponding to the room resonance modes, were successfully detected.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.