Author Search Result

[Author] Hiroyuki SEGI(3hit)

1-3hit
  • Simultaneous Subtitling System for Broadcast News Programs with a Speech Recognizer

    Akio ANDO  Toru IMAI  Akio KOBAYASHI  Shinich HOMMA  Jun GOTO  Nobumasa SEIYAMA  Takeshi MISHIMA  Takeshi KOBAYAKAWA  Shoei SATO  Kazuo ONOE  Hiroyuki SEGI  Atsushi IMAI  Atsushi MATSUI  Akira NAKAMURA  Hideki TANAKA  Tohru TAKAGI  Eiichi MIYASAKA  Haruo ISONO  

     
    INVITED PAPER

      Vol:
    E86-D No:1
      Page(s):
    15-25

    There is a strong demand to expand captioned broadcasting for TV news programs in Japan. However, keyboard entry of captioned manuscripts for news program cannot keep pace with the speed of speech, because in the case of Japanese it takes time to select the correct characters from among homonyms. In order to implement simultaneous subtitled broadcasting for Japanese news programs, a simultaneous subtitling system by speech recognition has been developed. This system consists of a real-time speech recognition system to handle broadcast news transcription and a recognition-error correction system that manually corrects mistakes in the recognition result with short delay time. NHK started simultaneous subtitled broadcasting for the news program "News 7" on the evening of March 27, 2000.

  • Spectral Features for Perceptually Natural Phoneme Replacement by Another Speaker's Speech

    Reiko TAKOU  Hiroyuki SEGI  Tohru TAKAGI  Nobumasa SEIYAMA  

     
    PAPER-Speech and Hearing

      Vol:
    E95-A No:4
      Page(s):
    751-759

    The frequency regions and spectral features that can be used to measure the perceived similarity and continuity of voice quality are reported here. A perceptual evaluation test was conducted to assess the naturalness of spoken sentences in which either a vowel or a long vowel of the original speaker was replaced by that of another. Correlation analysis between the evaluation score and the spectral feature distance was conducted to select the spectral features that were expected to be effective in measuring the voice quality and to identify the appropriate speech segment of another speaker. The mel-frequency cepstrum coefficient (MFCC) and the spectral center of gravity (COG) in the low-, middle-, and high-frequency regions were selected. A perceptual paired comparison test was carried out to confirm the effectiveness of the spectral features. The results showed that the MFCC was effective for spectra across a wide range of frequency regions, the COG was effective in the low- and high-frequency regions, and the effective spectral features differed among the original speakers.

  • Filter Bank Subtraction for Robust Speech Recognition

    Kazuo ONOE  Hiroyuki SEGI  Takeshi KOBAYAKAWA  Shoei SATO  Shinichi HOMMA  Toru IMAI  Akio ANDO  

     
    PAPER-Robust Speech Recognition and Enhancement

      Vol:
    E86-D No:3
      Page(s):
    483-488

    In this paper, we propose a new technique of filter bank subtraction for robust speech recognition under various acoustic conditions. Spectral subtraction is a simple and useful technique for reducing the influence of additive noise. Conventional spectral subtraction assumes accurate estimation of the noise spectrum and no correlation between speech and noise. Those assumptions, however, are rarely satisfied in reality, leading to the degradation of speech recognition accuracy. Moreover, the recognition improvement attained by conventional methods is slight when the input SNR changes sharply. We propose a new method in which the output values of filter banks are used for noise estimation and subtraction. By estimating noise at each filter bank, instead of at each frequency point, the method alleviates the necessity for precise estimation of noise. We also take into consideration expected phase differences between the spectra of speech and noise in the subtraction and control a subtraction coefficient theoretically. Recognition experiments on test sets at several SNRs showed that the filter bank subtraction technique improved the word accuracy significantly and got better results than conventional spectral subtraction on all the test sets. In other experiments, on recognizing speech from TV news field reports with environmental noise, the proposed subtraction method yielded better results than the conventional method.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.