Author Search Result

[Author] Jotaro IKEDO(3hit)

1-3hit
  • Voice Activity Detection Using Neural Network

    Jotaro IKEDO  

     
    LETTER

      Vol:
    E81-B No:12
      Page(s):
    2509-2513

    Voice activity detection (VAD) is to determine whether a short time speech frame is voice or silence. VAD is useful in reducing the mean speech coding rate by suppressing transmission during silence periods, and is effective in transmitting speech and other data simultaneously. This letter describes a VAD system that uses a neural network. The neural network gets several parameters by analyzing slices of the speech wave form, and outputs only one scalar value related to voice activity. This output is compared to a threshold to determine whether the slice is voice or silence. The mean code transfer rate can be reduced to less than 50% by using the proposed VAD system.

  • Measuring the Perceived Importance of Speech Segments for Transmission over IP Networks Open Access

    Yusuke HIWASAKI  Toru MORINAGA  Jotaro IKEDO  Akitoshi KATAOKA  

     
    PAPER

      Vol:
    E89-B No:2
      Page(s):
    326-333

    This paper presents a way of using a linear regression model to produce a single-valued criterion that indicates the perceived importance of each block in a stream of speech blocks. This method is superior to the conventional approach, voice activity detection (VAD), in that it provides a dynamically changing priority value for speech segments with finer granularity. The approach can be used in conjunction with scalable speech coding techniques in the context of IP QoS services to achieve a flexible form of quality control for speech transmission. A simple linear regression model is used to estimate a mean opinion score (MOS) of the various cases of missing speech segments. The estimated MOS is a continuous value that can be mapped to priority levels with arbitrary granularity. Through subjective evaluation, we show the validity of the calculated priority values.

  • A Low Complexity Speech Codec and Its Error Protection

    Jotaro IKEDO  Akitoshi KATAOKA  

     
    PAPER-Source Encoding

      Vol:
    E80-B No:11
      Page(s):
    1688-1695

    This paper proposes a new speech codec based on CELP for PHS multimedia communication. PHS portable terminals should consume as little power as possible, and the codec used in them has to be robust against channel errors. Therefore, the proposed codec operates with low computational complexity while reducing the deterioration in speech quality due to channel errors. This codec uses two new schemes to reduce computational complexity. One is moving average scalar quantization for the filter coefficients of the synthesis filter. This scheme requires 90% less complexity to quantize synthesis filter coefficients compared to the widely used vector quantization. The other is pre-selection for selecting an algebraic codebook used as random excitation source. An orthogonalization scheme is used for stable pre-selection. Deterioration of speech quality is suppressed by using CRC and parameter estimation for error protection. Two types of codec are proposed: a 10-ms frame type that transmits 160 bits every 10-ms and a 15-ms frame type that transmits 160 bits every 15 ms. The computational complexity of these codecs is less than 5 MOPS. In a nochannel error environment, the speech quality is equal to that of ITU-TG.726 at 32.0 kbit/s. With 0.3% channel error, both codecs offer more comfortable conversation than G.726. Moreover, at 1.0% channel error, the 10-ms frame type still provides comfortable conversation.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.