IEICE globals.ieice.org Site

Author Search Result

[Author] Jotaro IKEDO(3hit)

1-3hit

Voice Activity Detection Using Neural Network
Jotaro IKEDO

LETTER

Vol:
E81-B No:12
Page(s):
2509-2513
Voice activity detection (VAD) is to determine whether a short time speech frame is voice or silence. VAD is useful in reducing the mean speech coding rate by suppressing transmission during silence periods, and is effective in transmitting speech and other data simultaneously. This letter describes a VAD system that uses a neural network. The neural network gets several parameters by analyzing slices of the speech wave form, and outputs only one scalar value related to voice activity. This output is compared to a threshold to determine whether the slice is voice or silence. The mean code transfer rate can be reduced to less than 50% by using the proposed VAD system.
Measuring the Perceived Importance of Speech Segments for Transmission over IP Networks Open Access
Yusuke HIWASAKI Toru MORINAGA Jotaro IKEDO Akitoshi KATAOKA

PAPER

Vol:
E89-B No:2
Page(s):
326-333
This paper presents a way of using a linear regression model to produce a single-valued criterion that indicates the perceived importance of each block in a stream of speech blocks. This method is superior to the conventional approach, voice activity detection (VAD), in that it provides a dynamically changing priority value for speech segments with finer granularity. The approach can be used in conjunction with scalable speech coding techniques in the context of IP QoS services to achieve a flexible form of quality control for speech transmission. A simple linear regression model is used to estimate a mean opinion score (MOS) of the various cases of missing speech segments. The estimated MOS is a continuous value that can be mapped to priority levels with arbitrary granularity. Through subjective evaluation, we show the validity of the calculated priority values.
A Low Complexity Speech Codec and Its Error Protection
Jotaro IKEDO Akitoshi KATAOKA

PAPER-Source Encoding

Vol:
E80-B No:11
Page(s):
1688-1695
This paper proposes a new speech codec based on CELP for PHS multimedia communication. PHS portable terminals should consume as little power as possible, and the codec used in them has to be robust against channel errors. Therefore, the proposed codec operates with low computational complexity while reducing the deterioration in speech quality due to channel errors. This codec uses two new schemes to reduce computational complexity. One is moving average scalar quantization for the filter coefficients of the synthesis filter. This scheme requires 90% less complexity to quantize synthesis filter coefficients compared to the widely used vector quantization. The other is pre-selection for selecting an algebraic codebook used as random excitation source. An orthogonalization scheme is used for stable pre-selection. Deterioration of speech quality is suppressed by using CRC and parameter estimation for error protection. Two types of codec are proposed: a 10-ms frame type that transmits 160 bits every 10-ms and a 15-ms frame type that transmits 160 bits every 15 ms. The computational complexity of these codecs is less than 5 MOPS. In a nochannel error environment, the speech quality is equal to that of ITU-TG.726 at 32.0 kbit/s. With 0.3% channel error, both codecs offer more comfortable conversation than G.726. Moreover, at 1.0% channel error, the 10-ms frame type still provides comfortable conversation.

Author Search Result

[Author] Jotaro IKEDO(3hit)

Voice Activity Detection Using Neural Network

Measuring the Perceived Importance of Speech Segments for Transmission over IP Networks Open Access

A Low Complexity Speech Codec and Its Error Protection

Latest Issue

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles