Keyword Search Result

[Keyword] speech coding(44hit)

1-20hit(44hit)

  • Reversible Audio Data Hiding Based on Variable Error-Expansion of Linear Prediction for Segmental Audio and G.711 Speech

    Akira NISHIMURA  

     
    PAPER

      Pubricized:
    2015/10/21
      Vol:
    E99-D No:1
      Page(s):
    83-91

    Reversible data hiding is a technique in which hidden data are embedded in host data such that the consistency of the host is perfectly preserved and its data are restored during extraction of the hidden data. In this paper, a linear prediction technique for reversible data hiding of audio waveforms is improved. The proposed variable expansion method is able to control the payload size through varying the expansion factor. The proposed technique is combined with the prediction error expansion method. Reversible embedding, perfect payload detection, and perfect recovery of the host signal are achieved for a framed audio signal. A smaller expansion factor results in a smaller payload size and less degradation in the stego audio quality. Computer simulations reveal that embedding a random-bit payload of less than 0.4 bits per sample into CD-format music signals provide stego audio with acceptable objective quality. The method is also applied to G.711 µ-law-coded speech signals. Computer simulations reveal that embedding a random-bit payload of less than 0.1 bits per sample into speech signals provide stego speech with good objective quality.

  • Designing Algebraic Trellis Vector Code as an Efficient Excitation Codebook for ACELP Coder

    Sungjin KIM  Sangwon KANG  

     
    LETTER-Multimedia Systems for Communications

      Vol:
    E95-B No:11
      Page(s):
    3642-3645

    In this paper, a block-constrained trellis coded vector quantization (BC-TCVQ) algorithm is combined with an algebraic codebook to produce an algebraic trellis vector code (ATVC) to be used in ACELP coding. ATVC expands the set of allowed algebraic codebook pulse position, and the trellis branches are labeled with these subsets. The Viterbi algorithm is used to select the excitation codevector. A fast codebook search method using an efficient non-exhaustive search technique is also proposed to reduce the complexity of the ATVC search procedure while maintaining the quality of the reconstructed speech. The ATVC block code is used as the fixed codebook of AMR-NB (12.2 kbps), which reduces the computational complexity compared to the conventional algebraic codebook.

  • Information Hiding for G.711 Speech Based on Substitution of Least Significant Bits and Estimation of Tolerable Distortion

    Akinori ITO  Shun'ichiro ABE  Yoiti SUZUKI  

     
    PAPER-Speech and Hearing

      Vol:
    E93-A No:7
      Page(s):
    1279-1286

    In this paper, we propose a novel data hiding technique for G.711-coded speech based on the LSB substitution method. The novel feature of the proposed method is that a low-bitrate encoder, G.726 ADPCM, is used as a reference for deciding how many bits can be embedded in a sample. Experiments showed that the method outperformed the simple LSB substitution method and the selective embedding method proposed by Aoki. We achieved 4-kbit/s embedding with almost no subjective degradation of speech quality, and 10 kbit/s while maintaining good quality.

  • Complexity Scalability Design in the Internet Low Bit Rate Codec (iLBC) for Speech Coding

    Fu-Kun CHEN  Kuo-Bao KUO  

     
    PAPER-Speech and Hearing

      Vol:
    E93-D No:5
      Page(s):
    1238-1243

    Differing from the long-term prediction used in the modern speech codec, the standard of the internet low bit rate codec (iLBC) independently encodes the residual of the linear predictive coding (LPC) frame by frame. In this paper, a complexity scalability design is proposed for the coding of the dynamic codebook search in the iLBC speech codec. In addition, a trade-off between the computational complexity and the speech quality can be achieved by dynamically setting the parameter of the proposed approach. Simulation results show that the computational complexity can be effectively reduced with imperceptible degradation of the speech quality.

  • Low-Complexity Wideband LSF Quantization Using Algebraic Trellis VQ

    Abdellah KADDAI  Mohammed HALIMI  

     
    PAPER-Speech and Hearing

      Vol:
    E92-D No:12
      Page(s):
    2478-2486

    In this paper an algebraic trellis vector quantization (ATVQ) that introduces algebraic codebooks into trellis coded vector quantization (TCVQ) structure is presented. Low encoding complexity and minimum memory storage requirements are achieved using the proposed approach. It exploits advantages of both the TCVQ and the algebraic codebooks to know the delayed decision, the codebook widening, the low computational complexity and the no storage of codebook. This novel vector quantization scheme is used to encode the wideband speech line spectral frequencies (LSF) parameters. Experimental results on wideband speech have shown that ATVQ yields the same performance as the traditional split vector quantization (SVQ) and the TCVQ in terms of spectral distortion (SD). It can achieve a transparent quality at 47 bits/frame with a considerable reduction of memory storage and computation complexity when compared to SVQ and TCVQ.

  • An Iterative Joint Source-Channel (De-)Coding and (De-)Modulation Algorithm for G.729EV in Ultrashort Wave Communication

    Tan PENG  Xiangming XU  Huijuan CUI  Kun TANG  Wei MIAO  

     
    LETTER-Speech and Hearing

      Vol:
    E91-D No:12
      Page(s):
    2897-2901

    Improving the overall performance of reliable speech communication in ultrashort wave radios over very noisy channels is of great importance and practical use. An iterative joint source-channel (de-)coding and (de-)modulation (JSCCM) algorithm is proposed for ITU-T Rec.G.729EV by both exploiting the residual redundancy and passing soft information throughout the receiver while introducing a systematic global iteration process. Being fully compatible with existing transmitter structure, the proposed algorithm does not introduce additional bandwidth expansion and transmission delay. Simulations show substantial error correcting performance and synthesized speech quality improvement over conventional separate designed systems in delay and bandwidth constraint channels by using the JSCCM algorithm.

  • Realtime Joint Speech Coding and Transmission Algorithm for High Packet Loss Rate Wireless Channels

    Tan PENG  Huijuan CUI  Kun TANG  Wei MIAO  

     
    LETTER-Speech and Hearing

      Vol:
    E91-D No:12
      Page(s):
    2892-2896

    In digital speech communication over noisy high packet loss rate wireless channels, improving the overall performance of the realtime speech coding and transmission system is of great importance. A novel joint speech coding and transmission algorithm is proposed by fully exploiting the correlation between speech coding, channel coding and the transmission process. The proposed algorithm requires no algorithm delay and less bandwidth expansion while greatly enhancing the error correcting performance and the reconstructed speech quality compared with conventional algorithms. Simulations show that the residual error rate is reduced by 84.36% and the MOS (Mean Opinion Score) is improved over 38.86%.

  • Switching Search Method for Pulse Assignment in ITU-T G.729D

    Fu-Kun CHEN  Yu-Ruei TSAI  

     
    LETTER-Speech and Hearing

      Vol:
    E91-D No:10
      Page(s):
    2532-2535

    In this paper, the simplified search designs for the stochastic codebook of algebraic code excited linear prediction (ACELP) for ITU-T G.729D speech coder are proposed. By using two search rounds and limiting the search range, the computational complexity of the proposed approach is only 6.25% of the full search method recommended by G.729D. In addition, the computational complexity of proposed approach is only 59% of the global pulse replacement search method recommended by G.729.1. Simulation results show that the coded speech quality evaluated by using the standard subjective and objective quality measurements is with perceptually negligible degradation.

  • Designing Algebraic Trellis Code as a New Fixed Codebook Module for ACELP Coder

    Jakyong JUN  Sangwon KANG  Thomas R. FISCHER  

     
    LETTER-Multimedia Systems for Communications

      Vol:
    E91-B No:3
      Page(s):
    972-974

    In this paper, a block-constrained trellis coded quantization (BC-TCQ) algorithm is combined with an algebraic codebook to produce an algebraic trellis code (ATC) to be used in ACELP coding. In ATC, the set of allowed algebraic codebook pulse positions is expanded, and the expanded set is partitioned into subsets of pulse positions; the trellis branches are labeled with these subsets. The list Viterbi algorithm (LVA) is used to select the excitation codevector. The combination of an ATC codebook and LVA trellis search algorithm is denoted as an ATC-LVA block code. The ATC-LVA block code is used as the fixed codebook of the AMR-WB 8.85 kbps mode, reducing complexity compared to the conventional algebraic codebook.

  • A Statistical Approach to Error Compensation in Spectral Quantization

    Seung Ho CHOI  Hong Kook KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E90-D No:9
      Page(s):
    1460-1464

    In this paper, we propose a statistical approach to improve the performance of spectral quantization of speech coders. The proposed techniques compensate for the distortion in a decoded line spectrum pair (LSP) vector based on a statistical mapping function between a decoded LSP vector and its corresponding original LSP vector. We first develop two codebook-based probabilistic matching (CBPM) methods by investigating the distribution of LSP vectors. In addition, we propose an iterative procedure for the two CBPMs. Next, the proposed techniques are applied to the predictive vector quantizer (PVQ) used for the IS-641 speech coder. The experimental results show that the proposed techniques reduce average spectral distortion by around 0.064 dB and the percentage of outliers compared with the PVQ without any compensation, resulting in transparent quality of spectral quantization. Finally, the comparison of speech quality using the perceptual evaluation of speech quality (PESQ) measure is performed and it is shown that the IS-641 speech coder employing the proposed techniques has better decoded speech quality than the standard IS-641 speech coder.

  • A MFCC-Based CELP Speech Coder for Server-Based Speech Recognition in Network Environments

    Jae Sam YOON  Gil Ho LEE  Hong Kook KIM  

     
    PAPER-Speech/Audio Processing

      Vol:
    E90-A No:3
      Page(s):
    626-632

    Existing standard speech coders can provide high quality speech communication. However, they tend to degrade the performance of automatic speech recognition (ASR) systems that use the reconstructed speech. The main cause of the degradation is in that the linear predictive coefficients (LPCs), which are typical spectral envelope parameters in speech coding, are optimized to speech quality rather than to the performance of speech recognition. In this paper, we propose a speech coder using mel-frequency cepstral coefficients (MFCCs) instead of LPCs to improve the performance of a server-based speech recognition system in network environments. To develop the proposed speech coder with a low-bit rate, we first explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel errors. As a result, we propose an 8.7 kbps MFCC-based CELP coder. It is shown that the proposed speech coder has a comparable speech quality to 8 kbps G.729 and the ASR system using the proposed speech coder gives the relative word error rate reduction by 6.8% as compared to the ASR system using G.729 on a large vocabulary task (AURORA4).

  • Measuring the Perceived Importance of Speech Segments for Transmission over IP Networks Open Access

    Yusuke HIWASAKI  Toru MORINAGA  Jotaro IKEDO  Akitoshi KATAOKA  

     
    PAPER

      Vol:
    E89-B No:2
      Page(s):
    326-333

    This paper presents a way of using a linear regression model to produce a single-valued criterion that indicates the perceived importance of each block in a stream of speech blocks. This method is superior to the conventional approach, voice activity detection (VAD), in that it provides a dynamically changing priority value for speech segments with finer granularity. The approach can be used in conjunction with scalable speech coding techniques in the context of IP QoS services to achieve a flexible form of quality control for speech transmission. A simple linear regression model is used to estimate a mean opinion score (MOS) of the various cases of missing speech segments. The estimated MOS is a continuous value that can be mapped to priority levels with arbitrary granularity. Through subjective evaluation, we show the validity of the calculated priority values.

  • Multiband Vector Quantization Based on Inner Product for Wideband Speech Coding

    Joon-Hyuk CHANG  Sanjit K. MITRA  

     
    LETTER-Speech and Hearing

      Vol:
    E88-D No:11
      Page(s):
    2606-2608

    This paper describes a multiband vector quantization (VQ) technique based on inner product for wideband speech coding at 16 kb/s. Our approach consists of splitting the input speech into two separate bands and then applying an independent coding scheme for each band. A code excited linear prediction (CELP) coder is used in the lower band while a transform based coding strategy is applied in the higher band. The spectral components in the higher frequency band are represented by a set of modulated lapped transform (MLT) coefficients. The higher frequency band is divided into three subbands, and the MLT coefficients construct a vector for each subband. Specifically, for the VQ of these vectors, an inner product-based distance measure is proposed as a new strategy. The proposed 16 kb/s coder with the inner-product based distortion measure achieves better performance than the 48 kb/s ITU-T G.722 in subjective quality tests.

  • A Fast Encoding Technique for Vector Quantization of LSF Parameters

    Sangwon KANG  Yongwon SHIN  Changyong SON  Thomas R. FISCHER  

     
    PAPER-Multimedia Systems for Communications" Multimedia Systems for Communications

      Vol:
    E88-B No:9
      Page(s):
    3750-3755

    A fast encoding technique is described for vector quantization (VQ) of line spectral frequency parameters. A reduction in VQ encoding complexity is achieved by using a preliminary test that reduces the necessary codebook search range. The test is performed based on two criteria. One criterion uses the distance between a specific single element of the input vector and the corresponding element of the codevectors in the codebook. The other criterion makes use of the ordering property of LSF parameters. The fast encoding technique is implemented in the enhanced variable rate codec (EVRC) encoding algorithm. Simulation results show that the average searching range of the codebook can be reduced by 44.50% for the EVRC without degradation of spectral distortion (SD).

  • Harmonic Model Based Excitation Enhancement for Low-Bit-Rate Speech Coding

    Hong Kook KIM  Mi Suk LEE  Chul Hong KWON  

     
    LETTER-Speech and Hearing

      Vol:
    E87-D No:7
      Page(s):
    1974-1977

    A new excitation enhancement technique based on a harmonic model is proposed in this paper to improve the speech quality of low-bit-rate speech coders. This technique is employed only in the decoding process of speech coders and improves high-frequency components of excitation. We develop the procedure of harmonic model parameters estimation and harmonic generation and apply the technique to a current state-of-art low bit rate speech coder. Experiments on spectrum reading and spectrum distortion measurement show that the proposed excitation enhancement technique improves speech quality.

  • Design of a Robust LSP Quantizer for a High-Quality 4-kbit/s CELP Speech Coder

    Yusuke HIWASAKI  Kazunori MANO  Kazutoshi YASUNAGA  Toshiyuki MORII  Hiroyuki EHARA  Takao KANEKO  

     
    PAPER-Speech and Hearing

      Vol:
    E87-D No:6
      Page(s):
    1496-1506

    This paper presents an efficient LSP quantizer implementation for low bit-rate coders. The major feature of the quantizer is that it uses a truncated cepstral distance criterion for the code selection procedure. This approach has generally been considered too computationally costly. We utilized the quantizer with a moving-average predictor, two-stage-split vector quantizer and delayed decision. We have investigated the optimal parameter settings in this case and incorporated the quantizer thus obtained into an ITU-T 4-kbit/s speech coding candidate algorithm with a bit budget of 21 bits. The objective performance is better than that with a conventional weighted mean-square criterion, while the complexity is still kept to a reasonable level. The paper also describes the codebook design and techniques that were employed to achieve robustness in noisy channel conditions.

  • Efficient Coding Translation of GSM and G.729 Speech Coders across Mobile and IP Networks

    Shu-Min TSAI  Jia-Ching WANG  Jar-Ferr YANG  Jhing-Fa WANG  

     
    PAPER-Speech and Hearing

      Vol:
    E87-D No:2
      Page(s):
    444-452

    In this paper, we propose a speech coding translation scheme by transferring coding parameters between GSM half rate and G.729 coders. Compared to the conventional decode-then-encode (DTE) scheme, the proposed parameter conversions provide speech interoperability between mobile and IP networks with reducing computational complexity and coding delay. Simulation results show that the proposed methods can reduce about 30% computational load and coding delay acquired in the target encoders and achieve almost imperceptible degradation in performance.

  • A Packet Loss Recovery Method Using Packets Arrived behind the Playout Time for CELP Decoding

    Masahiro SERIZAWA  Hironori ITO  

     
    PAPER-Speech and Hearing

      Vol:
    E86-D No:12
      Page(s):
    2775-2779

    This paper proposes a packet loss recovery method using packets arrived behind the playout time for CELP (Code Excited Liner Prediction) decoding. The proposed method recovers synchronization of the filter states between encoding and decoding in the period following packet loss. The recovery is performed by replacing the degraded filter states with the ones calculated from the late arrival packet in decoding. When the proposed method is applied to the AMR (Adaptive Multi-Rate) speech decoder, it improves the segmental SNR (Signal-to-Noise Ratio) by 0.2 to 1.8 dB at packet loss rates of 2 to 10 % in case that all the packet losses occur due to their late arrival. PESQ (Perceptual Evaluation of Speech Quality) results also show that the proposed method slightly improves the speech quality. The subjective test results show that five-grade mean opinion scores are improved by 0.35 and 0.28 at a packet loss rate of 5 % at speech coding bitrates of 7.95 and 12.2 kbit/s, respectively.

  • Modified Restricted Temporal Decomposition and Its Application to Low Rate Speech Coding

    Phu Chien NGUYEN  Takao OCHI  Masato AKAGI  

     
    PAPER-Speech and Audio Coding

      Vol:
    E86-D No:3
      Page(s):
    397-405

    This paper presents a method of temporal decomposition (TD) for line spectral frequency (LSF) parameters, called "Modified Restricted Temporal Decomposition" (MRTD), and its application to low rate speech coding. The LSF parameters have not been used for TD due to the stability problems in the linear predictive coding (LPC) model. To overcome this deficiency, a refinement process is applied to the event vectors in the proposed TD method to preserve their LSF ordering property. Meanwhile, the restricted second order TD model, where only two adjacent event functions can overlap and all event functions at any time sum up to one, is utilized to reduce the computational cost of TD. In addition, based on the geometric interpretation of TD the MRTD method enforces a new property on the event functions, named the "well-shapedness" property, to model the temporal structure of speech more effectively. This paper also proposes a method for speech coding at rates around 1.2 kbps based on STRAIGHT, a high quality speech analysis-synthesis method, using MRTD. In this speech coding method, MRTD based vector quantization is used for encoding spectral information of speech. Subjective test results indicate that the speech quality of the proposed speech coding method is close to that of the 4.8 kbps FS-1016 CELP coder.

  • A Silence Compression Algorithm for the Multi-Rate Dual-Bandwidth MPEG-4 CELP Standard

    Masahiro SERIZAWA  Hironori ITO  Toshiyuki NOMURA  

     
    PAPER-Speech and Audio Coding

      Vol:
    E86-D No:3
      Page(s):
    412-417

    This paper proposes a silence compression algorithm operating at multi-rates (MR) and with dual-bandwidths (DB), a narrowband and a wideband, for the MPEG (Moving Picture Experts Group)-4 CELP (Code Excited Linear Prediction) standard. The MR/DB operations are implemented by a Variable-Frame-size/Dual-Bandwidth Voice Activity Detection (VF/DB-VAD) module with bandwidth conversions of the input signal, and a Variable-Frame-size Comfort Noise Generator (VF-CNG) module. The CNG module adaptively smoothes the Root Mean Square (RMS) value of the input signal to improve the coding quality during transition periods. The algorithm also employs a Dual-Rate Discontinuous Transmission (DR-DTX) module to reduce an average transmission bitrate during silence periods. Subjective test results show that the proposed silence compression algorithm gives no degradation in coding quality for clean and noisy speech signals. These signals include about 20 to 30% non-speech frames and the average transmission bitrates are reduced by 20 to 40%. The proposed algorithm has been adopted as a part of the ISO/IEC MPEG-4 CELP version 2 standard.

1-20hit(44hit)

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.