Keyword Search Result

[Keyword] Microphone array(57hit)

1-20hit(57hit)

  • CNN-Based Feature Integration Network for Speech Enhancement in Microphone Arrays Open Access

    Ji XI  Pengxu JIANG  Yue XIE  Wei JIANG  Hao DING  

     
    LETTER-Speech and Hearing

      Pubricized:
    2024/08/26
      Vol:
    E107-D No:12
      Page(s):
    1546-1549

    The relevant model based on convolutional neural networks (CNNs) has been proven to be an effective solution in speech enhancement algorithms. However, there needs to be more research on CNNs based on microphone arrays, especially in exploring the correlation between networks associated with different microphones. In this paper, we proposed a CNN-based feature integration network for speech enhancement in microphone arrays. The input of CNN is composed of short-time Fourier transform (STFT) from different microphones. CNN includes the encoding layer, decoding layer, and skip structure. In addition, the designed feature integration layer enables information exchange between different microphones, and the designed feature fusion layer integrates additional information. The experiment proved the superiority of the designed structure.

  • Design and Analysis of First-Order Steerable Nonorthogonal Differential Microphone Arrays

    Qiang YU  Xiaoguang WU  Yaping BAO  

     
    LETTER-Engineering Acoustics

      Vol:
    E101-A No:10
      Page(s):
    1687-1692

    Differential microphone arrays have been widely used in hands-free communication systems because of their frequency-invariant beampatterns, high directivity factors and small apertures. Considering the position of acoustic source always moving within a certain range in real application, this letter proposes an approach to construct the steerable first-order differential beampattern by using four omnidirectional microphones arranged in a non-orthogonal circular geometry. The theoretical analysis and simulation results show beampattern constructed via this method achieves the same direction factor (DF) as traditional DMAs and higher white noise gain (WNG) within a certain angular range. The simulation results also show the proposed method applies to processing speech signal. In experiments, we show the effectiveness and small computation amount of the proposed method.

  • Integration of Spatial Cue-Based Noise Reduction and Speech Model-Based Source Restoration for Real Time Speech Enhancement

    Tomoko KAWASE  Kenta NIWA  Masakiyo FUJIMOTO  Kazunori KOBAYASHI  Shoko ARAKI  Tomohiro NAKATANI  

     
    PAPER-Digital Signal Processing

      Vol:
    E100-A No:5
      Page(s):
    1127-1136

    We propose a microphone array speech enhancement method that integrates spatial-cue-based source power spectral density (PSD) estimation and statistical speech model-based PSD estimation. The goal of this research was to clearly pick up target speech even in noisy environments such as crowded places, factories, and cars running at high speed. Beamforming with post-Wiener filtering is commonly used in many conventional studies on microphone-array noise reduction. For calculating a Wiener filter, speech/noise PSDs are essential, and they are estimated using spatial cues obtained from microphone observations. Assuming that the sound sources are sparse in the temporal-spatial domain, speech/noise PSDs may be estimated accurately. However, PSD estimation errors increase under circumstances beyond this assumption. In this study, we integrated speech models and PSD-estimation-in-beamspace method to correct speech/noise PSD estimation errors. The roughly estimated noise PSD was obtained frame-by-frame by analyzing spatial cues from array observations. By combining noise PSD with the statistical model of clean-speech, the relationships between the PSD of the observed signal and that of the target speech, hereafter called the observation model, could be described without pre-training. By exploiting Bayes' theorem, a Wiener filter is statistically generated from observation models. Experiments conducted to evaluate the proposed method showed that the signal-to-noise ratio and naturalness of the output speech signal were significantly better than that with conventional methods.

  • An Extension of MUSIC Exploiting Higher-Order Moments via Nonlinear Mapping

    Yuya SUGIMOTO  Shigeki MIYABE  Takeshi YAMADA  Shoji MAKINO  Biing-Hwang JUANG  

     
    PAPER-Engineering Acoustics

      Vol:
    E99-A No:6
      Page(s):
    1152-1162

    MUltiple SIgnal Classification (MUSIC) is a standard technique for direction of arrival (DOA) estimation with high resolution. However, MUSIC cannot estimate DOAs accurately in the case of underdetermined conditions, where the number of sources exceeds the number of microphones. To overcome this drawback, an extension of MUSIC using cumulants called 2q-MUSIC has been proposed, but this method greatly suffers from the variance of the statistics, given as the temporal mean of the observation process, and requires long observation. In this paper, we propose a new approach for extending MUSIC that exploits higher-order moments of the signal for the underdetermined DOA estimation with smaller variance. We propose an estimation algorithm that nonlinearly maps the observed signal onto a space with expanded dimensionality and conducts MUSIC-based correlation analysis in the expanded space. Since the dimensionality of the noise subspace is increased by the mapping, the proposed method enables the estimation of DOAs in the case of underdetermined conditions. Furthermore, we describe the class of mapping that allows us to analyze the higher-order moments of the observed signal in the original space. We compare 2q-MUSIC and the proposed method through an experiment assuming that the true number of sources is known as prior information to evaluate in terms of the bias-variance tradeoff of the statistics and computational complexity. The results clarify that the proposed method has advantages for both computational complexity and estimation accuracy in short-time analysis, i.e., the time duration of the analyzed data is short.

  • Integration of Multiple Microphone Arrays and Use of Sound Reflections for 3D Localization of Sound Sources

    Carlos T. ISHI  Jani EVEN  Norihiro HAGITA  

     
    PAPER

      Vol:
    E97-A No:9
      Page(s):
    1867-1874

    We proposed a method for estimating sound source positions in 3D space by integrating sound directions estimated by multiple microphone arrays and taking advantage of reflection information. Two types of sources with different directivity properties (human speech and loudspeaker speech) were evaluated for different positions and orientations. Experimental results showed the effectiveness of using reflection information, depending on the position and orientation of the sound sources relative to the array, walls, and the source type. The use of reflection information increased the source position detection rates by 10% on average and up to 60% for the best case.

  • Compressed Sampling and Source Localization of Miniature Microphone Array

    Qingyun WANG  Xinchun JI  Ruiyu LIANG  Li ZHAO  

     
    LETTER

      Vol:
    E97-A No:9
      Page(s):
    1902-1906

    In the traditional microphone array signal processing, the performance degrades rapidly when the array aperture decreases, which has been a barrier restricting its implementation in the small-scale acoustic system such as digital hearing aids. In this work a new compressed sampling method of miniature microphone array is proposed, which compresses information in the internal of ADC by means of mixture system of hardware circuit and software program in order to remove the redundancy of the different array element signals. The architecture of the method is developed using the Verilog language and has already been tested in the FPGA chip. Experiments of compressed sampling and reconstruction show the successful sparseness and reconstruction for speech sources. Owing to having avoided singularity problem of the correlation matrix of the miniature microphone array, when used in the direction of arrival (DOA) estimation in digital hearing aids, the proposed method has the advantage of higher resolution compared with the traditional GCC and MUSIC algorithms.

  • 3D Sound-Space Sensing Method Based on Numerous Symmetrically Arranged Microphones

    Shuichi SAKAMOTO  Satoshi HONGO  Yôiti SUZUKI  

     
    PAPER

      Vol:
    E97-A No:9
      Page(s):
    1893-1901

    Sensing and reproduction of precise sound-space information is important to realize highly realistic audio communications. This study was conducted to realize high-precision sensors of 3D sound-space information for transmission to distant places and for preservation of sound data for the future. Proposed method comprises a compact and spherical object with numerous microphones. Each recorded signal from multiple microphones that are uniformly distributed on the sphere is simply weighted and summed to synthesize signals to be presented to a listener's left and right ears. The calculated signals are presented binaurally via ordinary binaural systems such as headphones. Moreover, the weight can be changed according to a human's 3D head movement. A human's 3D head movement is well known to be a crucially important factor to facilitate human spatial hearing. For accurate spatial hearing, 3D sound-space information is acquired as accurately reflecting the listener's head movement. We named the proposed method SENZI (Symmetrical object with ENchased ZIllion microphones). The results of computer simulations demonstrate that our proposed SENZI outperforms a conventional method (binaural Ambisonics). It can sense 3D sound-space with high precision over a wide frequency range.

  • Sound Source Orientation Estimation Based on an Orientation-Extended Beamformer

    Hirofumi NAKAJIMA  Keiko KIKUCHI  Kazuhiro NAKADAI  Yutaka KANEDA  

     
    PAPER

      Vol:
    E97-A No:9
      Page(s):
    1875-1883

    This paper proposes a sound source orientation estimation method that is suitable for a distributed microphone arrangement. The proposed method is based on orientation-extended beamforming (OEBF), which has four features: (a) robustness against reverberations, (b) robustness against noises, (c) free arrangements of microphones and (d) feasibility for real-time processing. In terms of (a) and (c), since OEBF is based on a general propagation model using transfer functions (TFs) that include all propagation phenomena such as reflections and diffractions, OEBF causes no model errors for the propagation phenomena, and is applicable to arbitrary microphone arrangements. Regarding (b), OEBF overcomes noise effects by incorporating three additional processes (Amplitude extraction, time-frequency mask and histogram integration) that are also proposed in this paper. As for (d), OEBF is executable in real-time basis as the execution process is the same as usual beamforming processes. A numerical experiment was performed to confirm the theoretical validity of OEBF. The results showed that OEBF was able to estimate sound source positions and orientations very precisely. Practical experiments were carried out using a 96-channel microphone array in real environments. The results indicated that OEBF worked properly even under reverberant and noisy environments and the averaged estimation error was given only 4°.

  • Effective Frame Selection for Blind Source Separation Based on Frequency Domain Independent Component Analysis

    Yusuke MIZUNO  Kazunobu KONDO  Takanori NISHINO  Norihide KITAOKA  Kazuya TAKEDA  

     
    PAPER-Engineering Acoustics

      Vol:
    E97-A No:3
      Page(s):
    784-791

    Blind source separation is a technique that can separate sound sources without such information as source location, the number of sources, and the utterance content. Multi-channel source separation using many microphones separates signals with high accuracy, even if there are many sources. However, these methods have extremely high computational complexity, which must be reduced. In this paper, we propose a computational complexity reduction method for blind source separation based on frequency domain independent component analysis (FDICA) and examine temporal data that are effective for source separation. A frame with many sound sources is effective for FDICA source separation. We assume that a frame with a low kurtosis has many sound sources and preferentially select such frames. In our proposed method, we used the log power spectrum and the kurtosis of the magnitude distribution of the observed data as selection criteria and conducted source separation experiments using speech signals from twelve speakers. We evaluated the separation performances by the signal-to-interference ratio (SIR) improvement score. From our results, the SIR improvement score was 24.3dB when all the frames were used, and 23.3dB when the 300 frames selected by our criteria were used. These results clarified that our proposed selection criteria based on kurtosis and magnitude is effective. Furthermore, we significantly reduced the computational complexity because it is proportional to the number of selected frames.

  • An Estimation Method of Sound Source Orientation Using Eigenspace Variation of Spatial Correlation Matrix

    Kenta NIWA  Yusuke HIOKA  Sumitaka SAKAUCHI  Ken'ichi FURUYA  Yoichi HANEDA  

     
    PAPER-Engineering Acoustics

      Vol:
    E96-A No:9
      Page(s):
    1831-1839

    A method to estimate sound source orientation in a reverberant room using a microphone array is proposed. We extend the conventional modeling of a room transfer function based on the image method in order to take into account the directivity of a sound source. With this extension, a transfer function between a sound source and a listener (or a microphone) is described by the superposition of transfer functions from each image source to the listener multiplied by the source directivity; thus, the sound source orientation can be estimated by analyzing how the image sources are distributed (power distribution of image sources) from observed signals. We applied eigenvalue analysis to the spatial correlation matrix of the microphone array observation to obtain the power distribution of image sources. Bsed on the assumption that the spatial correlation matrix for each set of source position and orientation is known a priori, the variation of the eigenspace can be modeled. By comparing the eigenspace of observed signals and that of pre-learned models, we estimated the sound source orientation. Through experiments using seven microphones, the sound source orientation was estimated with high accuracy by increasing the reverberation time of a room.

  • Multichannel Two-Stage Beamforming with Unconstrained Beamformer and Distortion Reduction

    Masahito TOGAMI  Yohei KAWAGUCHI  Yasunari OBUCHI  

     
    PAPER-Engineering Acoustics

      Vol:
    E96-A No:4
      Page(s):
    749-761

    This paper proposes a novel multichannel speech enhancement technique for reverberant rooms that is effective when noise sources are spatially stationary, such as a projector fan noise, an air-conditioner noise, and unwanted speech sources at the back of microphones. Speech enhancement performance of the conventional multichannel Wiener filter (MWF) degrades when the Signal-to-Noise Ratio (SNR) of the current microphone input signal changes from the noise-only period. Furthermore, the MWF structure is computationally inefficient, because the MWF updates the whole spatial beamformer periodically to track switching of the speakers (e.g. turn-taking). In contrast to the MWF, the proposed method reduces noise independently of the SNR. The proposed method has a novel two-stage structure, which reduces noise and distortion of the desired source signal in a cascade manner by using two different beamformers. The first beamformer focuses on noise reduction without any constraint on the desired source, which is insensitive to SNR variation. However, the output signal after the first beamformer is distorted. The second beamformer focuses on distortion reduction of the desired source signal. Theoretically, complete elimination of distortion is assured. Additionally, the proposed method has a computationally efficient structure optimized for spatially stationary noise reduction problems. The first beamformer is updated only when the speech enhancement system is initialized. Only the second beamformer is updated periodically to track switching of the active speaker. The experimental results indicate that the proposed method can reduce spatially stationary noise source signals effectively with less distortion of the desired source signal even in a reverberant conference room.

  • Two-Microphone Noise Reduction Using Spatial Information-Based Spectral Amplitude Estimation

    Kai LI  Yanmeng GUO  Qiang FU  Junfeng LI  Yonghong YAN  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:5
      Page(s):
    1454-1464

    Traditional two-microphone noise reduction algorithms to deal with highly nonstationary directional noises generally use the direction of arrival or phase difference information. The performance of these algorithms deteriorate when diffuse noises coexist with nonstationary directional noises in realistic adverse environments. In this paper, we present a two-channel noise reduction algorithm using a spatial information-based speech estimator and a spatial-information-controlled soft-decision noise estimator to improve the noise reduction performance in realistic non-stationary noisy environments. A target presence probability estimator based on Bayes rules using both phase difference and magnitude squared coherence is proposed for soft-decision of noise estimation, so that they can share complementary advantages when both directional noises and diffuse noises are present. Performances of the proposed two-microphone noise reduction algorithm are evaluated by noise reduction, log-spectral distance (LSD) and word recognition rate (WRR) of a distant-talking ASR system in a real room's noisy environment. Experimental results show that the proposed algorithm achieves better noises suppression without further distorting the desired signal components over the comparative dual-channel noise reduction algorithms.

  • Blind Source Separation Using Dodecahedral Microphone Array under Reverberant Conditions

    Motoki OGASAWARA  Takanori NISHINO  Kazuya TAKEDA  

     
    PAPER-Engineering Acoustics

      Vol:
    E94-A No:3
      Page(s):
    897-906

    The separation and localization of sound source signals are important techniques for many applications, such as highly realistic communication and speech recognition systems. These systems are expected to work without such prior information as the number of sound sources and the environmental conditions. In this paper, we developed a dodecahedral microphone array and proposed a novel separation method with our developed device. This method refers to human sound localization cues and uses acoustical characteristics obtained by the shape of the dodecahedral microphone array. Moreover, this method includes an estimation method of the number of sound sources that can operate without prior information. The sound source separation performances were evaluated under simulated and actual reverberant conditions, and the results were compared with the conventional method. The experimental results showed that our separation performance outperformed the conventional method.

  • Improving Power Spectra Estimation in 2-Dimensional Areas Using Number of Active Sound Sources

    Yusuke HIOKA  Ken'ichi FURUYA  Yoichi HANEDA  Akitoshi KATAOKA  

     
    PAPER-Engineering Acoustics

      Vol:
    E94-A No:1
      Page(s):
    273-281

    An improvement of estimating sound power spectra located in a particular 2-dimensional area is proposed. We previously proposed a conventional method that estimates sound power spectra using multiple fixed beamformings in order to emphasize speech located in a particular 2-dimensional area. However, the method has one drawback that the number of areas where the active sound sources are located must be restricted. This restriction makes the method less effective when many noise source located in different areas are simultaneously active. In this paper, we reveal the cause of this restriction and determine the maximum number of areas for which the method is able to simultaneously estimate sound power spectra. Then we also introduce a procedure for investigating areas that include active sound sources to reduce the number of unknown power spectra to be estimated. The effectiveness of the proposed method is examined by experimental evaluation applied to sounds recorded in a practical environment.

  • Distant Speech Recognition Using a Microphone Array Network

    Alberto Yoshihiro NAKANO  Seiichi NAKAGAWA  Kazumasa YAMAMOTO  

     
    PAPER-Microphone Array

      Vol:
    E93-D No:9
      Page(s):
    2451-2462

    In this work, spatial information consisting of the position and orientation angle of an acoustic source is estimated by an artificial neural network (ANN). The estimated position of a speaker in an enclosed space is used to refine the estimated time delays for a delay-and-sum beamformer, thus enhancing the output signal. On the other hand, the orientation angle is used to restrict the lexicon used in the recognition phase, assuming that the speaker faces a particular direction while speaking. To compensate the effect of the transmission channel inside a short frame analysis window, a new cepstral mean normalization (CMN) method based on a Gaussian mixture model (GMM) is investigated and shows better performance than the conventional CMN for short utterances. The performance of the proposed method is evaluated through Japanese digit/command recognition experiments.

  • Speech Enhancement Using a Square Microphone Array in the Presence of Directional and Diffuse Noise

    Tetsuji OGAWA  Shintaro TAKADA  Kenzo AKAGIRI  Tetsunori KOBAYASHI  

     
    PAPER-Speech and Hearing

      Vol:
    E93-A No:5
      Page(s):
    926-935

    We propose a new speech enhancement method suitable for mobile devices used in the presence of various types of noise. In order to achieve high-performance speech recognition and auditory perception in mobile devices, various types of noise have to be removed under the constraints of a space-saving microphone arrangement and few computational resources. The proposed method can reduce both the directional noise and the diffuse noise under the abovementioned constraints for mobile devices by employing a square microphone array and conducting low-computational-cost processing that consists of multiple null beamforming, minimum power channel selection, and Wiener filtering. The effectiveness of the proposed method is experimentally verified in terms of speech recognition accuracy and speech quality when both the directional noise and the diffuse noise are observed simultaneously; this method reduces the number of word errors and improves the log-spectral distances as compared to conventional methods.

  • Probabilistic Adaptation Mode Control Algorithm for GSC-Based Noise Reduction

    Seungho HAN  Jungpyo HONG  Sangbae JEONG  Minsoo HAHN  

     
    LETTER-Speech and Hearing

      Vol:
    E93-A No:3
      Page(s):
    627-630

    An efficient noise reduction algorithm is proposed to improve speech recognition performance for human machine interfaces. In the algorithm, a probabilistic adaptation mode controller (AMC) is designed and adopted to the generalized sidelobe canceller (GSC). To detect target speech intervals, the proposed AMC calculates the inter-channel correlation and estimates speech absence probability (SAP). Based on the SAP, the adaptation mode of the adaptive filter in the GSC is decided. Experimental results show the proposed algorithm significantly improves speech recognition performances and signal-to-noise ratios in real noisy environments.

  • A Single-Chip Speech Dialogue Module and Its Evaluation on a Personal Robot, PaPeRo-Mini

    Miki SATO  Toru IWASAWA  Akihiko SUGIYAMA  Toshihiro NISHIZAWA  Yosuke TAKANO  

     
    PAPER-Digital Signal Processing

      Vol:
    E93-A No:1
      Page(s):
    261-271

    This paper presents a single-chip speech dialogue module and its evaluation on a personal robot. This module is implemented on an application processor that was developed primarily for mobile phones to provide a compact size, low power-consumption, and low cost. It performs speech recognition with preprocessing functions such as direction-of-arrival (DOA) estimation, noise cancellation, beamforming with an array of microphones, and echo cancellation. Text-to-speech (TTS) conversion is also equipped with. Evaluation results obtained on a new personal robot, PaPeRo-mini, which is a scale-down version of PaPeRo, demonstrate an 85% correct rate in DOA estimation, and as much as 54% and 30% higher speech recognition rates in noisy environments and during robot utterances, respectively. These results are shown to be comparable to those obtained by PaPeRo.

  • Robust Relative Transfer Function Estimation for Dual Microphone-Based Generalized Sidelobe Canceller

    Kihyeon KIM  Hanseok KO  

     
    LETTER-Speech and Hearing

      Vol:
    E92-D No:9
      Page(s):
    1794-1797

    In this Letter, a robust system identification method is proposed for the generalized sidelobe canceller using dual microphones. The conventional transfer-function generalized sidelobe canceller employs the non-stationarity characteristics of the speech signal to estimate the relative transfer function and thus is difficult to apply when the noise is also non-stationary. Under the assumption of W-disjoint orthogonality between the speech and the non-stationary noise, the proposed algorithm finds the speech-dominant time-frequency bins of the input signal by inspecting the system output and the inter-microphone time delay. Only these bins are used to estimate the relative transfer function, so reliable estimates can be obtained under non-stationary noise conditions. The experimental results show that the proposed algorithm significantly improves the performance of the transfer-function generalized sidelobe canceller, while only sustaining a modest estimation error in adverse non-stationary noise environments.

  • Multi-Input Feature Combination in the Cepstral Domain for Practical Speech Recognition Systems

    Yasunari OBUCHI  Nobuo HATAOKA  

     
    PAPER-Speech and Hearing

      Vol:
    E92-D No:4
      Page(s):
    662-670

    In this paper we describe a new framework of feature combination in the cepstral domain for multi-input robust speech recognition. The general framework of working in the cepstral domain has various advantages over working in the time or hypothesis domain. It is stable, easy to maintain, and less expensive because it does not require precise calibration. It is also easy to configure in a complex speech recognition system. However, it is not straightforward to improve the recognition performance by increasing the number of inputs, and we introduce the concept of variance re-scaling to compensate the negative effect of averaging several input features. Finally, we propose to take another advantage of working in the cepstral domain. The speech can be modeled using hidden Markov models, and the model can be used as prior knowledge. This approach is formulated as a new algorithm, referred to as Hypothesis-Based Feature Combination. The effectiveness of various algorithms are evaluated using two sets of speech databases. We also refer to automatic optimization of some parameters in the proposed algorithms.

1-20hit(57hit)

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.