1-4hit |
Kenta NIWA Yusuke HIOKA Sumitaka SAKAUCHI Ken'ichi FURUYA Yoichi HANEDA
A method to estimate sound source orientation in a reverberant room using a microphone array is proposed. We extend the conventional modeling of a room transfer function based on the image method in order to take into account the directivity of a sound source. With this extension, a transfer function between a sound source and a listener (or a microphone) is described by the superposition of transfer functions from each image source to the listener multiplied by the source directivity; thus, the sound source orientation can be estimated by analyzing how the image sources are distributed (power distribution of image sources) from observed signals. We applied eigenvalue analysis to the spatial correlation matrix of the microphone array observation to obtain the power distribution of image sources. Bsed on the assumption that the spatial correlation matrix for each set of source position and orientation is known a priori, the variation of the eigenspace can be modeled. By comparing the eigenspace of observed signals and that of pre-learned models, we estimated the sound source orientation. Through experiments using seven microphones, the sound source orientation was estimated with high accuracy by increasing the reverberation time of a room.
Kenta NIWA Takanori NISHINO Kazuya TAKEDA
A sound field reproduction method is proposed that uses blind source separation and a head-related transfer function. In the proposed system, multichannel acoustic signals captured at distant microphones are decomposed to a set of location/signal pairs of virtual sound sources based on frequency-domain independent component analysis. After estimating the locations and the signals of the virtual sources by convolving the controlled acoustic transfer functions with each signal, the spatial sound is constructed at the selected point. In experiments, a sound field made by six sound sources is captured using 48 distant microphones and decomposed into sets of virtual sound sources. Since subjective evaluation shows no significant difference between natural and reconstructed sound when six virtual sources and are used, the effectiveness of the decomposing algorithm as well as the virtual source representation are confirmed.
Kento OHTANI Kenta NIWA Kazuya TAKEDA
A single-dimensional interface which enables users to obtain diverse localizations of audio sources is proposed. In many conventional interfaces for arranging audio sources, there are multiple arrangement parameters, some of which allow users to control positions of audio sources. However, it is difficult for users who are unfamiliar with these systems to optimize the arrangement parameters since the number of possible settings is huge. We propose a simple, single-dimensional interface for adjusting arrangement parameters, allowing users to sample several diverse audio source arrangements and easily find their preferred auditory localizations. To select subsets of arrangement parameters from all of the possible choices, auditory-localization space vectors (ASVs) are defined to represent the auditory localization of each arrangement parameter. By selecting subsets of ASVs which are approximately orthogonal, we can choose arrangement parameters which will produce diverse auditory localizations. Experimental evaluations were conducted using music composed of three audio sources. Subjective evaluations confirmed that novice users can obtain diverse localizations using the proposed interface.
Tomoko KAWASE Kenta NIWA Masakiyo FUJIMOTO Kazunori KOBAYASHI Shoko ARAKI Tomohiro NAKATANI
We propose a microphone array speech enhancement method that integrates spatial-cue-based source power spectral density (PSD) estimation and statistical speech model-based PSD estimation. The goal of this research was to clearly pick up target speech even in noisy environments such as crowded places, factories, and cars running at high speed. Beamforming with post-Wiener filtering is commonly used in many conventional studies on microphone-array noise reduction. For calculating a Wiener filter, speech/noise PSDs are essential, and they are estimated using spatial cues obtained from microphone observations. Assuming that the sound sources are sparse in the temporal-spatial domain, speech/noise PSDs may be estimated accurately. However, PSD estimation errors increase under circumstances beyond this assumption. In this study, we integrated speech models and PSD-estimation-in-beamspace method to correct speech/noise PSD estimation errors. The roughly estimated noise PSD was obtained frame-by-frame by analyzing spatial cues from array observations. By combining noise PSD with the statistical model of clean-speech, the relationships between the PSD of the observed signal and that of the target speech, hereafter called the observation model, could be described without pre-training. By exploiting Bayes' theorem, a Wiener filter is statistically generated from observation models. Experiments conducted to evaluate the proposed method showed that the signal-to-noise ratio and naturalness of the output speech signal were significantly better than that with conventional methods.