1-11hit |
Hironobu TAKAHASHI Yoiti SUZUKI Shouichi TAKANE Futoshi ASANO
A new method for active suppression of reflected sound waves is proposed in this paper. The proposed control system is based on the state feedback control. FEM (Finite Element Method) was applied to represent the sound field under the system equations as proposed by Samejima et al. A new performance index was derived so as to minimize the sound intensity leaving a control region, which was set around the control source on a wall. On the basis of the system equations and the new performance index, an optimal feedback law governing suppression of waves reflected from the wall was derived. In order to evaluate the validity of the proposed method, computer simulations in one- and two-dimensional sound fields were executed. In a one-dimensional sound field, the time response was examined, and the distribution of the instantaneous sound intensity was evaluated in a two-dimensional sound field. The results showed that the reflected sound waves can be suppressed quite well in one-dimensional sound fields by using this method and that the proposed method can potentially suppress the reflected sound waves in the two-dimensional sound fields as well.
Futoshi ASANO Hideki ASOH Toshihiro MATSUI
As a preprocessor of the automatic speech recognizer in a noisy environment, a microphone array system has been investigated to reduce the environmental noise. In usual microphone array design, a plane wave is assumed for the sake of simplicity (far-field assumption). However, this far-field assumption does not always hold, resulting in distortion in the array output. In this report, the subspace method, which is one of the high resolution spectrum estimator, is applied to the near-field source localization problem. A high resolution method is necessary especially for the near-field source localization with a small-sized array. By combining the source localization technique with a spatial inverse filter, the signal coming from the multiple sources in the near-field range can be separated. The modified minimum variance beamformer is used to design the spatial inverse filter. As a result of the experiment in a real environment with two sound sources in the near-field range, 60-70% of word recognition rate was achieved.
Yoiti SUZUKI Shinji TSUKUI Futoshi ASANO Ryouichi NISHIMURA Toshio SONE
A new method of designing a microphone array with two outputs preserving binaural information is proposed in this paper. This system employs adaptive beamforming using multiple constraints. The binaural cues may be preserved in the two outputs by use of these multiple constraints with simultaneous beamforming to enhance target signals is also available. A computer simulation was conducted to examine the performance of the beamforming. The results showed that the proposed array can perform both the generation of the binaural cues and the beamforming as intended. In particular, beamforming with double-constraints exhibits the best performance; DI is around 7 dB and good interchannel (interaural) time/phase and level differences are generated within a target region in front. With triple-constraints, however, the performance of the beamforming becomes poorer while the binaural information is better realized. Setting of the desired responses to give proper binaural information seems to become critical as the number of the constraints increases.
In this paper, signal processing techniques which can be applied to automatic speech recognition to improve its robustness are reviewed. The choice of signal processing techniques is strongly dependent on the scenario of the applications. The analysis of scenario and the choice of suitable signal processing techniques are shown through two examples.
Ryouichi NISHIMURA Futoshi ASANO Yoiti SUZUKI Toshio SONE
A new speech enhancement technique is proposed assuming that a speech signal is represented in terms of a linear probabilistic process and that a noise signal is represented in terms of a stationary random process. Since the target signal, i.e., speech, cannot be represented by a stationary random process, a Wiener filter does not yield an optimum solution to this problem regarding the minimum mean variance. Instead, a Kalman filter may provide a suitable solution in this case. In the Kalman filter, a signal is represented as a sequence of varying state vectors, and the transition is dominated by transition matrices. Our proposal is to construct the state vectors as well as the transition matrices based on time-frequency pattern of signals calculated by a wavelet transformation (WT). Computer simulations verify that the proposed technique has a high potential to suppress noise signals.
Futoshi ASANO Yoiti SUZUKI Toshio SONE
The convergence characteristics of the adaptive beamformer with the RLS algorithm are analyzed in this paper. In case of the RLS adaptive beamformer, the convergence characteristics are significantly affected by the spatial characteristics of the signals/noises in the environment. The purpose of this paper is to show how these physical parameters affect the convergence characteristics. In this paper, a typical environment where a few directional noises are accompanied by background noise is assumed, and the influence of each component of the environment is analyzed separately using rank analysis of the correlation matrix. For directional components, the convergence speed is faster for a smaller number of noise sources since the effective rank of the input correlation matrix is reduced. In the presence of background noise, the convergence speed is slowed down due to the increase of the effective rank. However, the convergence speed can be improved by controlling the initial matrix of the RLS algorithm. The latter section of this paper focuses on the physical interpretation of this initial matrix, in an attempt to elucidate the mechanism of the convergence characterisitics.
Kazutaka ABE Futoshi ASANO Yoiti SUZUKI Toshio SONE
In the conventional sound field reproduction system with control of the transfer functions from the source to both ears of a listener, a slight shift of the ears caused by movement of the listener inevitably results in sound localization being different from that expected. In this paper, a method for reproducing a sound field by controlling the transfer function from the source to multiple points (called the "method of multiple-points control" hereafter) is applied to a sound reproduction system with the aim of expanding the area which can be controlled. The system is controlled so that the transfer functions from the input of the system to the multiple points adjacent to the original receiving points have the same desired transfer function. By placing the control points at appropriate intervals, a "zone of equalization" is formed. Based on a computer simulation, the intervals between control points is discussed. The configuration of the loundspeakers for sound reproduction is also discussed.
Nobuhiko KITAWAKI Takeshi YAMADA Futoshi ASANO
Appropriate test signals defined by formula or generated by algorithm are used for measuring objective QoS (Quality of Services) for voice operated telecommunication devices such as telephone and speech codec (coder-decoder). However, that for measuring residual echo characteristics in hands-free telecommunications equipped with acoustic echo canceller is under study in ITU-T Recommendation G.167. This paper describes comparative assessment of test signals for measurement of residual echo characteristics. In hands-free telecommunications, acoustical echo canceller has been developed to remove a room echo signal through the loudspeaker to the microphone in the receiving end. Performance of the echo canceller system is evaluated by residual echo characteristics expressed in echo return loss enhancement (ERLE). The ERLE can be conventionally measured by putting white noise into the echo canceller system. However, white noise is not adequate as the test signal for measuring the performance of the echo canceller, since the performance may depend on the characteristics of input test signal, and the characteristics of the white noise differ from those of real voice. Therefore, this paper discusses appropriate characteristics of real voice required for objective quality evaluation of echo canceller system. The test signals used for this verification tests were real voice (RV), white noise (WN), frequency weighted noise (FWN), artificial voice (AV), and composite source signal (CSS) depending on the approximation of real voice characteristics. As the comparative assessment results, the ERLE characteristics measured by artificial voice conforming to ITU-T Recommendation P.50 having average characteristics of real voices in time and frequency domains are almost equivalent to those of real voice and best among those test signals. It is concluded that artificial voice P.50 is satisfied with measurement of residual echo characteristics.
Hack-Yoon KIM Futoshi ASANO Yoiti SUZUKI Toshio SONE
In this paper, a new spectral subtraction technique with two microphone inputs is proposed. In conventional spectral subtraction using a single microphone, the averaged noise spectrum is subtracted from the observed short-time input spectrum. This results in reduction of mean value of noise spectrum only, the component varying around the mean value remaining intact. In the method proposed in this paper, the short-time noise spectrum excluding the speech component is estimated by introducing the blocking matrix used in the Griffiths-Jim-type adaptive beamformer with two microphone inputs, combined with the spectral compensation technique. By subtracting the estimated short-time noise spectrum from the input spectrum, not only the mean value of the noise spectrum but also the component varying around the mean value can be reduced. This method can be interpreted as a partial construction of the adaptive beamformer where only the amplitude of the short-time noise spectrum is estimated, while the adaptive beamformer is equivalent to the estimator of the complex short-time noise spectrum. By limiting the estimation to the amplitude spectrum, the proposed system achieves better performance than the adaptive beamformer in the case when the number of sound sources exceeds the number of microphones.
A method for recovering the LPC spectrum from a microphone array input signal corrupted by less directional ambient noise is proposed. This method is based on the subspace method, in which directional signal and non-directional noise is classified in the subspace domain using eigenvalue analysis of the spatial correlation matrix. In this paper, the coherent subspace (CSS) method, a broadband extension of the subspace method, is employed. The advantage of this method is that is requires a much smaller number of averages in the time domain for estimating subspace, suitable feature for frame processing such as speech recognition. To enhance the performance of noise reduction, elimination of noise-dominant subspace using projection is further proposed, which is effective when the SNR is low and classification of noise and signals using eigenvalue analysis is difficult.
Kiyoshi YAMAMOTO Futoshi ASANO Takeshi YAMADA Nobuhiko KITAWAKI
In this paper, a method of detecting overlapping speech segments in meetings is proposed. It is known that the eigenvalue distribution of the spatial correlation matrix calculated from a multiple microphone input reflects information on the number and relative power of sound sources. However, in a reverberant sound field, the feature of the number of sources in the eigenvalue distribution is degraded by the room reverberation. In the Support Vector Machines approach, the eigenvalue distribution is classified into two classes (overlapping speech segments and single speech segments). In the Support Vector Regression approach, the relative power of sound sources is estimated by using the eigenvalue distribution, and overlapping speech segments are detected based on the estimated relative power. The salient feature of this approach is that the sensitivity of detecting overlapping speech segments can be controlled simply by changing the threshold value of the relative power. The proposed method was evaluated using recorded data of an actual meeting.