1-3hit |
Yoshifumi CHISAKI Toshimichi TAKADA Masahiro NAGANISHI Tsuyoshi USAGAWA
The frequency domain binaural model (FDBM) has been previously proposed to localize multiple sound sources. Since the method requires only two input signals and uses interaural phase and level differences caused by the diffraction generated by the head, flexibility in application is very high when the head is considered as an object. When an object is symmetric with respect to the two microphones, the performance of sound source localization is degraded, as a human being has front-back confusion due to the symmetry in a median plane. This paper proposes to reduce the degradation of performance on sound source localization by a combination of the microphone pair outputs using the FDBM. The proposed method is evaluated by applying to a security camera system, and the results showed performance improvement in sound source localization because of reducing the number of cones of confusion.
Yoshifumi CHISAKI Ryouji KAWANO Tsuyoshi USAGAWA
A binaural hearing assistance system based on the frequency domain binaural model has been previously proposed. The system can enhance a signal coming from a specific direction. Since the system utilizes a binaural signal, an inter-channel communication between left and right subsystems is required. The bit rate reduction in inter-channel communication is essential for the detachment of the headset from the processing system. In this paper, the performance of a system which uses a differential pulse code modulation codec is examined and the relationship between the bit rate and sound quality is discussed.
Hidetoshi NAKASHIMA Yoshifumi CHISAKI Tsuyoshi USAGAWA Masanao EBATA
This paper addresses the single channel speech enhancement method which utilizes the mean value and variance of the logarithmic noise power spectra. An important issue for single channel speech enhancement algorithm is to determine the trade-off point for the spectral distortion and residual noise. Thus the accurate discrimination between speech spectral and noise components is required. The conventional methods determine the trade-off point using parameters obtained experimentally. As a result spectral discrimination is not adequate. And the enhanced speech is deteriorated by spectral distortion or residual noise. Therefore, a criteria to determine the point is necessary. The proposed method determines the trade-off point of spectral distortion and residual noise level by discrimination between speech spectral and noise components based on statistical criteria. The spectral discrimination is performed using hypothesis testing that utilizes means and variances of the logarithmic power spectra. The discriminated spectral components are divided into speech-dominant spectral components and noise-dominant ones. For the speech-dominant ones, spectral subtraction is performed to minimize the spectral distortion. For the noise-dominant ones, attenuation is performed to reduce the noise level. The performance of the method is confirmed in terms of waveform, spectrogram, noise reduction level and speech recognition task. As a result, the noise reduction level and speech recognition rate are improved so that the method reduces the musical noise effectively and improves the enhanced speech quality.