DOA Estimation of Multiple Speech Sources from a Stereophonic Mixture in Underdetermined Case

Ning DING; Nozomu HAMADA

doi:10.1587/transfun.E95.A.735

DOA Estimation of Multiple Speech Sources from a Stereophonic Mixture in Underdetermined Case

Ning DING, Nozomu HAMADA

Full Text Views

0

Share
Cite this

Summary :

This paper proposes a direction-of-arrival (DOA) estimation method of multiple speech sources from a stereophonic mixture in an underdetermined case where the number of sources exceeds the number of sensors. The method relies on the sparseness of speech signals in time-frequency (T-F) domain representation which means multiple independent speakers have a small overlap. At first, a selection of T-F cells bearing reliable spatial information is proposed by an introduced reliability index which is defined by the estimated interaural phase difference at each T-F cell. Then, a statistical error propagation model between the phase difference at T-F cell and its consequent DOA is introduced. By employing this model and the sparseness in T-F domain the DOA estimation problem is altered to obtaining local peaks of probability density function of DOA. Finally the kernel density estimator approach based on the proposed statistical model is applied. The performance of the proposed method is assessed by conducted experiments. Our method outperforms others both in accuracy for real observed data and in robustness for simulation with additional diffused noise.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E95-A No.4 pp.735-744

Publication Date: 2012/04/01

Publicized

Online ISSN: 1745-1337

DOI: 10.1587/transfun.E95.A.735

Type of Manuscript: PAPER

Category: Engineering Acoustics

Cite this

Copy

Ning DING, Nozomu HAMADA, "DOA Estimation of Multiple Speech Sources from a Stereophonic Mixture in Underdetermined Case" in IEICE TRANSACTIONS on Fundamentals, vol. E95-A, no. 4, pp. 735-744, April 2012, doi: 10.1587/transfun.E95.A.735.
Abstract: This paper proposes a direction-of-arrival (DOA) estimation method of multiple speech sources from a stereophonic mixture in an underdetermined case where the number of sources exceeds the number of sensors. The method relies on the sparseness of speech signals in time-frequency (T-F) domain representation which means multiple independent speakers have a small overlap. At first, a selection of T-F cells bearing reliable spatial information is proposed by an introduced reliability index which is defined by the estimated interaural phase difference at each T-F cell. Then, a statistical error propagation model between the phase difference at T-F cell and its consequent DOA is introduced. By employing this model and the sparseness in T-F domain the DOA estimation problem is altered to obtaining local peaks of probability density function of DOA. Finally the kernel density estimator approach based on the proposed statistical model is applied. The performance of the proposed method is assessed by conducted experiments. Our method outperforms others both in accuracy for real observed data and in robustness for simulation with additional diffused noise.
URL: https://globals.ieice.org/en_transactions/fundamentals/10.1587/transfun.E95.A.735/_p

Copy

@ARTICLE{e95-a_4_735,
author={Ning DING, Nozomu HAMADA, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={DOA Estimation of Multiple Speech Sources from a Stereophonic Mixture in Underdetermined Case},
year={2012},
volume={E95-A},
number={4},
pages={735-744},
abstract={This paper proposes a direction-of-arrival (DOA) estimation method of multiple speech sources from a stereophonic mixture in an underdetermined case where the number of sources exceeds the number of sensors. The method relies on the sparseness of speech signals in time-frequency (T-F) domain representation which means multiple independent speakers have a small overlap. At first, a selection of T-F cells bearing reliable spatial information is proposed by an introduced reliability index which is defined by the estimated interaural phase difference at each T-F cell. Then, a statistical error propagation model between the phase difference at T-F cell and its consequent DOA is introduced. By employing this model and the sparseness in T-F domain the DOA estimation problem is altered to obtaining local peaks of probability density function of DOA. Finally the kernel density estimator approach based on the proposed statistical model is applied. The performance of the proposed method is assessed by conducted experiments. Our method outperforms others both in accuracy for real observed data and in robustness for simulation with additional diffused noise.},
keywords={},
doi={10.1587/transfun.E95.A.735},
ISSN={1745-1337},
month={April},}

Copy

TY - JOUR
TI - DOA Estimation of Multiple Speech Sources from a Stereophonic Mixture in Underdetermined Case
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 735
EP - 744
AU - Ning DING
AU - Nozomu HAMADA
PY - 2012
DO - 10.1587/transfun.E95.A.735
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E95-A
IS - 4
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - April 2012
AB - This paper proposes a direction-of-arrival (DOA) estimation method of multiple speech sources from a stereophonic mixture in an underdetermined case where the number of sources exceeds the number of sensors. The method relies on the sparseness of speech signals in time-frequency (T-F) domain representation which means multiple independent speakers have a small overlap. At first, a selection of T-F cells bearing reliable spatial information is proposed by an introduced reliability index which is defined by the estimated interaural phase difference at each T-F cell. Then, a statistical error propagation model between the phase difference at T-F cell and its consequent DOA is introduced. By employing this model and the sparseness in T-F domain the DOA estimation problem is altered to obtaining local peaks of probability density function of DOA. Finally the kernel density estimator approach based on the proposed statistical model is applied. The performance of the proposed method is assessed by conducted experiments. Our method outperforms others both in accuracy for real observed data and in robustness for simulation with additional diffused noise.
ER -