Speech Privacy for Sound Surveillance Using Super-Resolution Based on Maximum Likelihood and Bayesian Linear Regression

Ryouichi NISHIMURA; Seigo ENOMOTO; Hiroaki KATO

doi:10.1587/transinf.2017MUP0003

Speech Privacy for Sound Surveillance Using Super-Resolution Based on Maximum Likelihood and Bayesian Linear Regression

Ryouichi NISHIMURA, Seigo ENOMOTO, Hiroaki KATO

Full Text Views

0

Share
Cite this

Summary :

Surveillance with multiple cameras and microphones is promising to trace activities of suspicious persons for security purposes. When these sensors are connected to the Internet, they might also jeopardize innocent people's privacy because, as a result of human error, signals from sensors might allow eavesdropping by malicious persons. This paper presents a proposal for exploiting super-resolution to address this problem. Super-resolution is a signal processing technique by which a high-resolution version of a signal can be reproduced from a low-resolution version of the same signal source. Because of this property, an intelligible speech signal is reconstructed from multiple sensor signals, each of which is completely unintelligible because of its sufficiently low sampling rate. A method based on Bayesian linear regression is proposed in comparison with one based on maximum likelihood. Computer simulations using a simple sinusoidal input demonstrate that the methods restore the original signal from those which are actually measured. Moreover, results show that the method based on Bayesian linear regression is more robust than maximum likelihood under various microphone configurations in noisy environments and that this advantage is remarkable when the number of microphones enrolled in the process is as small as the minimum required. Finally, listening tests using speech signals confirmed that mean opinion score (MOS) of the reconstructed signal reach 3, while those of the original signal captured at each single microphone are almost 1.

Publication: IEICE TRANSACTIONS on Information Vol.E101-D No.1 pp.53-63

Publication Date: 2018/01/01

Publicized: 2017/10/16

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2017MUP0003

Type of Manuscript: Special Section PAPER (Special Section on Enriched Multimedia — Potential and Possibility of Multimedia Contents for the Future —)

Category

Authors

Ryouichi NISHIMURA
  National Institute of Information and Communications Technology
Seigo ENOMOTO
  National Institute of Information and Communications Technology
Hiroaki KATO
  National Institute of Information and Communications Technology

Keyword

sensor network, sound surveillance, maximum likelihood, Bayesian linear regression, Mean Opinion Score

Cite this

Copy

Ryouichi NISHIMURA, Seigo ENOMOTO, Hiroaki KATO, "Speech Privacy for Sound Surveillance Using Super-Resolution Based on Maximum Likelihood and Bayesian Linear Regression" in IEICE TRANSACTIONS on Information, vol. E101-D, no. 1, pp. 53-63, January 2018, doi: 10.1587/transinf.2017MUP0003.
Abstract: Surveillance with multiple cameras and microphones is promising to trace activities of suspicious persons for security purposes. When these sensors are connected to the Internet, they might also jeopardize innocent people's privacy because, as a result of human error, signals from sensors might allow eavesdropping by malicious persons. This paper presents a proposal for exploiting super-resolution to address this problem. Super-resolution is a signal processing technique by which a high-resolution version of a signal can be reproduced from a low-resolution version of the same signal source. Because of this property, an intelligible speech signal is reconstructed from multiple sensor signals, each of which is completely unintelligible because of its sufficiently low sampling rate. A method based on Bayesian linear regression is proposed in comparison with one based on maximum likelihood. Computer simulations using a simple sinusoidal input demonstrate that the methods restore the original signal from those which are actually measured. Moreover, results show that the method based on Bayesian linear regression is more robust than maximum likelihood under various microphone configurations in noisy environments and that this advantage is remarkable when the number of microphones enrolled in the process is as small as the minimum required. Finally, listening tests using speech signals confirmed that mean opinion score (MOS) of the reconstructed signal reach 3, while those of the original signal captured at each single microphone are almost 1.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.2017MUP0003/_p

Copy

@ARTICLE{e101-d_1_53,
author={Ryouichi NISHIMURA, Seigo ENOMOTO, Hiroaki KATO, },
journal={IEICE TRANSACTIONS on Information},
title={Speech Privacy for Sound Surveillance Using Super-Resolution Based on Maximum Likelihood and Bayesian Linear Regression},
year={2018},
volume={E101-D},
number={1},
pages={53-63},
abstract={Surveillance with multiple cameras and microphones is promising to trace activities of suspicious persons for security purposes. When these sensors are connected to the Internet, they might also jeopardize innocent people's privacy because, as a result of human error, signals from sensors might allow eavesdropping by malicious persons. This paper presents a proposal for exploiting super-resolution to address this problem. Super-resolution is a signal processing technique by which a high-resolution version of a signal can be reproduced from a low-resolution version of the same signal source. Because of this property, an intelligible speech signal is reconstructed from multiple sensor signals, each of which is completely unintelligible because of its sufficiently low sampling rate. A method based on Bayesian linear regression is proposed in comparison with one based on maximum likelihood. Computer simulations using a simple sinusoidal input demonstrate that the methods restore the original signal from those which are actually measured. Moreover, results show that the method based on Bayesian linear regression is more robust than maximum likelihood under various microphone configurations in noisy environments and that this advantage is remarkable when the number of microphones enrolled in the process is as small as the minimum required. Finally, listening tests using speech signals confirmed that mean opinion score (MOS) of the reconstructed signal reach 3, while those of the original signal captured at each single microphone are almost 1.},
keywords={},
doi={10.1587/transinf.2017MUP0003},
ISSN={1745-1361},
month={January},}

Copy

TY - JOUR
TI - Speech Privacy for Sound Surveillance Using Super-Resolution Based on Maximum Likelihood and Bayesian Linear Regression
T2 - IEICE TRANSACTIONS on Information
SP - 53
EP - 63
AU - Ryouichi NISHIMURA
AU - Seigo ENOMOTO
AU - Hiroaki KATO
PY - 2018
DO - 10.1587/transinf.2017MUP0003
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2018
AB - Surveillance with multiple cameras and microphones is promising to trace activities of suspicious persons for security purposes. When these sensors are connected to the Internet, they might also jeopardize innocent people's privacy because, as a result of human error, signals from sensors might allow eavesdropping by malicious persons. This paper presents a proposal for exploiting super-resolution to address this problem. Super-resolution is a signal processing technique by which a high-resolution version of a signal can be reproduced from a low-resolution version of the same signal source. Because of this property, an intelligible speech signal is reconstructed from multiple sensor signals, each of which is completely unintelligible because of its sufficiently low sampling rate. A method based on Bayesian linear regression is proposed in comparison with one based on maximum likelihood. Computer simulations using a simple sinusoidal input demonstrate that the methods restore the original signal from those which are actually measured. Moreover, results show that the method based on Bayesian linear regression is more robust than maximum likelihood under various microphone configurations in noisy environments and that this advantage is remarkable when the number of microphones enrolled in the process is as small as the minimum required. Finally, listening tests using speech signals confirmed that mean opinion score (MOS) of the reconstructed signal reach 3, while those of the original signal captured at each single microphone are almost 1.
ER -