Han MA Qiaoling ZHANG Roubing TANG Lu ZHANG Yubo JIA
Recently, robust speech recognition for real-world applications has attracted much attention. This paper proposes a robust speech recognition method based on the teacher-student learning framework for domain adaptation. In particular, the student network will be trained based on a novel optimization criterion defined by the encoder outputs of both teacher and student networks rather than the final output posterior probabilities, which aims to make the noisy audio map to the same embedding space as clean audio, so that the student network is adaptive in the noise domain. Comparative experiments demonstrate that the proposed method obtained good robustness against noise.
Hangjin SUN Lei WANG Zhaoyang QIU Qi ZHANG
The Nyquist folding receiver (NYFR) is a novel analog-to-information architecture, which can achieve wideband receiving with a small amount of system resource. The NYFR uses a radio frequency (RF) non-uniform sampling to realize wideband receiving, and the practical RF non-uniform sample pulse train usually contains an aperture. Therefore, it is necessary to investigate the aperture impact on the NYFR output. In this letter, based on the NYFR output signal to noise ratio (SNR), the aperture impact on the NYFR is analyzed. Focusing on the aperture impact, the corresponding NYFR output signal power and noise power are given firstly. Then, the relation between the aperture and the output SNR is analyzed. In addition, the output SNR distribution containing the aperture is investigated. Finally, combing with a parameter estimation method, several simulations are conducted to prove the theoretical aperture impact.
Taiki HAYASHI Kazuyoshi ISHIMURA Isao T. TOKUDA
Towards realization of a noise-induced synchronization in a natural environment, an experimental study is carried out using the Van der Pol oscillator circuit. We focus on acoustic sounds as a potential source of noise that may exist in nature. To mimic such a natural environment, white noise sounds were generated from a loud speaker and recorded into microphone signals. These signals were then injected into the oscillator circuits. We show that the oscillator circuits spontaneously give rise to synchronized dynamics when the microphone signals are highly correlated with each other. As the correlation among the input microphone signals is decreased, the level of synchrony is lowered monotonously, implying that the input correlation is the key determinant for the noise-induced synchronization. Our study provides an experimental basis for synchronizing clocks in distributed sensor networks as well as other engineering devices in natural environment.
Masahiro MURAYAMA Toyohiro HIGASHIYAMA Yuki HARAZONO Hirotake ISHII Hiroshi SHIMODA Shinobu OKIDO Yasuyoshi TARUTA
High-quality depth images are required for stable and accurate computer vision. Depth images captured by depth cameras tend to be noisy, incomplete, and of low-resolution. Therefore, increasing the accuracy and resolution of depth images is desirable. We propose a method for reducing the noise and holes from depth images pixel by pixel, and increasing resolution. For each pixel in the target image, the linear space from the focal point of the camera through each pixel to the existing object is divided into equally spaced grids. In each grid, the difference from each grid to the object surface is obtained from multiple tracked depth images, which have noisy depth values of the respective image pixels. Then, the coordinates of the correct object surface are obtainable by reducing the depth random noise. The missing values are completed. The resolution can also be increased by creating new pixels between existing pixels and by then using the same process as that used for noise reduction. Evaluation results have demonstrated that the proposed method can do processing with less GPU memory. Furthermore, the proposed method was able to reduce noise more accurately, especially around edges, and was able to process more details of objects than the conventional method. The super-resolution of the proposed method also produced a high-resolution depth image with smoother and more accurate edges than the conventional methods.
Hikaru FUJISAKI Makoto NAKASHIZUKA
This paper presents a deep network based on morphological filters for Gaussian denoising. The morphological filters can be applied with only addition, max, and min functions and require few computational resources. Therefore, the proposed network is suitable for implementation using a small microprocessor. Each layer of the proposed network consists of a top-hat transform, which extracts small peaks and valleys of noise components from the input image. Noise components are iteratively reduced in each layer by subtracting the noise components from the input image. In this paper, the extensions of opening and closing are introduced as linear combinations of the morphological filters for the top-hat transform of this deep network. Multiplications are only required for the linear combination of the morphological filters in the proposed network. Because almost all parameters of the network are structuring elements of the morphological filters, the feature maps and parameters can be represented in short bit-length integer form, which is suitable for implementation with single instructions, multiple data (SIMD) instructions. Denoising examples show that the proposed network obtains denoising results comparable to those of BM3D [1] without linear convolutions and with approximately one tenth the number of parameters of a full-scale deep convolutional neural network [2]. Moreover, the computational time of the proposed method using SIMD instructions of a microprocessor is also presented.
Exponential growth in data volumes has promoted widespread interest in data-selective adaptive algorithms. In a pioneering work, Diniz developed the data-selective least mean square (DS-LMS) algorithm, which is able to reduce specific quantities of computation data without compromising performance. Note however that the existing framework fails to consider the issue of impulse noise (IN), which can greatly undermine the benefits of reduced computation. In this letter, we present an error-based IN detection algorithm for implementation in conjunction with the DS-LMS algorithm. Numerical evaluations confirm the effectiveness of our proposed IN-tolerant DS-LMS algorithm.
Masato YOSHIDA Kozo SATO Toshihiko HIROOKA Keisuke KASAI Masataka NAKAZAWA
We present detailed measurements and analysis of the guided acoustic wave Brillouin scattering (GAWBS)-induced depolarization noise in a multi-core fiber (MCF) used for a digital coherent optical transmission. We first describe the GAWBS-induced depolarization noise in an uncoupled four-core fiber (4CF) with a 125μm cladding and compare the depolarization noise spectrum with that of a standard single-mode fiber (SSMF). We found that off-center cores in the 4CF are dominantly affected by higher-order TRn,m modes rather than the TR2,m mode unlike in the center core, and the total power of the depolarization noise in the 4CF was almost the same as that in the SSMF. We also report measurement results for the GAWBS-induced depolarization noise in an uncoupled 19-core fiber with a 240μm cladding. The results indicate that the amounts of depolarization noise generated in the cores are almost identical. Finally, we evaluate the influence of GAWBS-induced polarization crosstalk (XT) on a coherent QAM transmission. We found that the XT limits the achievable multiplicity of the QAM signal to 64 in a transoceanic transmission with an MCF.
Hiroki ISHIGURO Takashi ISHIDA Masashi SUGIYAMA
It has been demonstrated that large-scale labeled datasets facilitate the success of machine learning. However, collecting labeled data is often very costly and error-prone in practice. To cope with this problem, previous studies have considered the use of a complementary label, which specifies a class that an instance does not belong to and can be collected more easily than ordinary labels. However, complementary labels could also be error-prone and thus mitigating the influence of label noise is an important challenge to make complementary-label learning more useful in practice. In this paper, we derive conditions for the loss function such that the learning algorithm is not affected by noise in complementary labels. Experiments on benchmark datasets with noisy complementary labels demonstrate that the loss functions that satisfy our conditions significantly improve the classification performance.
We analyze the effect of window choice on the zero-padding method and corrected quadratically interpolated fast Fourier transform using a harmonic signal in noise at both high and low signal-to-noise ratios (SNRs) on a theoretical basis. Then, we validate the theoretical analysis using simulations. The theoretical analysis and simulation results using four traditional window functions show that the optimal window is determined depending on the SNR; the estimation errors are the smallest for the rectangular window at low SNR, the Hamming and Hanning windows at mid SNR, and the Blackman window at high SNR. In addition, we analyze the simulation results using the signal-to-noise floor ratio, which appears to be more effective than the conventional SNR in determining the optimal window.
Koichi MAEZAWA Tatsuo ITO Masayuki MORI
A hard-type oscillator is defined as an oscillator having stable fixed points within a stable limit cycle. For resonant tunneling diode (RTD) oscillators, using hard-type configuration has a significant advantage that it can suppress spurious oscillations in a bias line. We have fabricated hard-type oscillators using an InGaAs-based RTD, and demonstrated a proper operation. Furthermore, the oscillating properties have been compared with a soft-type oscillator having a same parameters. It has been demonstrated that the same level of the phase noise can be obtained with a much smaller power consumption of approximately 1/20.
Satoshi MIZOGUCHI Yuki SAITO Shinnosuke TAKAMICHI Hiroshi SARUWATARI
We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.
To cope with complicated interference scenarios in realistic acoustic environment, supervised deep neural networks (DNNs) are investigated to estimate different user-defined targets. Such techniques can be broadly categorized into magnitude estimation and time-frequency mask estimation techniques. Further, the mask such as the Wiener gain can be estimated directly or derived by the estimated interference power spectral density (PSD) or the estimated signal-to-interference ratio (SIR). In this paper, we propose to incorporate the multi-task learning in DNN-based single-channel speech enhancement by using the speech presence probability (SPP) as a secondary target to assist the target estimation in the main task. The domain-specific information is shared between two tasks to learn a more generalizable representation. Since the performance of multi-task network is sensitive to the weight parameters of loss function, the homoscedastic uncertainty is introduced to adaptively learn the weights, which is proven to outperform the fixed weighting method. Simulation results show the proposed multi-task scheme improves the speech enhancement performance overall compared to the conventional single-task methods. And the joint direct mask and SPP estimation yields the best performance among all the considered techniques.
Quantum noise ultimately restricts the transmission distance in fiber communication systems using optical amplifiers. This paper investigates the quantum-noise-limited performance of optical binary phase-shift keying transmission using gain-saturated phase-sensitive amplifiers (PSAs) as optical repeaters. It is shown that coherent state transmission, where ultimately clean light in the classical sense is transmitted, and endless transmission, where the transmission distance is not restricted, are theoretically achievable under certain system conditions owing to the noise suppression effects of the gain-saturated PSA.
Tomohiro TSUKUSHI Satoshi ONO Koji WADA
Realizing frequency rectangular characteristics using a planar circuit made of a normal conductor material such as a printed circuit board (PCB) is difficult. The reason is that the corners of the frequency response are rounded by the effect of the low unloaded quality factors of the resonators. Rectangular frequency characteristics are generally realized by a low-noise amplifier (LNA) with flat gain characteristics and a high-order bandpass filter (BPF) with resonators having high unloaded quality factors. Here, we use an LNA and a fourth-order flat passband BPF made of a PCB to realize the desired characteristics. We first calculate the signal and noise powers to confirm any effects from insertion loss caused by the BPF. Next, we explain the design and fabrication of an LNA, since no proper LNAs have been developed for this research. Finally, the rectangular frequency characteristics are shown by a circuit combining the fabricated LNA and the fabricated flat passband BPF. We show that rectangular frequency characteristics can be realized using a flat passband BPF technique.
Satoshi DENNO Kazuma YAMAMOTO Yafei HOU
This paper proposes relay selection techniques for XOR physical layer network coding with MMSE based non-linear precoding in MIMO bi-directional wireless relaying networks. The proposed selection techniques are derived on the different assumption about characteristics of the MMSE based non-linear precoding in the wireless network. We show that the signal to noise power ratio (SNR) is dependent on the product of all the eigenvalues in the channels from the terminals to relays. This paper shows that the best selection techniques in all the proposed techniques is to select a group of the relays that maximizes the product. Therefore, the selection technique is called “product of all eigenvalues (PAE)” in this paper. The performance of the proposed relay selection techniques is evaluated in a MIMO bi-directional wireless relaying network where two terminals with 2 antennas exchange their information via relays. When the PAE is applied to select a group of the 2 relays out of the 10 relays where an antenna is placed, the PAE attains a gain of more than 13dB at the BER of 10-3.
Go URAKAWA Hiroyuki KOBAYASHI Jun DEGUCHI Ryuichi FUJIMOTO
In general, since the in-band noise of phase-locked loops (PLLs) is mainly caused by charge pumps (CPs), large-size transistors that occupy a large area are used to improve in-band noise of CPs. With the high demand for low phase noise in recent high-performance communication systems, the issue of the trade-off between occupied area and noise in conventional CPs has become significant. A noise-canceling CP circuit is presented in this paper to mitigate the trade-off between occupied area and noise. The proposed CP can achieve lower noise performance than conventional CPs by performing additional noise cancelation. According to the simulation results, the proposed CP can reduce the current noise to 57% with the same occupied area, or can reduce the occupied area to 22% compared with that of the conventional CPs at the same noise performance. We fabricated a prototype of the proposed CP embedded in a 28-GHz LC-PLL using a 16-nm FinFET process, and 1.2-dB improvement in single sideband integrated phase noise is achieved.
Frequency delta sigma modulation (FDSM) is a unique analog to digital conversion technique featuring large dynamic range with wide frequency band width. It can be used for high performance digital-output sensors, if the oscillator in the FDSM is replaced by a variable frequency oscillator whose frequency depends on a certain external physical quantity. One of the most important parameters governing the performance of these sensors is a phase noise of the oscillator. The phase noise is an essential error source in the FDSM, and it is quite important for this type of sensors because they use a high frequency oscillator and an extremely large oversampling ratio. In this paper, we will discuss the quantitative effects of the phase noise on the FDSM output on the basis of a simple model. The model was validated with experiments for three types of oscillators.
A narrowband active noise control (NANC) system is very effective for controlling low-frequency periodic noise. A frequency mismatch (FM) with the reference signal will degrade the performance or even cause the system to diverge. To deal with an FM and obtain an accurate reference signal, NANC systems often employ a frequency estimator. Combining an autoregressive predictive filter with a variable step size (VSS) all-pass-based lattice adaptive notch filter (ANF), a new frequency estimation method is proposed that does not require prior information of the primary signal, and the convergence characteristics are much improved. Simulation results show that the designed frequency estimator has a higher accuracy than the conventional algorithm. Finally, hardware experiments are carried out to verify the noise reduction effect.
Takao WAHO Tomoaki KOIZUMI Hitoshi HAYASHI
A feedforward (FF) network using ΔΣ modulators is investigated to implement a non-binary analog-to-digital (A/D) converter. Weighting coefficients in the network are determined to suppress the generation of quantization noise. A moving average is adopted to prevent the analog signal amplitude from increasing beyond the allowable input range of the modulators. The noise transfer function is derived and used to estimate the signal-to-noise ratio (SNR). The FF network output is a non-uniformly distributed multi-level signal, which results in a better SNR than a uniformly distributed one. Also, the effect of the characteristic mismatch in analog components on the SNR is analyzed. Our behavioral simulations show that the SNR is improved by more than 30 dB, or equivalently a bit resolution of 5 bits, compared with a conventional first-order ΔΣ modulator.
In this paper, we propose a robust parameters estimation algorithm for channel coded systems based on the low-density parity-check (LDPC) code over fading channels with impulse noise. The estimated parameters are then used to generate bit log-likelihood ratios (LLRs) for a soft-inputLDPC decoder. The expectation-maximization (EM) algorithm is used to estimate the parameters, including the channel gain and the parameters of the Bernoulli-Gaussian (B-G) impulse noise model. The parameters can be estimated accurately and the average number of iterations of the proposed algorithm is acceptable. Simulation results show that over a wide range of impulse noise power, the proposed algorithm approaches the optimal performance under different Rician channel factors and even under Middleton class-A (M-CA) impulse noise models.