1-11hit |
An online nonnegative matrix factorization (NMF) algorithm based on recursive least squares (RLS) is described in a matrix form, and a simplified algorithm for a low-complexity calculation is developed for frame-by-frame online audio source separation system. First, the online NMF algorithm based on the RLS method is described as solving the NMF problem recursively. Next, a simplified algorithm is developed to approximate the RLS-based online NMF algorithm with low complexity. The proposed algorithm is evaluated in terms of audio source separation, and the results show that the performance of the proposed algorithms are superior to that of the conventional online NMF algorithm with significantly reduced complexity.
Estimation of the time delay of arrival (TDOA) problem is important to acoustic source localization. The TDOA estimation problem is defined as finding the relative delay between several microphone signals for the direct sound. To estimate TDOA, the generalized cross-correlation (GCC) method is the most frequently used, but it has a disadvantage in terms of reverberant environments. In order to overcome this problem, the adaptive eigenvalue decomposition (AED) method has been developed, which estimates the room transfer function and finds the direct-path delay. However, the algorithm does not take into account the fact that the room transfer function is a sparse channel, and so sometimes the estimated transfer function is too dense, resulting in failure to exact direct-path and delay. In this paper, an enhanced AED algorithm that makes use of a proportionate step-size control and a direct-path constraint is proposed instead of a constant step size and the L2-norm constraint. The simulation results show that the proposed algorithm has enhanced performance as compared to both the conventional AED method and the phase-transform (PHAT) algorithm.
Seokjin LEE Sang Ha PARK Koeng-Mo SUNG
In this paper, an on-line nonnegative matrix factorization (NMF) algorithm for acoustic signal processing systems is developed based on the recursive least squares (RLS) method. In order to develop the on-line NMF algorithm, we reformulate the NMF problem into multiple least squares (LS) normal equations, and solve the reformulated problems using RLS methods. In addition, we eliminate the irrelevant calculations based on the NMF model. The proposed algorithm has been evaluated with a well-known dataset used for NMF performance evaluation and with real acoustic signals; the results show that the proposed algorithm performs better than the conventional algorithm in on-line applications.
Recursive least squares-based online nonnegative matrix factorization (RLS-ONMF), an NMF algorithm based on the RLS method, was developed to solve the NMF problem online. However, this method suffers from a partial-data problem. In this study, the partial-data problem is resolved by developing an improved online NMF algorithm using RLS and a sparsity constraint. The proposed method, RLS-based online sparse NMF (RLS-OSNMF), consists of two steps; an estimation step that optimizes the Euclidean NMF cost function, and a shaping step that satisfies the sparsity constraint. The proposed algorithm was evaluated with recorded speech and music data and with the RWC music database. The results show that the proposed algorithm performs better than conventional RLS-ONMF, especially during the adaptation process.
The estimation of the matrix rank of harmonic components of a music spectrogram provides some useful information, e.g., the determination of the number of basis vectors of the matrix-factorization-based algorithms, which is required for the automatic music transcription or in post-processing. In this work, we develop an algorithm based on Stein's unbiased risk estimator (SURE) algorithm with the matrix factorization model. The noise variance required for the SURE algorithm is estimated by suppressing the harmonic component via median filtering. An evaluation performed using the MIDI-aligned piano sounds (MAPS) database revealed an average estimation error of -0.26 (standard deviation: 4.4) for the proposed algorithm.
Sang Ha PARK Seokjin LEE Koeng-Mo SUNG
Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.
We analyze the effect of window choice on the zero-padding method and corrected quadratically interpolated fast Fourier transform using a harmonic signal in noise at both high and low signal-to-noise ratios (SNRs) on a theoretical basis. Then, we validate the theoretical analysis using simulations. The theoretical analysis and simulation results using four traditional window functions show that the optimal window is determined depending on the SNR; the estimation errors are the smallest for the rectangular window at low SNR, the Hamming and Hanning windows at mid SNR, and the Blackman window at high SNR. In addition, we analyze the simulation results using the signal-to-noise floor ratio, which appears to be more effective than the conventional SNR in determining the optimal window.
Recently, array speaker products have received attention in the field of consumer electronics, and control technologies for arrayed speaker units, including beamforming and wave field synthesis (WFS), have been developed for various purposes. An important application of these algorithms is focused source reproduction. The focused source reproduction capability is strongly coupled with the array length. The array length is a very important design factor in consumer products, but it is very short in home entertainment systems, compared with ideal WFS systems or theater speaker systems. Therefore, a well-defined measure for the maximum focusing range is necessary for designing an array speaker product. In this paper, a maximum focusable range measure is proposed and is analyzed by simulation of a small array speaker. The analysis results show that the proposed maximum focusable range has properties strongly related to the capability for focused source reproduction.
The development of multichannel audio systems has increased the need for multichannel contents. However, the supply of multichannel audio contents is not sufficient for advanced multichannel systems. Therefore, home entertainment manufacturers need upmixing systems, including systems that utilize monaural time-frequency domain information. Therefore, a monaural ambience extraction algorithm based on nonnegative matrix factorization (NMF) has been developed recently. Ambience signals refer to sound components that do not have obvious spatial images, e.g., wind, rain, and diffuse sound. The developed algorithm provides good upmixing performance; however, the algorithm is a batch process and therefore, it cannot be used by home audio manufacturers. In this paper, we propose an on-line monaural ambience extraction algorithm. The proposed algorithm analyzes the dominant components with an on-line NMF algorithm, and extracts the remaining sound as ambience components. Experiments were performed with artificial mixed signals and real music signals, and the performance of the proposed algorithm was compared with the performance of the conventional batch algorithm as a reference. The experimental results show that the proposed algorithm extracts the ambience components as well as the batch algorithm, despite the on-line constraints.
Sang Ha PARK Seokjin LEE Koeng-Mo SUNG
Non-negative matrix factorization (NMF) is widely used for music transcription because of its efficiency. However, the conventional NMF-based music transcription algorithm often causes harmonic confusion errors or time split-up errors, because the NMF decomposes the time-frequency data according to the activated frequency in its time. To solve these problems, we proposed an NMF with temporal continuity and harmonicity constraints. The temporal continuity constraint prevented the time split-up of the continuous time components, and the harmonicity constraint helped to bind the fundamental with harmonic frequencies by reducing the additional octave errors. The transcription performance of the proposed algorithm was compared with that of the conventional algorithms, which showed that the proposed method helped to reduce additional false errors and increased the overall transcription performance.
Seokjin LEE Sang Ha PARK Koeng-Mo SUNG
In this paper, a geometric source separation system using nonnegative matrix factorization (NMF) is proposed. The adaptive beamformer is the best method for geometric source separation, but it suffers from a “target signal cancellation” problem in multi-path situations. We modified the HALS-NMF algorithm for decomposition into bases, and developed an interference suppression module in order to cancel the interference bases. A performance comparison between the proposed and subband GSC-RLS algorithm using a MATLAB® simulation was executed; the results show that the proposed system is robust in multi-path situations.