1-12hit |
Dubok PARK David K. HAN Changwon JEON Hanseok KO
Images captured under foggy conditions often exhibit poor contrast and color. This is primarily due to the air-light which degrades image quality exponentially with fog depth between the scene and the camera. In this paper, we restore fog-degraded images by first estimating depth using the physical model characterizing the RGB channels in a single monocular image. The fog effects are then removed by subtracting the estimated irradiance, which is empirically related to the scene depth information obtained, from the total irradiance received by the sensor. Effective restoration of color and contrast of images taken under foggy conditions are demonstrated. In the experiments, we validate the effectiveness of our method compared with conventional method.
Chao-Yuan KAO Sangwook PARK Alzahra BADI David K. HAN Hanseok KO
Performance in Automatic Speech Recognition (ASR) degrades dramatically in noisy environments. To alleviate this problem, a variety of deep networks based on convolutional neural networks and recurrent neural networks were proposed by applying L1 or L2 loss. In this Letter, we propose a new orthogonal gradient penalty (OGP) method for Wasserstein Generative Adversarial Networks (WGAN) applied to denoising and despeeching models. WGAN integrates a multi-task autoencoder which estimates not only speech features but also noise features from noisy speech. While achieving 14.1% improvement in Wasserstein distance convergence rate, the proposed OGP enhanced features are tested in ASR and achieve 9.7%, 8.6%, 6.2%, and 4.8% WER improvements over DDAE, MTAE, R-CED(CNN) and RNN models.
Jinsoo PARK Wooil KIM David K. HAN Hanseok KO
We propose a new algorithm to suppress both stationary background noise and nonstationary directional interference noise in a speech enhancement system that employs the generalized sidelobe canceller. Our approach builds on advances in generalized sidelobe canceller design involving the transfer function ratio. Our system is composed of three stages. The first stage estimates the transfer function ratio on the acoustic path, from the nonstationary directional interference noise source to the microphones, and the powers of the stationary background noise components. Secondly, the estimated powers of the stationary background noise components are used to execute spectral subtraction with respect to input signals. Finally, the estimated transfer function ratio is used for speech enhancement on the primary channel, and an adaptive filter reduces the residual correlated noise components of the signal. These algorithmic improvements give consistently better performance than the transfer function generalized sidelobe canceller when input signal-to-noise ratio is 10 dB or lower.
Byeonghak KIM Murray LOEW David K. HAN Hanseok KO
To date, many studies have employed clustering for the classification of unlabeled data. Deep separate clustering applies several deep learning models to conventional clustering algorithms to more clearly separate the distribution of the clusters. In this paper, we employ a convolutional autoencoder to learn the features of input images. Following this, k-means clustering is conducted using the encoded layer features learned by the convolutional autoencoder. A center loss function is then added to aggregate the data points into clusters to increase the intra-cluster homogeneity. Finally, we calculate and increase the inter-cluster separability. We combine all loss functions into a single global objective function. Our new deep clustering method surpasses the performance of existing clustering approaches when compared in experiments under the same conditions.
Daehun KIM Bonhwa KU David K. HAN Hanseok KO
In this paper, an algorithm is proposed for license plate recognition (LPR) in video traffic surveillance applications. In an LPR system, the primary steps are license plate detection and character segmentation. However, in practice, false alarms often occur due to images of vehicle parts that are similar in appearance to a license plate or detection rate degradation due to local illumination changes. To alleviate these difficulties, the proposed license plate segmentation employs an adaptive binarization using a superpixel-based local contrast measurement. From the binarization, we apply a set of rules to a sequence of characters in a sub-image region to determine whether it is part of a license plate. This process is effective in reducing false alarms and improving detection rates. Our experimental results demonstrate a significant improvement over conventional methods.
Dubok PARK David K. HAN Hanseok KO
This paper proposes a novel framework for enhancing underwater images captured by optical imaging model and non-local means denoising. The proposed approach adjusts the color balance using biasness correction and the average luminance. Scene visibility is then enhanced based on an underwater optical imaging model. The increase in noise in the enhanced images is alleviated by non-local means (NLM) denoising. The final enhanced images are characterized by improved visibility while retaining color fidelity and reducing noise. The proposed method does not require specialized hardware nor prior knowledge of the underwater environment.
Suwon SHON David K. HAN Jounghoon BEH Hanseok KO
This paper describes a method for estimating Direction Of Arrival (DOA) of multiple sound sources in full azimuth with three microphones. Estimating DOA with paired microphone arrays creates imaginary sound sources because of time delay of arrival (TDOA) being identical between real and imaginary sources. Imaginary sound sources can create chronic problems in multiple Sound Source Localization (SSL), because they can be localized as real sound sources. Our proposed approach is based on the observation that each microphone array creates imaginary sound sources, but the DOA of imaginary sources may be different depending on the orientation of the paired microphone array. With the fact that a real source would always be localized in the same direction regardless of the array orientation, we can suppress the imaginary sound sources by minimum filtering based on Steered Response Power – Phase Transform (SRP-PHAT) method. A set of experiments conducted in a real noisy environment showed that the proposed method was accurate in localizing multiple sound sources.
Seongkyu MUN Minkyu SHIN Suwon SHON Wooil KIM David K. HAN Hanseok KO
Recent acoustic event classification research has focused on training suitable filters to represent acoustic events. However, due to limited availability of target event databases and linearity of conventional filters, there is still room for improving performance. By exploiting the non-linear modeling of deep neural networks (DNNs) and their ability to learn beyond pre-trained environments, this letter proposes a DNN-based feature extraction scheme for the classification of acoustic events. The effectiveness and robustness to noise of the proposed method are demonstrated using a database of indoor surveillance environments.
Kyungdeuk KO Jaihyun PARK David K. HAN Hanseok KO
In-class species classification based on animal sounds is a highly challenging task even with the latest deep learning technique applied. The difficulty of distinguishing the species is further compounded when the number of species is large within the same class. This paper presents a novel approach for fine categorization of animal species based on their sounds by using pre-trained CNNs and a new self-attention module well-suited for acoustic signals The proposed method is shown effective as it achieves average species accuracy of 98.37% and the minimum species accuracy of 94.38%, the highest among the competing baselines, which include CNN's without self-attention and CNN's with CBAM, FAM, and CFAM but without pre-training.
Minkyu SHIN Seongkyu MUN David K. HAN Hanseok KO
In this paper, a multichannel speech enhancement system which adopts a denoising auto-encoder as part of the beamformer is proposed. The proposed structure of the generalized sidelobe canceller generates enhanced multi-channel signals, instead of merely one channel, to which the following denoising auto-encoder can be applied. Because the beamformer exploits spatial information and compensates for differences in the transfer functions of each channel, the proposed system is expected to resolve the difficulty of modelling relative transfer functions consisting of complex numbers which are hard to model with a denoising auto-encoder. As a result, the modelling capability of the denoising auto-encoder can concentrate on removing the artefacts caused by the beamformer. Unlike conventional beamformers, which combine these artefacts into one channel, they remain separated for each channel in the proposed method. As a result, the denoising auto-encoder can remove the artefacts by referring to other channels. Experimental results prove that the proposed structure is effective for the six-channel data in CHiME, as indicated by improvements in terms of speech enhancement and word error rate in automatic speech recognition.
Kyungsun LEE Minseok KEUM David K. HAN Hanseok KO
It is unclear whether Hidden Markov Model (HMM) or Dynamic Time Warping (DTW) mapping is more appropriate for visual speech recognition when only small data samples are available. In this letter, the two approaches are compared in terms of sensitivity to the amount of training samples and computing time with the objective of determining the tipping point. The limited training data problem is addressed by exploiting a straightforward template matching via weighted-DTW. The proposed framework is a refined DTW by adjusting the warping paths with judicially injected weights to ensure a smooth diagonal path for accurate alignment without added computational load. The proposed WDTW is evaluated on three databases (two in the public domain and one developed in-house) for visual recognition performance. Subsequent experiments indicate that the proposed WDTW significantly enhances the recognition rate compared to the DTW and HMM based algorithms, especially under limited data samples.
Seongkyu MUN Suwon SHON Wooil KIM David K. HAN Hanseok KO
Various types of classifiers and feature extraction methods for acoustic scene classification have been recently proposed in the IEEE Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Challenge Task 1. The results of the final evaluation, however, have shown that even top 10 ranked teams, showed extremely low accuracy performance in particular class pairs with similar sounds. Due to such sound classes being difficult to distinguish even by human ears, the conventional deep learning based feature extraction methods, as used by most DCASE participating teams, are considered facing performance limitations. To address the low performance problem in similar class pair cases, this letter proposes to employ a recurrent neural network (RNN) based source separation for each class prior to the classification step. Based on the fact that the system can effectively extract trained sound components using the RNN structure, the mid-layer of the RNN can be considered to capture discriminative information of the trained class. Therefore, this letter proposes to use this mid-layer information as novel discriminative features. The proposed feature shows an average classification rate improvement of 2.3% compared to the conventional method, which uses additional classifiers for the similar class pair issue.