Author Search Result

[Author] David K. HAN(12hit)

1-12hit
  • Fast Single Image De-Hazing Using Characteristics of RGB Channel of Foggy Image

    Dubok PARK  David K. HAN  Changwon JEON  Hanseok KO  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E96-D No:8
      Page(s):
    1793-1799

    Images captured under foggy conditions often exhibit poor contrast and color. This is primarily due to the air-light which degrades image quality exponentially with fog depth between the scene and the camera. In this paper, we restore fog-degraded images by first estimating depth using the physical model characterizing the RGB channels in a single monocular image. The fog effects are then removed by subtracting the estimated irradiance, which is empirically related to the scene depth information obtained, from the total irradiance received by the sensor. Effective restoration of color and contrast of images taken under foggy conditions are demonstrated. In the experiments, we validate the effectiveness of our method compared with conventional method.

  • Orthogonal Gradient Penalty for Fast Training of Wasserstein GAN Based Multi-Task Autoencoder toward Robust Speech Recognition

    Chao-Yuan KAO  Sangwook PARK  Alzahra BADI  David K. HAN  Hanseok KO  

     
    LETTER-Speech and Hearing

      Pubricized:
    2020/01/27
      Vol:
    E103-D No:5
      Page(s):
    1195-1198

    Performance in Automatic Speech Recognition (ASR) degrades dramatically in noisy environments. To alleviate this problem, a variety of deep networks based on convolutional neural networks and recurrent neural networks were proposed by applying L1 or L2 loss. In this Letter, we propose a new orthogonal gradient penalty (OGP) method for Wasserstein Generative Adversarial Networks (WGAN) applied to denoising and despeeching models. WGAN integrates a multi-task autoencoder which estimates not only speech features but also noise features from noisy speech. While achieving 14.1% improvement in Wasserstein distance convergence rate, the proposed OGP enhanced features are tested in ASR and achieve 9.7%, 8.6%, 6.2%, and 4.8% WER improvements over DDAE, MTAE, R-CED(CNN) and RNN models.

  • Pre-Filtering Algorithm for Dual-Microphone Generalized Sidelobe Canceller Using General Transfer Function

    Jinsoo PARK  Wooil KIM  David K. HAN  Hanseok KO  

     
    LETTER-Speech and Hearing

      Vol:
    E97-D No:9
      Page(s):
    2533-2536

    We propose a new algorithm to suppress both stationary background noise and nonstationary directional interference noise in a speech enhancement system that employs the generalized sidelobe canceller. Our approach builds on advances in generalized sidelobe canceller design involving the transfer function ratio. Our system is composed of three stages. The first stage estimates the transfer function ratio on the acoustic path, from the nonstationary directional interference noise source to the microphones, and the powers of the stationary background noise components. Secondly, the estimated powers of the stationary background noise components are used to execute spectral subtraction with respect to input signals. Finally, the estimated transfer function ratio is used for speech enhancement on the primary channel, and an adaptive filter reduces the residual correlated noise components of the signal. These algorithmic improvements give consistently better performance than the transfer function generalized sidelobe canceller when input signal-to-noise ratio is 10 dB or lower.

  • Deep Clustering for Improved Inter-Cluster Separability and Intra-Cluster Homogeneity with Cohesive Loss

    Byeonghak KIM  Murray LOEW  David K. HAN  Hanseok KO  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/01/28
      Vol:
    E104-D No:5
      Page(s):
    776-780

    To date, many studies have employed clustering for the classification of unlabeled data. Deep separate clustering applies several deep learning models to conventional clustering algorithms to more clearly separate the distribution of the clusters. In this paper, we employ a convolutional autoencoder to learn the features of input images. Following this, k-means clustering is conducted using the encoded layer features learned by the convolutional autoencoder. A center loss function is then added to aggregate the data points into clusters to increase the intra-cluster homogeneity. Finally, we calculate and increase the inter-cluster separability. We combine all loss functions into a single global objective function. Our new deep clustering method surpasses the performance of existing clustering approaches when compared in experiments under the same conditions.

  • License Plate Detection and Character Segmentation Using Adaptive Binarization Based on Superpixels under Illumination Change

    Daehun KIM  Bonhwa KU  David K. HAN  Hanseok KO  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2017/02/22
      Vol:
    E100-D No:6
      Page(s):
    1384-1387

    In this paper, an algorithm is proposed for license plate recognition (LPR) in video traffic surveillance applications. In an LPR system, the primary steps are license plate detection and character segmentation. However, in practice, false alarms often occur due to images of vehicle parts that are similar in appearance to a license plate or detection rate degradation due to local illumination changes. To alleviate these difficulties, the proposed license plate segmentation employs an adaptive binarization using a superpixel-based local contrast measurement. From the binarization, we apply a set of rules to a sequence of characters in a sub-image region to determine whether it is part of a license plate. This process is effective in reducing false alarms and improving detection rates. Our experimental results demonstrate a significant improvement over conventional methods.

  • Enhancing Underwater Color Images via Optical Imaging Model and Non-Local Means Denoising

    Dubok PARK  David K. HAN  Hanseok KO  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2017/04/07
      Vol:
    E100-D No:7
      Page(s):
    1475-1483

    This paper proposes a novel framework for enhancing underwater images captured by optical imaging model and non-local means denoising. The proposed approach adjusts the color balance using biasness correction and the average luminance. Scene visibility is then enhanced based on an underwater optical imaging model. The increase in noise in the enhanced images is alleviated by non-local means (NLM) denoising. The final enhanced images are characterized by improved visibility while retaining color fidelity and reducing noise. The proposed method does not require specialized hardware nor prior knowledge of the underwater environment.

  • Full Azimuth Multiple Sound Source Localization with 3-Channel Microphone Array

    Suwon SHON  David K. HAN  Jounghoon BEH  Hanseok KO  

     
    PAPER-Engineering Acoustics

      Vol:
    E95-A No:4
      Page(s):
    745-750

    This paper describes a method for estimating Direction Of Arrival (DOA) of multiple sound sources in full azimuth with three microphones. Estimating DOA with paired microphone arrays creates imaginary sound sources because of time delay of arrival (TDOA) being identical between real and imaginary sources. Imaginary sound sources can create chronic problems in multiple Sound Source Localization (SSL), because they can be localized as real sound sources. Our proposed approach is based on the observation that each microphone array creates imaginary sound sources, but the DOA of imaginary sources may be different depending on the orientation of the paired microphone array. With the fact that a real source would always be localized in the same direction regardless of the array orientation, we can suppress the imaginary sound sources by minimum filtering based on Steered Response Power – Phase Transform (SRP-PHAT) method. A set of experiments conducted in a real noisy environment showed that the proposed method was accurate in localizing multiple sound sources.

  • DNN Transfer Learning Based Non-Linear Feature Extraction for Acoustic Event Classification

    Seongkyu MUN  Minkyu SHIN  Suwon SHON  Wooil KIM  David K. HAN  Hanseok KO  

     
    LETTER-Speech and Hearing

      Pubricized:
    2017/06/09
      Vol:
    E100-D No:9
      Page(s):
    2249-2252

    Recent acoustic event classification research has focused on training suitable filters to represent acoustic events. However, due to limited availability of target event databases and linearity of conventional filters, there is still room for improving performance. By exploiting the non-linear modeling of deep neural networks (DNNs) and their ability to learn beyond pre-trained environments, this letter proposes a DNN-based feature extraction scheme for the classification of acoustic events. The effectiveness and robustness to noise of the proposed method are demonstrated using a database of indoor surveillance environments.

  • Channel and Frequency Attention Module for Diverse Animal Sound Classification

    Kyungdeuk KO  Jaihyun PARK  David K. HAN  Hanseok KO  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/09/17
      Vol:
    E102-D No:12
      Page(s):
    2615-2618

    In-class species classification based on animal sounds is a highly challenging task even with the latest deep learning technique applied. The difficulty of distinguishing the species is further compounded when the number of species is large within the same class. This paper presents a novel approach for fine categorization of animal species based on their sounds by using pre-trained CNNs and a new self-attention module well-suited for acoustic signals The proposed method is shown effective as it achieves average species accuracy of 98.37% and the minimum species accuracy of 94.38%, the highest among the competing baselines, which include CNN's without self-attention and CNN's with CBAM, FAM, and CFAM but without pre-training.

  • New Generalized Sidelobe Canceller with Denoising Auto-Encoder for Improved Speech Enhancement

    Minkyu SHIN  Seongkyu MUN  David K. HAN  Hanseok KO  

     
    LETTER-Speech and Hearing

      Vol:
    E100-A No:12
      Page(s):
    3038-3040

    In this paper, a multichannel speech enhancement system which adopts a denoising auto-encoder as part of the beamformer is proposed. The proposed structure of the generalized sidelobe canceller generates enhanced multi-channel signals, instead of merely one channel, to which the following denoising auto-encoder can be applied. Because the beamformer exploits spatial information and compensates for differences in the transfer functions of each channel, the proposed system is expected to resolve the difficulty of modelling relative transfer functions consisting of complex numbers which are hard to model with a denoising auto-encoder. As a result, the modelling capability of the denoising auto-encoder can concentrate on removing the artefacts caused by the beamformer. Unlike conventional beamformers, which combine these artefacts into one channel, they remain separated for each channel in the proposed method. As a result, the denoising auto-encoder can remove the artefacts by referring to other channels. Experimental results prove that the proposed structure is effective for the six-channel data in CHiME, as indicated by improvements in terms of speech enhancement and word error rate in automatic speech recognition.

  • Visual Speech Recognition Using Weighted Dynamic Time Warping

    Kyungsun LEE  Minseok KEUM  David K. HAN  Hanseok KO  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2015/04/09
      Vol:
    E98-D No:7
      Page(s):
    1430-1433

    It is unclear whether Hidden Markov Model (HMM) or Dynamic Time Warping (DTW) mapping is more appropriate for visual speech recognition when only small data samples are available. In this letter, the two approaches are compared in terms of sensitivity to the amount of training samples and computing time with the objective of determining the tipping point. The limited training data problem is addressed by exploiting a straightforward template matching via weighted-DTW. The proposed framework is a refined DTW by adjusting the warping paths with judicially injected weights to ensure a smooth diagonal path for accurate alignment without added computational load. The proposed WDTW is evaluated on three databases (two in the public domain and one developed in-house) for visual recognition performance. Subsequent experiments indicate that the proposed WDTW significantly enhances the recognition rate compared to the DTW and HMM based algorithms, especially under limited data samples.

  • A Novel Discriminative Feature Extraction for Acoustic Scene Classification Using RNN Based Source Separation

    Seongkyu MUN  Suwon SHON  Wooil KIM  David K. HAN  Hanseok KO  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2017/09/14
      Vol:
    E100-D No:12
      Page(s):
    3041-3044

    Various types of classifiers and feature extraction methods for acoustic scene classification have been recently proposed in the IEEE Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Challenge Task 1. The results of the final evaluation, however, have shown that even top 10 ranked teams, showed extremely low accuracy performance in particular class pairs with similar sounds. Due to such sound classes being difficult to distinguish even by human ears, the conventional deep learning based feature extraction methods, as used by most DCASE participating teams, are considered facing performance limitations. To address the low performance problem in similar class pair cases, this letter proposes to employ a recurrent neural network (RNN) based source separation for each class prior to the classification step. Based on the fact that the system can effectively extract trained sound components using the RNN structure, the mid-layer of the RNN can be considered to capture discriminative information of the trained class. Therefore, this letter proposes to use this mid-layer information as novel discriminative features. The proposed feature shows an average classification rate improvement of 2.3% compared to the conventional method, which uses additional classifiers for the similar class pair issue.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.