1. Introduction
With the continuous development of social industrialization, air pollution has become more and more serious, and the occurrence frequency of haze in the atmosphere has become more frequent. Atmospheric particles suspended in fog and haze weather absorb and scatter light, resulting in reduced contrast, loss of detail, and color shift in images acquired by imaging equipment, which is not conducive to image recognition and application. The occurrence of fog and haze will not only affect the perception of the pictures taken, but also have a serious impact on the computer vision system whose main work is image processing. The classic image dehazing methods include image enhancement and image restoration.
The dehazing method based on image enhancement is mainly to enhance the contrast and saturation of the details in the haze image, highlighting the interesting parts of human vision. These include histogram equalization [1], Retinex algorithm [2], etc. The methods based on image restoration mainly use physical models and estimate various parameters to solve clear images. Researchers have proposed various prior knowledge to scientifically estimate parameters in the physical model to improve the effect of dehazing. Among them, the prior with great influence is the method based on dark channel prior (DCP) proposed by He et al [3]. A large number of image dehazing works are improved or expanded based on DCP [4]-[8]. However, these methods suffer from poor dehazing robustness due to the limitations of physical models and handcrafted priors.
With the rapid development of deep learning, it has been widely used in the field of image dehazing. Early deep dehazing models were based on convolutional neural networks [9]-[14]. In recent years, with the development of Transformer architecture [15], Transformer-based image dehazing models have been continuously proposed [16]-[21]. Although the dehazing network based on the Transformer architecture has excellent performance due to its long-distance representation capability, it greatly increases the number of parameters and computational overhead of the network, which is not conducive to deployment in terminal devices with real-time requirements. Therefore, we propose a lightweight dehazing network IAD-Net based on image attention. IAD-Net has a lower number of parameters and computational overhead, and has better dehazing performance. Our contributions can be summarized as:
-
We propose a novel attention mechanism, called image attention, which is a plug-and-play module with global modeling capabilities that can better characterize the features of images.
-
We design a lightweight depth image dehazing model IAD-Net based on Image Attention. IAD-Net is a parallel lightweight network structure that combines the global modeling ability of image attention and the local modeling ability of convolution, enabling the network to learn global and local features and fuse them. IAD-Net has good feature learning and feature expression capabilities, low computational overhead, and can restore image details while removing haze.
-
We conduct experimental comparisons with state-of-the-art methods. Experimental results show that the proposed image attention module can effectively improve the dehazing ability, and IAD-Net is competitive with state-of-the-art image dehazing methods.
2. Related Work
The earliest work on the image dehazing algorithm based on deep learning is the trainable end-to-end model DehazeNet proposed by Cai et al. [9], which is used to estimate the transmission map. Li et al. [10] reconstructed the atmospheric scattering model and proposed AODNet. Ren et al. [14] proposed MSCNN to roughly estimate and refine the transmission map. With the development of the deep self-attention structure Transformer [15], many researchers have proposed dehazing networks based on the Transformer structure, such as HyLoG-ViT [16], Transweather [17], DeHamer [18], and DehazeFormer [19]. Although Transformer-based dehazing networks have achieved great dehazing performance, the parameter and computational overhead are large, which limits real-time applications in terminals. Therefore, we propose a novel image attention mechanism and dehazing network IAD-Net, which greatly reduces the number of parameters and computational overhead while ensuring dehazing performance.
3. Proposed Method
3.1 Motivation
The proposed lightweight image dehazing model IAD-Net is an end-to-end structure, which allows the model to directly learn a mapping from a hazy image to a clear image without considering the estimation of parameters based on the physical model. This can well avoid problems such as inaccurate estimation of parameters by various prior knowledge and the limitations of the atmospheric scattering model (ASM). In addition, most existing image dehazing models based on convolutional neural networks continuously increase the depth and width of the network to improve the fitting ability of the network, or improve performance through the Transformer module. However, this will greatly increase the network parameters and make it difficult to train and deploy. Therefore, we propose a novel attention mechanism, called image attention, which has global modeling capabilities and is integrated with traditional convolutional neural network modules to improve performance while reducing model parameters and computational overhead.
3.2 Architecture of IAD-Net
The overall architecture of IAD-Net is shown in Fig. 1, where \(1 \times 1\) CONV represents a convolution operation of size \(1 \times 1\), and \(3 \times 3\) CONV represents a convolution operation of size \(3 \times 3\). “Image-Attention” represents the proposed image attention module, “Attention” represents the attention calculation module, and “CNN” represents the convolutional neural network module. IAD-Net can be seen as a parallel architecture, that is, the “Image-Attention” module and the “CNN” module are used to extract features from the input hazy images and integrate global and local features to make the network have better representation ability.
The CNN module in IDA-Net uses the down-sampling and up-sampling modules in the U-Net [22] structure, as shown in Fig. 2. This module can extract multi-scale features, and the number of parameters is very low.
3.3 Module of Image Attention
The proposed image attention module consists of three parallel convolutions and an attention calculation module. First, the input image is extracted through a \(1 \times 1\) convolution operation, and then different high-level features are extracted through three parallel \(1 \times 1\) convolutions, named \(V\), \(K\), and \(P\) respectively. Then the global features are extracted through the following attention calculation method:
\[\begin{equation*} \begin{aligned} & Attention(V,K,P) \\ &= \sum_{j=1}^{N}\sum_{i=1}^{N}(\mathrm{Softmax}(V^{j}\times\mathrm{Softmax}(P^{i}\otimes K^{j}))), \end{aligned} \tag{1} \end{equation*}\] |
The feature map of the input image is padded to a multiple of 16, and then the image is cropped into image blocks of \(16\times16\) size. \(N\) represents the number of image blocks, \(\times\) represents dot multiplication operation, and \(\otimes\) represents matrix multiplication. The proposed image attention calculations are all based on image patches. The calculation is as follows:
(1) Cut the extracted tensor images \(V\), \(K\), and \(P\) into \(N\) image blocks respectively through parallel convolution operations;
(2) Calculate attention by traversing each image patch and then concat each patch to obtain the attention result of the entire image. The attention calculation method for each image patch is \(\mathrm{Softmax}(V^{j}\times\mathrm{Softmax}(P^{i}\otimes K^{j}))\), where \(i\) and \(j\) represent the patch number, \(V^{j}\), \(P^{i}\) and \(K^{j}\) represent one of the patches, and \(\mathrm{Softmax}\) is a common normalization function in deep learning.
The proposed image attention module combines the advantages of convolution and attention, combining the local feature extraction of convolution with the global feature extraction of attention.
3.4 Loss Function
The proposed IDA-Net is a very lightweight model, and the loss function uses a very simple L1 loss function, as shown below:
\[\begin{equation*} \zeta =\frac{1}{Num}\sum_{i=1}^{Num}\left \| \boldsymbol{J}^{i}-\boldsymbol{J}_{GT}^{i} \right \|_{1}, \tag{2} \end{equation*}\] |
where \(Num\) represents the number of images participating in training, \(\boldsymbol{J}\) represents the output image obtained by the proposed model, and \(\boldsymbol{J}_{GT}^{i}\) represents the corresponding ground truth (GT) image.
4. Experiments
4.1 Implementation Details
IAD-Net is implemented on devices equipped with GTX 1080Ti GPUs. For synthetic hazy images, IAD-Net is trained on the outdoor subset OTS of the RESIDE dataset [30] and tested on the outdoor test set SOTS-Outdoor. For real-world hazy images, IAD-Net is trained on the O-Haze dataset [23], tested on the NH-Haze dataset [24], and then vice versa.
4.2 Comparison with State-of-the-Art Methods
To verify the performance of IAD-Net, 8 state-of-the-art dehazing methods were selected for qualitative and quantitative experimental comparisons, including: DCP [3], DehazeNet [9], AODNet [10], GridDehazeNet [25], FFANet [26], RefineDNet [27], PSD [28], and D4 [29]. We use the codes and models disclosed by these methods to conduct experimental comparisons on the SOTS-Outdoor, O-Haze, and NH-Haze datasets. The comparison of visual dehazing effects is shown in Fig. 3. The images restored by DCP and AODNet are darker. In addition, the images restored by DCP, AODNet, RefineDNet, and PSD have certain distortions in the sky area. DehazeNet, GridDehazeNet, FFANet, and D4 make it difficult to eliminate the influence of haze in real-world hazy images, but our method has an excellent restoration effect on both synthetic hazy images and real-world hazy images and is closest to the ground truth image.
In addition, quantitative experimental comparisons were conducted on all images of these three datasets, and the average evaluation indexes PSNR and SSIM scores were calculated. As shown in Table 1, our method achieved the highest score, which strongly illustrates the excellent performance of the proposed method.
4.3 Ablation Study
This section mainly studies the impact of IAD-Net’s Image-Attention module and CNN module on network performance through ablation experiments. We removed these two modules respectively, retrained on the ITS dataset, and then tested on the SOTS-Indoor dataset. The results are shown in Table 2. It can be found that adding each module can improve the PSNR and SSIM scores, which illustrates the effectiveness of these two modules.
5. Conclusions
In this letter, we propose a novel lightweight deep dehazing model IAD-Net, which contains a novel image attention module (Image-Attention) and is integrated with the traditional convolutional neural network. IAD-Net has the local feature extraction capabilities of traditional convolution and the long-distance modeling capability of the attention mechanism. It can effectively extract local and global features of images and has good representation capabilities. Experimental results show that IAD-Net has excellent performance on both synthetic hazy images and real-world hazy images. However, our model also has some shortcomings, such as the restored images have slight color distortion. This may be due to the failure of our model to extract color information well. This is a direction for further research in the future.
Acknowledgments
This work is supported by the Natural Science Research Key Project of Department of Education Anhui Province, China (Grant NO. 2022AH051828).
References
[1] W. Kim, J. You, and J. Jeong, “Contrast enhancement using histogram equalization based on logarithmic mapping,” Optical Engineering, vol.51, no.6, 067002, 2012.
CrossRef
[2] H. Li, W.H. Xie, X.G. Wang, S.S. Liu, Y.Y. Gai, and L. Yang, “Gpu implementation of multi-scale retinex image enhancement algorithm,” 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), pp.1-5, 2016.
CrossRef
[3] K.M. He, J. Sun, and X.O. Tang, “Single image haze removal using dark channel prior,” IEEE Trans. Pattern Anal. Mach. Intell., vol.33, no.12, pp.2341-2353, 2011.
CrossRef
[4] Z. Wei, G. Zhu, X. Liang, W. Liu, “An image fusion dehazing algorithm based on dark channel prior and retinex,” International Journal of Computational Science and Engineering, vol.23, no.2, pp.115-123, 2020.
CrossRef
[5] X.P. Yuan, Y.Y. Chen, and H. Shi, “Improved image dehazing algorithm based on haze-line and dark channel prior,” Laser & Optoelectronics Progress, vol.59, no.8, 0810014, 2022.
[6] Y. Wang, T.-Z. Huang, X.-L. Zhao, L.-J. Deng, and T.-Y. Ji, “A convex single image dehazing model via sparse dark channel prior,” Applied Mathematics and Computation, vol.375, 125085, 2020.
CrossRef
[7] Z. Lu, B. Long, and S. Yang, “Saturation Based Iterative Approach for Single Image Dehazing,” IEEE Signal Process. Lett., vol.27, pp.665-669, 2020.
CrossRef
[8] Y. Liu, Z.S. Yan, J.G. Tan, and Y.C. Li, “Multi-Purpose Oriented Single Nighttime Image Haze Removal Based on Unified Variational Retinex Model,” IEEE Trans. Circuits Syst. Video Technol., vol.33, no.4, pp.1643-1657, 2023.
CrossRef
[9] B. Cai, X.M. Xu, K. Jia, C.M. Qing, and D.C. Tao, “Dehazenet: An end-to-end system for single image haze removal,” IEEE Trans. Image Process., vol.25, no.11, pp.5187-5198, 2016.
CrossRef
[10] B. Li, X. Peng, Z.Y. Wang, D. Xu, and J.Z. Feng, “Aod-net: All-in-one dehazing network,” Proc. IEEE International Conference on Computer Vision, pp.4780-4788, 2017.
[11] W. Ren, L. Ma, J. Zhang, J. Pan, X. Cao, W. Liu, and M.H. Yang, “Gated fusion network for single image dehazing,” Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp.3253-3261, 2018.
[12] D. Chen, M. He, Q. Fan, J. Liao, L. Zhang, D. Hou, L. Yuan, and G. Hua, “Gated context aggregation network for image dehazing and deraining,” Proc. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp.1375-1383, 2019.
[13] X. Qin, Z. Wang, Y. Bai, X. Xie, and H. Jia, “FFA-Net: Feature fusion attention network for single image dehazing,” Proc. AAAI Conference on Artificial Intelligence, vol.34, no.7, pp.11908-11915, 2020.
CrossRef
[14] W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and M.H. Yang, “Single image dehazing via multi-scale convolutional neural networks,” Proc. European Conference on Computer Vision, pp.154-169, Springer, 2016.
[15] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol.30, 2017.
[16] D. Zhao, J. Li, H. Li, and L. Xu, “Hybrid local-global transformer for image dehazing,” arXiv preprint arXiv:2109.07100, 2021.
[17] J.M.J. Valanarasu, R. Yasarla, and V.M. Patel, “Transweather: Transformer-based restoration of images degraded by adverse weather conditions,” Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.2353-2363, 2022.
[18] C.L. Guo, Q.X. Yan, S. Anwar, R.M. Cong, W.Q. Ren, and C.Y. Li, “Image dehazing transformer with transmission-aware 3D position embedding,” Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.5812-5820, 2022.
[19] Y.D. Song, Z.Q. He, H. Qian, and X. Du, “Vision transformers for single image dehazing,” IEEE Trans. Image Process., vol.32, pp.1927-1941, 2023.
CrossRef
[20] H. Zhou, Z.K. Chen, Y. Liu, Y.P. Sheng, W.Q. Ren, and H.L. Xiong, “Physical-priors-guided DehazeFormer,” Knowledge-Based Systems, vol.266, 110410, 2023.
CrossRef
[21] Y. Liu, Z.S. Yan, S.X. Chen, T. Ye, W.Q. Ren, E, Chen, J.G. Tan, and Y.C. Li, “Nighthazeformer: Single nighttime haze removal using prior query transformer,” Proc. 31st ACM International Conference on Multimedia, pp.4119-4128, 2023.
CrossRef
[22] S.W. Zhang and C.L. Zhang, “Modified U-Net for plant diseased leaf image segmentation,” Computers and Electronics in Agriculture, vol.204, 107511, 2023.
CrossRef
[23] C.O. Ancuti, C. Ancuti, R. Timofte, and C.D. Vleeschouwer, “O-haze: A dehazing benchmark with real hazy and haze-free outdoor images,” Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.754-762, 2018.
[24] C.O. Ancuti, C. Ancuti, and R. Timofte, “NH-HAZE: An image dehazing benchmark with non-homogeneous hazy and haze-free images,” Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.444-445, 2020.
[25] X.H. Liu, Y.R. Ma, Z.H. Shi, and J. Chen, “Griddehazenet: Attention-based multi-scale network for image dehazing,” Proc. IEEE/CVF International Conference on Computer Vision, pp.7314-7323, 2019.
[26] X. Qin, Z. Wang, Y. Bai, X. Xie, and H. Jia, “FFA-Net: Feature fusion attention network for single image dehazing,” Proc. AAAI Conference on Artificial Intelligence, vol.34, no.7, pp.11908-11915, 2020.
CrossRef
[27] S. Zhao, L. Zhang, Y. Shen, and Y. Zhou, “RefineDNet: A weakly supervised refinement framework for single image dehazing,” IEEE Trans. Image Process., vol.30, pp.3391-3404, 2021.
CrossRef
[28] Z. Chen, Y. Wang, Y. Yang, and D. Liu, “PSD: Principled synthetic-to-real dehazing guided by physical priors,” Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.7180-7189, 2021.
[29] Y. Yang, C. Wang, R. Liu, L. Zhang, X. Guo, and D. Tao, “Self-augmented unpaired image dehazing via density and depth decomposition,” Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.2037-2046, 2022.
[30] B.Y. Li, W.Q. Ren, D.P. Fu, D.C. Tao, D. Feng, W.J. Zeng, and Z.Y. Wang, “Reside: A benchmark for single image dehazing,” IEEE Trans. Image Process., vol.28, no.1, pp.492-505, 2019.
CrossRef