IEICE globals.ieice.org Site

Keyword Search Result

[Keyword] mixed-precision computing(1hit)

1-1hit

Accelerating CNN Inference with an Adaptive Quantization Method Using Computational Complexity-Aware Regularization Open Access
Kengo NAKATA Daisuke MIYASHITA Jun DEGUCHI Ryuichi FUJIMOTO

PAPER-Neural Networks and Bioengineering

Pubricized:
2024/08/05
Vol:
E108-A No:2
Page(s):
149-159
Quantization is commonly used to reduce the inference time of convolutional neural networks (CNNs). To reduce the inference time without drastically reducing accuracy, optimal bit widths need to be allocated for each layer or filter of the CNN. In conventional methods, the optimal bit allocation is obtained by using the gradient descent algorithm while minimizing the model size. However, the model size has little to no correlation with the inference time. In this paper, we present a computational-complexity metric called MAC×bit that is strongly correlated with the inference time of quantized CNNs. We propose a gradient descent-based regularization method that uses this metric for optimal bit allocation of a quantized CNN to improve the recognition accuracy and reduce the inference time. In experiments, the proposed method reduced the inference time of a quantized ResNet-18 model by 21.0% compared with the conventional regularization method based on model size while maintaining comparable recognition accuracy.

Latest Issue

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.

IEICE DIGITAL LIBRARY

Select the flag icon

English

Links

Call for Papers

Call for Papers

Special Section

Submit to IEICE Trans.

Submit to IEICE Trans.

Information for Authors

Transactions NEWS

Transactions NEWS

Popular articles

Popular articles

Top 10 Downloads