1-7hit |
Ahmed SWILEM Kousuke IMAMURA Hideo HASHIMOTO
In this paper, we propose two fast codebook generation algorithms for entropy-constrained vector quantization. The first algorithm uses the angular constraint to reduce the search area and to accelerate the search process in the codebook design. It employs the projection angles of the vectors to a reference line. The second algorithm has feature of using a suitable hyperplane to partition the codebook and image data. These algorithms allow significant acceleration in codebook design process. Experimental results are presented on image block data. These results show that our new algorithms perform better than the previously known methods.
Masayuki MIYAMA Junichi MIYAKOSHI Kousuke IMAMURA Hideo HASHIMOTO Masahiko YOSHIMOTO
This paper describes a VLSI-oriented motion estimation algorithm using a steepest descent method (SDM) applied to MPEG-4 visual communication with a mobile terminal. The SDM algorithm is optimized for QCIF or CIF resolution video and VLSI implementation. The SDM combined with a subblock search method is developed to enhance picture quality. Simulation results show that a mean PSNR drop of the SDM algorithm processing QCIF 15 fps resolution video in comparison with a full search algorithm is -0.17 dB. Power consumption of a VLSI based on the SDM algorithm assuming 0.18 µm CMOS technology is estimated at 2 mW. The VLSI attains higher picture quality than that based on the other fast motion estimation algorithm, and is applicable to mobile video applications.
Yu SUZUKI Masato ITO Satoshi KANDA Kousuke IMAMURA Yoshio MATSUDA Tetsuya MATSUMURA
This paper describes the design and implementation of a real-time optical flow processor using a single field-programmable gate array (FPGA) chip. By introducing the modified initial flow generation method, the successive over-relaxation (SOR) method for both layers, the optimization of the reciprocal operation method, and the image division method, it is now possible to both reduce hardware requirements and improve flow accuracy. Additionally, by introducing a pipeline structure to this processor, high-throughput hardware implementation could be achieved. Total logic cell (LC) amounts and processer memory capacity are reduced by about 8% and 16%, respectively, compared to our previous hierarchical optical flow estimation (HOE) processor. The results of our evaluation confirm that this processor can perform 30 fps wide extended graphics array (WXGA) 175.7MHz real-time optical flow processing with a single FPGA.
Reo AOKI Kousuke IMAMURA Akihiro HIRANO Yoshio MATSUDA
Recently, Super-resolution convolutional neural network (SRCNN) is widely known as a state of the art method for achieving single-image super resolution. However, performance problems such as jaggy and ringing artifacts exist in SRCNN. Moreover, in order to realize a real-time upconverting system for high-resolution video streams such as 4K/8K 60 fps, problems such as processing delay and implementation cost remain. In the present paper, we propose high-performance super-resolution via patch-based deep neural network (SR-PDNN) rather than a convolutional neural network (CNN). Despite the very simple end-to-end learning system, the SR-PDNN achieves higher performance than the conventional CNN-based approach. In addition, this system is suitable for ultra-low-delay video processing by hardware implementation using an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA).
Noriyuki MINEGISHI Junichi MIYAKOSHI Yuki KURODA Tadayoshi KATAGIRI Yuki FUKUYAMA Ryo YAMAMOTO Masayuki MIYAMA Kousuke IMAMURA Hideo HASHIMOTO Masahiko YOSHIMOTO
An optical flow processor architecture is proposed. It offers accuracy and image-size scalability for video segmentation extraction. The Hierarchical Optical flow Estimation (HOE) algorithm [1] is optimized to provide an appropriate bit-length and iteration number to realize VLSI. The proposed processor architecture provides the following features. First, an algorithm-oriented data-path is introduced to execute all necessary processes of optical flow derivation allowing hardware cost minimization. The data-path is designed using 4-SIMD architecture, which enables high-throughput operation. Thereby, it achieves real-time optical flow derivation with 100% pixel density. Second, it has scalable architecture for higher accuracy and higher resolution. A third feature is the CMOS-process compatible on-chip 2-port DRAM for die-area reduction. The proposed processor has performance for CIF 30 fr/s with 189 MHz clock frequency. Its estimated core size is 6.025.33 mm2 with six-metal 90-nm CMOS technology.
Masayuki MIYAMA Osamu TOOYAMA Naoki TAKAMATSU Tsuyoshi KODAKE Kazuo NAKAMURA Ai KATO Junichi MIYAKOSHI Kousuke IMAMURA Hideo HASHIMOTO Satoshi KOMATSU Mikio YAGI Masao MORIMOTO Kazuo TAKI Masahiko YOSHIMOTO
This paper describes an ultra low power, motion estimation (ME) processor for MPEG2 HDTV resolution video. It adopts a Gradient Descent Search (GDS) algorithm that drastically reduces required computational power to 6 GOPS. A SIMD datapath architecture optimized for the GDS algorithm decreases the clock frequency and operating voltage. A low power 3-port SRAM with a write-disturb-free cell array arrangement is newly designed for image data caches of the processor. The proposed ME processor contains 7-M transistors, integrated in 4.50 mm 3.35 mm area using 0.13 µm CMOS technology. Estimated power consumption is less than 100 mW at 81 MHz@1.0 V. The processor is applicable to a portable HDTV system.
Kousuke IMAMURA Ryota HONDA Yoshifumi KAWAMURA Naoki MIURA Masami URANO Satoshi SHIGEMATSU Tetsuya MATSUMURA Yoshio MATSUDA
The development of an extremely efficient packet inspection algorithm for lookup engines is important in order to realize high throughput and to lower energy dissipation. In this paper, we propose a new lookup engine based on a combination of a mismatch detection circuit and a linked-list hash table. The engine has an automatic rule registration and deletion function; the results are that it is only necessary to input rules, and the various tables included in the circuits, such as the Mismatch Table, Index Table, and Rule Table, will be automatically configured using the embedded hardware. This function utilizes a match/mismatch assessment for normal packet inspection operations. An experimental chip was fabricated using 40-nm 8-metal CMOS process technology. The chip operates at a frequency of 100MHz under a power supply voltage of VDD =1.1V. A throughput of 100Mpacket/s (=51.2Gb/s) is obtained at an operating frequency of 100MHz, which is three times greater than the throughput of 33Mpacket/s obtained with a conventional lookup engine without a mismatch detection circuit. The measured energy dissipation was a 1.58pJ/b·Search.