1-4hit |
Shihao WANG Dajiang ZHOU Jianbin ZHOU Takeshi YOSHIMURA Satoshi GOTO
In this paper, VLSI architecture design of unified motion vector (MV) and boundary strength (BS) parameter decoder (PDec) for 8K UHDTV HEVC decoder is presented. The adoption of new coding tools in PDec, such as Advanced Motion Vector Prediction (AMVP), increases the VLSI hardware realization overhead and memory bandwidth requirement, especially for 8K UHDTV application. We propose four techniques for these challenges. Firstly, this work unifies MV and BS parameter decoders for line buffer memory sharing. Secondly, to support high throughput, we propose the top-level CU-adaptive pipeline scheme by trading off between implementation complexity and performance. Thirdly, PDec process engine with optimizations is adopted for 43.2k area reduction. Finally, PU-based coding scheme is proposed for 30% DRAM bandwidth reduction. In 90nm process, our design costs 93.3k logic gates with 23.0kB line buffer. The proposed architecture can support real-time decoding for 7680x4320@60fps application at 249MHz in the worst case.
Jianbin ZHOU Dajiang ZHOU Shihao WANG Takeshi YOSHIMURA Satoshi GOTO
8K Ultra High Definition Television (UHDTV) requires extremely high throughput for video decoding based on H.265. In H.265, intra coding could significantly enhance video compression efficiency, at the expense of an increased computational complexity compared with H.264. For intra prediction of 8K UHDTV real-time H.265 decoding, the joint complexity and throughput issue is more difficult to solve. Therefore, based on the divide-and-conquer strategy, we propose a new VLSI architecture in this paper, including two techniques, in order to achieve 8K UHDTV H.265 intra prediction decoding. The first technique is the LUT based Reference Sample Fetching Scheme (LUT-RSFS), reducing the number of reference samples in the worst case from 99 to 13. It further reduces the circuit area and enhances the performance. The second one is the Hybrid Block Reordering and Data Forwarding (HBRDF), minimizing the idle time and eliminating the dependency between TUs by creating 3 Data Forwarding paths. It achieves the hardware utilization of 94%. Our design is synthesized using Synopsys Design Compiler in 40nm process technology. It achieves an operation frequency of 260MHz, with a gate count of 217.8K for 8-bit design, and 251.1K for 10-bit design. The proposed VLSI architecture can support 4320p@120fps H.265 intra decoding (8-bit or 10-bit), with all 35 intra prediction modes and prediction unit sizes ranging from 4×4 to 64×64.
Jianbin ZHOU Dajiang ZHOU Li GUO Takeshi YOSHIMURA Satoshi GOTO
This paper presents a measurement-domain intra prediction coding framework that is compatible with compressive sensing (CS)-based image sensors. In this framework, we propose a low-complexity intra prediction algorithm that can be directly applied to measurements captured by the image sensor. We proposed a structural random 0/1 measurement matrix, embedding the block boundary information that can be extracted from the measurements for intra prediction. Furthermore, a low-cost Very Large Scale Integration (VLSI) architecture is implemented for the proposed framework, by substituting the matrix multiplication with shared adders and shifters. The experimental results show that our proposed framework can compress the measurements and increase coding efficiency, with 34.9% BD-rate reduction compared to the direct output of CS-based sensors. The VLSI architecture of the proposed framework is 9.1 Kin area, and achieves the 83% reduction in size of memory bandwidth and storage for the line buffer. This could significantly reduce both the energy consumption and bandwidth in communication of wireless camera systems, which are expected to be massively deployed in the Internet of Things (IoT) era.
Jianbin ZHOU Dajiang ZHOU Takeshi YOSHIMURA Satoshi GOTO
Compressed Sensing based CMOS image sensor (CS-CIS) is a new generation of CMOS image sensor that significantly reduces the power consumption. For CS-CIS, the image quality and data volume of output are two important issues to concern. In this paper, we first proposed an algorithm to generate a series of deterministic and ternary matrices, which improves the image quality, reduces the data volume and are compatible with CS-CIS. Proposed matrices are derived from the approximate DCT and trimmed in 2D-zigzag order, thus preserving the energy compaction property as DCT does. Moreover, we proposed matrix row operations adaptive to the proposed matrix to further compress data (measurements) without any image quality loss. At last, a low-cost VLSI architecture of measurements compression with proposed matrix row operations is implemented. Experiment results show our proposed matrix significantly improve the coding efficiency by BD-PSNR increase of 4.2 dB, comparing with the random binary matrix used in the-state-of-art CS-CIS. The proposed matrix row operations for measurement compression further increases the coding efficiency by 0.24 dB BD-PSNR (4.8% BD-rate reduction). The VLSI architecture is only 4.3 K gates in area and 0.3 mW in power consumption.