1-13hit |
Yiqing HUANG Xiaocong JIN Jin ZHOU Jia SU Takeshi IKENAGA
One high profile intra predictor generation engine is proposed in this paper. Firstly, hardware level algorithm optimization for intra 88 (I8MB) mode is introduced. The original candidate pixels for generating prediction samples of I8MB are replaced with boundary pixels of intra 44 (I4MB) blocks. Based on this adoption, full data reuse between predictors of I4MB and filtered samples of I8MB can be achieved with almost no quality loss. Secondly, one lossless two-44-block based parallel predictor generation flow is proposed. The original predictor generation flow is optimized from 16 stages to 10 stages for I4MB and Intra 1616 (I16MB), which saves 37.5% processing cycles. For I8MB, similar methodology with different processing order of 44 scaled blocks is introduced. Thirdly, fully utilized hardwired engines for I4MB, I16MB and I8MB are proposed in this paper. Except DC (direct current) and plane modes, full data reuse among all intra modes of high profile can be achieved. Fourthly, for DC mode, one combined predictor generation process is introduced and predictor generation of I16MB's DC mode is merged into the process of I4MB's DC mode. Moreover, by configuring proposed hardwired engines, predictor generation of I16MB's plane mode and chrominance plane mode can be accomplished with only 50% cycles of original design. Totally, when compared with original full-mode design and latest dynamic mode reused design, the proposed predictor generation engine can achieve 89.5% and 73.2% saving of processing cycles, respectively. Synthesized by TSMC 0.18 µm technology under worst work conditions (1.62 V, 125°C), with 380 MHz and 37.2 k gates, the proposed design can handle real-time high profile intra predictor generation of Super Hi-Vision 4 k4 k@60 fps. The maximum work frequency of our design under worst condition is 468 MHz.
Qin LIU Yiqing HUANG Satoshi GOTO Takeshi IKENAGA
Compared with previous standards, H.264/AVC adopts variable block size motion estimation (VBSME) and multiple reference frames (MRF) to improve the video quality. Full search motion estimation algorithm (FS), which calculates every search candidate in the search window for 7 block type with multiple reference frames, consumes massive computation power. Mathematical analysis reveals that the aliasing problem of subsampling algorithm comes from high frequency signal components. Moreover, high frequency signal components are also the main issues that make MRF algorithm essential. As we know, a picture being rich of texture must contain lots of high frequency signals. So based on these mathematical investigations, two fast VBSME algorithms are proposed in this paper, namely edge block detection based subsampling method and motion vector based MRF early termination algorithm. Experiments show that strong correlation exists among the motion vectors of those blocks belonging to the same macroblock. Through exploiting this feature, a dynamically adjustment of the search ranges of integer motion estimation is proposed in this paper. Combing our proposed algorithms with UMHS almost saves 96-98% Integer Motion Estimation (IME) time compared to the exhaustive search algorithm. The induced coding quality loss is less than 0.8% bitrate increase or 0.04 dB PSNR decline on average.
Yiqing HUANG Qin LIU Shuijiong WU Zhewen ZHENG Takeshi IKENAGA
One fast inter mode decision algorithm is proposed in this paper. The whole algorithm is divided into two stages. In the pre-stage, by exploiting spatial and temporal information of encoded macrobocks (MBs), a skip mode early detection scheme is proposed. The homogeneity of current MB is also analyzed to filter out small inter modes in this stage. Secondly, during the block matching stage, a motion feature based inter mode decision scheme is introduced by analyzing the motion vector predictor's accuracy, the block overlapping situation and the smoothness of SAD (sum of absolute difference) value. Moreover, the rate distortion cost is checked in an early stage and we set some constraints to speed up the whole decision flow. Experiments show that our algorithm can achieve a speed up factor of up to 53.4% for sequences with different motion type. The overall bit increment and quality degradation is negligible compared with existing works.
Qin LIU Yiqing HUANG Satoshi GOTO Takeshi IKENAGA
H.264 is the latest HDTV video compression standard, which provides a significant improvement in coding efficiency at the cost of huge computation complexity. After transform and quantization, if all the coefficients of the block's residue data are zero, this block is called all-zero block (AZB). Provided that an AZB can be detected early, the process of transform and quantization on an AZB can be skipped, which reduces significant redundant computations. In this paper, a theoretical analysis is performed for the sufficient condition for AZB detection. As a result, a partial sum of absolute difference (SAD) based 44 AZB detection algorithm is derived. And then, a hardware-oriented AZB detection algorithm is proposed by modifying the order of SAD calculation. Furthermore, a quantization parameter (QP) oriented 88 AZB detection algorithm is proposed according to the AZB's statistical analysis. Experimental results show that the proposed algorithm outperforms the previous methods in all cases and achieves major improvement of computation reduction in the range from 6.7% to 42.3% for 44 blocks, from 0.24% to 79.48% for 88 blocks. The computation reduction increases as QP increases.
Xiaocong JIN Jun SUN Yiqing HUANG Jia SU Takeshi IKENAGA
Different encoding modes for variable block size are available in the H.264/AVC standard in order to offer better coding quality. However, this also introduces huge computation time due to the exhaustive check for all modes. In this paper, a fast spatial DIRECT mode decision method for profiles supporting B frame encoding (main profile, high profile, etc.) in H.264/AVC is proposed. Statistical analysis on multiple video sequences is carried out, and the strong relationship of mode selection and rate-distortion (RD) cost between the current DIRECT macroblock (MB) and the co-located MBs is observed. With the check of mode condition, predicted RD cost threshold and dynamic parameter update model, the complex mode decision process can be terminated at an early stage even for small QP cases. Simulation results demonstrate the proposed method can achieve much better performance than the original exhaustive rate-distortion optimization (RDO) based mode decision algorithm by reducing up to 56.8% of encoding time for IBPBP picture group and up to 67.8% of encoding time for IBBPBBP picture group while incurring only negligible bit increment and quality degradation.
Yiqing HUANG Qin LIU Takeshi IKENAGA
In H.264/AVC standard, many new techniques such as variable block size (VBS) and multiple reference frame (MRF) are used in motion estimation (ME) part to achieve superior coding performance. However, the use of new techniques will also cause great burden on computation complexity, which leads to problems in low power hardware implementation. Many software based fast ME algorithms are proposed to reduce complexity. For real-time hardwired encoder, the huge throughput of fractional motion estimation (FME) and integer motion estimation (IME) makes pipeline stage a must. In this case, IME is arranged in a single stage, which deteriorates the efficiency of many software based algorithms. Based on the hardware data flow, this paper provides a complexity reduction algorithm which speeds up ME procedure through three schemes. Firstly, the proposed algorithm executes similarity analysis to detect big mode MB and apply early termination in IME stage. Secondly, for normal MB, motion feature is extracted after IME of each frame and a 6-ring based search range adjustment scheme is introduced to remove redundant search positions. Thirdly, for MBs which have large motion feature, the pixel difference is very small due to the blur effect on video sensor. So, we use subsampling technique to reduce computation complexity for such MBs. Experimental results show that, compared with hardware friendly full search algorithm, the proposed fast ME algorithm can reduce 52.63% to 83.21% ME time with negligible video quality degradation. Furthermore, since the proposed algorithm works in a hardware friendly way, it can be embedded into 3-stage real-time hardwired video encoder to achieve low power design.
Yiqing HUANG Qin LIU Satoshi GOTO Takeshi IKENAGA
One VLSI friendly fast motion estimation (ME) algorithm is proposed in this paper. Firstly, theoretical analysis shows that image rich of sharp edges and texture is regarded as high frequency abundant image and macroblocks (MBs) in such image will express large pixel difference. In our paper, we apply adaptive subsampling method during ME process based on pixel difference analysis, so the computation complexity of full pixel pattern can be reduced. Secondly, statistic analysis shows that for MBs with static feature, the ratio of selecting previous reference frame as best one is very high and multiple reference frame technique is not required for these MBs. Based on this analysis, we give out a block overlapping method to pick out static MBs and apply MRF elimination process. Thirdly, since many redundant search positions exist in MB with small motion trend and large search range is only contributive to MB with big motion, we extract motion feature after ME on first reference frame and use it to adjust search range for rest ME process. So, the computation complexity of redundant search positions is eliminated. Experimental results show that, compared with hardware friendly full search algorithm, our proposed algorithm can reduce 71.09% to 95.26% ME time with negligible video quality degradation. Moreover, our fast algorithm can be combined with existing fast ME algorithms like UMHexagon method for further reduction in complexity and it is friendly to hardware implementation.
Shuijiong WU Peilin LIU Yiqing HUANG Qin LIU Takeshi IKENAGA
H.264/AVC encoder employs rate control to adaptively adjust quantization parameter (QP) to enable coded video to be transmitted over a constant bit-rate (CBR) channel. In this topic, bit allocation is crucial since it is directly related with actual bit generation and the coding quality. Meanwhile, the rate-distortion-optimization (RDO) based mode-decision technique also affects performance a lot for the strong relation among mode, bits, and quality. This paper presents a multi-stage rate control scheme for R-D optimized H.264/AVC encoders under CBR video transmission. To enhance the precision of the complexity estimation and bit allocation, a frequency-domain parameter named mean-absolute-transform-difference (MATD) is adopted to represent frame and macroblock (MB) residual complexity. Second, the MATD ratio is utilized to enhance the accuracy of frame layer bit prediction. Then, by considering the bit usage status of whole sequence, a measurement combining forward and backward bit analysis is proposed to adjust the Lagrange multiplier λMODE on frame layer to optimize the mode decision for all MBs within the current frame. On the next stage, bits are allocated on MB layer by proposed remaining complexity analysis. Computed QP is further adjusted according to predicted MB texture bits. Simulation results show the PSNR improvement is up to 1.13 dB by using our algorithm, and the stress of output buffer control is also largely released compared with the recommended rate control in H.264/AVC reference software JM13.2.
One Super Hi-Vision (SHV) 4k4k@60 fps fractional motion estimation (FME) engine is proposed in our paper. Firstly, two complexity reduction schemes are proposed in the algorithm level. By analyzing the integer motion cost of sub blocks in each inter mode, the mode reduction based mode pre-filtering scheme can achieve 48% clock cycle saving compared with previous algorithm. By further check the motion cost of search points around best integer candidate, the motion cost oriented directional one-pass scheme can provide 50% clock cycle saving and 36% reduction in the number of processing units (PU). Secondly, in the hardware level, two parallel improved schemes namely 16-Pel processing and MB-parallel scheme are given out in our paper, which reduces design effort to only 145 MHz for SHV FME processing. Also, quarter sub-sampling is adopted in our design and 75% hardware cost is reduced for each PU. Thirdly, one unified pixel block loading scheme is proposed. About 28.67% to 86.39% pixels are reused and the related memory access is saved. Furthermore, we also give out one parity pixel organization scheme to solve memory access conflict of MB-parallel scheme. By using TSMC 0.18 µm technology in worst work conditions (1.62 V, 125), our FME engine can achieve real-time processing for SHV 4k4k@60 fps with 412k gates hardware.
Yiqing HUANG Qin LIU Satoshi GOTO Takeshi IKENAGA
This paper presents a reconfigurable SAD Tree (RSADT) architecture based on adaptive sub-sampling algorithm for HDTV application. Firstly, to obtain the the feature of HDTV picture, pixel difference analysis is applied on each macroblock (MB). Three hardware friendly sub-sampling patterns are selected adaptively to release complexity of homogeneous MB and keep video quality for texture MB. Secondly, since two pipeline stages are inserted, the whole clock speed of RSADT structure is enhanced. Thirdly, to solve data reuse and hardware utilization problem of adaptive algorithm, the RSADT structure adopts pixel data organization in both memory and architecture level, which leads to full data reuse and hardware utilization. Additionally, a cross reuse structure is proposed to efficiently generate 16 pixel scaled configurable SAD (sum of absolute difference). Experimental results show that, our RSADT architecture can averagely save 61.71% processing cycles for integer motion estimation engine and accomplish twice or four times processing capability for homogeneous MBs. The maximum clock frequency of our design is 208 MHz under TSMC 0.18 µm technology in worst work conditions(1.62 V, 125C). Furthermore, the proposed algorithm and reconfigurable structure are favorable to power aware real-time encoding system.
Yiqing HUANG Zhenyu LIU Yang SONG Satoshi GOTO Takeshi IKENAGA
One hardware efficient and high speed architecture for variable block size motion estimation (VBSME) in H.264 is presented in this paper. By improving the pipeline structure and processing element (PE) circuits, the system latency and hardware cost is reduced, which makes this structure more hardware efficient than the original Propagate Partial SAD architecture. For small and middle frame size picture's coding, the proposed structure can save 12.1% hardware cost compared with original Propagate Partial SAD structure. In the case of HDTV, since small inter modes trivially contribute to the coding quality, we remove modes below 88 in our design. By adopting mode reduction technique, when the set number of PE array is less than 8, the proposed mode reduction based Propagate Partial SAD structure can work at faster clock speed and consume less hardware cost than widely used SAD Tree architecture. It is more robust to the high speed timing constraint when parallel processing is considered. With TSMC 0.18 µm technology in worst work conditions (1.62 V, 125), its peak throughput of 8-set PE array structure is 720p@30 Hz with 12864 search range and 5 reference frames. 12 k gates hardware cost can be reduced by our design compared with the parallel SAD Tree architecture.
Lei SUN Jie LENG Jia SU Yiqing HUANG Hiroomi MOTOHASHI Takeshi IKENAGA
Scalable Video Coding (SVC) was standardized as an extension of H.264/AVC with the intention to provide flexible adaptation to heterogeneous networks and different end-user requirements, which provides great scalability in multi-point applications such as video conferencing. However, due to the existence of H.264/AVC-based systems, transcoding between AVC and SVC becomes necessary. Most existing works focus on temporal transcoding, quality transcoding or SVC-to-AVC spatial transcoding while the straightforward re-encoding method requires high computational cost. This paper proposes a low-complexity AVC-to-SVC spatial transcoder based on coarse-level mode mapping for video conferencing scenes. First, to omit unnecessary motion estimations (ME) for layers with reduced resolution, an ME skipping scheme based on AVC mode distribution is proposed with an adaptive search range. Then a probability-profile based scheme is proposed for further mode skipping. After that 3 coarse-level mode-mapping methods are presented for fast mode decision and the adaptive usage of the 3 methods is discussed. Finally, motion vector (MV) refinement is introduced for further lower-layer time reduction. As for the top layer, direct encapsulation is proposed to preserve better quality and another scheme involving inter-layer predictions is also provided for bandwidth-crucial applications. Simulation results show that proposed transcoder achieves up to 92.6% time reduction without significant coding efficiency loss compared to re-encoding method.
Jia SU Yiqing HUANG Lei SUN Shinichi SAKAIDA Takeshi IKENAGA
With the increasing demand of high video quality and large image size, adaptive interpolation filter (AIF) addresses these issues and conquers the time varying effects resulting in increased coding efficiency, comparing with recent H.264 standard. However, currently most AIF algorithms are based on either frame level or macroblock (MB) level, which are not flexible enough for different video contents in a real codec system, and most of them are facing a severe time consuming problem. This paper proposes a content based coarse to fine AIF algorithm, which can adapt to video contents by adding different filters and conditions from coarse to fine. The overall algorithm has been mainly made up by 3 schemes: frequency analysis based frame level skip interpolation, motion vector modeling based region level interpolation, and edge detection based macroblock level interpolation. According to the experiments, AIF are discovered to be more effective in the high frequency frames, therefore, the condition to skip low frequency frames for generating AIF coefficients has been set. Moreover, by utilizing the motion vector information of previous frames the region level based interpolation has been designed, and Laplacian of Gaussian based macroblock level interpolation has been proposed to drive the interpolation process from coarse to fine. Six 720p and six 1080p video sequences which cover most typical video types have been tested for evaluating the proposed algorithm. The experimental results show that the proposed algorithm reduce total encoding time about 41% for 720p and 25% for 1080p sequences averagely, comparing with Key Technology Areas (KTA) Enhanced AIF algorithm, while obtains a BDPSNR gain up to 0.004 and 3.122 BDBR reduction.