Chuang ZHU Jie LIU Xiao Feng HUANG Guo Qing XIANG
This paper reports a high-quality hardware-friendly integer motion estimation (IME) scheme. According to different characteristics of CTU content, the proposed method adopts different adaptive multi-resolution strategies coupled with accurate full-PU modes IME at the finest level. Besides, by using motion vector derivation, IME for the second reference frame is simplified and hardware resource is saved greatly through processing element (PE) sharing. It is shown that the proposed architecture can support the real-time processing of 4K-UHD @60fps, while the BD-rate is just increased by 0.53%.
Leilei HUANG Yibo FAN Chenhao GU Xiaoyang ZENG
High Efficiency Video Coding (HEVC) standard is now becoming one of the most widespread video coding standards in the world. As a successor of H.264 standard, it aims to provide a much superior encoding performance. To fulfill this goal, several new notations along with the corresponding computation processes are introduced by this standard. Among those computation processes, the integer motion estimation (IME) is one of bottlenecks due to the complex partitions of the inter prediction units (PU) and the large search window commonly adopted. Many algorithms have been proposed to address this issue and usually put emphasis on a large search window and great computation amount. However, the coding efforts should be related to the scenes. To be more specific, for relatively static videos, a small search window along with a simple search scheme should be adopted to reduce the time cost and power consumption. In view of this, a micro-code-based IME engine is proposed in this paper, which could be applied with search schemes of different complexity. To test the performance, three different search schemes based on this engine are designed and evaluated under HEVC test model (HM) 16.9, achieving a B-D rate increase of 0.55/-0.07/-0.14%. Compared with our previous work, the hardware implementation is optimized to reduce 64.2% of the SRAMs bits and 32.8% of the logic gate count. The final design could support 4K×2K @139/85/37fps videos @500MHz.
Nobuaki KOBAYASHI Tadayoshi ENOMOTO
To completely utilize the advantages of dynamic voltage and frequency scaling (DVFS) techniques, a quantized decoder (QNT-D) was developed. The QNT-D generates a quantized signal processing quantity (Q) using a predicted signal processing quantity (M). Q is used to produce the optimum frequency (opt.fc) and the optimum supply voltage (opt.VD) that are proportional to Q. To develop a DVFS controlled motion estimation (ME) processor, we used both the QNT-D and a fast ME algorithm called A2BC (Adaptively Assigned Breaking-off Condition) to predict M for each macro-block (MB). A DVFS controlled ME processor was fabricated using 90-nm CMOS technology. The total power dissipation (PT) of the processor was significantly reduced and varied from 38.65 to 99.5 µW, only 3.27 to 8.41 % of PT of a conventional ME processor, depending on the test video picture.
Chihiro TSUTAKE Toshiyuki YOSHIDA
Many of affine motion compensation techniques proposed thus far employ least-square-based techniques in estimating affine parameters, which requires a hardware structure different from conventional block-matching-based one. This paper proposes a new affine motion estimation/compensation framework friendly to block-matching-based parameter estimation, and applies it to an HEVC encoder to demonstrate its coding efficiency and computation cost. To avoid a nest of search loops, a new affine motion model is first introduced by decomposing the conventional 4-parameter affine model into two 3-parameter ones. Then, a block-matching-based fast parameter estimation technique is proposed for the models. The experimental results given in this paper show that our approach is advantageous over conventional techniques.
Shuping ZHANG Jinjia ZHOU Dajiang ZHOU Shinji KIMURA Satoshi GOTO
In this paper, a hamburger architecture with a 3D stacked reconfigurable memory is proposed for a 4K motion estimation (ME) processor. By positioning the memory dies on both the top and bottom sides of the processor die, the proposed hamburger architecture can reduce the usage of the signal through-silicon via (TSV), and balance the power delivery network and the clock tree of the entire system. It results in 1/3 reduction of the usage of signal TSVs. Moreover, a stacked reconfigurable memory architecture is proposed to reduce the fabrication complexity and further reduce the number of signal TSVs by more than 1/2. The reduction of signal TSVs in the entire design is 71.24%. Finally, we address unique issues that occur in electronic design automation (EDA) tools during 3D large-scale integration (LSI) designs. As a result, a 4K ME processor with 7-die stacking 3D system-on-chip design is implemented. The proposed design can support real time 3840 × 2160 @ 120 fps encoding at 130 MHz with less than 540 mW.
Ran LI Hongbing LIU Jie CHEN Zongliang GAN
The conventional bilateral motion estimation (BME) for motion-compensated frame rate up-conversion (MC-FRUC) can avoid the problem of overlapped areas and holes but usually results in lots of inaccurate motion vectors (MVs) since 1) the MV of an object between the previous and following frames is more likely to have no temporal symmetry with respect to the target block of the interpolated frame and 2) the repetitive patterns existing in video frame lead to the problem of mismatch due to the lack of the interpolated block. In this paper, a new BME algorithm with a low computational complexity is proposed to resolve the above problems. The proposed algorithm incorporates multi-resolution search into BME, since it can easily utilize the MV consistency between two adjacent pyramid levels and spatial neighboring MVs to correct the inaccurate MVs resulting from no temporal symmetry while guaranteeing low computational cost. Besides, the multi-resolution search uses the fast wavelet transform to construct the wavelet pyramid, which not only can guarantee low computational complexity but also can reserve the high-frequency components of image at each level while sub-sampling. The high-frequency components are used to regularize the traditional block matching criterion for reducing the probability of mismatch in BME. Experiments show that the proposed algorithm can significantly improve both the objective and subjective quality of the interpolated frame with low computational complexity, and provide the better performance than the existing BME algorithms.
Shuping ZHANG Jinjia ZHOU Dajiang ZHOU Shinji KIMURA Satoshi GOTO
Motion estimation (ME) is a key encoding component of almost all modern video coding standards. ME contributes significantly to video coding efficiency, but, it also consumes the most power of any component in a video encoder. In this paper, an ME processor with 3D stacked memory architecture is proposed to reduce memory and core power consumption. First, a memory die is designed and stacked with ME die. By adding face-to-face (F2F) pads and through-silicon-via (TSV) definitions, 2D electronic design automation (EDA) tools can be extended to support the proposed 3D stacking architecture. Moreover, a special memory controller is applied to control data transmission and timing between the memory die and the ME processor die. Finally, a 3D physical design is completed for the entire system. This design includes TSV/F2F placement, floor plan optimization, and power network generation. Compared to 2D technology, the number of input/output (IO) pins is reduced by 77%. After optimizing the floor plan of the processor die and memory die, the routing wire lengths are reduced by 13.4% and 50%, respectively. The stacking static random access memory contributes the most power reduction in this work. The simulation results show that the design can support real-time 720p @ 60fps encoding at 8MHz using less than 65mW in power, which is much better compared to the state-of-the-art ME processor.
Dang Ngoc Hai NGUYEN NamUk KIM Yung-Lyul LEE
A new technology for video frame rate up-conversion (FRUC) is presented by combining a median filter and motion estimation (ME) with an occlusion detection (OD) method. First, ME is performed to obtain a motion vector. Then, the OD method is used to refine the MV in the occlusion region. When occlusion occurs, median filtering is applied. Otherwise, bidirectional motion compensated interpolation (BDMC) is applied to create the interpolated frames. The experimental results show that the proposed algorithm provides better performance than the conventional approach. The average gain in the PSNR (Peak Signal to Noise Ratio) is always better than the other methods in the Full HD test sequences.
Zhu LI Yoichi TOMIOKA Hitoshi KITAZAWA
Detailed tracking is required for many vision applications. A visual feature-based constraint underlies most conventional motion estimation methods. For example, optical flow methods assume that the brightness of each pixel is constant in two consecutive frames. However, it is difficult to realize accurate extraction and tracking using only visual feature information, because viewpoint changes and inconsistent illumination cause the visual features of some regions of objects to appear different in consecutive frames. A structure-based constraint of objects is also necessary for tracking. In the proposed method, both visual feature matching and structure matching are formulated as a linear assignment problem and then integrated.
Tadayoshi ENOMOTO Nobuaki KOBAYASHI
A motion estimation (ME) multimedia processor was developed by employing dynamic voltage and frequency scaling (DVFS) technique to greatly reduce the power dissipation. To make full use of the advantages of DVFS technique, a fast motion estimation (ME) algorithm was also developed. It can adaptively predict the optimum supply voltage and the optimum clock frequency before ME process starts for each macro-block for encoding. Power dissipation of the 90-nm CMOS DVFS controlled multimedia processor, which contained an absolute difference accumulator as well as a small on-chip DC/DC level converter, a minimum value detector and DVFS controller, was reduced to 38.48 µW, which was only 3.261% that of a conventional multimedia processor.
Junsang CHO Jung Wook SUH Gwanggil JEON Jechang JEONG
In this letter, we propose an error surface modeling-based segmentalized motion estimation for video coding. We proposed two algorithms previously, one was MBQME [1] and the other is HMBQME [2]. However, these algorithms are not based on locally quadratic MC prediction errors around an integer-pixel motion vector and the hypothesis that the local error plane is a convex function. Therefore, we propose an error surface considered segmentalized modeling algorithm. In this scheme, the tendency of the error surface is first assessed. Using the Sobel operation at the error surface, we classify the error surface region as plain or textured. For plain regions, conventional MBQME is appropriate as the quarter-pixel motion estimation method. For textured regions, we search the additional interpolation points for more accurate modeling. After the interpolation, we perform double precision mathematical modeling so as to find the best motion vector (MV). Experiments show that the proposed scheme has better PSNR performance than conventional modeling algorithms with minimum operation time.
Seung-Jin BAEK Seung-Won JUNG Hahyun LEE Hui Yong KIM Sung-Jea KO
In this paper, an improved B-picture coding algorithm based on the symmetric bi-directional motion estimation (ME) is proposed. In addition to the block match error between blocks in the forward and backward reference frames, the proposed method exploits the previously-reconstructed template regions in the current and reference frames for bi-directional ME. The side match error between the predicted target block and its template is also employed in order to alleviate block discontinuities. To efficiently perform ME, an initial motion vector (MV) is adaptively derived by exploiting temporal correlations. Experimental results show that the number of generated bits is reduced by up to 9.31% when the proposed algorithm is employed as a new macroblock (MB) coding mode for the H.264/AVC standard.
Yibo FAN Jialiang LIU Dexue ZHANG Xiaoyang ZENG Xinhua CHEN
Fidelity Range Extension (FRExt) (i.e. High Profile) was added to the H.264/AVC recommendation in the second version. One of the features included in FRExt is the Adaptive Block-size Transform (ABT). In order to conform to the FRExt, a Fractional Motion Estimation (FME) architecture is proposed to support the 88/44 adaptive Hadamard Transform (88/44 AHT). The 88/44 AHT circuit contributes to higher throughput and encoding performance. In order to increase the utilization of SATD (Sum of Absolute Transformed Difference) Generator (SG) in unit time, the proposed architecture employs two 8-pel interpolators (IP) to time-share one SG. These two IPs can work in turn to provide the available data continuously to the SG, which increases the data throughput and significantly reduces the cycles that are needed to process one Macroblock. Furthermore, this architecture also exploits the linear feature of Hadamard Transform to generate the quarter-pel SATD. This method could help to shorten the long datapath in the second-step of two-iteration FME algorithm. Finally, experimental results show that this architecture could be used in the applications requiring different performances by adjusting the supported modes and operation frequency. It can support the real-time encoding of the seven-mode 4 K2 K@24 fps or six-mode 4 K2 K@30 fps video sequences.
Leiqi ZHU Dongkai YANG Qishan ZHANG
In order to reduce the convergence time in an iterative procedure, some gradient based preliminary processes are employed to eliminate outliers. The adaptive variable block size is also introduced to balance the accuracy and computational complexity. Moreover, the use of Canberra distance instead of Euclidean distance illustrates higher performance in measuring motion similarity.
Yoshiki YUNBE Masayuki MIYAMA Yoshio MATSUDA
This paper describes an affine motion estimation processor for real-time video segmentation. The processor estimates the dominant motion of a target region with affine parameters. The processor is based on the Pseudo-M-estimator algorithm. Introduction of an image division method and a binary weight method to the original algorithm reduces data traffic and hardware costs. A pixel sampling method is proposed that reduces the clock frequency by 50%. The pixel pipeline architecture and a frame overlap method double throughput. The processor was prototyped on an FPGA; its function and performance were subsequently verified. It was also implemented as an ASIC. The core size is 5.05.0 mm2 in 0.18 µm process, standard cell technology. The ASIC can accommodate a VGA 30 fps video with 120 MHz clock frequency.
Zhenyu LIU Dongsheng WANG Takeshi IKENAGA
Variable block size motion estimation developed by the latest video coding standard H.264/AVC is the efficient approach to reduce the temporal redundancies. The intensive computational complexity coming from the variable block size technique makes the hardwired accelerator essential, for real-time applications. Propagate partial sums of absolute differences (Propagate Partial SAD) and SAD Tree hardwired engines outperform other counterparts, especially considering the impact of supporting variable block size technique. In this paper, the authors apply the architecture-level and the circuit-level approaches to improve the maximum operating frequency and reduce the hardware overhead of Propagate Partial SAD and SAD Tree, while other metrics, in terms of latency, memory bandwidth and hardware utilization, of the original architectures are maintained. Experiments demonstrate that by using the proposed approaches, at 110.8 MHz operating frequency, compared with the original architectures, 14.7% and 18.0% gate count can be saved for Propagate Partial SAD and SAD Tree, respectively. With TSMC 0.18 µm 1P6M CMOS technology, the proposed Propagate Partial SAD architecture achieves 231.6 MHz operating frequency at a cost of 84.1 k gates. Correspondingly, the maximum work frequency of the optimized SAD Tree architecture is improved to 204.8 MHz, which is almost two times of the original one, while its hardware overhead is merely 88.5 k-gate.
Zhen LI Atushi UEMURA Hitoshi KIYA
An FFT-based full-search block matching algorithm (BMA) is described that uses the sum of squared differences (SSD) criterion. The proposed method does not have to extend a real signal into complex one. This reduces the computational load of FFT approaches. In addition, if two macroblocks share the same search window, they can be matched at the same time. In a simulation of motion estimation, the proposed method achieved the same performance as a direct SSD full search and its processing speed is faster than other FFT-based BMAs.
Vinh TRUONG QUANG Sung-Hoon HONG Young-Chul KIM
We proposed a new motion vector (MV) smoothing using fuzzy weighting and vector median filtering for frame rate up-conversion. A fuzzy reasoning system adjusts the weighting values based on the local characteristics of MV field including block difference and block boundary distortion. The fuzzy weighting removes the affect of outliers and irregular MVs from the MV smoothing process. The simulation results show that the proposed algorithm can efficiently correct wrong MVs and thus improve visual quality of the interpolated frames better than conventional methods.
Trung Thanh NGO Yuichiro KOJIMA Hajime NAGAHARA Ryusuke SAGAWA Yasuhiro MUKAIGAWA Masahiko YACHIDA Yasushi YAGI
For fast egomotion of a camera, computing feature correspondence and motion parameters by global search becomes highly time-consuming. Therefore, the complexity of the estimation needs to be reduced for real-time applications. In this paper, we propose a compound omnidirectional vision sensor and an algorithm for estimating its fast egomotion. The proposed sensor has both multi-baselines and a large field of view (FOV). Our method uses the multi-baseline stereo vision capability to classify feature points as near or far features. After the classification, we can estimate the camera rotation and translation separately by using random sample consensus (RANSAC) to reduce the computational complexity. The large FOV also improves the robustness since the translation and rotation are clearly distinguished. To date, there has been no work on combining multi-baseline stereo with large FOV characteristics for estimation, even though these characteristics are individually are important in improving egomotion estimation. Experiments showed that the proposed method is robust and produces reasonable accuracy in real time for fast motion of the sensor.
Miki HASEYAMA Makoto TAKIZAWA Takashi YAMAMOTO
In this paper, a new video frame interpolation method based on image morphing for frame rate up-conversion is proposed. In this method, image features are extracted by Scale-Invariant Feature Transform in each frame, and their correspondence in two contiguous frames is then computed separately in foreground and background regions. By using the above two functions, the proposed method accurately generates interpolation frames and thus achieves frame rate up-conversion.