Hidehiro TAKATA Rei AKIYAMA Tadao YAMANAKA Haruyuki OHKUMA Yasue SUETSUGU Toshihiro KANAOKA Satoshi KUMAKI Kazuya ISHIHARA Atsuo HANAMI Tetsuya MATSUMURA Tetsuya WATANABE Yoshihide AJIOKA Yoshio MATSUDA Syuhei IWADE
An on-chip, 64-Mb, embedded, DRAM MPEG-2 encoder LSI with a multimedia processor has been developed. To implement this large-scale and high-speed LSI, we have developed the hierarchical skew control of multi-clocks, with timing verification, in which cross-talk noise is considered, and simple measures taken against the IR drop in the power lines through decoupling capacitors. As a result, the target performance of 263 MHz at 1.5 V has been successfully attained and verified, the cross-talk noise has been considered, and, in addition, it has become possible to restrain the IR drop to 166 mV in the 162 MHz operation block.
Guarantees of quality-of-service (QoS) in the real-time transmission of video on the Internet is a challenging task for the success of many video applications. The Internet Engineering Task Force (IETF) has proposed the Guaranteed Service (GS) in the Integrated Service model with firm delay and bandwidth guarantees. For the GS, it is necessary to provide traffic sources with the capability of calculating the traffic characteristics to be declared to the network on the basis of a limited set of parameters statistically characterizing the traffic and the required level of QoS. In this paper, we develop an algorithm for the evaluation of the traffic parameters which characterize the video stream when a QoS requirement is given. To this end an analytical traffic model for the VBR MPEG video is introduced. Simulation results show that the proposed method can evaluate the traffic parameters precisely and efficiently.
Daiji ISHII Masao IKEKAWA Ichiro KURODA
This paper introduces fast methods for variable length decoding (VLD) and inverse quantization (IQ) on software MPEG-2 decoders by using Single Instruction stream Multiple Data stream (SIMD) type instructions for multimedia applications. With the VLD implementation, the VLD tables are made as small as possible so as to minimize missed cache accesses, and variable length codewords are decoded concurrently. With the IQ implementation, inverse quantization of the VLD results is performed in parallel. When these methods are used, combined clock cycles for VLD and IQ are roughly 30% shorter than those resulting from conventional methods, and this effect is especially pronounced for high bitrate streams.
Naiwala Pathirannehelage CHANDRASIRI Takeshi NAEMURA Hiroshi HARASHIMA
This paper discusses recognition up to intensities of mix of primary facial expressions in real time. The proposed recognition method is compatible with the MPEG-4 high level expression Facial Animation Parameter (FAP). In our method, the whole facial image is considered as a single pattern without any block segmentation. As model features, an expression vector, viz. low global frequency coefficient (DCT) changes relative to neutral facial image of a person is used. These features are robust and good enough to deal with real time processing. To construct a person specific model, apex images of primary facial expression categories are utilized as references. Personal facial expression space (PFES) is constructed by using multidimensional scaling. PFES with its generalization capability maps an unknown input image relative to known reference images. As PFES possesses linear mapping characteristics, MPEG-4 high level expression FAP can be easily calculated by the location of the input face on PFES. Also, temporal variations of facial expressions can be seen on PFES as trajectories. Experimental results are shown to demonstrate the effectiveness of the proposed method.
Jongho NANG Seungwook HONG Ohyeong KWON
The (cinema) caption processing that adds descriptive text on a sequence of frames is an important video manipulation function that a video editor should support. This paper proposes an efficient MC-DCT compressed domain approach to insert the caption into the MPEG compressed video stream. It basically adds the DCT blocks of the caption image to the corresponding DCT blocks of the input frames one by one in the MC-DCT domain as in [6]. However, the strength of the caption image is adjusted in the DCT domain to prevent the resulting DCT coefficients from exceeding the maximum value allowed in MPEG. In order to adjust the strength of the caption image adaptively we need to know the exact pixel value of the input image. This is a difficult task in DCT domain. We propose an approximation scheme for the pixel values in which the DC value of a block is used as the expected pixel value for all pixels in that block. Although this approximation may lead to some errors in the caption area, it still provides a relatively high image quality in the non-caption area, whereas the processing time is about 4.9 times faster than the decode-captioning-reencode method.
Ayuko TAKAGI Kiyoshi NISHIKAWA Hitoshi KIYA
This paper propose a method for improving the image quality of motion estimation (ME) using low-bit images. By using edge-enhanced images for quantization, we can increase the accuracy of the ME and improve the image quality. It is known that using low-bit images for ME is effective for reducing power consumption but it slightly degrades image quality. The quality of the encoded image depends on the thresholds for data quantization, thus, algorithms for determining thresholds are studied. The proposed method uses linear quantization, which simply truncates the least significant bits. This method is simple without any complicated threshold calculations, and the resultant image quality is improved as much as the methods that use threshold calculations. To evaluate the effectiveness, we simulate results for image quality and estimate the power consumption using synthesis results from a VHDL model motion estimator.
Shinfeng D. LIN Chien-Chuang LIN Shih-Chieh SHIE
MPEG-4 emphasizes on coding efficiency and allows for content-based access and transmission of arbitrary shaped object. It addresses the encoding of video object using shape coding, motion estimation, and texture coding for interactivity, high compression ratio, and scalability. In this letter, an advanced object-adaptive vertex-based shape coding method is proposed for encoding the shape of video objects. This method exploits octant-based representation to represent the relation of adjacent vertices and that relation can be used to improve coding efficiency. Simulation results demonstrate that the proposed method may reduce more bits for closely spaced vertices.
Hiroe IWASAKI Jiro NAGANUMA Makoto ENDO Takeshi OGURA
This paper proposes a very small on-chip multimedia real-time OS for embedded system LSIs, and demonstrates its usefulness on MPEG-2 multimedia applications. The real-time OS, which has a conditional cyclic task with suspend and resume for interacting hardware (HW) / software (SW) of embedded system LSIs, implements the minimum set of task, interrupt, and semaphore managements on the basis of an analysis of embedded software requirements. It requires only about 2.5 Kbytes memory on run-time, reduces redundant conventional cyclic task execution steps to about 1/2 for HW/SW interactions, and provides sufficient performance in real-time through implementing two typical embedded softwares for practical multimedia system LSIs: an MPEG-2 system protocol LSI and an MPEG-2 video encoder LSI. This on-chip multimedia real-time OS with 2.5 Kbyte memory will be acceptable for future multimedia embedded system LSIs.
Akira NAKAGAWA Eishi MORIMATSU Takashi ITOH Kiichi MATSUDA
High-speed digital data transmission services with mobile equipment are becoming available. Though the visual signal is one of the expected media to be used with such transmission capabilities, the bandwidth of visual signal is much broader than the provided transmission bandwidth in general. Therefore efficient video encoding algorithms have to be introduced. The ITU-T Recommendation H.263 and ISO/IEC MPEG-4 are very powerful encoding algorithms for a wide range of video sequences. But a large amount of bits are generated in highly active scenes to encode them using such conventional methods. This results in frame skipping and degradation of decoded picture quality. In order to keep these degradations as low as possible, we proposed a Dynamic Resolution Conversion (DRC) method of the prediction error. In the method, a reduced resolution encoding is carried out when the input scene is highly active. Simulation results show that the proposed scheme can improve both coding frame rate and picture quality in a highly active scene. We also present in this paper that some analysis for the performance of the DRC method under the error prone environment that is inevitable with mobile communications.
Koyo NITTA Toshihiro MINAMI Toshio KONDO Takeshi OGURA
This paper describes a unique motion estimation and compensation (ME/MC) hardware architecture for a scene-adaptive algorithm. By statistically analyzing the characteristics of the scene being encoded and controlling the encoding parameters according to the scene, the quality of the decoded image can be enhanced. The most significant feature of the architecture is that the two modules for ME/MC can work independently. Since a time interval can be inserted between the operations of the two modules, a scene-adaptive algorithm can be implemented in the architecture. The ME/MC architecture is loaded on a single-chip MPEG-2 video encoder.
Hiroshi SEGAWA Yoshinori MATSUURA Satoshi KUMAKI Tetsuya MATSUMURA Stefan SCOTZNIOVSKY Shu MURAYAMA Tetsuro WADA Ayako HARADA Eiji OHARA Ken-ichi ASANO Toyohiko YOSHIDA Yasutaka HORIBA
This paper describes an embedded software scheme for a single-chip MPEG-2 encoder that executes concurrent video, audio, and system encoding in real-time. The software features a scalable module structure, which is hierarchically composed and has expandable plug-in modules. For increased applicability, several task-modules are prepared for the respective video, audio, and system processing. In addition, an effective task management scheme that features polling and interrupt-based task switching has been proposed in order to achieve real-time operation. The software having these features and including all task-modules is implemented on a single media-processor D30V on a single chip MPEG-2 video, audio, and system encoder. This encoder realizes real-time MPEG-2 video encoding, Dolby Digital or MPEG-1 audio encoding, and system encoding that generates TS or PS over 50 Mbps for various applications. Assuming a DVD or DTV encoder system, the software is reconstructed with less than 56.6-kbytes of instruction and 145.6 MIPS performance. The single media-processor with 64-kbytes of instruction RAM and 162 MIPS performance, running at a clock rate of 162 MHz, can successfully accomplish a real-time operation with the proposed embedded software.
Tetsuya MATSUMURA Satoshi KUMAKI Hiroshi SEGAWA Kazuya ISHIHARA Atsuo HANAMI Yoshinori MATSUURA Stefan SCOTZNIOVSKY Hidehiro TAKATA Akira YAMADA Shu MURAYAMA Tetsuro WADA Hideo OHIRA Toshiaki SHIMADA Ken-ichi ASANO Toyohiko YOSHIDA Masahiko YOSHIMOTO Koji TSUCHIHASHI Yasutaka HORIBA
A single-chip MPEG-2 video, audio, and system encoder LSI has been developed. It performs concurrent real-time processing of MPEG-2 422P@ML video encoding, 2-channel Dolby Digital or MPEG-1 audio encoding, and system encoding that generates a multiplexed transport stream (TS) or a program stream (PS). Advanced hybrid architecture, which combines a high performance VLIW media-processor D30V and hardwired video processing circuits, has been adopted to satisfy the demands of both high flexibility and enormous computational capability. A unified control scheme has been newly proposed that hierarchically manages adaptive task priority control over asynchronous video, audio, and system encoding processes in order to achieve real-time concurrent processing using a single D30V. Dual dedicated motion estimation cores consisting of a coarse ME core (CME) for wide range searches and a fine ME core (FME) for precise searches have been integrated to produce high picture quality while using a small amount of hardware. Adopting these features, a single-chip encoder has been fabricated using 0.25-micron 4-layer metal CMOS technology, and integrated into a 14.2 mm 14.2 mm die with 11 million transistors.
Watermarking techniques proposed up to now have some measure of robustness against non-geometrical alteration, however, most of them are not so robust against geometrical alteration. As also for video watermarking, small translation, scaling, rotation, affine transformation can be effective attacks on the watermark. In this paper, we propose an image correction scheme for video watermarking extraction. This scheme embeds 'patchwork' into digital video data for detection of positions, and corrects the images attacked by geometrical alteration on the basis of the detected positions. We also show the simulation results of applying proposed scheme to the conventional watermarking technique.
Byeong-Hee ROH Seung-Wha YOO Tae-Yong KIM Jae-Kyoon KIM
Two main characteristics of VBR MPEG video traffic are the different statistics according to different picture types and the periodic traffic pattern due to GOP structure. Especially, the I-pictures at the beginning of each GOP generate much more traffic than other pictures. Therefore, when several VBR MPEG video sources are superposed, the multiplexing performance can vary according to the variations of their I-picture start times. In this paper, we show how the start time arrangement of the superposed VBR MPEG videos can significantly affect the cell loss ratio characteristics at the multiplexers, by using U-NDPP/D/1/B queueing model. It is also shown that the Lognormal distribution is more suitable for modeling VBR MPEG video traffic than the Normal and Gamma distributions, in the queueing application's view points.
Hirokazu TANAKA Shoichiro YAMASAKI
GSRI Pragmatic TCM, which is a Pragmatic Trellis Coded Modulation allowing bandwidth expansion, has been proposed. In [1], it is shown that this scheme can achieve higher performance than conventional Pragmatic TCM scheme. On the other hand, a real-time video multimedia communication is one of the possible applications for the third generation mobile communication systems. This video multimedia communication system needs a multiplexer which mixes various types of media such as video, voice and data into a single bitstream. ITU-T has standardized H.223 Annex A, B, C and D multimedia multiplexing protocols for low bit-rate mobile communications. This paper evaluates the performance of the GSRI Pragmatic TCM with an application of a mobile multimedia system using H.223 Annex D multiplexing scheme and MPEG-4 video coding.
Ayako HARADA Shin-ichi HATTORI Tadashi KASEZAWA Hidenori SATO Tetsuya MATSUMURA Satoshi KUMAKI Kazuya ISHIHARA Hiroshi SEGAWA Atsuo HANAMI Yoshinori MATSUURA Ken-ichi ASANO Toyohiko YOSHIDA Masahiko YOSHIMOTO Tokumichi MURAKAMI
An MPEG-2 422P@HL encoder chip set composed of a preprocessing LSI, an encoding LSI, and a motion estimation LSI is described. This chip set realizes a two-type scalability of picture resolution and quality, and executes a hierarchical coding control in the overall encoder system. Due to its scalable architecture, the chip set realizes a 422P@HL video encoder with multi-chip configuration. This single encoding LSI achieves 422P@ML video, audio, and system encoding in real time. It employs an advanced hybrid architecture with a 162 MHz media processor and dedicated video processing hardware. It also has dual communication ports for parallel processing with multi-chip configuration. Transferring of reconstructed data and macroblock characteristic data between neighboring encoder modules is executed via these ports. The preprocessing LSI is fabricated using 0.25 micron three-layer metal CMOS technology and integrates 560 K gates in an area of 12.0 mm 12.0 mm . The encoding LSI is fabricated using 0.25 micron four-layer metal CMOS technology and integrates 11 million transistors in an area of 14.2 mm 14.2 mm . The motion estimation LSI is fabricated using 0.35 micron three-layer metal CMOS technology. It integrates 1.9 million transistors in an area of 8.5 mm 8.5 mm . This chip set makes various system configurations possible and allows for a compact and cost-effective video encoder with high picture quality.
Ayuko TAKAGI Shogo MURAMATSU Hitoshi KIYA
In MPEG standard, motion estimation (ME) is used to eliminate the temporal redundancy of video frames. This ME is the most time-consuming task in the encoding of video sequences and is also the one using the most power. Using low-bit images can save power of ME and a conventional architecture fixed to a certain bit width is used for low-bit motion estimator. It is known that there is a trade-off between power and image quality. ME may be used in various situations, and the relation between demands for power or image quality will depend on those circumstances. We therefore developed an architecture for a low-bit motion estimator with adjustable power consumption. In this architecture, we can select the bit width for the input image and adjust the amount of power for ME. To evaluate its effectiveness, we designed the motion estimator by VHDL and used the synthesis results to estimate the performance.
Mitsuo IKEDA Toshio KONDO Koyo NITTA Kazuhito SUGURI Takeshi YOSHITOME Toshihiro MINAMI Jiro NAGANUMA Takeshi OGURA
This paper presents an architecture for a single-chip MPEG-2 video encoder and demonstrates its flexibility and usefulness. The architecture based on three-layer cooperation provides flexible data-transfer that improves the encoder from the standpoints of versatility, scalability, and video quality. The LSI was successfully fabricated in the 0.25-µm four-metal CMOS process. Its small size and its low power consumption make it ideal for a wide range of applications, such as DVD recorders, PC-card encoders and HDTV encoders.
Hiroshi OGAWA Takao NAKAMURA Atsuki TOMIOKA Youichi TAKASHIMA
A quantization-based watermarking system for motion pictures is proposed. In particular, methods for improving the image quality of watermarked video, the watermarking data tolerance, and the accuracy of watermark data detection are described. A quantitative evaluation of the reliability of watermarked data, which has not generally been discussed up to now, is also performed.
Sang-Jo YOO Sung-Hoon HONG Seong-Dae KIM
In this paper, we propose an analytic method for dimensioning traffic descriptors at the leaky bucket-based UPC for VBR MPEG video traffic on ATM networks. We analytically derived cell violation probabilities at the UPC by using a proposed scene-based video traffic model, and then we showed that it was possible to select sets of traffic descriptors that produce the required violation probability. In two example video traces, the numerical results showed that our proposed traffic descriptor dimensioning method well approximated the simulation-based traffic control results of the real video traces. In cases where an effective bandwidth allocation method based on the ON/OFF model was used for the call admission control in the networks, we compared the allocated effective bandwidth to each set of traffic descriptors that produced zero UPC losses.