4K/8K satellite broadcasting featuring ultra-high definition video and sound was launched in Japan in 2018. This is the first 8K ultra high definition television (UHDTV) broadcasting in the world, with 16 times as many pixels as HDTV and 3D sound with 22.2ch audio. The large amount of information that has to be transmitted means that a new satellite broadcasting transmission system had to be developed. In this paper, we describe this transmission system, focusing on the technology that enables 4K/8K UHDTV satellite broadcasting.
Jianbin ZHOU Dajiang ZHOU Shihao WANG Takeshi YOSHIMURA Satoshi GOTO
8K Ultra High Definition Television (UHDTV) requires extremely high throughput for video decoding based on H.265. In H.265, intra coding could significantly enhance video compression efficiency, at the expense of an increased computational complexity compared with H.264. For intra prediction of 8K UHDTV real-time H.265 decoding, the joint complexity and throughput issue is more difficult to solve. Therefore, based on the divide-and-conquer strategy, we propose a new VLSI architecture in this paper, including two techniques, in order to achieve 8K UHDTV H.265 intra prediction decoding. The first technique is the LUT based Reference Sample Fetching Scheme (LUT-RSFS), reducing the number of reference samples in the worst case from 99 to 13. It further reduces the circuit area and enhances the performance. The second one is the Hybrid Block Reordering and Data Forwarding (HBRDF), minimizing the idle time and eliminating the dependency between TUs by creating 3 Data Forwarding paths. It achieves the hardware utilization of 94%. Our design is synthesized using Synopsys Design Compiler in 40nm process technology. It achieves an operation frequency of 260MHz, with a gate count of 217.8K for 8-bit design, and 251.1K for 10-bit design. The proposed VLSI architecture can support 4320p@120fps H.265 intra decoding (8-bit or 10-bit), with all 35 intra prediction modes and prediction unit sizes ranging from 4×4 to 64×64.
This paper presents an Adapting Block-Propagative Background Subtraction (ABPBGS) designed for Ultra High Definition Television (UHDTV) foreground detection. The main idea is to detect block after block along the objects in order to skip all areas of the image in which there is no moving object. This is particularly interesting for UHDTV when the objects of interest could represent not even 0.1% of the total area. From a seed block which is determined in a previous iteration, the detection will spread along an object as long as it detects a part of that object. A block history map guaranties that each block is processed only once. Moreover, only small blocks are loaded and processed, thus saving computational time and memory usage. The process of each block is independent enough to be easily parallelized. Compared to 9 state-of-the-art works, the ABPBGS achieved the best results with an average global quality score of 0.57 (1 being the maximum) on a dataset of 4K and 8K UHDTV sequences developed for this work. None of the state-of-the-art methods could process 4K videos in reasonable time while the ABPBGS has shown an average speed of 5.18fps. In comparison, 5 of the 9 state-of-the-art methods performed slower on 270p down-scale version of the same videos. The experiments have also shown that for the process an 8K UHDTV video the ABPBGS can divide the memory required by about 24 for a total of 450MB.
Shihao WANG Dajiang ZHOU Jianbin ZHOU Takeshi YOSHIMURA Satoshi GOTO
In this paper, VLSI architecture design of unified motion vector (MV) and boundary strength (BS) parameter decoder (PDec) for 8K UHDTV HEVC decoder is presented. The adoption of new coding tools in PDec, such as Advanced Motion Vector Prediction (AMVP), increases the VLSI hardware realization overhead and memory bandwidth requirement, especially for 8K UHDTV application. We propose four techniques for these challenges. Firstly, this work unifies MV and BS parameter decoders for line buffer memory sharing. Secondly, to support high throughput, we propose the top-level CU-adaptive pipeline scheme by trading off between implementation complexity and performance. Thirdly, PDec process engine with optimizations is adopted for 43.2k area reduction. Finally, PU-based coding scheme is proposed for 30% DRAM bandwidth reduction. In 90nm process, our design costs 93.3k logic gates with 23.0kB line buffer. The proposed architecture can support real-time decoding for 7680x4320@60fps application at 249MHz in the worst case.
Moving objects or more generally foreground objects are the simplest objects in the field of computer vision after the pixel. Indeed, a moving object can be defined by 4 integers only, either two pairs of coordinates or a pair of coordinates and the size. In fixed camera scenes, moving objects (or blobs) can be extracted quite easily but the methods to produce them are not able to tell if a blob corresponds to remaining background noise, a single target or if there is an occlusion between many target which are too close together thus creating a single blob resulting from the fusion of all targets. In this paper we propose an novel method to refine moving object detection results in order to get as many blobs as targets on the scene by using a tracking system for additional information. Knowing if a blob is at proximity of a tracker allows us to remove noise blobs, keep the rest and handle occlusions when there are more than one tracker on a blob. The results show that the refinement is an efficient tool to sort good blobs from noise blobs and accurate enough to perform a tracking based on moving objects. The tracking process is a resolution free system able to reach speed such as 20 000fps even for UHDTV sequences. The refinement process itself is in real time, running at more than 2000fps in difficult situations. Different tests are presented to show the efficiency of the noise removal and the reality of the independence of the refinement tracking system from the resolution of the videos.
Kosuke MIZUNO Kenta TAKAGI Yosuke TERACHI Shintaro IZUMI Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
This paper describes a Histogram of Oriented Gradients (HOG) feature extraction accelerator that features a VLSI-oriented HOG algorithm with early classification in Support Vector Machine (SVM) classification, dual core architecture for parallel feature extraction and multiple object detection, and detection-window-size scalable architecture with reconfigurable MAC array for processing objects of several shapes. To achieve low-power consumption for mobile applications, early classification reduces the amount of computations in SVM classification efficiently with no accuracy degradation. The dual core architecture enables parallel feature extraction in one frame for high-speed or low-power computing and detection of multiple objects simultaneously with low power consumption by HOG feature sharing. Objects of several shapes, a vertically long object, a horizontally long object, and a square object, can be detected because of cooperation between the two cores. The proposed methods provide processing capability for HDTV resolution video (19201080 pixels) at 30 frames per second (fps). The test chip, which has been fabricated using 65 nm CMOS technology, occupies 4.22.1 mm2 containing 502 Kgates and 1.22 Mbit on-chip SRAMs. The simulated data show 99.5 mW power consumption at 42.9 MHz and 1.1 V.
Koyo NITTA Hiroe IWASAKI Takayuki ONISHI Takashi SANO Atsushi SAGATA Yasuyuki NAKAJIMA Minoru INAMORI Ryuichi TANIDA Atsushi SHIMIZU Ken NAKAMURA Mitsuo IKEDA Jiro NAGANUMA
An H.264/AVC encoder LSI (named “SARA”) that supports High422 profile, as well as 422 profile of MPEG-2, has been developed for HDTV broadcasting infrastructures. It contains three motion estimation and compensation (ME/MC) engines with wide search ranges of -217.75 to +199.75 horizontally, -109.75 to +145.75 vertically, which can utilize almost all H.264/AVC ME/MC coding tools, such as multiple reference frame, variable block size, quarter-pel prediction, macroblock adaptive field/frame prediction (MBAFF), spatial/temporal direct mode, and weighted prediction. Our evaluations show that it can encode fast moving scenes with 1.2 dB to 1.7 dB higher than the JM. It was successfully fabricated in a 90-nm technology, and integrates 140 million transistors.
Kosuke MIZUNO Hiroki NOGUCHI Guangji HE Yosuke TERACHI Tetsuya KAMINO Tsuyoshi FUJINAGA Shintaro IZUMI Yasuo ARIKI Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
This paper describes a SIFT (Scale Invariant Feature Transform) descriptor generation engine which features a VLSI oriented SIFT algorithm, three-stage pipelined architecture and novel systolic array architectures for Gaussian filtering and key-point extraction. The ROI-based scheme has been employed for the VLSI oriented algorithm. The novel systolic array architecture drastically reduces the number of operation cycle and memory access. The cycle counts of Gaussian filtering module is reduced by 82%, compared with the SIMD architecture. The number of memory accesses of the Gaussian filtering module and the key-point extraction module are reduced by 99.8% and 66% respectively, compared with the results obtained assuming the SIMD architecture. The proposed schemes provide processing capability for HDTV resolution video (1920 1080 pixels) at 30 frames per second (fps). The test chip has been fabricated in 65 nm CMOS technology and occupies 4.2 4.2 mm2 containing 1.1 M gates and 1.38 Mbit on-chip memory. The measured data demonstrates 38.2 mW power consumption at 78 MHz and 1.2 V.
Yiqing HUANG Qin LIU Satoshi GOTO Takeshi IKENAGA
This paper presents a reconfigurable SAD Tree (RSADT) architecture based on adaptive sub-sampling algorithm for HDTV application. Firstly, to obtain the the feature of HDTV picture, pixel difference analysis is applied on each macroblock (MB). Three hardware friendly sub-sampling patterns are selected adaptively to release complexity of homogeneous MB and keep video quality for texture MB. Secondly, since two pipeline stages are inserted, the whole clock speed of RSADT structure is enhanced. Thirdly, to solve data reuse and hardware utilization problem of adaptive algorithm, the RSADT structure adopts pixel data organization in both memory and architecture level, which leads to full data reuse and hardware utilization. Additionally, a cross reuse structure is proposed to efficiently generate 16 pixel scaled configurable SAD (sum of absolute difference). Experimental results show that, our RSADT architecture can averagely save 61.71% processing cycles for integer motion estimation engine and accomplish twice or four times processing capability for homogeneous MBs. The maximum clock frequency of our design is 208 MHz under TSMC 0.18 µm technology in worst work conditions(1.62 V, 125C). Furthermore, the proposed algorithm and reconfigurable structure are favorable to power aware real-time encoding system.
Though millimeter wave applications have attracted much attention in recent years, they have not yet been put to practical use. The major reason for the failure may be a large transmission loss peculiar to the short wavelength. In order to overcome the inconvenience, it may be promising to introduce the technology of millimeter-wave NRD-guide circuits. In this technology, not only NRD-guide but also Gunn diodes and Schottky diodes play the important role in high bit-rate millimeter-wave applications. A variety of practical millimeter wave wireless systems have been proposed and fabricated. Performances and applications of them are discussed in detail as well.
Seong-Hee PARK Seong-Hee LEE Il-Soon JANG Sang-Sung CHOI Je-Hoon LEE Younggap YOU
This paper presented a new method to transfer isochronous data through an IEEE 1394 over UWB (ultra wideband) network. The goal of this research is to implement a complete heterogeneous system without commercial IEEE 1394 link chips supporting the bridge-aware function. The method resolving this dedicated chip-less situation, was employed a new bridge adapting a pseudo connection management protocol (CMP). This approach made a wired 1394 devices as an IEEE 1394 over UWB device. This method allowed an IEEE 1394 equipment to transfer an isochronous data using a UWB wireless communication network. The result of this approach was demonstrated successfully via an IEEE 1394 over UWB bridge module. The proposed CMP and IEEE 1394 over UWB bridge module can exchange isochronous data through an IEEE 1394 over UWB network. This method makes an IEEE 1394 equipment transfer an isochronous data using a UWB wireless channel.
Jin-Ho KIM Oh-Kyong KWON Byong-Deok CHOI
We present our recent results of the 10-bit data driver LSI for 42-inch diagonal TFT-LCD TV with full HD format. To develop data driver LSIs for a true 10-bit TFT-LCD TV with full HD (19201080) format, small chip area, low power consumption, and output uniformity between channels are key problems that must be solved. By applying a two-stage DAC which combines 8-bit resistor-string DAC and 2-bit binary weighted capacitor DAC, the area increase is limited to only 30% compared to the area of 8-bit resistor-string DAC. The output deviation between channels is successfully limited within 5 mV and the driver LSI with 414 outputs consumes the maximum total current of 16 mA when driving 42-inch HDTV panel. We confirmed that the picture with 10-bit shades of gray is much more natural than that with 8-bit shades of gray.
Osamu NAKAMURA Kazunori SUGIURA Seiichi YAMAMOTO Noriyuki SHIGECHIKA Akira KATO Katsuyuki HASEBE Jun MURAI
An experimental remote jazz jam session with uncompressed HDTV over the Internet was conducted on September 21st as a Grand Final event of the Aichi Exposition 2005. Professional jazz musicians located at the venue of Aichi Exposition and at SARA in Amsterdam have made the jazz jam session with new mechanisms called as "Internet Metronome" and "delay-control unit" using an international "lightpath." This was the first music collaboration using a new methodology and, one of the challenging demonstrations to transport the uncompressed HDTV streams with timing control under the current software and hardware architectures. "Internet Metronome" and "delay-control unit" enabled to make a tempo using and controlling delay, and "lightpath" minimized the network jitter. Using these new mechanisms and technology, the musicians could play with new music collaboration environment over the Internet with long communication delay, and enjoyed remote jazz jam session at both ends.
Yuichiro MURACHI Koji HAMANO Tetsuro MATSUNO Junichi MIYAKOSHI Masayuki MIYAMA Masahiko YOSHIMOTO
This paper describes a 95 mW MPEG2 MP@HL motion estimation processor core for portable and high-resolution video applications such as that in an HD camcorder. It features a novel hierarchical algorithm and a low-power ring-connected systolic array architecture. It supports frame/field and bi-directional prediction with half-pel precision for 19201080@30 fps resolution video. The search range is 12864 pixels. The ME core integrates 2.25 M transistors in 3.1 mm3.1 mm using 0.18-micron technology.
Junichi YAMAZAKI Masayuki MIYAZAKI Tsuneo IHARA Itaru MIZUNO Kazuo YOSHIKAWA Shigehiro KANAYAMA Nobuo MATSUI Takayoshi HIRUMA Masaharu NISHIMURA
An ultra-high-sensitivity HDTV color camcorder (camera with VTR) has been developed featuring image intensifiers with GaAsP photocathodes, which provide very high quantum efficiency. To achieve superior performance and a compact camera body, we combined three 1-inch image intensifiers with a 2/3-inch taking lens and three 2/3-inch CCDs by means of a new optical system capable of enlarging and reducing images. The camcorder provides excellent color reproducibility even under low light level conditions (0.2 lx) at an iris setting of f/2, with a signal-to-noise ratio of 55 dB at pedestal level. Its sensitivity is about 400 times greater than that of current HDTV CCD camcorders, making it particularly well suited for capturing images of faint objects in space, aurora, etc., filming the nocturnal activities of animals in their natural settings, and reporting breaking news at night.
Masayuki MIYAMA Osamu TOOYAMA Naoki TAKAMATSU Tsuyoshi KODAKE Kazuo NAKAMURA Ai KATO Junichi MIYAKOSHI Kousuke IMAMURA Hideo HASHIMOTO Satoshi KOMATSU Mikio YAGI Masao MORIMOTO Kazuo TAKI Masahiko YOSHIMOTO
This paper describes an ultra low power, motion estimation (ME) processor for MPEG2 HDTV resolution video. It adopts a Gradient Descent Search (GDS) algorithm that drastically reduces required computational power to 6 GOPS. A SIMD datapath architecture optimized for the GDS algorithm decreases the clock frequency and operating voltage. A low power 3-port SRAM with a write-disturb-free cell array arrangement is newly designed for image data caches of the processor. The proposed ME processor contains 7-M transistors, integrated in 4.50 mm 3.35 mm area using 0.13 µm CMOS technology. Estimated power consumption is less than 100 mW at 81 MHz@1.0 V. The processor is applicable to a portable HDTV system.
Ayako HARADA Shin-ichi HATTORI Tadashi KASEZAWA Hidenori SATO Tetsuya MATSUMURA Satoshi KUMAKI Kazuya ISHIHARA Hiroshi SEGAWA Atsuo HANAMI Yoshinori MATSUURA Ken-ichi ASANO Toyohiko YOSHIDA Masahiko YOSHIMOTO Tokumichi MURAKAMI
An MPEG-2 422P@HL encoder chip set composed of a preprocessing LSI, an encoding LSI, and a motion estimation LSI is described. This chip set realizes a two-type scalability of picture resolution and quality, and executes a hierarchical coding control in the overall encoder system. Due to its scalable architecture, the chip set realizes a 422P@HL video encoder with multi-chip configuration. This single encoding LSI achieves 422P@ML video, audio, and system encoding in real time. It employs an advanced hybrid architecture with a 162 MHz media processor and dedicated video processing hardware. It also has dual communication ports for parallel processing with multi-chip configuration. Transferring of reconstructed data and macroblock characteristic data between neighboring encoder modules is executed via these ports. The preprocessing LSI is fabricated using 0.25 micron three-layer metal CMOS technology and integrates 560 K gates in an area of 12.0 mm 12.0 mm . The encoding LSI is fabricated using 0.25 micron four-layer metal CMOS technology and integrates 11 million transistors in an area of 14.2 mm 14.2 mm . The motion estimation LSI is fabricated using 0.35 micron three-layer metal CMOS technology. It integrates 1.9 million transistors in an area of 8.5 mm 8.5 mm . This chip set makes various system configurations possible and allows for a compact and cost-effective video encoder with high picture quality.
Kohji MITANI Hiroshi SHIMAMOTO Yoshihiro FUJITA
We have developed an experimental 4 K 2 K pixel progressive scan color camera system. This new camera system has a data rate of 297 MHz pixel/sec and 60 frame/sec and we are sure that horizontal and vertical limiting resolution of 1500 TVL (TV lines) can be achieved on a color monitor. Instead of the previous approach of improving resolution simply by increasing the pixel count in a imager, a novel four-sensor pickup method with 2/3 inch 2 million pixel CMD (Charge Modulation Device) imagers is used in this system. These sensors have 1920 (H) 1035 (V) pixels within a 16:9 wide aspect image area and are successfully driven at 148 M pixel/sec in the progressive scan mode. In the four-sensor pickup method, two sensors are used for green and the rest are for red and blue. A spatial offset imaging method in the diagonal direction was applied to the two green sensors to improve the horizontal and vertical resolution effectively. The horizontal and vertical resolution of the red and blue signals become half that of the green signal, because only one 2 M-pixel imager is used for each signal. The resolution of this system, however, is not degraded so much because the luminance signal is mainly composed of green signals.
Li JIANG Dongju LI Shintaro HABA Chawalit HONSAWEK Hiroaki KUNIEDA
In this paper, a dedicated hardware design for motion estimation LSI of MPEG2 is presented. Combining our bits truncation adaptive pyramid (BTAP) algorithm with Window-MSPA architecture, the hardware cost is tremendously reduced without PSNR performance degradation for mean pyramid algorithm. The core of the test chip working at 83 MHz, performs a search range of 67 for image size of 1920 1152 and achieves video rate of 60 field/s. It can be used for HDTV purpose. The chip size is 4. 8 mm 4. 8 mm with 0. 5u 2-level metal CMOS technology. The result in this paper shows our promising future to realize one chip HDTV MPEG2 encoder.
Takao ONOYE Gen FUJITA Masamichi TAKATSU Isao SHIRAKAWA Nariyoshi YAMAI
A single chip motion estimator is described dedicatedly for MPEG2 MP@HL moving pictures. Adopting a two-level hierarchical searching algorithm in detecting motion vectors, the computational labor can be reduced by 1/70 in comparison with the conventional algorithm. A novel mechanism is introduced into the full-search procedure, which attempts the maximum possible reuse of reference pixels in order to reduce the bandwidth of the frame memory interface. The proposed motion estimator is integrated in a 0.6 µm triple-metal CMOS chip, which contains 1,450 K transistors on a 12.713.7 mm2 die. The input clock rate can be attained up to 133 MHz, which enables the real time motion estimation for MPEG2 MP@HL.