Takao ONOYE Toshihiro MASAKI Yasuo MORIMOTO Yoh SATO Isao SHIRAKAWA Kenji MATSUMURA
A single chip MPEG2 MP@HL Video decoder has been developed, which consists mainly of specific functional units and macroblock level pipeline buffers. A new organization is also devised for a set of off-chip frame memories and the interfaces associated with it. Owing to sophisticated I/O interfaces among functional units, the macroblock level pipeline in conjunction with different decording facilities attains a high throughput to such an extent as to decode HDTV images in real time. Moreover, a set of these functional units, pipeline buffers, and frame memory interfaces, together with a sequence controller, is integrated for the first time in a single chip, which has the total area of 8.8 9.2mm2 with a 0.6µm triple-mental CMOS technology, and dissipates 1.2 W from a single 3.3 V supply.
Kenshiro KATO Daichi WATARI Ittetsu TANIGUCHI Takao ONOYE
Solar energy is an important energy resource for a sustainable society and is massively introduced these days. Household generally sells their excess solar energy by the reverse power flow, but the massive reverse power flow usually sacrifices the grid stability. In order to utilize renewable energy effectively and reduce solar energy waste, electric vehicles (EVs) takes an important role to fill in the spatiotemporal gap of solar energy. This paper proposes a novel EV aggregation framework for spatiotemporal shifting of solar energy without any reverse power flow. The proposed framework causes charging and discharging via an EV aggregator by intentionally changing the price, and the solar energy waste is expected to reduce by the energy trade. Simulation results show the proposed framework reduced the solar energy waste by 68%.
Twe Ta OO Takao ONOYE Kilho SHIN
The MPEG-1 layer-III compressed audio format, which is widely known as MP3, is the most popular for audio distribution. However, it is not equipped with security features to protect the content from unauthorized access. Although encryption ensures content security, the naive method of encrypting the entire MP3 file would destroy compliance with the MPEG standard. In this paper, we propose a low-complexity partial encryption method that is embedded during the MP3 encoding process. Our method reduces time consumption by encrypting only the perceptually important parts of an MP3 file rather than the whole file, and the resulting encrypted file is still compatible with the MPEG standard so as to be rendered by any existing MP3 players. For full-quality rendering, decryption using the appropriate cryptographic key is necessary. Moreover, the effect of encryption on audio quality can be flexibly controlled by adjusting the percentage of encryption. On the basis of this feature, we can realize the try-before-purchase model, which is one of the important business models of Digital Rights Management (DRM): users can render encrypted MP3 files for trial and enjoy the contents in original quality by purchasing decryption keys. From our experiments, it turns out that encrypting 2-10% of MP3 data suffices to generate trial music, and furthermore file size increasing after encryption is subtle.
Shoichi IIZUKA Yuma HIGUCHI Masanori HASHIMOTO Takao ONOYE
The RO (Ring-Oscillator)-based sensor is one of easily-implementable variation sensors, but for decomposing the observed variability into multiple unique device-parameter variations, a large number of ROs with different structures and sensitivities to device-parameters is required. This paper proposes an area efficient device parameter estimation method with sensitivity-configurable ring oscillator (RO). This sensitivity-configurable RO has a number of configurations and the proposed method exploits this property for reducing sensor area and/or improving estimation accuracy. The proposed method selects multiple sets of sensitivity configurations, obtains multiple estimates and computes the average of them for accuracy improvement exploiting an averaging effect. Experimental results with a 32-nm predictive technology model show that the proposed averaging with multiple estimates can reduce the estimation error by 49% or reduce the sensor area by 75% while keeping the accuracy. Compared to previous work with iterative estimation, 23% accuracy improvement is achieved.
Hiroaki KONOURA Takashi IMAGAWA Yukio MITSUYAMA Masanori HASHIMOTO Takao ONOYE
Fault tolerant methods using dynamically reconfigurable devices have been studied to overcome wear-out failures. However, quantitative comparisons have not been sufficiently assessed on device lifetime enhancement with these methods, whereas they have mainly been evaluated individually from various viewpoints such as additional hardware overheads, performance, and downtime for fault recovery. This paper presents quantitative lifetime evaluations performed by simulating the fault-avoidance procedures of five representative methods under the same conditions in wear-out scenarios, applications, and device architecture. The simulation results indicated that improvements of up to 70% mean-time-to-failure (MTTF) in comparison with ideal fault avoidance could be achieved by using methods of fault avoidance with ‘row direction shift’ and ‘dynamic partial reconfiguration’. ‘Column shift’, on the other hand, attained a high degree of stability with moderate improvements in MTTF. The experimental results also revealed that spare basic elements (BEs) should be prevented from aging so that improvements in MTTF would not be adversely affected. Moreover, we found that the selection of initial mappings guided by wire utilization could increase the lifetimes of partial reconfiguration based fault avoidance.
Ken-ichi SHINKAI Masanori HASHIMOTO Takao ONOYE
Device-parameter estimation sensors inside a chip are gaining its importance as the post-fabrication tuning is becoming of a practical use. In estimation of variational parameters using on-chip sensors, it is often assumed that the outputs of variation sensors are not affected by random variations. However, random variations can deteriorate the accuracy of the estimation result. In this paper, we propose a device-parameter estimation method with on-chip variation sensors explicitly considering random variability. The proposed method derives the global variation parameters and the standard deviation of the random variability using the maximum likelihood estimation. We experimentally verified that the proposed method improves the accuracy of device-parameter estimation by 11.1 to 73.4% compared to the conventional method that neglects random variations.
Hiroaki KONOURA Toshihiro KAMEDA Yukio MITSUYAMA Masanori HASHIMOTO Takao ONOYE
Negative Bias Temperature Instability (NBTI) is one of the serious concerns for long-term circuit performance degradation. NBTI degrades PMOS transistors under negative bias, whereas they recover once negative bias is removed. In this paper, we propose a mitigation method for NBTI-induced performance degradation that exploits the recovery property by shifting random input sequence through scan paths. With this method, we prevent consecutive stress that causes large degradation. Experimental results reveal that random scan-in vectors successfully mitigate NBTI and the path delay degradation is reduced by 71% in a test case when standby mode occupies 10% of total time. We also confirmed that 8-bit LFSR is capable of random number generation for this purpose with low area and power overhead.
Koichi MITSUNARI Jaehoon YU Takao ONOYE Masanori HASHIMOTO
Visual object detection on embedded systems involves a multi-objective optimization problem in the presence of trade-offs between power consumption, processing performance, and detection accuracy. For a new Pareto solution with high processing performance and low power consumption, this paper proposes a hardware architecture for decision tree ensemble using multiple channels of features. For efficient detection, the proposed architecture utilizes the dimensionality of feature channels in addition to parallelism in image space and adopts task scheduling to attain random memory access without conflict. Evaluation results show that an FPGA implementation of the proposed architecture with an aggregated channel features pedestrian detector can process 229 million samples per second at 100MHz operation frequency while it requires a relatively small amount of resources. Consequently, the proposed architecture achieves 350fps processing performance for 1080P Full HD images and outperforms conventional object detection hardware architectures developed for embedded systems.
Shuta KIMURA Masanori HASHIMOTO Takao ONOYE
Post-silicon tuning is attracting a lot of attention for coping with increasing process variation. However, its tuning cost via testing is still a crucial problem. In this paper, we propose tuning-friendly body bias clustering with multiple bias voltages. The proposed method provides a small set of compensation levels so that the speed and leakage current vary monotonically according to the level. Thanks to this monotonic leveling and limitation of the number of levels, the test-cost of post-silicon tuning is significantly reduced. During the body bias clustering, the proposed method explicitly estimates and minimizes the average leakage after the post-silicon tuning. Experimental results demonstrate that the proposed method reduces the average leakage by 25.3 to 51.9% compared to non clustering case. In a test case of four clusters, the number of necessary tests is reduced by 83% compared to the conventional exhaustive test approach. We reveal that two bias voltages are sufficient when only a small number of compensation levels are allowed for test-cost reduction. We also give an implication on how to synthesize a circuit to which post-silicon tuning will be applied.
Takuya OKAMOTO Takafumi YUASA Tomonori IZUMI Takao ONOYE Yukihiro NAKAMURA
A configurable device "PCA-Chip2" implements the concept of Plastic Cell Architecture, which is an extension of programmable logic devices. This paper presents basic design tools for the PCA-Chip2 as the first step to develop the total design environment. Given a C description of a target function, configuration data for PCA-Chip2 is automatically generated by the tools. Trial designs by the tools are also presented to demonstrate the practicability of the proposed approach.
Masakazu IWAI Takuya FUTAGAMI Noboru HAYASAKA Takao ONOYE
In this paper, we improve upon the automatic building extraction method, which uses a variational inference Gaussian mixture model for performing color clustering, by accelerating its computational speed. The improved method decreases the computational time using an image with reduced resolution upon applying color clustering. According to our experiment, in which we used 106 scenery images, the improved method could extract buildings at a rate 86.54% faster than that of the conventional methods. Furthermore, the improved method significantly increased the extraction accuracy by 1.8% or more by preventing over-clustering using the reduced image, which also had a reduced number of the colors.
Ken-ichi SHINKAI Masanori HASHIMOTO Takao ONOYE
This paper investigates whether the self-heating effect in short intra-block wires will become apparent with technology scaling. These wires seem to have good thermal radiation characteristics, but we validate that the self-heating effect in local signal wires will be greater than that in optimal repeater-inserted global wires. Our numerical experiment shows that the maximum temperature increase from the silicon junction temperature will reach 40.4 in a steady state at a 14-nm process. Our attribution analysis also demonstrates that miniaturizing the area of wire cross-section exacerbates self-heating as well as using low-κ material and increased power dissipation in advanced technologies below 28 nm. It is revealed that the impact of self-heating on performance in local wires is limited, while underestimating the temperature may cause an unexpected reliability failure.
Takao ONOYE Akihisa YAMADA Itthichai ARUNGSRISANGCHAI Masakazu TANAKA Isao SHIRAKAWA
An autonatic layout scheme dedicated to bipolar analog modules is described. A layout model is settled in such a way that the VCC/GND line is laid out on top/bottom edge of a rectangular region, within which the whole elements are placed and interconnected. According to this simple modeling, a layout scheme can be constructed of a series of the following algorithms: First clustering is executed for partitioning a given circuit into clusters, each having connections with VCC and GND lines, and then linear ordering is applied to clusters so as to be placed in a one-dimensional array. After a relative placement of circuits elements in each cluster, a block compactor is implemented by means of packing blocks in each cluster into an idle space, and then a detailed router is conducted to attain 100% interconnection. Finally a layout compactor is invoked to pack all layout patterns into a rectangle of the minimum possible area. A number of implementation results are also shown to reveal the practicability of the proposed analog module generator.
Katsuya NAKAGAWA Masaru KAWAKITA Koji SATO Mitsuru MINAKUCHI Takao ONOYE Toru CHIBA Isao SHIRAKAWA
In recent years, information devices with network communication ability have become very popular, and many people actually own such kind of devices. Those information devices, however, do not share users' data in spite of their communication ability. This paper proposes "OCEAN: Object Communication Environment for Arbitrary Network" architecture, which provides liaison of objects stored in each device according to their profiles and situations. It eliminates redundant user operation on information devices, and enables novel communication scheme among users by sharing common objects in those devices. Furthermore, it maximizes the effective use of each device's limitation according to each environment. Finally, in this paper, we discuss our prototype of OCEAN.
Xuzhen XIE Takao ONO Tomio HIRATA
Karger, Motwani and Sudan presented a graph coloring algorithm based on semidefinite programming, which colors any k-colorable graph with maximum degree Δ using
Yukio MITSUYAMA Motoki KIMURA Takao ONOYE Isao SHIRAKAWA
VLSI architecture of IEEE802.11i cipher algorithms is devised dedicatedly for embedded implementation of IEEE802.11a/g wireless communication systems. The proposed architecture consists mainly of RC4 unit for WEP/TKIP and AES unit. The RC4 unit successfully adopts packed memory accessing architecture. As for the AES unit, overlapped pipeline scheme of CBC-MAC and Counter-Mode is exploited in order to conceal processing latency. The cipher core has been implemented with 18 Kgates in 0.18 µm CMOS technology, which achieves the maximum transmission rate of IEEE802.11a/g at 60 MHz clock frequency while consuming 14.5 mW of power.
A fusion framework between CNN and RNN is proposed dedicatedly for air-writing recognition. By modeling the air-writing using both spatial and temporal features, the proposed network can learn more information than existing techniques. Performance of the proposed network is evaluated by using the alphabet and numeric datasets in the public database namely the 6DMG. Average accuracy of the proposed fusion network outperforms other techniques, i.e. 99.25% and 99.83% are observed in the alphabet gesture and the numeric gesture, respectively. Simplified structure of RNN is also proposed, which can attain about two folds speed-up of ordinary BLSTM network. It is also confirmed that only the distance between consecutive sampling points is enough to attain high recognition performance.
Hiroshi TSUTSUI Akihiko TOMITA Shigenori SUGIMOTO Kazuhisa SAKAI Tomonori IZUMI Takao ONOYE Yukihiro NAKAMURA
In this paper, a design of Programmable Logic Device (PLD) and a synthesis approach are proposed. Our PLD is derived from traditional Programmable Logic Array (PLA). The key extension is that programmable AND devices in PLA is replaced by Look-Up Tables (LUTs). A series of cascaded LUTs in the array can generate more complex terms, which we call generalized complex terms (GCTs), than product terms. In order to utilize the capability, a synthesis approach to map a given function into the array is also proposed. Our approach generates a expression of the sum of GCTs aiming to minimize the number of terms. A number of experimental results demonstrate that the number of terms for our PLD generated by our approach is 14.9% fewer than that by an existing approach. We design our PLD based on a fundamental unit named nGCT cell which can be used as LUTs in multiple sizes or random access memories. Implementation of the PLD based on a fundamental unit named nGCT cell which can be used as LUTs or random access memories is also described.
Hiroyuki OKADA Altan-Erdene SHIITEV Hak-Sop SONG Gen FUJITA Takao ONOYE Isao SHIRAKAWA
This paper describes a new approach to the digital watermarking of motion pictures dedicatedly for the MPEG-4 video coding, which intends to enhance the error detection ability. The conventional method lacks not only the detection ability but also the compatibility with video decoders widely used today. Thus in this approach the digital watermarks are to be embedded into the quantized DCT (Discrete Cosine Transform) coefficients for the error detection, where the prevention of the picture quality degradation is also attempted. Experimental results are shown to demonstrate that the error detection ability of the proposed approach is significantly improved, as compared with that of the conventional method, and that the degradation of the picture quality by the watermarking is extremely small.