1-13hit |
A new hardware algorithm for the block matching video motion estimation is presented. The algorithm works in the full-search fashion but unlike the Full-Search Block Matching Algorithm (FSBMA) it adjusts the number of computations dynamically to variable picture contents. Due to incorporated mechanism of data-driven thresholding, the proposed algorithm performs as four times as less operations comparing to the FSBMA while maintaining the same quality of results. Its hardware implementation is simple and compact. A supportive hardware design as well as simulation results on benchmarks are outlined.
Vasily G. MOSHNYAGA Tomoyuki YAMANAKA
Design of portable battery operated multimedia devices requires energy-efficient multiplication circuits. This paper proposes a novel architectural technique to reduce power consumption of digital multipliers. Unlike related approaches which focus on multiplier transition activity reduction, we concentrate on dynamic reduction of supply voltage. Two implementation schemes capable of dynamically adjusting a double voltage supply to input data variation are presented. Simulations show that using these schemes we can reduce energy consumption of 1616-bit multiplier by 34% and 29% on peak and by 10% and 7% on average with area overhead of 15% and 4%, respectively, while maintaining the performance of traditional multiplier.
Koji INOUE Vasily G. MOSHNYAGA Kazuaki MURAKAMI
In this paper, we propose an instruction encoding scheme to reduce power consumption of instruction ROMs. The power consumption of the instruction ROM strongly depends on the switching activity of bit-lines due to their large load capacitance. In our approach, the binary-patterns to be assigned as op-codes are determined based on the frequency of instructions in order to reduce the number of bit-line dis-charging. Simulation results show that our approach can reduce 40% of bit-line switchings from a conventional organization.
Vasily G. MOSHNYAGA Keikichi TAMARU
As IC fabrication technology enters a deepsubmicron region with device feature sizes <0.35µm, interconnect becomes the most dominant factor in design of high-speed Application Specific Integrated Circuits (ASICs). This paper proposes a novel methodology for automated data-path synthesis of such circuits and outlines algorithms to support it. In contrast to other approaches, we formulate interconnect area/delay optimizations as high-level synthesis transformations and use them during the synthesis to minimize the impact of wiring on circuit characteristics. Experiments with FIR filter implementations show that such formulation jointly with on the fly" module generation and performance-driven floorplanning provides more than a 30% reduction in wiring delay for deep sub-micron designs.
With increased size and issue-width, instruction issue queue becomes one of the most energy consuming units in today's superscalar microprocessors. This paper presents a novel architectural technique to reduce energy dissipation of adaptive issue queue, whose functionality is dynamically adjusted at runtime to match the changing computational demands of instruction stream. In contrast to existing schemes, the technique exploits a new freedom in queue design, namely the voltage per access. Since loading capacitance operated in the adaptive queue varies in time, the clock cycle budget becomes inefficiently exploited. We propose to trade-off the unused cycle time with supply voltage, lowering the voltage level when the queue functionality is reduced and increasing it with the activation of resources in the queue. Experiments show that the approach can save up to 39% of the issue queue energy without large performance and area overhead.
Koji INOUE Vasily G. MOSHNYAGA Kazuaki MURAKAMI
One of uncompromising requirements from portable computing is energy efficiency, because that affects directly the battery life. On the other hand, portable computing will target more demanding applications, for example moving pictures, so that higher performance is still required. Cache memories have been employed as one of the most important components of computer systems. In this paper, we briefly survey architectural techniques for high performance, low power cache memories.
Vasily G. MOSHNYAGA Hiroshi TSUJI
Due to a large capacitance and enormous access rate, caches dissipate about a third of the total energy consumed by today's processors. In this paper we present a new architectural technique to reduce energy consumption in caches. Unlike previous approaches, which have focused on lowering cache capacitance and the number of accesses, our method exploits a new freedom in cache design, namely the voltage per access. Since in modern block-buffered caches, the loading capacitance operated on block-hit is much less than the capacitance operated on miss, the given clock cycle time is inefficiently utilized during the hit. We propose to trade-off this unused time with the supply voltage, lowering the voltage level on the hit and increasing it on the miss. Experiments show that the approach can half the cache energy dissipation without large performance and area overhead.
Vasily G. MOSHNYAGA Keikichi TAMARU Hiroto YASUURA
A new applicative design language is proposed for developing generators of data-path modules from hardware algorithms. The language includes a set of primitives that represent placement operations, parameterized cells, routing patterns and a set of transformation rules specifying modifications of the module topology without changing its functionality. Using the language, a hardware algorithm designer can easily define both the topological and geometrical specifications of module generation directly at the functional level without engaged in the layout details. A sketch of the language and an example of module design with the language is presented.
Koji INOUE Vasily G. MOSHNYAGA Kazuaki MURAKAMI
In this paper, we propose a novel architecture for low-power direct-mapped instruction caches, called "history-based tag-comparison (HBTC) cache. " The cache attempts to reuse tag-comparison results for avoiding unnecessary tag checks. Execution footprints are recorded into an extended BTB (Branch Target Buffer). In our evaluation, it is observed that the energy for tag comparison can be reduced by more than 90% in many applications.
Vasily G. MOSHNYAGA Yutaka MORI Keikichi TAMARU
In order to shorten the time-to-market, Application-Specific Integrated Circuits (ASIC's) are designed from a library of pre-defined layout implementations for register-transfer modules such as multipliers, adders, RAM, ROM, etc. Current approaches to selecting the implementations from the library usually deal with their timing-area estimates and do not consider delay of the intermodule wiring. However, as sub-micron design rules are utilized for IC fabrication, wiring delay becomes comparable to the functional unit delay and can not longer be ignored even in register-transfer synthesis. In this paper we propose an algorithm that combines module selection with Performance-Driven module placement and reduces an impact of wiring on sub-micron ASIC performance. The algorithm not only efficiently exploits multiple module realizations in the design library, but also finds the module placement which minimizes wiring delay. Experimental results on several benchmarks show that considering both module and wiring issues, more than 30% reduction of the total circuit delay can be achieved.
Tomoaki ANDO Vasily G. MOSHNYAGA Koji HASHIMOTO
This paper introduces new FPGA design of user-monitoring system for power management of PC display. From the camera readings the system detects whether the user looks at the screen or not and produces signals to control the display backlight. The system provides over 88% eye detection accuracy at 8f/s image processing rate. We describe new eye-tracking algorithm and hardware and present the results of its experimental evaluation in prototype display power management system.
Vasily G. MOSHNYAGA Koichi MASUNAGA
A new algorithm and architecture to eliminate redundant operations in block-matching (BM) motion estimation is proposed. The key step of this work is to use binary-matching to define image regions with the static background content and then exclude these regions from the actual motion estimation. According to experiments, the approach maintains the highest PSNR, while making as half as less computations in comparison to the adaptive BM or 1/8 of the computations required by the full-search BM. An implementation scheme is outlined.
Reiko KOMIYA Koji INOUE Vasily G. MOSHNYAGA Kazuaki MURAKAMI
As the transistor feature sizes and threshold voltages reduce, leakage energy consumption has become an inevitable issue for high-performance microprocessor designs. Since on-chip caches are major contributors of the leakage, a number of researchers have proposed efficient leakage reduction techniques. However, it is still not clear that 1) what kind of algorithm can be considered and 2) how much they have impact on energy and performance. To answer these questions, we explore run-time cache management algorithm, and evaluate the energy-performance efficiency for several alternatives.