1-7hit |
Yutaka MASUDA Jun NAGAYAMA TaiYu CHENG Tohru ISHIHARA Yoichi MOMIYAMA Masanori HASHIMOTO
This work proposes a design methodology that saves the power dissipation under voltage over-scaling (VOS) operation. The key idea of the proposed design methodology is to combine critical path isolation (CPI) and bit-width scaling (BWS) under the constraint of computational quality, e.g., Peak Signal-to-Noise Ratio (PSNR) in the image processing domain. Conventional CPI inherently cannot reduce the delay of intrinsic critical paths (CPs), which may significantly restrict the power saving effect. On the other hand, the proposed methodology tries to reduce both intrinsic and non-intrinsic CPs. Therefore, our design dramatically reduces the supply voltage and power dissipation while satisfying the quality constraint. Moreover, for reducing co-design exploration space, the proposed methodology utilizes the exclusiveness of the paths targeted by CPI and BWS, where CPI aims at reducing the minimum supply voltage of non-intrinsic CP, and BWS focuses on intrinsic CPs in arithmetic units. From this key exclusiveness, the proposed design splits the simultaneous optimization problem into three sub-problems; (1) the determination of bit-width reduction, (2) the timing optimization for non-intrinsic CPs, and (3) investigating the minimum supply voltage of the BWS and CPI-applied circuit under quality constraint, for reducing power dissipation. Thanks to the problem splitting, the proposed methodology can efficiently find quality-constrained minimum-power design. Evaluation results show that CPI and BWS are highly compatible, and they significantly enhance the efficacy of VOS. In a case study of a GPGPU processor, the proposed design saves the power dissipation by 42.7% with an image processing workload and by 51.2% with a neural network inference workload.
Yu CHENG Anguo MA Minxuan ZHANG
Soft errors caused by energetic particle strikes in on-chip cache memories have become a critical challenge for microprocessor design. Architectural vulnerability factor (AVF), which is defined as the probability that a transient fault in the structure would result in a visible error in the final output of a program, has been widely employed for accurate soft error rate estimation. Recent studies have found that designing soft error protection techniques with the awareness of AVF is greatly helpful to achieve a tradeoff between performance and reliability for several structures (i.e., issue queue, reorder buffer). Considering large on-chip L2 cache, redundancy-based protection techniques (such as ECC) have been widely employed for L2 cache data integrity with high costs. Protecting caches without accurate knowledge of the vulnerability characteristics may lead to the over-protection, thus incurring high overheads. Therefore, designing AVF-aware protection techniques would be attractive for designers to achieve a cost-efficient protection for caches, especially at early design stage. In this paper, we propose an improved AVF estimation framework for conducing comprehensive characterization of dynamic behavior and predictability of L2 cache vulnerability. We propose to employ Bayesian Additive Regression Trees (BART) method to accurately model the variation of L2 cache AVF and to quantitatively explain the important effects of several key performance metrics on L2 cache AVF. Then we employ bump hunting technique to extract some simple selecting rules based on several key performance metrics for a simplified and fast estimation of L2 cache AVF. Using the simplified L2 cache AVF estimator, we develop an AVF-aware ECC technique as an example to demonstrate the cost-efficient advantages of the AVF prediction based dynamic fault tolerant techniques. Experimental results show that compared with traditional full ECC technique, AVF-aware ECC technique reduces the L2 cache access latency by 16.5% and saves power consumption by 11.4% for SPEC2K benchmarks averagely.
The quasi-ARX neurofuzzy (Q-ARX-NF) model has shown great approximation ability and usefulness in nonlinear system identification and control. It owns an ARX-like linear structure, and the coefficients are expressed by an incorporated neurofuzzy (InNF) network. However, the Q-ARX-NF model suffers from curse-of-dimensionality problem, because the number of fuzzy rules in the InNF network increases exponentially with input space dimension. It may result in high computational complexity and over-fitting. In this paper, the curse-of-dimensionality is solved in two ways. Firstly, a support vector regression (SVR) based approach is used to reduce computational complexity by a dual form of quadratic programming (QP) optimization, where the solution is independent of input dimensions. Secondly, genetic algorithm (GA) based input selection is applied with a novel fitness evaluation function, and a parsimonious model structure is generated with only important inputs for the InNF network. Mathematical and real system simulations are carried out to demonstrate the effectiveness of the proposed method.
Tzu-Yu CHENG Yoshio YAMAGUCHI Kun-Shan CHEN Jong-Sen LEE Yi CUI
In this paper, a multi-temporal analysis of polarimetric synthetic aperture radar (Pol-SAR) data over the sandbank and oyster farm area is presented. Specifically, a four-component scattering model, being able to identify single bounce, double bounce, volume, and helix scattering power contributions, has been employed to retrieve information. Decomposition results of a time series RADARSAT Pol-SAR images acquired over the western Taiwan coast indicate that the coastal tide level plays a key role in the sandbank and oyster farm monitoring. At high tide levels, the underlying sandbank creates a shallow area with an increased roughness of the above sea surface, leading to an enhanced surface scattering power as compared to the ambient water. Contrarily, at low tide levels, the exposed sandbank appears to be a smooth scatterer, generating decreased backscattering power than the surrounding area. On the other hand, the double-bounce scattering power is shown to be highly correlated with the tide level in the oyster farms due to their vertical structures. This also demonstrates a promising potential of the four-component scattering power decomposition for coastal tide level monitoring applications.
Qinghua SHENG Yu CHENG Xiaofang HUANG Changcai LAI Xiaofeng HUANG Haibin YIN
Dependent Quantization (DQ) is a new quantization tool introduced in the Versatile Video Coding (VVC) standard. While it provides better rate-distortion calculation accuracy, it also increases the computational complexity and hardware cost compared to the widely used scalar quantization. To address this issue, this paper proposes a parallel-dependent quantization hardware architecture using Verilog HDL language. The architecture preprocesses the coefficients with a scalar quantizer and a high-frequency filter, and then further segments and processes the coefficients in parallel using the Viterbi algorithm. Additionally, the weight bit width of the rate-distortion calculation is reduced to decrease the quantization cycle and computational complexity. Finally, the final quantization of the TU is determined through sequential scanning and judging of the rate-distortion cost. Experimental results show that the proposed algorithm reduces the quantization cycle by an average of 56.96% compared to VVC’s reference platform VTM, with a Bjøntegaard delta bit rate (BDBR) loss of 1.03% and 1.05% under the Low-delay P and Random Access configurations, respectively. Verification on the AMD FPGA development platform demonstrates that the hardware implementation meets the quantization requirements for 1080P@60Hz video hardware encoding.
Kuo-Hsiung TSENG Ching-Lin HUANG Pei-Yu CHENG Zih-Ciao WEI
This paper is focused on discussing a low-voltage system for lightning, and in particular the testing equipment of surge arresters. Only by demonstrating the performance and applicability of arresters can we seek the most feasible and economic low-voltage solutions. After performing repeated experiments with the same testing samples, using different testing equipment, we compare the different test results in order to select the most suitable and applicable testing equipment. In addition, the basis of a surge current parameter design theory is confirmed and verified through the test results using a simple and compact Impulse Current Generator to test a wide range of samples. By performing the actual analyzes and experiments, we can understand deeply how R, L, and C affect surge current, current wave, and current wave time. The ideal testing equipment standards have been set as follows: (1) Test Voltage up to 20 kV; (2) Expand current range from 1.5 kA to 46.5 kA, with resolution 1.5 kA; and (3) Simple operational procedures.
TaiYu CHENG Yutaka MASUDA Jun NAGAYAMA Yoichi MOMIYAMA Jun CHEN Masanori HASHIMOTO
Reducing power consumption is a crucial factor making industrial designs, such as mobile SoCs, competitive. Voltage scaling (VS) is the classical yet most effective technique that contributes to quadratic power reduction. A recent design technique called activation-aware slack assignment (ASA) enhances the voltage-scaling by allocating the timing margin of critical paths with a stochastic mean-time-to-failure (MTTF) analysis. Meanwhile, such stochastic treatment of timing errors is accepted in limited application domains, such as image processing. This paper proposes a design optimization methodology that achieves a mode-wise voltage-scalable (MWVS) design guaranteeing no timing errors in each mode operation. This work formulates the MWVS design as an optimization problem that minimizes the overall power consumption considering each mode duration, achievable voltage lowering and accompanied circuit overhead explicitly, and explores the solution space with the downhill simplex algorithm that does not require numerical derivation and frequent objective function evaluations. For obtaining a solution, i.e., a design, in the optimization process, we exploit the multi-corner multi-mode design flow in a commercial tool for performing mode-wise ASA with sets of false paths dedicated to individual modes. We applied the proposed design methodology to RISC-V design. Experimental results show that the proposed methodology saves 13% to 20% more power compared to the conventional VS approach and attains 8% to 15% gain from the conventional single-mode ASA. We also found that cycle-by-cycle fine-grained false path identification reduced leakage power by 31% to 42%.