Dehua LIANG Jun SHIOMI Noriyuki MIURA Masanori HASHIMOTO Hiromitsu AWANO
Reservoir computing (RC) is an attractive alternative to machine learning models owing to its computationally inexpensive training process and simplicity. In this work, we propose EnsembleBloomCA, which utilizes cellular automata (CA) and an ensemble Bloom filter to organize an RC system. In contrast to most existing RC systems, EnsembleBloomCA eliminates all floating-point calculation and integer multiplication. EnsembleBloomCA adopts CA as the reservoir in the RC system because it can be implemented using only binary operations and is thus energy efficient. The rich pattern dynamics created by CA can map the original input into a high-dimensional space and provide more features for the classifier. Utilizing an ensemble Bloom filter as the classifier, the features provided by the reservoir can be effectively memorized. Our experiment revealed that applying the ensemble mechanism to the Bloom filter resulted in a significant reduction in memory cost during the inference phase. In comparison with Bloom WiSARD, one of the state-of-the-art reference work, the EnsembleBloomCA model achieves a 43× reduction in memory cost while maintaining the same accuracy. Our hardware implementation also demonstrated that EnsembleBloomCA achieved over 23× and 8.5× reductions in area and power, respectively.
Takehiko AMAKI Masanori HASHIMOTO Takao ONOYE
We propose a jitter amplifier architecture for an oscillator-based true random number generator (TRNG). Two types of latency-controllable (LC) buffer, which are the key components of the proposed jitter amplifier, are presented. We derive an equation to estimate the gain of the jitter amplifier, and analyze sufficient conditions for the proposed circuit to work properly. The proposed jitter amplifier was fabricated with a 65 nm CMOS process. The jitter amplifier with the two-voltage LC buffer occupied 3,300 µm2 and attained 8.4x gain, and that with the single-voltage LC buffer achieved 2.2x gain with an 1,700 µm2 area. The jitter amplification of the sampling clock increased the entropy of a bit stream and improved the results of the NIST test suite so that all the tests passed whereas TRNGs with simple correctors failed. The jitter amplifier attained higher throughput per area than a frequency divider when the required amount of jitter was more than two times larger than the inherent jitter in our test-chip implementations.
Hiroaki KONOURA Dawood ALNAJJAR Yukio MITSUYAMA Hajime SHIMADA Kazutoshi KOBAYASHI Hiroyuki KANBARA Hiroyuki OCHI Takashi IMAGAWA Kazutoshi WAKABAYASHI Masanori HASHIMOTO Takao ONOYE Hidetoshi ONODERA
This paper proposes a mixed-grained reconfigurable architecture consisting of fine-grained and coarse-grained fabrics, each of which can be configured for different levels of reliability depending on the reliability requirement of target applications, e.g. mission-critical applications to consumer products. Thanks to the fine-grained fabrics, the architecture can accommodate a state machine, which is indispensable for exploiting C-based behavioral synthesis to trade latency with resource usage through multi-step processing using dynamic reconfiguration. In implementing the architecture, the strategy of dynamic reconfiguration, the assignment of configuration storage and the number of implementable states are key factors that determine the achievable trade-off between used silicon area and latency. We thus split the configuration bits into two classes; state-wise configuration bits and state-invariant configuration bits for minimizing area overhead of configuration bit storage. Through a case study, we experimentally explore the appropriate number of implementable states. A proof-of-concept VLSI chip was fabricated in 65nm process. Measurement results show that applications on the chip can be working in a harsh radiation environment. Irradiation tests also show the correlation between the number of sensitive bits and the mean time to failure. Furthermore, the temporal error rate of an example application due to soft errors in the datapath was measured and demonstrated for reliability-aware mapping.
Akira TSUCHIYA Masanori HASHIMOTO Hidetoshi ONODERA
This paper discusses the frequency to extract RLC values from interconnects. In circuit design, frequency-independent equivalent circuit is widely used, and many design and analysis techniques based on this equivalent circuit are proposed so far. However in reality, characteristics of interconnects are frequency-dependent. Also pulse waveforms in digital circuits contain multiple frequency components. The frequency used for RLC extraction affects the accuracy of interconnect characterization, and hence careful determination of extraction frequency is critical. We propose a representative frequency for RLC extraction. Conventionally, representative frequencies are determined by input pulse. The proposed method decides the representative frequency based on the interconnect length, whereas conventional representative frequencies are determined by input pulse shape, period and patterns. We verify that the extraction at the proposed frequency provides the most accurate transition waveform against various input signals and interconnect structures in digital circuits.
Atsushi KUROKAWA Masanori HASHIMOTO Akira KASEBE Zhangcai HUANG Yun YANG Yasuaki INOUE Ryosuke INAGAKI Hiroo MASUDA
Simple closed-form expressions for efficiently calculating on-chip interconnect capacitances are presented. The formulas are expressed with second-order polynomial functions which do not include exponential functions. The runtime of the proposed formulas is about 2-10 times faster than those of existing formulas. The root mean square (RMS) errors of the proposed formulas are within 1.5%, 1.3%, 3.1%, and 4.6% of the results obtained by a field solver for structures with one line above a ground plane, one line between ground planes, three lines above a ground plane, and three lines between ground planes, respectively. The proposed formulas are also superior in accuracy to existing formulas.
TaiYu CHENG Yutaka MASUDA Jun NAGAYAMA Yoichi MOMIYAMA Jun CHEN Masanori HASHIMOTO
Reducing power consumption is a crucial factor making industrial designs, such as mobile SoCs, competitive. Voltage scaling (VS) is the classical yet most effective technique that contributes to quadratic power reduction. A recent design technique called activation-aware slack assignment (ASA) enhances the voltage-scaling by allocating the timing margin of critical paths with a stochastic mean-time-to-failure (MTTF) analysis. Meanwhile, such stochastic treatment of timing errors is accepted in limited application domains, such as image processing. This paper proposes a design optimization methodology that achieves a mode-wise voltage-scalable (MWVS) design guaranteeing no timing errors in each mode operation. This work formulates the MWVS design as an optimization problem that minimizes the overall power consumption considering each mode duration, achievable voltage lowering and accompanied circuit overhead explicitly, and explores the solution space with the downhill simplex algorithm that does not require numerical derivation and frequent objective function evaluations. For obtaining a solution, i.e., a design, in the optimization process, we exploit the multi-corner multi-mode design flow in a commercial tool for performing mode-wise ASA with sets of false paths dedicated to individual modes. We applied the proposed design methodology to RISC-V design. Experimental results show that the proposed methodology saves 13% to 20% more power compared to the conventional VS approach and attains 8% to 15% gain from the conventional single-mode ASA. We also found that cycle-by-cycle fine-grained false path identification reduced leakage power by 31% to 42%.
Masanori HASHIMOTO Hidetoshi ONODERA
This paper discusses a gate resizing method for performance enhancement based on statistical static timing analysis. The proposed method focuses on timing uncertainties caused by local random fluctuation. Our method aims to remove both over-design and under-design of a circuit, and realize high-performance and high-reliability LSI design. The effectiveness of our method is examined by 6 benchmark circuits. We verify that our method can reduce the delay time further from the circuits optimized for minimizing the delay without the consideration of delay fluctuation.
Akira TSUCHIYA Masanori HASHIMOTO Hidetoshi ONODERA
This paper discusses the resistive termination of on-chip high-performance interconnects. Resistive termination is effective to improve the bandwidth of on-chip interconnects, on the other hands, increases the power dissipation and the area. Therefore trade-off analysis about resistive termination is necessary. This paper proposes a method to determine the termination of on-chip interconnects. The termination derived by the proposed method provides minimum sensitivity to process variation as well as maximum eye-opening in voltage.
Igors HOMJAKOVS Masanori HASHIMOTO Tetsuya HIROSE Takao ONOYE
This paper presents an architecture of signal-dependent analog-to-digital converter (ADC) based on MINIMAX sampling scheme that allows achieving high data compression rate and power reduction. The proposed architecture consists of a conventional synchronous ADC, a timer and a peak detector. AD conversion is carried out only when input signal peaks are detected. To improve the accuracy of signal reconstruction, MINIMAX sampling is improved so that multiple points are captured for each peak, and its effectiveness is experimentally confirmed. In addition, power reduction, which is the primary advantage of the proposed signal-dependent ADC, is analytically discussed and then validated with circuit simulations.
Takashi ENAMI Shinyu NINOMIYA Ken-ichi SHINKAI Shinya ABE Masanori HASHIMOTO
Clock driver suffers from delay variation due to manufacturing and environmental variabilities as well as combinational cells. The delay variation causes clock skew and jitter, and varies both setup and hold timing margins. This paper presents a timing verification method that takes into consideration delay variation inside a clock network due to both manufacturing variability and dynamic power supply noise. We also discuss that setup and hold slack computation inherently involves a structural correlation problem due to common paths, and demonstrate that assigning individual random variables to upstream clock drivers provides a notable accuracy improvement in clock skew estimation with limited increase in computational cost. We applied the proposed method to industrial designs in 90 nm process. Experimental results show that dynamic delay variation reduces setup slack by over 500 ps and hold slack by 16.4 ps in test cases.
Yasumichi TAKAI Masanori HASHIMOTO Takao ONOYE
This paper investigates power gating implementations that mitigate power supply noise. We focus on the body connection of power-gated circuits, and examine the amount of power supply noise induced by power-on rush current and the contribution of a power-gated circuit as a decoupling capacitance during the sleep mode. To figure out the best implementation, we designed and fabricated a test chip in 65 nm process. Experimental results with measurement and simulation reveal that the power-gated circuit with body-tied structure in triple-well is the best implementation from the following three points; power supply noise due to rush current, the contribution of decoupling capacitance during the sleep mode and the leakage reduction thanks to power gating.
Ryo HARADA Yukio MITSUYAMA Masanori HASHIMOTO Takao ONOYE
This paper presents a measurement circuit structure for capturing SET pulse-width suppressing pulse-width modulation and within-die process variation effects. For mitigating pulse-width modulation while maintaining area efficiency, the proposed circuit uses massively parallelized short inverter chains as a target circuit. Moreover, for each inverter chain on each die, pulse-width calibration is performed. In measurements, narrow SET pulses ranging 5ps to 215ps were obtained. We confirm that an overestimation of pulse-width may happen when ignoring die-to-die and within-die variation of the measurement circuit. Our evaluation results thus point out that calibration for within-die variation in addition to die-to-die variation of the measurement circuit is indispensable.
Daisuke FUKUDA Kenichi WATANABE Yuji KANAZAWA Masanori HASHIMOTO
As the technology of VLSI manufacturing process continues to shrink, it becomes a challenging problem to generate layout patterns that can satisfy performance and manufacturability requirements. Wire width variation is one of the main issues that have a large impact on chip performance and yield loss. Particularly, etching process is the last and most influential process to wire width variation, and hence models for predicting etching induced variation have been proposed. However, they do not consider an effect of global layout variation. This work proposes a prediction model of etching induced wire width variation which takes into account global layout pattern variation. We also present a wire width adjustment method that modifies etching process on the fly according to the critical dimension loss estimated by the proposed prediction model and wire space measurement just before etching process. Experimental results show that the proposed model achieved good performance in prediction, and demonstrated that the potential reduction of the gap between the target wire width and actual wire width thanks to the proposed on-the-fly etching process modification was 68.9% on an average.
Masanori HASHIMOTO Hidetoshi ONODERA Keikichi TAMARU
We present a method for power and delay optimization by input reordering. We observe that the reordering has a significant effect on the power dissipation of the gate which drives the reordered gate. This is because the input capacitance depends on the signal values of other inputs. This property, however, has not been utilized for power reduction. Previous approaches focus on the reduction of the power dissipated by internal capacitances of the reordered gate. We propose a heuristic algorithm considering the total power consumed in the driving gate and the reordered gate. Experimental results using 30 benchmark circuits show that our method reduces the power dissipation in all the circuits by 5.9% on average. There is a possibility that power dissipation is reduced by 22.5% maximum. In the case of delay and power optimization, our method reduces delay by 6.7% and power dissipation by 5.3% on average.
Soft error jeopardizes the reliability of semiconductor devices, especially those working at low voltage. In recent years, silicon-on-thin-box (SOTB), which is a FD-SOI device, is drawing attention since it is suitable for ultra-low-voltage operation. This work evaluates the contributions of SRAM, FF and combinational circuit to chip-level soft error rate (SER) based on irradiation test results. For this evaluation, this work performed neutron irradiation test for characterizing single event transient (SET) rate of SOTB and bulk circuits at 0.5 V. Using the SBU and MCU data in SRAMs from previous work, we calculated the MBU rate with/without error correcting code (ECC) and with 1/2/4-col MUX interleaving. Combining FF error rates reported in literature, we estimated chip-level SER and each contribution to chip-level SER for embedded and high-performance processors. For both the processors, without ECC, 95% errors occur at SRAM in both SOTB and bulk chips at 0.5 V and 1.0 V, and the overall chip-level SERs of the assumed SOTB chip at 0.5 V is at least 10 x lower than that of bulk chip. On the other hand, when ECC is applied to SRAM in the SOTB chip, SEUs occurring at FFs are dominant in the high-performance processor while MBUs at SRAMs are not negligible in the bulk embedded chips.
Takaaki OKUMURA Fumihiro MINAMI Kenji SHIMAZAKI Kimihiko KUWADA Masanori HASHIMOTO
This paper presents a gate delay estimation method that takes into account dynamic power supply noise. We review STA based on static IR-drop analysis and a conventional method for dynamic noise waveform, and reveal their limitations and problems that originate from circuit structures and higher delay sensitivity to voltage in advanced technologies. We then propose a gate delay computation that overcomes the problems with iterative computations and consideration of input voltage drop. Evaluation results with various circuits and noise injection timings show that the proposed method estimates path delay fluctuation well within 1% error on average.
Takashi SATO Junji ICHIMIYA Nobuto ONO Masanori HASHIMOTO
In this paper, we propose a methodology for calculating on-chip temperature gradient and leakage power distributions. It considers the interdependence between leakage power and local temperature using a general circuit simulator as a differential equation solver. The proposed methodology can be utilized in the early stages of the design cycle as well as in the final verification phase. Simulation results proved that consideration of the temperature dependence of the leakage power is critically important for achieving reliable physical designs since the conventional temperature analysis that ignores the interdependence underestimates leakage power considerably and may overlook potential thermal runaway.
Yukio MITSUYAMA Kazuma TAKAHASHI Rintaro IMAI Masanori HASHIMOTO Takao ONOYE Isao SHIRAKAWA
An area-efficient dynamically reconfigurable architecture is proposed, which is dedicated to media processing. To implement a compact but high performance device, which can be used in consumer applications, the reconfigurable architecture distinctively performs 8-bit operations required for media processing whereas fine-grained operations are executed with the cooperation of a host processor. A heterogeneous reconfigurable array is composed of four types of cells, for which configuration data size is reduced by focusing application domain on media processing. Implementation results show that a multi-standard video decoding can be achieved by the proposed reconfigurable architecture with 1.11.4 mm2 in a 90 nm CMOS technology.
Toshiki KANAMOTO Shigekiyo AKUTSU Tamiyo NAKABAYASHI Takahiro ICHINOMIYA Koutaro HACHIYA Atsushi KUROKAWA Hiroshi ISHIKAWA Sakae MUROMOTO Hiroyuki KOBAYASHI Masanori HASHIMOTO
In this letter, we discuss the impact of intrinsic error in parasitic capacitance extraction programs which are commonly used in today's SoC design flows. Most of the extraction programs use pattern-matching methods which introduces an improvable error factor due to the pattern interpolation, and an intrinsically inescapable error factor from the difference of boundary conditions in the electro-magnetic field solver. Here, we study impact of the intrinsic error on timing and crosstalk noise estimation. We experimentally show that the resulting delay and noise estimation errors show a scatter which is normally distributed. Values of the standard deviations will help designers consider the intrinsic error compared with other variation factors.
Takahito MIYAZAKI Masanori HASHIMOTO Hidetoshi ONODERA
This paper discusses performance prediction of clock generation PLLs using a ring oscillator based VCO (RingVCO) and an LC oscillator based VCO (LCVCO). For clock generation, we generally design PLLs using RingVCOs because of their superiority in tunable frequency range, chip area and power consumption, in spite of their poor noise characteristics. In the future, it is predicted that operating frequency will rapidly increase and supply voltage will dramatically decrease. Besides, rigid noise performances will be required. In this condition, it is not clear neither how performances of both PLLs will change nor the performance differences between both PLLs will change. This paper predicts and compares future performances of PLLs using a RingVCO and an LCVCO with a qualitative evaluation by an analytical approach and with design experiments based on predicted process parameters. Our discussion reveals that the relative performance difference between both PLLs will be unchanged. As technology advances, power dissipation and chip area of both PLLs favorably decrease, while, noise characteristics of both PLLs degrade, which indicates low noise PLL circuit design will be more important.