Yuki ABE Kazutoshi KOBAYASHI Jun SHIOMI Hiroyuki OCHI
Energy harvesting has been widely investigated as a potential solution to supply power for Internet of Things (IoT) devices. Computing devices must operate intermittently rather than continuously, because harvested energy is unstable and some of IoT applications can be periodic. Therefore, processors for IoT devices with intermittent operation must feature a hibernation mode with zero-standby-power in addition to energy-efficient normal mode. In this paper, we describe the layout design and measurement results of a nonvolatile standard cell memory (NV-SCM) and nonvolatile flip-flops (NV-FF) with a nonvolatile memory using Fishbone-in-Cage Capacitor (FiCC) suitable for IoT processors with intermittent operations. They can be fabricated in any conventional CMOS process without any additional mask. NV-SCM and NV-FF are fabricated in a 180nm CMOS process technology. The area overhead by nonvolatility of a bit cell are 74% in NV-SCM and 29% in NV-FF, respectively. We confirmed full functionality of the NV-SCM and NV-FF. The nonvolatile system using proposed NV-SCM and NV-FF can reduce the energy consumption by 24.3% compared to the volatile system when hibernation/normal operation time ratio is 500 as shown in the simulation.
Yuki IMAI Shinichi NISHIZAWA Kazuhito ITO
Environmental power generation devices such as solar cells are used as power sources for IoT devices. Due to the large internal resistance of such power source, LSIs in the IoT devices may malfunction when the LSI operates at high speed, a large current flows, and the voltage drops. In this paper, a standard cell library of stacked structured cells is proposed to increase the delay of logic circuits within the range not exceeding the clock cycle, thereby reducing the maximum current of the LSIs. We show that the maximum power consumption of LSIs can be reduced without increasing the energy consumption of the LSIs.
Daijoon HYUN Younggwang JUNG Youngsoo SHIN
Multiple patterning lithography allows fine patterns beyond lithography limit, but it suffers from a large process cost. In this paper, we address a method to reduce the number of V0 masks; it consists of two sub-problems. First, stitch-induced via (SIV) is introduced to reduce the number of V0 masks. It involves the redesign of standard cells to replace some vias in V0 layer with SIVs, such that the remaining vias can be assigned to the reduced masks. Since SIV formation requires metal stitches in different masks, SIV replacement and metal mask assignment should be solved simultaneously. This sub-problem is formulated as integer linear programming (ILP). In the second sub-problem, inter-row via conflict aware detailed placement is addressed. Single row placement optimization is performed for each row to remove metal and inter-row via conflicts, while minimizing cell displacements. Since it is time consuming to consider many cell operations at once, we apply a few operations iteratively, where different operations are applied to each iteration and to each cell depending on whether the cell has a conflict in the previous iteration. Remaining conflicts are then removed by mapping conflict cells to white spaces. To this end, we minimize the number of cells to move and maximize the number of large white spaces before mapping. Experimental results demonstrate that the cell placement with two V0 masks is completed by proposed methods, with 7 times speedup and 21% reduction in total cell displacement, compared to conventional detailed placement.
Yusuke YOSHIDA Kimiyoshi USAMI
This paper describes a design of energy-efficient Standard Cell Memory (SCM) using Silicon-on-Thin-BOX (SOTB). We present automatic place and routing (P&R) methodology for optimal body-bias separation (BBS) for SCM, which enables to apply different body bias voltages to latches and to other peripheral circuits within SCM. Capability of SOTB to effectively reduce leakage by body biasing is fully exploited in BBS. Simulation results demonstrated that our approach allows us to design SCM with 40% smaller energy dissipation at the energy minimum voltage as compared to the conventional design flow. For the process and temperature variations, Adaptive Body Bias (ABB) for SCM with our BBS provided 70% smaller leakage energy than ABB for the conventional SCM, while achieving the same clock frequency.
Tian WANG Xiaoxin CUI Kai LIAO Nan LIAO Xiaole CUI Dunshan YU
With the decrease in transistor feature size, power consumption, especially leakage power, has become a most important design concern. Because of their superior electrical properties and design flexibility, fin-type field-effect transistors (FinFETs) seem to be the most promising option in low-power applications. In order to support the VLSI digital system design flow based on logic synthesis, this paper proposes a design method for low-power high-performance standard cells based on IG-mode FinFETs. Such a method is derived on the basis of appropriately and artfully designing and optimizing the stacked structures in each standard cell, and applying the mixed forward and reverse back-gate bias technique in a well-chosen manner. The proposed method is also applicable when the supply voltage reduces to 0.5V to further reduce the leakage power consumption. By applying this design method, optimized IG-mode FinFET standard cells are generated, and they form a low-power high-performance standard cell library. Simulation results of the library cells indicate that the performance of the standard cells designed with the proposed method can be maintained while reducing leakage consumption by a factor of 58.9 at most. The 16-bit ripple carry adder implemented with this library can acquire up to 17.5% leakage power reduction.
Dongsheng YANG Tomohiro UENO Wei DENG Yuki TERASHIMA Kengo NAKATA Aravind Tharayil NARAYANAN Rui WU Kenichi OKADA Akira MATSUZAWA
A fully synthesizable all-digital phase-locked loop (AD-PLL) with a stochastic time-to-digital converter (STDC) is proposed in this paper. The whole AD-PLL circuit design is based on only standard cells from digital library, thus the layout of this AD-PLL can be automatically synthesized by a commercial place-and-route (P&R) tool with a foundry-provided standard-cell library. No manual layout and process modification is required in the whole AD-PLL design. In order to solve the delay mismatch issue in the delay-line-based time-to-digital converter (TDC), an STDC employing only standard D flip-flop (DFF) is presented to mitigate the sensitivity to layout mismatch resulted from automatic P&R. For the stochastic TDC, the key idea is to utilize the layout uncertainty due to automatic P&R which follows Gaussian distribution according to statistics theory. Moreover, the fully synthesized STDC can achieve a finer resolution compared to the conventional TDC. Implemented in a 28nm fully depleted silicon on insulator (FDSOI) technology, the fully synthesized PLL consumes only 480µW under 1.0V power supply while operating at 0.9GHz. It achieves a figure of merit (FoM) of -231.1dB with 4.0ps RMS jitter while occupying 0.0055mm2 chip area only.
Tsang-Chi KAN Ying-Jung CHEN Hung-Ming HONG Shanq-Jang RUAN
Well designed redundant via-aware standard cells (SCs) can increase the redundant via1 insertion rate in cell-based designs. However, in conventional methods, manual- and visual-based checks are required to locate pins and tune the geometries of layouts. These tasks can be very time consuming and unreliable. In this work, an O(Nlog N) redundant via-aware standard cell optimization scheme is developed. The proposed method is an efficient layout check and optimization scheme that considers various redundant via configurations including the double-via and rectangle-via to shorten the design time for standard cells. The optimized SCs effectively increase the redundant via insertion rate, and in particular the insertion rate of via1 for both concurrent routing and post-layout optimization. Furthermore, an automatic layout checker and optimizer are more efficient in identifying expandable metal 1 pins in libraries that contain numerous cells than are conventional visual check and manual optimization. Therefore, the proposed scheme not only solves the problem of a low via1 insertion rate in nanometer regimes, but also provides an efficient layout optimizer for designing standard cells. Experimental results indicate that the optimized standard cells increase the double-via1 insertion rates by 21.9%.
Shinichi NISHIZAWA Tohru ISHIHARA Hidetoshi ONODERA
This paper propose a structure of standard cells where the P/N boundary ratio of each cell can be independently customized for near-threshold operation. Lowering the supply voltage is one of the most promising approaches for reducing the power consumption of VLSI circuit, however, this causes an increase of imbalance between rise and fall delays for cells having transistor stacks. Conventional cell library with fixed P/N boundary is not efficient to compensate this delay imbalance. Proposed structure achieves individual P/N boundary ratio optimization for each standard cell, therefore it cancels the imbalance between rise and fall delays at the expense of cell area. Proposed structure is verified using measured result of Ring Oscillator circuits and simulation result of benchmark circuits in 65nm CMOS. The experiments with ISCAS'85 benchmark circuits demonstrate that the standard cell library consisting of the proposed cells reduces the power consumption of the benchmark circuits by 16% on average without increasing the circuit area, compared to that of the same circuit synthesized with a library which is not optimized for the near-threshold operation.
Toshiyuki YAMAGISHI Tatsuo SHIOZAWA Koji HORISAKI Hiroyuki HARA Yasuo UNEKAWA
A completely-digital, on-chip performance monitor is newly proposed in this paper. In addition to a traditional ring oscillator, the proposed monitor has a special buffer chain whose output duty ratio is emphasized by the difference between NMOS and PMOS performances. Thus the performances of NMOS and PMOS transistor can accurately be estimated independently. By using only standard cells, the monitor achieves a small occupied area and process portability. To demonstrate the accuracy of performance estimation and the usability of the monitor, we have fabricated the proposed monitor using 90 nm CMOS process. The estimated errors of the drain saturation current of NMOS and PMOS transistors are 2.0% and 3.4%, respectively. A D/A converter has been also fabricated to verify the usability of the proposed monitor. The output amplitude variation of the D/A converter is successfully reduced to 50.0% by the calibration using the proposed monitor.
Li-Rong WANG Ming-Hsien TU Shyh-Jye JOU Chung-Len LEE
This paper presents a well-structured modified Booth encoding (MBE) multiplier which is applied in the design of a reconfigurable multiply-accumulator (MAC) core. The multiplier adopts an improved Booth encoder and selector to achieve an extra-row-removal and uses a hybrid approach in the two's complementation circuit to reduce the area and improve the speed. The multiplier is used to form a 32-bit reconfigurable MAC core which can be flexibly configured to execute one 3232, two 1616 or four 88 signed multiply-accumulation. Experimentally, when implemented with a 130 nm CMOS single-Vt standard cell library, the multiplier achieved a 15.8% area saving and 11.7% power saving over the classical design, and the reconfigurable MAC achieved a 4.2% area and a 7.4% power saving over the MAC design published so far if implemented with a mixed-Vt standard cell library.
In this paper, we propose novel transmission-gate-based (TG-based) AND gates, TG-based OR gates, and pass-transistor logic gates that have new structures and have lower transistor counts than those proposed by other authors. All our proposed gates operate in full swing and have less leakage currents and shorter delays than conventional CMOS gates. Compared with the conventional 65 nm CMOS gates, our proposed 65 nm gates in this paper can improve leakage currents, dynamic power consumption, and propagation delays by averages of 42.4%, 8.1%, and 13.5%, respectively. Logic synthesizers can use them to facilitate power reduction. The experimental results show that a commercial power optimization tool can further reduce the leakage current and dynamic power up to 39.85% and 18.69%, respectively, when the standard cell library used by the tool contains our proposed gates.
Hirokazu MUTA Hidetoshi ONODERA
We focus our attention on the layout dependent Across Chip Linewidth Variability (ACLV) of gate-forming poly-silicon patterns as a measure for manufacturability, which is a major contributor of systematic gate-length variation. First, we study the ACLV of standard cell layouts by lithography simulation. Then, we introduce regularity in gate-forming poly-silicon patterns and how it improves the ACLV and also how it incurs area-overhead. According to the investigation, we propose two design guidelines for standard-cell layout that can reduce ACLV with reasonable area overhead. Those guidelines include on-grid fixed-pitch layout with dummy-poly insertion and stretched gate-poly extension. Design experiments assuming a 65 nm process technology indicate that a D-FF designed with the first guideline reduces ACLV by 35% with 14% area overhead and the second guideline reduces ACLV by 75% with 29% area overhead at the best focus condition. Under defocus conditions, both layouts exhibit stable characteristics whereas the variability of conventional layout grows rapidly as the level of defocus increases. Circuit-level lithography simulation over benchmark circuits also supports that the proposed guidelines considerably reduces the amount of gate length variation.
Yongqiang LU Chin-Ngai SZE Xianlong HONG Qiang ZHOU Yici CAI Liang HUANG Jiang HU
With VLSI design development, the increasingly severe power problem requests to minimize clock routing wirelength so that both power consumption and power supply noise can be alleviated. In contrast to most of traditional works that handle this problem only in clock routing, we propose to navigate standard cell register placement to locations that enable further less clock routing wirelength and power. To minimize adverse impacts to conventional cell placement goals such as signal net wirelength and critical path delay, the register placement is carried out in the context of a quadratic placement. The proposed technique is particularly effective for the recently popular prescribed skew clock routing. Experiments on benchmark circuits show encouraging results.
Futabako MATSUZAKI Kenichi YODA Junichi KOSHIYAMA Kei MOTOORI Nobuyuki YOSHIKAWA
We have proposed a top-down design methodology for the RSFQ logic circuits based on the Binary Decision Diagram (BDD). In order to show the effectiveness of the methodology, we have designed a small RSFQ microprocessor based on simple architecture. We have compared the performance of the 8-bit RSFQ microprocessor with its CMOS version. It was found that the RSFQ system is superior in terms of the operating speed though it requires extremely large area. We have also implemented and tested a 1-bit ALU that is one of the important components of the microprocessor and confirmed its correct operation.
An analog standard cell layout configuration is proposed for simplifying the design and reducing the man-hours for designing mixed analog-digital LSIs, and analog standard cells are fabricated for A-D and D-A converters with Δ-Σ modulators. This works seeks to implement 2-D cell placement with up-down and left-right mirror rotation and shorter high-impedance analog wiring than conventional 1-D placement in order to obtain high-performance analog characteristics. By considering sensitivity to noise, routing channels have been classified into 4 types: high-impedance analog, low-impedance analog, analog-digital, and digital, and efforts have been made to prevent analog wires from crossing over digital wires. In addition to power and analog ground wires, analog standard cells have built-in analog ground wires with attached wells optimized for shielding. These wires are interconnected to a new isolation cell that separates analog circuits from digital circuits and routing channels. Based on the above layout structure, 46 different types of analog standard cells have been designed. Also, the analog part of Δ-Σ type A-D and D-A converters can be automatically designed in conjunction with interactive processing and chips fabricated by using these cells. It was found that, compared to manual design, one could easily obtain a chip occupying less than 1.5-times the area with about 2/3 the man-days using this approach. In comparison with manual design, it was also found that the S/N ratio could be reduced from about 6 to 7 dB.
Keiichi KOIKE Kenji KAWAI Akira ONOZAWA Yuichiro TAKEI Yoshiji KOBAYASHI Haruhiko ICHINO
A computer-aided low-power design methodology for very high-speed Si bipolar standard cell LSI is described. In order to obtain Gbit/s-speed operation, it features a pair of differential clock channels inside cells and a highly accurate static timing analysis for back annotation. A newly developed CAD-based power optimization scheme minimizes cell currents while maintaining circuit speed. A 5.6 k gate SDH signal-processing LSI operating at 1.6 Gbit/s with only 3.9 W power consumption demonstrates the effectiveness of this design technology.
Yuk-Wah PANG Wing-yun SIT Chiu-sing CHOY Cheong-fat CHAN Wai-kuen CHAM
The performance of synchronous VLSI system is limited by the speed of the global clock which is further constrained by the clock skew. Self-timed design technique, based on the Muller model, improves performance by eliminating the global clock. In order to prevent hazard, a self-timed system should satisfy certain assumptions and timing constraints, therefore special cells are required. The novel Self-timed Cell Library is designed for 1.2µm CMOS technology which contains Muller C-elements, DCVSL circuits, latches and delay elements. It is very useful because: (1) It avoids any possible violations of the assumptions and timing constraints since all cells are custom designed; (2) It provides a fast and reliable model for self-timed system verification using either SPICE simulator or Verilog simulator; (3) It is flexible since it is compatible with an existing Standard Cell Library. In this paper, the library is described. Moreover, the simulated and measured cell characteristics are compared. Using the library, two [18] [81] matrix multipliers employing (1) DCVSL technique, and (2) micropipeline technique have been implemented as design examples and the results are compared. In addition, this paper also demonstrates the benefits of custom-layouted C-elements and a new way to realize delay element for micropipeline. The last but not least, two new HCCs are also proposed.
Tetsushi KOIDE Takeshi SUZUKI Shin'ichi WAKABAYASHI Noriyoshi YOSHIDA
This paper presents a new timing-driven global routing method for standard cell layout. The proposed method can explicitly consider the timing constraint between two registers and minimize the channel density under the given timing constraint. In the proposed method, first, we determine the initial global routes. Next, we improve the global routes to satisfy the timing constraint between two registers as well as to minimize the channel density. Finally, for each cell row, the nets incident to terminals on the cell row are assigned to channels to minimize the channel density using 0-1 integer linear programming. We also show the experimental results of the proposed method implemented on an engineering workstation. Experimental results show that the proposed method is quite promising.
Yasunori OGAWA Kuniichi IKEMURA Shouhei SEKI
Six chips of the GaAs standard cell LSIs have been developed for a synchronous digital hierarchy (SDH) interface unit in 10 Gbit/s optical communication systems. Two of them are the frame termination LSIs for SDH, and four are the byte multiplexing and demultiplexing LSIs. The LSI configuration with a careful thermal design were needed to realize a natural air-cooling operation. As a result, the unit was composed of eight chips with six kind of LSIs and these LSIs consist of 1 K to 3 K gates. The LSIs were designed with the standard cell libraries based on 0.5µm gate DCFL (Direct Coupled FET Logic) operating at a low power supply voltage of 1.5V. The propagation delay time of standard DCFL inverter was 25 ps with a power consumption of 0.45mW in the experimental results. The LSI design methodology using these libraries were discussed to achieve the data processing of 1.25 Gbit/s signals under a natural air-cooling condition. The maximum operating speeds of them were at least 1.4 GHz and the power consumptions were as low as under 1.8 W, which resulted in fully high speed operations under a natural air-cooling condition at an ambient temperature of 100.
Masahiro AKIYAMA Seiji NISHI Yasushi KAWAKAMI
High speed GaAs ICs (Integrated Circutis) using FETs (Field Effect Transistors) are reported. As the fabricating techniques, ion implantation processes for both 0.5 µm and 0.2 µm gate FETs using W/Al refractory metal and 0.2 µm recessed gate process with MBE grown epitaxial wafers are shown. These fabrication processes are selected depending on the circuit speed and the integration level. The outline of the circuit design and the examples of ICs, which are developed for 10 Gb/s optical communication systems, are also shown with the obtained characteristics.