This letter introduces an innovation for the heterogeneous storage architecture of AI chips, specifically focusing on the integration of six transistors(6T) and eight transistors(8T) hybrid SRAM. Traditional approaches to reducing SRAM power consumption typically involve lowering the operating voltage, a method that often substantially diminishes the recognition rate of neural networks. However, the innovative design detailed in this letter amalgamates the strengths of both SRAM types. It operates at a voltage lower than conventional SRAM, thereby significantly reducing the power consumption in neural networks without compromising performance.
Masaya MIYAHARA Zule XU Takehito ISHII Noritoshi KIMURA
In this paper, we propose a hybrid crystal oscillator which achieves both quick startup and low steady-state power consumption. At startup, a large negative resistance is realized by configuring a Pierce oscillating circuit with a multi-stage inverter amplifier, resulting in high-speed startup. During steady-state oscillation, the oscillator is reconfigured as a class-C complementary Colpitts circuit for low power consumption and low phase noise. Prototype chips were fabricated in 65nm CMOS process technology. With Pierce-type configuration, the measured startup time and startup energy of the oscillator are reduced to 1/11 and 1/5, respectively, compared with the one without Pierce-type configuration. The power consumption during steady oscillation is 30 µW.
Lingxiao HOU Yutaka MASUDA Tohru ISHIHARA
The approximate logarithmic multiplier proposed by Mitchell provides an efficient alternative for processing dense multiplication or multiply-accumulate operations in applications such as image processing and real-time robotics. It offers the advantages of small area, high energy efficiency and is suitable for applications that do not necessarily achieve high accuracy. However, its maximum error of 11.1% makes it challenging to deploy in applications requiring relatively high accuracy. This paper proposes a novel operand decomposition method (OD) that decomposes one multiplication into the sum of multiple approximate logarithmic multiplications to widely reduce Mitchell multiplier errors while taking full advantage of its area savings. Based on the proposed OD method, this paper also proposes an accuracy reconfigurable multiply-accumulate (MAC) unit that provides multiple reconfigurable accuracies with high parallelism. Compared to a MAC unit consisting of accurate multipliers, the area is significantly reduced to less than half, improving the hardware parallelism while satisfying the required accuracy for various scenarios. The experimental results show the excellent applicability of our proposed MAC unit in image smoothing and robot localization and mapping application. We have also designed a prototype processor that integrates the minimum functionality of this MAC unit as a vector accelerator and have implemented a software-level accuracy reconfiguration in the form of an instruction set extension. We experimentally confirmed the correct operation of the proposed vector accelerator, which provides the different degrees of accuracy and parallelism at the software level.
Xiangyu MENG Kangfeng WEI Zhiyi YU Xinlun CAI
This paper proposes a low-power 100Gb/s four-level pulse amplitude modulation driver (PAM-4 Driver) based on linear distortion compensation structure for thin-film Lithium Niobate (LiNbO3) modulators, which manages to achieve high linearity in the output. The inductive peaking technology and open drain structure enable the overall circuit to achieve a 31-GHz bandwidth. With an area of 0.292 mm2, the proposed PAM-4 driver chip is designed in a 65-nm process to achieve power consumption of 37.7 mW. Post-layout simulation results show that the power efficiency is 0.37 mW/Gb/s, RLM is more than 96%, and the FOM value is 8.84.
Hongjie XU Jun SHIOMI Hidetoshi ONODERA
Hardware accelerators are designed to support a specialized processing dataflow for everchanging deep neural networks (DNNs) under various processing environments. This paper introduces two hardware properties to describe the cost of data movement in each memory hierarchy. Based on the hardware properties, this paper proposes a set of evaluation metrics that are able to evaluate the number of memory accesses and the required memory capacity according to the specialized processing dataflow. Proposed metrics are able to analytically predict energy, throughput, and area of a hardware design without detailed implementation. Once a processing dataflow and constraints of hardware resources are determined, the proposed evaluation metrics quickly quantify the expected hardware benefits, thereby reducing design time.
Akira KITAYAMA Goichi ONO Tadashi KISHIMOTO Hiroaki ITO Naohiro KOHMU
Reducing power consumption is crucial for edge devices using convolutional neural network (CNN). The zero-skipping approach for CNNs is a processing technique widely known for its relatively low power consumption and high speed. This approach stops multiplication and accumulation (MAC) when the multiplication results of the input data and weight are zero. However, this technique requires large logic circuits with around 5% overhead, and the average rate of MAC stopping is approximately 30%. In this paper, we propose a precise zero-skipping method that uses input data and simple logic circuits to stop multipliers and accumulators precisely. We also propose an active data-skipping method to further reduce power consumption by slightly degrading recognition accuracy. In this method, each multiplier and accumulator are stopped by using small values (e.g., 1, 2) as input. We implemented single shot multi-box detector 500 (SSD500) network model on a Xilinx ZU9 and applied our proposed techniques. We verified that operations were stopped at a rate of 49.1%, recognition accuracy was degraded by 0.29%, power consumption was reduced from 9.2 to 4.4 W (-52.3%), and circuit overhead was reduced from 5.1 to 2.7% (-45.9%). The proposed techniques were determined to be effective for lowering the power consumption of CNN-based edge devices such as FPGA.
Zheng SUN Hanli LIU Dingxin XU Hongye HUANG Bangan LIU Zheng LI Jian PANG Teruki SOMEYA Atsushi SHIRANE Kenichi OKADA
This paper presents a high jitter performance injection-locked clock multiplier (ILCM) using an ultra-low power (ULP) voltage-controlled oscillator (VCO) for IoT application in 65-nm CMOS. The proposed transformer-based VCO achieves low flicker noise corner and sub-100µW power consumption. Double cross-coupled NMOS transistors sharing the same current provide high transconductance. The network using high-Q factor transformer (TF) provides a large tank impedance to minimize the current requirement. Thanks to the low current bias with a small conduction angle in the ULP VCO design, the proposed TF-based VCO's flicker noise can be suppressed, and a good PN can be achieved in flicker region (1/f3) with sub-100µW power consumption. Thus, a high figure-of-merit (FoM) can be obtained at both 100kHz and 1MHz without additional inductor. The proposed VCO achieves phase noise of -94.5/-115.3dBc/Hz at 100kHz/1MHz frequency offset with a 97µW power consumption, which corresponds to a -193/-194dBc/Hz VCO FoM at 2.62GHz oscillation frequency. The measurement results show that the 1/f3 corner is below 60kHz over the tuning range from 2.57GHz to 3.40GHz. Thanks to the proposed low power VCO, the total ILCM achieves 78 fs RMS jitter while using a high reference clock. A 960 fs RMS jitter can be achieved with a 40MHz common reference and 107µW corresponding power.
Yasuaki ISSHIKI Dai SUZUKI Ryo ISHIDA Kousuke MIYAJI
This paper proposes and demonstrates a 65nm CMOS process cascode single-inductor-dual-output (SIDO) boost converter whose outputs are Li-ion battery and 1V low voltage supply for RF wireless power transfer (WPT) receiver. The 1V power supply is used for internal control circuits to reduce power consumption. In order to withstand 4.2V Li-ion battery output, cascode 2.5V I/O PFETs are used at the power stage. On the other hand, to generate 1V while maintaining 4.2V tolerance at 1V output, cascode 2.5V I/O NFETs output stage is proposed. Measurement results show conversion efficiency of 87% at PIN=7mW, ILOAD=1.6mA and VBAT=4.0V, and 89% at PIN=7.9mW, ILOAD=2.1mA and VBAT=3.4V.
Xiao XU Tsuyoshi SUGIURA Toshihiko YOSHIMASU
This paper presents two ultra-low voltage and high performance VCO ICs with two novel transformer-based harmonic tuned tanks. The first proposed harmonic tuned tank effectively shapes the pseudo-square drain-node voltage waveform for close-in phase noise reduction. To compensate the voltage drop caused by the transformer, an improved second tank is proposed. It not only has tuned harmonic impedance but also provides a voltage gain to enlarge the output voltage swing over supply voltage limitation. The VCO with second tank exhibits over 3 dB better phase noise performance in 1/f2 region among all tuning range. The two VCO ICs are designed, fabricated and measured on wafer in 45-nm SOI CMOS technology. With only 0.3 V supply voltage, the proposed two VCO ICs exhibit best phase noise of -123.3 and -127.2 dBc/Hz at 10 MHz offset and related FoMs of -191.7 and -192.2 dBc/Hz, respectively. The frequency tuning ranges of them are from 14.05 to 15.14 GHz and from 14.23 to 15.68 GHz, respectively.
Makoto NISHIZAWA Kento HASEGAWA Nozomu TOGAWA
In IoT (Internet-of-Things) era, the number and variety of hardware devices becomes continuously increasing. Several IoT devices are utilized at infrastructure equipments. How to maintain such IoT devices is a serious concern. Capacitance measurement is one of the powerful ways to detect anomalous states in the structure of the hardware devices. Particularly, measuring capacitance while the hardware device is running is a major challenge but no such researches proposed so far. This paper proposes a capacitance measuring device which measures device capacitance in operation. We firstly combine the AC (alternating current) voltage signal with the DC (direct current) supply voltage signal and generates the fluctuating signal. We supply the fluctuating signal to the target device instead of supplying the DC supply voltage. By effectively filtering the observed current in the target device, the filtered current can be proportional to the capacitance value and thus we can measure the target device capacitance even when it is running. We have implemented the proposed capacitance measuring device on the printed wiring board with the size of 95mm × 70mm and evaluated power consumption and accuracy of the capacitance measurement. The experimental results demonstrate that power consumption of the proposed capacitance measuring device is reduced by 65% in low-power mode from measuring mode and proposed device successfully measured capacitance in 0.002μF resolution.
Guiping JIN Guangde ZENG Long LI Wei WANG Yuehui CUI
A triple-band CP rectenna for ambient RF energy harvesting is presented in this paper. A simple broadband CP slot antenna has been proposed with the bandwidth of 51.1% operating from 1.53 to 2.58GHz, which can cover GSM-1800, UMTS-2100 and 2.45GHz WLAN bands. Accordingly, a triple-band rectifying circuit is designed to convert RF energy in the above bands, with the maximum RF-DC conversion efficiency of 42.5% at a relatively low input power of -5dBm. Additionally, the rectenna achieves the maximum conversion efficiency of 12.7% in the laboratory measurements. The measured results show a good performance in the laboratory measurements.
Deng-Fong LU Chin HSIA Kun-Chu LEE
The paper presents a low power, wideband operational trans-conductance amplifier (OTA) for applications to drive large capacitive loads. In order to satisfy the low static power dissipation, high-speed, while reserving high current driving capability, the complementary slew-rate enhancer in conjunction with a dual class AB input stage to improve the slew-rate of a rail-to-rail two-stage OTA is proposed. The proposed architecture was implemented using 0.5µm CMOS process with a supply voltage of 5V. The slew-rate can achieve 68V/µsec at static power dissipation of 0.9mW, which can be used to efficiently drive larger than 6 nF capacitive load. The measured output has a total harmonic distortion of less than 5%.
Po-Yu KUO Chia-Hsin HSIEH Jin-Fa LIN Ming-Hwa SHEU Yi-Ting HUNG
A novel low power sense-amplifier based flip-flop (FF) is presented. By using a simplified SRAM based latch design and pass transistor logic (PTL) circuit scheme, the transistor-count of the FF design is greatly reduced as well as leakage power performance. The performance claims are verified through extensive post-layout simulations. Compared to the conventional sense-amplifier FF design, the proposed circuit achieves 19.6% leakage reduction. Moreover, the delay, and area are reduced by 21.8% and 31%, respectively. The performance edge becomes even better when the flip-flop is integrated in N-bit register file.
Mizuki MOTOYOSHI Suguru KAMEDA Noriharu SUEMATSU
In this paper, we proposed low power consumption ASK transmitter based on the direct modulated oscillator at 60GHz-band. To achieve the proposed transmitter, high power-efficient oscillator and loss less modulator are designed. Moreover combined on-chip resonator and antenna to remove the buffer amplifier of the transmitter to reduce the power consumption and size. The proposed transmitter has been fabricated in standard 65nm CMOS process. The core area is 1130µm×590µm with pads. The operation frequency is 60.4GHz. The BER of 10-6 is achieved under 50Mbps with power consumption of less than 260µW including the buffer amplifier. Using the proposed combined on-chip resonator and antenna, which need no buffer amplifier for transmitter and the power consumption is reduced to 180µW.
Xiao XU Tsuyoshi SUGIURA Toshihiko YOSHIMASU
This paper presents two ultra-low voltage and high performance VCO ICs with novel harmonic tuned LC tank which provides different harmonic impedance and shapes the pseudo-square drain voltage waveform of transistors. In the novel tank, two additional inductors are connected between the drains of the cross-coupled pMOSFETs and the conventional LC tank, and they effectively decrease second harmonic load impedance and increase third harmonic load impedance of the transistors. In this paper, the novel harmonic tuned LC tank is applied to two different structure VCOs. These two VCOs exhibit over 2 dB better phase noise performance than conventional LC tank VCOs among all tuning range. The conventional and proposed VCO ICs are designed, fabricated and measured on wafer in 45-nm SOI CMOS technology. With novel harmonic tuned LC tank, the novel two VCOs exhibit measured best phase-noise of -125.7 and -129.3 dBc/Hz at 10 MHz offset and related FoM of -190.2 and -190.5 dBc/Hz at a supply voltage of 0.3 V and 0.35 V, respectively. Frequency tuning range of the two VCOs are from 13.01 to 14.34 GHz and from 15.02 to 16.03GHz, respectively.
Kenya HAYASHI Shigeki ARATA Ge XU Shunya MURAKAMI Cong Dang BUI Atsuki KOBAYASHI Kiichi NIITSU
This work presents an FSK inductive-coupling transceiver using a load-modulated transmitter and LC-oscillator-based receiver for energy-budget-unbalanced applications. By introducing the time-domain load modulated transmitter for FSK instead of the conventional current-driven scheme, energy reduction of the transmitter side is possible. For verifying the proposed scheme, a test chip was fabricated in 65nm CMOS, and two chips were stacked for verifying the inter-chip communication. The measurement results show 0.64fJ/bit transmitter power consumption while its input voltage is 60mV, and the communication distance is 150μm. The footprint of the transmitter is 0.0016mm2.
Takuya KOYANAGI Jun SHIOMI Tohru ISHIHARA Hidetoshi ONODERA
Body bias generators are useful circuits that can reduce variability and power dissipation in LSI circuits. However, the amplifier implemented into the body bias generator is difficult to design because of its complexity. To overcome the difficulty, this paper proposes a clearer cell-based design method of the amplifier than the existing cell-based design methods. The proposed method is based on a simple analytical model, which enables to easily design the amplifiers under various operating conditions. First, we introduce a small signal equivalent circuit of two-stage amplifiers by which we approximate a three-stage amplifier, and introduce a method for determining its design parameters based on the analytical model. Second, we propose a method of tuning parameters such as cell-based phase compensation elements and drive-strength of the output stage. Finally, based on the test chip measurement, we show the advantage of the body bias generator we designed in a cell-based flow over existing designs.
Seong Jin CHOE Ju Sang LEE Sung Sik PARK Sang Dae YU
This paper presents an ultra-low-power class-AB bulk-driven operational transconductance amplifier operating in the subthreshold region. Employing the partial positive feedback in current mirrors, the effective transconductance and output voltage swing are enhanced considerably without additional power consumption and layout area. Both traditional and proposed OTAs are designed and simulated for a 180 nm CMOS process. They dissipate an ultra low power of 192 nW. The proposed OTA features not only a DC gain enhancement of 14 dB but also a slew rate improvement of 200%. In addition, the improved gain leads to a 5.3 times wider unity-gain bandwidth than that of the traditional OTA.
Tongxin YANG Tomoaki UKEZONO Toshinori SATO
Many applications, such as image signal processing, has an inherent tolerance for insignificant inaccuracies. Multiplication is a key arithmetic function for many applications. Approximate multipliers are considered an efficient technique to trade off energy relative to performance and accuracy for the error-tolerant applications. Here, we design and analyze four approximate multipliers that demonstrate lower power consumption and shorter critical path delay than the conventional multiplier. They employ an approximate tree compressor that halves the height of the partial product tree and generates a vector to compensate accuracy. Compared with the conventional Wallace tree multiplier, one of the evaluated 8-bit approximate multipliers reduces power consumption and critical path delay by 36.9% and 38.9%, respectively. With a 0.25% normalized mean error distance, the silicon area required to implement the multiplier is reduced by 50.3%. Our multipliers outperform the previously proposed approximate multipliers relative to power consumption, critical path delay, and design area. Results from two image processing applications also demonstrate that the qualities of the images processed by our multipliers are sufficiently accurate for such error-tolerant applications.
In this paper, we review a super-steep subthreshold slope (SS) (<1 mV/dec) body-tied (BT) silicon on insulator (SOI) metal oxide semiconductor field effect transistor (MOSFET) fabricated with 0.15 µm SOI technology and discuss the possibility of its use in ultralow voltage applications. The mechanism of the super-steep SS in the BT SOI MOSFET was investigated with technology computer-aided design simulation. The gate length/width and Si thickness optimizations promise further reductions in operation voltage, as well as improvement of the ION/IOFF ratio. In addition, we demonstrated control of the threshold voltage and hysteresis characteristics using the substrate and body bias in the BT SOI MOSFET.