1-17hit |
Komang OKA SAPUTRA Wei-Chung TENG Takaaki NARA
A network-based remote host clock skew measurement involves collecting the offsets, the differences between sending and receiving times, of packets from the host within a period of time. Although the variant and immeasurable delay in each packet prevents the measurer from getting the real clock offset, the local minimum delays and the majority of delays delineate the clock offset shifts, and are used by existing approaches to estimate the skew. However, events during skew measurement like time synchronization and rerouting caused by switching network interface or base transceiver station may break the trend into multi-segment patterns. Although the skew in each segment is theoretically of the same value, the skew derived from the whole offset-set usually differs with an error of unpredictable scale. In this work, a method called dynamic region of offset majority locating (DROML) is developed to detect multi-segment cases, and to precisely estimate the skew. DROML is designed to work in real-time, and it uses a modified version of the HT-based method [8] both to measure the skew of one segment and to detect the break between adjacent segments. In the evaluation section, the modified HT-based method is compared with the original method and with a linear programming algorithm (LPA) on accumulated-time and short-term measurement. The fluctuation of the modified method in the short-term experiment is 0.6 ppm (parts per million), which is obviously less than the 1.23 ppm and 1.45 ppm from the other two methods. DROML, when estimating a four-segment case, is able to output a skew of only 0.22 ppm error, compared with the result of the normal case.
Koichi FUJIWARA Kazushi KAWAMURA Masao YANAGISAWA Nozomu TOGAWA
Recently, high-level synthesis techniques for FPGA designs (FPGA-HLS techniques) are strongly required in various applications. Both interconnection delays and clock skews have a large impact on circuit performance implemented onto FPGA, which indicates the need for floorplan-driven FPGA-HLS algorithms considering them. To appropriately estimate interconnection delays and clock skews at HLS phase, a reasonable model to estimate them becomes essential. In this paper, we demonstrate several experiments to characterize interconnection delays and clock skews in FPGA and propose novel estimate models called “IDEF” and “CSEF”. In order to evaluate our models, we integrate them into a conventional floorplan-driven FPGA-HLS algorithm. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the latency by up to 22% compared with conventional approaches.
Yukihide KOHIRA Atsushi TAKAHASHI
Multi-domain clock skew scheduling in general-synchronous framework is an effective technique to improve the performance of sequential circuits by using practical clock distribution network. Although the upper bound of performance of a circuit increases as the number of clock domains increases in multi-domain clock skew scheduling, the improvement of the performance becomes smaller while the cost of clock distribution network increases much. In this paper, a linear time algorithm that finds an optimum two-domain clock skew schedule in general-synchronous framework is proposed. Experimental results on ISCAS89 benchmark circuits and artificial data show that optimum circuits are efficiently obtained by our method in short time.
Susumu KOBAYASHI Fumihiro MINAMI
As the LSI process technology advances and the gate size becomes smaller, the signal delay on interconnect becomes a significant factor in the signal path delay. Also, as the size of interconnect structure becomes smaller, the interconnect process variations have become one of the dominant factors which influence the signal delay and thus clock skew. Therefore, controlling the influence of interconnect process variations on clock skew is a crucial issue in the advanced process technologies. In this paper, we propose a method for minimizing clock skew fluctuations caused by interconnect process variations. The proposed method identifies the suitable balance of clock buffer size and wire length in order to minimize the clock skew fluctuations caused by the interconnect process variations. Experimental results on test circuits of 28nm process technology show that the proposed method reduces the clock skew fluctuations by 30-92% compared to the conventional method.
Yanling ZHI Wai-Shing LUK Yi WANG Changhao YAN Xuan ZENG
Yield-driven clock skew scheduling was previously formulated as a minimum cost-to-time ratio cycle problem, by assuming that variational path delays are in Gaussian distributions. However in today's nanometer technology, process variations show growing impacts on this assumption, as variational delays with non-Gaussian distributions have been observed on these paths. In this paper, we propose a novel yield-driven clock skew scheduling method for arbitrary distributions of critical path delays. Firstly, a general problem formulation is proposed. By integrating the cumulative distribution function (CDF) of critical path delays, the formulation is able to handle path delays with any distributions. It also generalizes the previous formulations on yield-driven clock skew scheduling and indicates their statistical interpretations. Generalized Howard algorithm is derived for finding the critical cycles of the underlying timing constraint graphs. Moreover, an effective algorithm based on minimum balancing is proposed for the overall yield improvement. Experimental results on ISCAS89 benchmarks show that, compared with two representative existing methods, our method remarkably improves the yield by 10.25% on average (up to 14.66%).
Kazuyoshi TAKAGI Yuki ITO Shota TAKESHIMA Masamitsu TANAKA Naofumi TAKAGI
In this paper, we propose a method for layout-driven skewed clock tree synthesis for SFQ logic circuits. For a given logic circuit without a clock tree, our algorithm outputs a circuit with a synthesized clock tree and timing adjustments achieving the given clock period and a rough placement of the clocked gates. In the proposed algorithm, clocked gates are grouped into levels and the clock tree is synthesized for each level. For each level, we estimate the clock timing for all possible placements of each gate, and then we search a placement of all gates that minimizes the total number of delay elements for timing adjustment. Once the placement is obtained, we synthesize a clock tree without wire intersections. We applied the proposed method to a moderate size circuit and confirmed that clock trees satisfying given timing requirements can be synthesized automatically.
Koji OBATA Kazuyoshi TAKAGI Naofumi TAKAGI
An algorithm for clock scheduling of concurrent-flow clocking rapid single-flux-quantum (RSFQ) digital circuits is proposed. RSFQ circuit technology is an emerging technology of digital circuits. In concurrent-flow clocking RSFQ digital circuits, all logic gates are driven by clock pulses. Appropriate clock scheduling makes clock frequency of the circuits higher. Given a clock period, the proposed algorithm determines the arrival time of clock pulses and the delay that should be inserted. Experimental results show that inserted delay elements by the proposed algorithm are 59.0% fewer and the height of clock trees are 40.4% shorter on average than those by a straightforward algorithm. The proposed algorithm can also be used to minimize the clock period, thus obtaining 19.0% shorter clock periods on average.
Shinya ABE Masanori HASHIMOTO Takao ONOYE
Influence of manufacturing variability on circuit performance has been increasing because of finer manufacturing process and lowered supply voltage. In this paper, we focus on mesh-style clock distribution which is believed to be effective for reducing clock skew, and we evaluate clock skew considering manufacturing and design variabilities. Considering MOS transistor variation -- random and spatially-correlated variation -- and non-uniform flip-flop (FF) placement, we demonstrate that spatially-correlated variation and severe non-uniform FF distribution can be major sources of clock skew. We also examine the dependency of clock skew on design parameters, and reveal that finer clock mesh does not necessarily reduce clock skew.
Akira UTAGAWA Tetsuya ASAI Tetsuya HIROSE Yoshihito AMEMIYA
We present on-chip oscillator arrays synchronized by random noises, aiming at skew-free clock distribution on synchronous digital systems. Nakao et al. recently reported that independent neural oscillators can be synchronized by applying temporal random impulses to the oscillators [1],[2]. We regard neural oscillators as independent clock sources on LSIs; i.e., clock sources are distributed on LSIs, and they are forced to synchronize through the use of random noises. We designed neuron-based clock generators operating at sub-RF region (< 1 GHz) by modifying the original neuron model to a new model that is suitable for CMOS implementation with 0.25-µm CMOS parameters. Through circuit simulations, we demonstrate that i) the clock generators are certainly synchronized by pseudo-random noises and ii) clock generators exhibited phase-locked oscillations even if they had small device mismatches.
Xu ZHANG Xiaohong JIANG Susumu HORIGUCHI
The evolution of VLSI chips towards larger die size, smaller feature size and faster clock speed makes the clock distribution an increasingly important issue. In this paper, we propose a new clock distribution network (CDN), namely Variant X-Tree, based on the idea of X-Architecture proposed recently for efficient wiring within VLSI chips. The Variant X-Tree CDN keeps the nice properties of equal-clock-path and symmetric structure of the typical H-Tree CDN, but results in both a lower maximal clock delay and a lower clock skew than its H-Tree counterpart, as verified by an extensive simulation study that incorporates simultaneously the effects of process variations and on-chip inductance. We also propose a closed-form statistical models for evaluating the skew and delay of the Variant X-Tree CDN. The comparison between the theoretical results and the simulation results indicates that the proposed statistical models can be used to efficiently and rapidly evaluate the performance of the variant X-Tree CDNs.
Zheng LIU Masanori FURUTA Shoji KAWAHITO
The RC mismatch among S/H stages for time-interleaved ADCs causes a phase error and a gain error and the phase error is dominant. The paper points out that clock skew and the phase error caused by the RC mismatch have similar effects on the sampling error and then can be compensated with the clock skew compensation. Simulation results agree well with the theoretical analysis. With the phase error compensation of RC mismatch, the SNDR in 14b ADC can be improved by more than 15 dB in the case that the bandwidth of S/H circuits is 3 times the sampling frequency. This paper also proposes a method of clock skew and RC mismatch compensation in time-interleaved sample-and-hold (S/H) circuits by sampling clock phase adjusting.
Takashi SATO Junji ICHIMIYA Nobuto ONO Koutaro HACHIYA Masanori HASHIMOTO
This paper quantitatively analyzes thermal gradient of SoC and proposes a thermal flattening procedure. First, the impact of dominant parameters, such as area occupancy of memory/logic block, power density, and floorplan on thermal gradient are studied quantitatively. Temperature difference is also evaluated from timing and reliability standpoints. Important results obtained here are 1) the maximum temperature difference increases with higher memory area occupancy and 2) the difference is very floorplan sensitive. Then, we propose a procedure to amend thermal gradient. A slight floorplan modification using the proposed procedure improves on-chip thermal gradient significantly.
Masanori HASHIMOTO Tomonori YAMAMOTO Hidetoshi ONODERA
This paper discusses clock skew due to manufacturing variability and environmental change. In clock tree design, transition time constraint is an important design parameter that controls clock skew and power dissipation. In this paper, we evaluate clock skew under several variability models, and demonstrate relationship among clock skew, transition time constraint and power dissipation. Experimental results show that constraint of small transition time reduces clock skew under manufacturing and supply voltage variabilities, whereas there is an optimum constraint value for temperature gradient. Our experiments in a 0.18 µm technology indicate that clock skew is minimized when clock buffer is sized such that the ratio of output and input capacitance is four.
Hidehiro TAKATA Rei AKIYAMA Tadao YAMANAKA Haruyuki OHKUMA Yasue SUETSUGU Toshihiro KANAOKA Satoshi KUMAKI Kazuya ISHIHARA Atsuo HANAMI Tetsuya MATSUMURA Tetsuya WATANABE Yoshihide AJIOKA Yoshio MATSUDA Syuhei IWADE
An on-chip, 64-Mb, embedded, DRAM MPEG-2 encoder LSI with a multimedia processor has been developed. To implement this large-scale and high-speed LSI, we have developed the hierarchical skew control of multi-clocks, with timing verification, in which cross-talk noise is considered, and simple measures taken against the IR drop in the power lines through decoupling capacitors. As a result, the target performance of 263 MHz at 1.5 V has been successfully attained and verified, the cross-talk noise has been considered, and, in addition, it has become possible to restrain the IR drop to 166 mV in the 162 MHz operation block.
Xiaohong JIANG Susumu HORIGUCHI
Available statistical skew models are too conservative in estimating the expected clock skew of a well-balanced H-tree. New closed form expressions are presented for accurately estimating the expected values and the variances of both the clock skew and the largest clock delay of a well-balanced H-tree. Based on the new model, clock period optimizations of wafer scale H-tree clock network are investigated under both conventional clocking mode and pipelined clocking mode. It is found that when the conventional clocking mode is used, clock period optimization of wafer scale H-tree is reduced to the minimization of expected largest clock delay under both area restriction and power restriction. On the other hand, when the pipelined clocking mode is considered, the optimization is reduced to the minimization of expected clock skew under power restriction. The results obtained in this paper are very useful in the optimization design of wafer scale H-tree clock distribution networks.
Hiroki SUTOH Kimihiro YAMAKOSHI
This paper describes a low-skew clock distribution technique for multiple targets. An automatic skew compensation circuit, that detects the round-trip delay through a pair of matched interconnection lines and corrects the delay of the variable delay lines, maintains clock skew and delay from among multiple targets below the resolution time of the variable delay lines without any manual adjustment. Measured results show that the initial clock skew of 900 ps is automatically reduced to 30 ps at a clock frequency of up to 250 MHz with 60 ps of clock jitter. Moreover, they show that the initial clock delay of 1500 ps is cancelled and 60 ps of clock delay can be achieved. The power dissipation is 100 mW at 250 MHz.
Hidenori SATO Hiroaki MATSUDA Akira ONOZAWA
This paper presents a clock routing technique called Balanced-Mesh Method (BMM) which incorporates the advantages of two famous conventional-clock-routing techniques. One is the balanced-tree method (BTM) where the clock net is routed as a tree so that the delay times of clock signal are balanced, and the other is the fixed-mesh method (FMM) where the clock net is routed as a fixed mesh driven by a large buffer. In BMM, the clock net is routed as a set of relatively small meshes of interconnects driven by relatively small buffers. Each mesh covers an area called a Mesh-Routing Region (MR) in which its delay and skew can be suppressed within a certain range. These small meshes are connected by a balanced tree with the chip clock source as its root. To implement BMM, we developed an MR-partitioning program that partitions the circuit into MR's according to a set of pre-determined constraints on the number of flip-flops and the area in each MR, and a clock-global-routing program that provides each mesh routing and the tree routing connecting meshes. We applied BMM to the design of an MPEG2-encoder LSI and achieved a skew of 210ps. In addition, the experimental results show BMM yields the lowest power dissipation compared to conventional methods.