Junnosuke HOSHIDO Tonan KAMATA Tsutomu ANSAI Ryuhei UEHARA
Shin-ichi NAKANO
Shang LU Kohei HATANO Shuji KIJIMA Eiji TAKIMOTO
Lin ZHOU Yanxiang CAO Qirui WANG Yunling CHENG Chenghao ZHUANG Yuxi DENG
Zhen WANG Longye WANG
Naohiro TODA Tetsuya NAKAGAMI
Haijun Wang Tao Hu Dongdong Chen Huiwei Yao Runze He Di Wu Zhifu Tian
Jianqiang NI Gaoli WANG Yingxin LI Siwei SUN
Rui CHENG Yun JIANG Qinglin ZHANG Qiaoqiao XIA
Ren TOGO Rintaro YANAGI Masato KAWAI Takahiro OGAWA Miki HASEYAMA
Naoki TATTA Yuki SAKATA Rie JINKI Yuukou HORITA
Kundan LAL DAS Munehisa SEKIKAWA Naohiko INABA
Menglong WU Tianao YAO Zhe XING Jianwen ZHANG Yumeng LIN
Jian ZHANG Zhao GUANG Wanjuan SONG Zhiyan XU
Shinya Matsumoto Daiki Ikemoto Takuya Abe Kan Okubo Kiyoshi Nishikawa
Kazuki HARADA Yuta MARUYAMA Tomonori TASHIRO Gosuke OHASHI
Zezhong WANG Masayuki SHIMODA Atsushi TAKAHASHI
Pierpaolo AGAMENNONE
Jianmao XIAO Jianyu ZOU Yuanlong CAO Yong ZHOU Ziwei YE Xun SHAO
Kazumasa ARIMURA Ryoichi MIYAUCHI Koichi TANNO
Shinichi NISHIZAWA Shinji KIMURA
Zhe LIU Wu GUAN Ziqin YAN Liping LIANG
Shuichi OHNO Shenjian WANG Kiyotsugu TAKABA
Yindong CHEN Wandong CHEN Dancheng HUANG
Xiaohe HE Zongwang LI Wei HUANG Junyan XIANG Chengxi ZHANG Zhuochen XIE Xuwen LIANG
Conggai LI Feng LIU Yingying LI Yanli XU
Siwei Yang Tingli Li Tao Hu Wenzhi Zhao
Takahiro FUJITA Kazuyuki WADA
Kazuma TAKA Tatsuya ISHIKAWA Kosei SAKAMOTO Takanori ISOBE
Quang-Thang DUONG Kohei MATSUKAWA Quoc-Trinh VO Minoru OKADA
Sihua LIU Xiaodong ZHU Kai KANG Li WAN Yong WANG
Kazuya YAMAMOTO Nobukazu TAKAI
Yasuhiro Sugimoto Nobukazu Takai
Ho-Lim CHOI
Weibang DAI Xiaogang CHEN Houpeng CHEN Sannian SONG Yichen SONG Shunfen LI Tao HONG Zhitang SONG
Duo Zhang Shishan Qi
Young Ghyu Sun Soo Hyun Kim Dong In Kim Jin Young Kim
Hongbin ZHANG Ao ZHAN Jing HAN Chengyu WU Zhengqiang WANG
Yuli YANG Jianxin SONG Dan YU Xiaoyan HAO Yongle CHEN
Kazuki IWAHANA Naoto YANAI Atsuo INOMATA Toru FUJIWARA
Rikuto KURAHARA Kosei SAKAMOTO Takanori ISOBE
Elham AMIRI Mojtaba JOODAKI
Qingqi ZHANG Xiaoan BAO Ren WU Mitsuru NAKATA Qi-Wei GE
Jiaqi Wang Aijun Liu Changjun Yu
Ruo-Fei Wang Jia Zhang Jun-Feng Liu Jing-Wei Tang
Yingnan QI Chuhong TANG Haiyang LIU Lianrong MA
Yi XIONG Senanayake THILAK Daisuke ARAI Jun IMAOKA Masayoshi YAMAMOTO
Zhenhai TAN Yun YANG Xiaoman WANG Fayez ALQAHTANI
Chenrui CHANG Tongwei LU Feng YAO
Takuma TSUCHIDA Rikuho MIYATA Hironori WASHIZAKI Kensuke SUMOTO Nobukazu YOSHIOKA Yoshiaki FUKAZAWA
Shoichi HIROSE Kazuhiko MINEMATSU
Toshimitsu USHIO
Yuta FUKUDA Kota YOSHIDA Takeshi FUJINO
Qingping YU Yuan SUN You ZHANG Longye WANG Xingwang LI
Qiuyu XU Kanghui ZHAO Tao LU Zhongyuan WANG Ruimin HU
Lei Zhang Xi-Lin Guo Guang Han Di-Hui Zeng
Meng HUANG Honglei WEI
Yang LIU Jialong WEI Shujian ZHAO Wenhua XIE Niankuan CHEN Jie LI Xin CHEN Kaixuan YANG Yongwei LI Zhen ZHAO
Ngoc-Son DUONG Lan-Nhi VU THI Sinh-Cong LAM Phuong-Dung CHU THI Thai-Mai DINH THI
Lan XIE Qiang WANG Yongqiang JI Yu GU Gaozheng XU Zheng ZHU Yuxing WANG Yuwei LI
Jihui LIU Hui ZHANG Wei SU Rong LUO
Shota NAKAYAMA Koichi KOBAYASHI Yuh YAMASHITA
Wataru NAKAMURA Kenta TAKAHASHI
Chunfeng FU Renjie JIN Longjiang QU Zijian ZHOU
Masaki KOBAYASHI
Shinichi NISHIZAWA Masahiro MATSUDA Shinji KIMURA
Keisuke FUKADA Tatsuhiko SHIRAI Nozomu TOGAWA
Yuta NAGAHAMA Tetsuya MANABE
Baoxian Wang Ze Gao Hongbin Xu Shoupeng Qin Zhao Tan Xuchao Shi
Maki TSUKAHARA Yusaku HARADA Haruka HIRATA Daiki MIYAHARA Yang LI Yuko HARA-AZUMI Kazuo SAKIYAMA
Guijie LIN Jianxiao XIE Zejun ZHANG
Hiroki FURUE Yasuhiko IKEMATSU
Longye WANG Lingguo KONG Xiaoli ZENG Qingping YU
Ayaka FUJITA Mashiho MUKAIDA Tadahiro AZETSU Noriaki SUETAKE
Xingan SHA Masao YANAGISAWA Youhua SHI
Jiqian XU Lijin FANG Qiankun ZHAO Yingcai WAN Yue GAO Huaizhen WANG
Sei TAKANO Mitsuji MUNEYASU Soh YOSHIDA Akira ASANO Nanae DEWAKE Nobuo YOSHINARI Keiichi UCHIDA
Kohei DOI Takeshi SUGAWARA
Yuta FUKUDA Kota YOSHIDA Takeshi FUJINO
Mingjie LIU Chunyang WANG Jian GONG Ming TAN Changlin ZHOU
Hironori UCHIKAWA Manabu HAGIWARA
Atsuko MIYAJI Tatsuhiro YAMATSUKI Tomoka TAKAHASHI Ping-Lun WANG Tomoaki MIMOTO
Kazuya TANIGUCHI Satoshi TAYU Atsushi TAKAHASHI Mathieu MOLONGO Makoto MINAMI Katsuya NISHIOKA
Masayuki SHIMODA Atsushi TAKAHASHI
Yuya Ichikawa Naoko Misawa Chihiro Matsui Ken Takeuchi
Katsutoshi OTSUKA Kazuhito ITO
Rei UEDA Tsunato NAKAI Kota YOSHIDA Takeshi FUJINO
Motonari OHTSUKA Takahiro ISHIMARU Yuta TSUKIE Shingo KUKITA Kohtaro WATANABE
Iori KODAMA Tetsuya KOJIMA
Yusuke MATSUOKA
Yosuke SUGIURA Ryota NOGUCHI Tetsuya SHIMAMURA
Tadashi WADAYAMA Ayano NAKAI-KASAI
Li Cheng Huaixing Wang
Beining ZHANG Xile ZHANG Qin WANG Guan GUI Lin SHAN
Soh YOSHIDA Nozomi YATOH Mitsuji MUNEYASU
Ryo YOSHIDA Soh YOSHIDA Mitsuji MUNEYASU
Nichika YUGE Hiroyuki ISHIHARA Morikazu NAKAMURA Takayuki NAKACHI
Ling ZHU Takayuki NAKACHI Bai ZHANG Yitu WANG
Toshiyuki MIYAMOTO Hiroki AKAMATSU
Yanchao LIU Xina CHENG Takeshi IKENAGA
Kengo HASHIMOTO Ken-ichi IWATA
Hiroshi FUJISAKI
Tota SUKO Manabu KOBAYASHI
Akira KAMATSUKA Koki KAZAMA Takahiro YOSHIDA
Manabu HAGIWARA
Hongwei ZHU Ilie I. LUICAN Florin BALASA
In real-time multimedia processing systems a very large part of the power consumption is due to the data storage and data transfer. Moreover, the area cost is often largely dominated by the memory modules. In deriving an optimized (for area and/or power) memory architecture, memory size computation is an important step in the exploration of the possible algorithmic specifications of multimedia applications. This paper presents a novel non-scalar approach for computing exactly the memory size in real-time multimedia algorithms. This methodology uses both algebraic techniques specific to the data-flow analysis used in modern compilers and, also, more recent advances in the theory of polyhedra. In contrast with all the previous works which are only estimation methods, this approach performs exact memory computations even for applications significantly large in terms of the code size, number of scalars, and number of array references.
Thanyapat SAKUNKONCHAK Satoshi KOMATSU Masahiro FUJITA
Concurrency is one of the most important issues in system-level design. Interleaving among parallel processes can cause an extremely large number of different behaviors, making design and verification difficult tasks. In this work, we propose a synchronization verification method for system-level designs described in the SpecC language. Instead of modeling the design with timed FSMs and using a model checker for timed automata (such as UPPAAL or KRONOS), we formulate the timing constraints with equalities/inequalities that can be solved by integer linear programming (ILP) tools. Verification is conducted in two steps. First, similar to other software model checkers, we compute the reachability of an error state in the absence of timing constraints. Then, if a path to an error state exists, its feasibility is checked by using the ILP solver to evaluate the timing constraints along the path. This approach can drastically increase the sizes of the designs that can be verified. Abstraction and abstraction refinement techniques based on the Counterexample-Guided Abstraction Refinement (CEGAR) paradigm are applied.
Yu LIU Satoshi KOMATSU Masahiro FUJITA
Recently, system level design languages (SLDLs), which can describe both hardware and software aspects of the design, are receiving attentions. Analog mixed-signal (AMS) extensions to SLDLs enable current discrete-oriented SLDLs to describe and simulate not only digital systems but also digital-analog mixed-signal systems. In this paper, we present our work on the AMS extension to one of the system level design language--SpecC. The extended language supports designer to describe all the analog, digital and software aspects in a universal language.
A unified representation for various kinds of speculations and global scheduling algorithms is presented. After introducing several types of local and global speculations, reviewing our conventional method called conditional vector-based list scheduling, and discussing some of its limitations, we introduce the unique notion of generalized condition vectors (GCVs), which can represent most varieties of speculations and multiple branches as a single vector. The unification of parallel branches and partially unresolved nested conditional branches is discussed. Then, a scheduling algorithm using GCVs is proposed. Experimental results show the effectiveness of the GCV-based scheduling method.
Mitsuru TOMONO Masaki NAKANISHI Shigeru YAMASHITA Kazuo NAKAJIMA Katsumasa WATANABE
In a partially reconfigurable FPGA of the future, arbitrary portions of its logic resources and interconnection networks will be reconfigured without affecting the other parts. Multiple tasks will be mapped and executed concurrently in such an FPGA. Efficient execution of the tasks using the limited resources of the FPGA will necessitate effective resource management. A number of online FPGA placement methods have recently been proposed for such an FPGA. However, they cannot handle I/O communications of the tasks. Taking such I/O communications into consideration, we introduce a new approach to online FPGA placement. We present an algorithm for placing each arriving task in an empty area so as to complete all the tasks efficiently. We develop two fitting strategies to effectively handle I/O communications of the tasks. Our experimental results show that properly weighted combinations of these and two other previously proposed strategies enable this algorithm to run very fast and make an effective placement of the tasks. In fact, we show that the overhead associated with the use of this algorithm is negligible as compared to the total execution time of the tasks.
Nobuhiro DOI Takashi HORIYAMA Masaki NAKANISHI Shinji KIMURA
High-level synthesis is a novel method to generate a RT-level hardware description automatically from a high-level language such as C, and is used at recent digital circuit design. Floating-point to fixed-point conversion with bit-length optimization is one of the key issues for the area and speed optimization in high-level synthesis. However, the conversion task is a rather tedious work for designers. This paper introduces automatic bit-length optimization method on floating-point to fixed-point conversion for high-level synthesis. The method estimates computational errors statistically, and formalizes an optimization problem as a non-linear problem. The application of NLP technique improves the balancing between computational accuracy and total hardware cost. Various constraints such as unit sharing, maximum bit-length of function units can be modeled easily, too. Experimental result shows that our method is fast compared with typical one, and reduces the hardware area.
Bakhtiar Affendi ROSDI Atsushi TAKAHASHI
A new algorithm is proposed to reduce the number of intermediate registers of a pipelined circuit using a combination of multi-clock cycle paths and clock scheduling. The algorithm analyzes the pipelined circuit and determines the intermediate registers that can be removed. An efficient subsidiary algorithm is presented that computes the minimum feasible clock period of a circuit containing multi-clock cycle paths. Experiments with a pipelined adder and multiplier verify that the proposed algorithm can reduce the number of intermediate registers without degrading performance, even when delay variations exist.
Debatosh DEBNATH Tsutomu SASAO
This paper presents an efficient technique for solving a Boolean matching problem in cell-library binding, where the number of cells in the library is large. As a basis of the Boolean matching, we use the notion NP-representative (NPR): two functions have the same NPR if one can be obtained from the other by a permutation and/or complementation(s) of the variables. By using a table look-up and a tree-based breadth-first search strategy, our method quickly computes the NPR for a given function. Boolean matching of the given function against the whole library is determined by checking the presence of its NPR in a hash table, which stores NPRs for all the library functions and their complements. The effectiveness of our method is demonstrated through experimental results, which show that it is more than two orders of magnitude faster than the Hinsberger-Kolla's algorithm.
Xingwen XU Shinji KIMURA Kazunari HORIKAWA Takehiko TSUCHIYA
Lack of complete formal specification is one of the major obstacles to the deployment of model checking. Coverage estimation addresses this issue by revealing the unverified part of the design according to the specified properties. In this paper we propose a new transition-based coverage metric to evaluate the completeness of properties for symbolic model checking. Our coverage metric pinpoints the transitions through which the values of signals are checked. An efficient symbolic algorithm is presented for computing the transition coverage for a subset of ACTL. Our coverage estimator has been applied to the model checking of a cache coherence protocol. We uncovered several coverage holes including one that eventually led to the discovery of a design bug.
Yuichi NAKAMURA Takeshi YOSHIMURA
This paper presents a novel power estimation method for large and complex LSIs. The proposed method is based on simulation and is used for analyzing the ways in chip-scale gate-level circuits including processors and memory are affected by gated-clock power reduction and the voltage drop due to electrical resistance. The chip-scale power estimation based on simulation patterns generally takes enormous time. In order to reduce the time to obtain accurate estimation results based on simulation patterns, we introduce three approaches: "partitioning of target LSIs and simulation pattern," "memory modeling," and "processor modeling." After placing and routing, the target LSIs are partitioned into hierarchical blocks, memory, and processors. The power consumption of each hierarchical block is calculated by using the partitioned patterns generated from chip-scale simulation patterns. The power consumption of the processor and memory blocks is estimated by a method considering the static power consumption and the rate of LSI activity ratio. Experimental results for a commercial 0.18 µm-technology media processing chip show that the proposed method is 23 times faster than the conventional method without partitioning and that both the results are almost the same.
Yuichi NAKAMURA Kouhei HOSOKAWA
This paper describes a new method for the simulation environment for a custom processor. It is generally very hard to develop an accurate simulator for a custom processor rapidly, even if simple instruction-set-level simulator (ISS). The proposed method uses a field-programmable-gate-array emulator with a PCI interface and debugging GUI software on a PC. Since the emulator implements the processor design at the register-transfer or net-list level, the emulation results are almost the same as the results obtained with the actual processor. To support rich debugging functions like those provided by the conventional software simulator, we use a debugging buffer and break-control circuits. Experimental results show that a simulator constructed by the proposed method can be constructed within several hours and that it can break the processor operation at any specified point and observe the internal signals when the emulated system is running at 1-30 MHz. The accuracy of the constructed simulator is the same as that of RTL simulation and much higher than that of software ISS simulation. We show that we can provide a fast, accurate, and useful simulator for any processor design specified at the register-transfer level.
Hiroki NAKAHARA Tsutomu SASAO Munehiro MATSUURA
This paper represents a cycle-based logic simulation method using an LUT cascade emulator, where an LUT cascade consists of multiple-output LUTs (cells) connected in series. The LUT cascade emulator is an architecture that emulates LUT cascades. It has a control part, a memory for logic, and registers. It connects the memory to registers through a programmable interconnection circuit, and evaluates the given circuit stored in the memory. The LUT cascade emulator runs on an ordinary PC. This paper also compares the method with a Levelized Compiled Code (LCC) simulator and a simulator using a Quasi-Reduced Multi-valued Decision Diagram (QRMDD). Our simulator is 3.5 to 10.6 times faster than the LCC, and 1.1 to 3.9 times faster than the one using a QRMDD. The simulation setup time is 2.0 to 9.8 times shorter than the LCC. The necessary amount of memory is 1/1.8 to 1/5.5 of the one using a QRMDD.
One of the critical issues in MTCMOS design is how to estimate a circuit delay quickly. In MTCMOS circuit, voltage on virtual ground fluctuates due to a discharge current of a logic cell. This event affects to the cell delay and makes static timing analysis (STA) difficult. In this paper, we propose a delay modeling and static STA methodology targeting at MTCMOS circuits. In the proposed method, we prepare a delay look-up table (LUT) consisting of the input slew, the output load capacitance, the virtual ground length, and a power-switch size. Using this LUT, we compute a circuit delay for each logic cell by applying the linear interpolation. This technique enables to calculate the cell delay considering the delay increase by the voltage fluctuation of virtual ground line. Experimental results show that the proposed methodology enables to estimate the cell delay and the critical path delay within 8% errors compared with SPICE simulation.
Takashi SATO Junji ICHIMIYA Nobuto ONO Masanori HASHIMOTO
In this paper, we propose a methodology for calculating on-chip temperature gradient and leakage power distributions. It considers the interdependence between leakage power and local temperature using a general circuit simulator as a differential equation solver. The proposed methodology can be utilized in the early stages of the design cycle as well as in the final verification phase. Simulation results proved that consideration of the temperature dependence of the leakage power is critically important for achieving reliable physical designs since the conventional temperature analysis that ignores the interdependence underestimates leakage power considerably and may overlook potential thermal runaway.
Naofumi HOMMA Yuki WATANABE Takafumi AOKI Tatsuo HIGUCHI
This paper presents a formal design of arithmetic circuits using an arithmetic description language called ARITH. The key idea in ARITH is to describe arithmetic algorithms directly with high-level mathematical objects (i.e., number representation systems and arithmetic operations/formulae). Using ARITH, we can provide formal description of arithmetic algorithms including those using unconventional number systems. In addition, the described arithmetic algorithms can be formally verified by equivalence checking with formula manipulations. The verified ARITH descriptions are easily translated into the equivalent HDL descriptions. In this paper, we also present an application of ARITH to an arithmetic module generator, which supports a variety of hardware algorithms for 2-operand adders, multi-operand adders, multipliers, constant-coefficient multipliers and multiply accumulators. The language processing system of ARITH incorporated in the generator verifies the correctness of ARITH descriptions in a formal method. As a result, we can obtain highly-reliable arithmetic modules whose functions are completely verified at the algorithm level.
Shinobu NAGAYAMA Tsutomu SASAO Jon T. BUTLER
This paper presents an architecture and a synthesis method for compact numerical function generators (NFGs) for trigonometric, logarithmic, square root, reciprocal, and combinations of these functions. Our NFG partitions a given domain of the function into non-uniform segments using an LUT cascade, and approximates the given function by a quadratic polynomial for each segment. Thus, we can implement fast and compact NFGs for a wide range of functions. Experimental results show that: 1) our NFGs require, on average, only 4% of the memory needed by NFGs based on the linear approximation with non-uniform segmentation; 2) our NFG for 2x-1 requires only 22% of the memory needed by the NFG based on a 5th-order approximation with uniform segmentation; and 3) our NFGs achieve about 70% of the throughput of the existing table-based NFGs using only a few percent of the memory. Thus, our NFGs can be implemented with more compact FPGAs than needed for the existing NFGs. Our automatic synthesis system generates such compact NFGs quickly.
Kouichi WATANABE Masashi IMAI Masaaki KONDO Hiroshi NAKAMURA Takashi NANYA
As VLSI technology advances, delay variations will become more serious. Delay-insensitive asynchronous dual-rail circuits tolerate any delay variation, but their energy consumption is more than double that of the single-rail circuits because signal transitions occur every cycle in all bits regardless of the input bit pattern. However, in functional units, a significant number of input bits may not change from the previous input in many cases. In such a situation, calculation of these bits is not required. Thus, we propose a method, called unflip-bits control, makes use of the above situation, to reduce energy consumption. We evaluate the energy consumption and performance penalty for the method using HSPICE and the verilog-XL simulator, and compare the method with the conventional dual-rail circuit and a synchronous circuit. Our evaluation results reveal that the proposed asynchronous dual-rail circuit has a 12-60% lower energy consumption compared with a conventional asynchronous dual-rail circuit.
Hiroaki YOSHIDA Makoto IKEDA Kunihiro ASADA
This paper presents a structural approach for synthesizing arbitrary multi-output multi-stage static CMOS circuits at the transistor level, targeting the reduction of transistor counts. To make the problem tractable, the solution space is restricted to the circuit structures which can be obtained by performing algebraic transformations on an arbitrary prime-and-irredundant two-level circuit. The proposed algorithm is guaranteed to find the optimal solution within the solution space. The circuit structures are implicitly enumerated via structural transformations on a single graph structure, then a dynamic-programming based algorithm efficiently finds the minimum solution among them. Experimental results on a benchmark suite targeting standard cell implementations demonstrate the feasibility and effectiveness of the proposed approach. We also demonstrated the efficiency of the proposed algorithm by a numerical analysis on randomly-generated problems.
Shingo TAKAHASHI Shuji TSUKIYAMA Masanori HASHIMOTO Isao SHIRAKAWA
In the design of an active matrix LCD (Liquid Crystal Display), the ratio of the pixel voltage to the video voltage (RPV) of a pixel is an important factor of the performance of the LCD, since the pixel voltage of each pixel determines its transmitted luminance. Thus, of practical importance is the issue of how to maintain the admissible allowance of RPV of each pixel within a prescribed narrow range. This constraint on RPV is analyzed in terms of circuit parameters associated with the sampling switch and sampling pulse of a column driver in the LCD. With the use of a minimal set of such circuit parameters, a design procedure is described dedicatedly for the sampling switch, which intends to seek an optimal sampling switch as well as an optimal sampling pulse waveform. A number of experimental results show that an optimal sampling switch attained by the proposed procedure yields a source driver with almost 18% less power consumption than the one by manual design. Moreover, the percentage of the RPVs within 100
Taisuke KAZAMA Makoto IKEDA Kunihiro ASADA
We propose a shot reduction technique of character projection (CP) Electron Beam Direct Writing (EBDW) using combined cell stencil (CCS) or the advanced process technology. CP EBDW is expected both to reduce mask costs and to realize quick turn around time. One of major issue of the conventional CP EBDW, however, is a throughput of lithography. The throughput is determined by numbers of shots, which are proportional to numbers of cell instances in LSIs. The conventional shot reduction techniques focus on optimization of cell stencil extraction, without any modifications on designed LSI mask patterns. The proposed technique employs the proposed combined cell stencil, with proposed modified design flow, for further shot reduction. We demonstrate 22.4% shot reduction within 4.3% area increase for a microprocessor and 28.6% shot reduction for IWLS benchmarks compared with the conventional technique.
Yoichi TOMIOKA Atsushi TAKAHASHI
Ball Grid Array packages in which I/O pins are arranged in a grid array pattern realize a number of connections between chips and PCB, but it takes much time in manual routing. So the demand for automation of package routing is increasing. In this paper, we give the necessary and sufficient condition that all nets can be connected by monotonic routes when a net consists of a finger and a ball and fingers are on the two parallel boundaries of the Ball Grid Array package, and propose a monotonic routing method based on this condition. Moreover, we give a necessary condition and a sufficient condition when fingers are on the two orthogonal boundaries, and propose a monotonic routing method based on the necessary condition.
Toshiki KANAMOTO Tatsuhiko IKEDA Akira TSUCHIYA Hidetoshi ONODERA Masanori HASHIMOTO
This paper proposes a simple yet sufficient Si-substrate modeling for interconnect resistance and inductance extraction. The proposed modeling expresses Si-substrate as four filaments in a filament-based extractor. Although the number of filaments is small, extracted loop inductances and resistances show accurate frequency dependence resulting from the proximity effect. We experimentally prove the accuracy using FEM (Finite Element Method) based simulations of electromagnetic fields. We also show a method to determine optimal size of the four filaments. The proposed model realizes substrate-aware extraction in SoC design flow.
Danardono Dwi ANTONO Kenichi INAGAKI Hiroshi KAWAGUCHI Takayasu SAKURAI
A simple analytical model based on Delayed Quadratic (DQ) Transfer Function approximation is proposed for estimating waveforms of inductive single-line interconnects in VLSI's. An expression for overshoot voltage is derived by the model within 17% error for the line width less than 10 times the minimum line width and typical input signal. A delay expression is also proposed within 15% for the same condition. The strength of the inductive effect is shown to be expressed by a closed-form expression, A=2(L(CT+0.5C))1/2/(RT(CT+CJ)+RTC+RCT+0.4RC). By using the criteria, a scaling trend of inductive effects in VLSI's is discussed. It is shown that the inductive effect of single-line, minimum-width VLSI interconnect peaks off at 90 nm based on the ITRS predicted parameters.
Takumi UEZONO Kenichi OKADA Kazuya MASU
In this paper, we propose a via distribution model for yield estimation. This model expresses a relationship between the number of vias and wire length. We also provide an estimate for the total number of vias in a circuit, derived from the via distribution and the wire-length distribution. The via distribution is modeled as a function of track utilization, and the wire-length distribution can be derived from the gate-level netlist and the layout area. We extract model parameters from the commercial chips designed for 0.18-µm and 0.13-µm CMOS processes, and demonstrate the yield degradation caused by vias.
Akira TSUCHIYA Masanori HASHIMOTO Hidetoshi ONODERA
This paper proposes a method to determine a single frequency for interconnect RL extraction. Resistance and inductance of interconnects depend on frequency, and hence the extraction frequency strongly affects the modeling accuracy of interconnects. The proposed method determines an extraction frequency based on the transfer characteristic of interconnects. By choosing the frequency where the transfer characteristic becomes maximum, the extracted RL values achieve the accurate modeling of the waveform. Experimental results show that the proposed method provides accurate transition waveforms over various interconnect topologies.
Yang SONG Zhenyu LIU Takeshi IKENAGA Satoshi GOTO
A one-dimensional (1-D) full search variable block size motion estimation (VBSME) architecture is presented in this paper. By properly choosing the partial sum of absolute differences (SAD) registers and scheduling the addition operations, the architecture can be implemented with simple control logic and regular workflow. Moreover, only one single-port SRAM is used to store the search area data. The design is realized in TSMC 0.18 µm 1P6M technology with a hardware cost of 67.6K gates. In typical working conditions (1.8 V, 25
Kazunori SHIMIZU Tatsuyuki ISHIKAWA Nozomu TOGAWA Takeshi IKENAGA Satoshi GOTO
In this paper, we propose a power-efficient LDPC decoder architecture based on an accelerated message-passing schedule. The proposed decoder architecture is characterized as follows: (i) Partitioning a pipelined operation not to read and write intermediate messages simultaneously enables the accelerated message-passing schedule to be implemented with single-port SRAMs. (ii) FIFO-based buffering reduces the number of SRAM banks and words of the LDPC decoder based on the accelerated message-passing schedule. The proposed LDPC decoder keeps a single message for each non-zero bit in a parity check matrix as well as a classical schedule while achieving the accelerated message-passing schedule. Implementation results in 0.18 [µm] CMOS technology show that the proposed decoder architecture reduces an area of the LDPC decoder by 43% and a power dissipation by 29% compared to the conventional architecture based on the accelerated message-passing schedule.
Win-Bin HUANG Alvin W. Y. SU Yau-Hwang KUO
Set Partitioning in Hierarchical Trees (SPIHT) is a highly efficient technique for compressing Discrete Wavelet Transform (DWT) decomposed images. Though its compression efficiency is a little less famous than Embedded Block Coding with Optimized Truncation (EBCOT) adopted by JPEG2000, SPIHT has a straight forward coding procedure and requires no tables. These make SPIHT a more appropriate algorithm for lower cost hardware implementation. In this paper, a modified SPIHT algorithm is presented. The modifications include a simplification of coefficient scanning process, a 1-D addressing method instead of the original 2-D arrangement of wavelet coefficients, and a fixed memory allocation for the data lists instead of a dynamic allocation approach required in the original SPIHT. Although the distortion is slightly increased, it facilitates an extremely fast throughput and easier hardware implementation. The VLSI implementation demonstrates that the proposed design can encode a CIF (352
Junichi MIYAKOSHI Yuichiro MURACHI Tetsuro MATSUNO Masaki HAMAMOTO Takahiro IINUMA Tomokazu ISHIHARA Hiroshi KAWAGUCHI Masayuki MIYAMA Masahiko YOSHIMOTO
We propose a sub-mW H.264 baseline-profile motion estimation processor for portable video applications. It features a VLSI-oriented block partitioning strategy and low-power SIMD/systolic-array datapath architecture, where the datapath can be switched between an SIMD and systolic array depending on processing flow. The processor supports all the seven kinds of block modes, and can handle three reference frames for a CIF (352
Yasuhiro MORITA Hidehiro FUJIWARA Hiroki NOGUCHI Kentaro KAWAKAMI Junichi MIYAKOSHI Shinji MIKAMI Koji NII Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
We propose a voltage control scheme for 6T SRAM cells that makes a minimum operation voltage down to 0.3 V under DVS environment. A supply voltage to the memory cells and wordline drivers, bitline voltage, and body bias voltage of load pMOSFETs are controlled according to read and write operations, which secures operation margins even at a low operation voltage. A self-aligned timing control with a dummy wordline and its feedback is also introduced to guarantee stable operation in a wide range of the supply voltage. A measurement result of a 64-kb SRAM in a 90-nm process technology shows that a power reduction of 30% can be achieved at 100 MHz. In a 65-nm 64-Mb SRAM, a 74% power saving is expected at 1/6 of the maximum operating frequency. The performance penalty by the proposed scheme is less than 1%, and area overhead is 5.6%.
Kentaro KAWAKAMI Jun TAKEMURA Mitsuhiko KURODA Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
We propose an elastic pipeline that can apply dynamic voltage scaling (DVS) to hardwired logic circuits. In order to demonstrate its feasibility, a hardwired H.264/AVC HDTV decoder is designed as a real-time application. An entropy decoding process is divided into context-based adaptive binary arithmetic coding (CABAC) and syntax element decoding (SED), which has advantages of smoothing workload for CABAC and keeping efficiency of the elastic pipeline. An operating frequency and supply voltage are dynamically modulated every slot depending on workload of H.264 decoding to minimize power. We optimize the number of slots per frame to enhance power reduction. The proposed decoder achieves a power reduction of 50% in a 90-nm process technology, compared to the conventional clock-gating scheme.
Kentaro NAKAHARA Shin'ichi KOUYAMA Tomonori IZUMI Hiroyuki OCHI Yukihiro NAKAMURA
Reconfigurable devices are expected to be utilized in such mission-critical fields as space development and undersea cables, because system updates and pseudo-repair can be achieved remotely by reconfiguring. However, conventional reconfigurable devices suffer from memory-bit upset caused by charged particles in space which results in fatal system problems. In this paper, we propose an architecture of a fault-tolerant reconfigurable device. The proposed device is divided into "autonomous-repair cells" with embedded control circuits. The autonomous-repair cell proposed in this paper is based on error detection and correction (EDAC) and uses hardware and time redundancy. From evaluation, it is shown that the proposed architecture achieves sufficient reliability against configuration memory upset. Trade-offs between performance and cost are also analyzed.
This paper presents a hardware-efficient folding technique for high-order FIR filtering while considering the tradeoff between the number of processing elements and throughput rate. Given the throughput rate, one can always employ the minimum number of processing elements for saving the implementation cost and figure out a folded architecture. However, applying inefficient folding techniques may result in costly switches and registers. Therefore, our work intends to evaluate the efficiency for folding techniques in terms of the number of registers, and the power dissipation of registers. As shown in the estimation results, while comparing with the published folded architectures under the same throughput rate, the proposed folding technique can turn out less power dissipation and low hardware complexity than the others. The proposed design has been implemented using TSMC 0.18 µm 1P6M technology. As seen in the post-layout simulation, our design can meet the requirement of IS-95 WCDMA pulse shaping FIR filter while the power consumption can be as low as 16.66 mW.
Toshiki KANAMOTO Shigekiyo AKUTSU Tamiyo NAKABAYASHI Takahiro ICHINOMIYA Koutaro HACHIYA Atsushi KUROKAWA Hiroshi ISHIKAWA Sakae MUROMOTO Hiroyuki KOBAYASHI Masanori HASHIMOTO
In this letter, we discuss the impact of intrinsic error in parasitic capacitance extraction programs which are commonly used in today's SoC design flows. Most of the extraction programs use pattern-matching methods which introduces an improvable error factor due to the pattern interpolation, and an intrinsically inescapable error factor from the difference of boundary conditions in the electro-magnetic field solver. Here, we study impact of the intrinsic error on timing and crosstalk noise estimation. We experimentally show that the resulting delay and noise estimation errors show a scatter which is normally distributed. Values of the standard deviations will help designers consider the intrinsic error compared with other variation factors.
Yuan WEN Jun YANG Woon-Seng GAN
A multiple-source system for rendering the sound pressure distribution in a target region can be modeled as a multi-input-multi-output (MIMO) system with the inputs being the source strengths and the outputs being the pressures on multiple measuring points/sensors. In this paper, we propose a target-oriented acoustic radiation generation technique (TARGET) for sound field control. For the MIMO system of a given geometry, a series of basic radiation modes, namely, target-oriented radiation modes (TORMs) can be derived using eigenvector analysis. Different TORMs have different contributions to the system control gain, which is defined as the ratio of the acoustic energy generated in the target zone to the transmitter output power. The TARGET can be effectively applied to the sound reproduction and suppression, which correspond the generations of bright and dark zone respectively. In acoustically bright zone generation and sound beamforming, the highest-gain TORM can be employed to determine the optimal source strengths. In active noise control, the strengths of the secondary sources can be derived using low-gain TORMs. Simulation results show that the proposed method has better or comparable performance than the traditional techniques.
Tomoko YOKOKAWA Masaru KAMADA Yasuhiro OHTAKI Tatsuhiro YONEKURA
An experimental test has been done on the suitability of box splines for the estimation of color images from their observation through the honeycomb color filter. The estimation is made by following the framework of consistent sampling by Unser and Aldroubi. Numerical evaluation of the estimation errors has shown that the best estimation may be made by choosing the box splines which are twice locally averaged along each of the three axes consisting of the unilateral triangular mesh. A close look at estimated images has indicated that those box splines are almost free from directional jaggy noises that the traditional B-splines suffer from.
This paper proposes a Miller capacitor which has a wide input signal range. By discharging the charge of the capacitor connected between the input and output terminals of an amplifier before the output voltage of the amplifier exceeds its maximum range, the amplifier always operates in the active region and the Miller operation can be guaranteed. Thus a large value capacitor with a wide dynamic operation range can be realized using a small value capacitor. The Miller capacitor proposed in this paper is applied to a loop filter of phase locked loop (PLL) circuit that requires a large value capacitor to realize a low cutoff frequency. SPICE simulation of the PLL circuit using the Miller capacitor confirms the operation of the Miller capacitor and shows good performances that are similar to those obtained using a passive capacitor of a large value.
Tetsuo NISHI Norikazu TAKAHASHI Hajime HARA
We give the necessary and sufficient conditions for a one-dimensional discrete-time autonomous binary cellular neural networks to be stable in the case of fixed boundary. The results are complete generalization of our previous one [16] in which the symmetrical connections were assumed. The conditions are compared with some stability conditions so far known.
Katsuyuki HAGIWARA Hiroshi ISHITANI
In this article, we considered the asymptotic expectations of the prediction error and the fitting error of a regression model, in which the component functions are chosen from a finite set of orthogonal functions. Under the least squares estimation, we showed that the asymptotic bias in estimating the prediction error based on the fitting error includes the true number of components, which is essentially unknown in practical applications. On the other hand, under a suitable shrinkage method, we showed that an asymptotically unbiased estimate of the prediction error is given by the fitting error plus a known term except the noise variance.
Takashi ISHIDA Masayuki GOTO Toshiyasu MATSUSHIMA Shigeichi HIRASAWA
Recently, a word-valued source has been proposed as a new class of information source models. A word-valued source is regarded as a source with a probability distribution over a word set. Although a word-valued source is a nonstationary source in general, it has been proved that an entropy rate of the source exists and the Asymptotic Equipartition Property (AEP) holds when the word set of the source is prefix-free. However, when the word set is not prefix-free (non-prefix-free), only an upper bound on the entropy density rate for an i.i.d. word-valued source has been derived so far. In this paper, we newly derive a lower bound on the entropy density rate for an i.i.d. word-valued source with a finite non-prefix-free word set. Then some numerical examples are given in order to investigate the behavior of the bounds.
Shin'ichi SHIRAISHI Miki HASEYAMA Hideo KITAJIMA
This paper analyzes the steady-state properties of a CORDIC-based adaptive ARMA lattice filter. In our previous study, the convergence properties of the filter in the non-steady state were clarified; however, its behavior in the steady state was not discussed. Therefore, we develop a distinct analysis technique based on a Markov chain in order to investigate the steady-state properties of the filter. By using the proposed technique, the relationship between step size and coefficient estimation error is revealed.
Despite the extensive literature on current conveyor-based voltage-mode first-order all-pass filters, no filter circuits have been reported to date that simultaneously achieve all of the advantageous features: (i) the employment of only one current conveyor, (ii) the employment of only one grounded capacitor, (iii) the employment of only one resistor, (iv) no need to impose component choice conditions, and (v) low active and passive sensitivities. In this letter, we describe such a filter structure with all of the above features simultaneously present, without trade-offs. H-Spice simulation results using the TSMC025 process and
InKoo KANG Kishore SINHA Heung-Kyu LEE
Combinatorial designs have been used to construct digital fingerprint codes. Here, a new constructive algorithm for an anticollusion fingerprint code based on group-divisible designs is presented. These codes are easy to construct and available for a large number of individuals, which is important from a business point of view. Group-divisible designs have not been used previously as a tool for fingerprint code construction.
Jung-Wook PARK Byoung-Kon CHOI Kyung-Bin SONG
This letter describes the first derivatives estimation of nonlinear parameters through an embedded identifier in the hybrid system by using a feed-forward neural network (FFNN). The hybrid systems are modelled by the differential-algebraic-impulsive-switched (DAIS) structure. The FFNN is used to identify the full dynamics of the hybrid system. Moreover, the partial derivatives of an objective function J with respect to the parameters are estimated by the proposed identifier. Then, it is applied for the identification and estimation of the non-smooth nonlinear dynamic behaviors due to a saturation limiter in a practical engineering system.