Junnosuke HOSHIDO Tonan KAMATA Tsutomu ANSAI Ryuhei UEHARA
Shin-ichi NAKANO
Shang LU Kohei HATANO Shuji KIJIMA Eiji TAKIMOTO
Lin ZHOU Yanxiang CAO Qirui WANG Yunling CHENG Chenghao ZHUANG Yuxi DENG
Zhen WANG Longye WANG
Naohiro TODA Tetsuya NAKAGAMI
Haijun Wang Tao Hu Dongdong Chen Huiwei Yao Runze He Di Wu Zhifu Tian
Jianqiang NI Gaoli WANG Yingxin LI Siwei SUN
Rui CHENG Yun JIANG Qinglin ZHANG Qiaoqiao XIA
Ren TOGO Rintaro YANAGI Masato KAWAI Takahiro OGAWA Miki HASEYAMA
Naoki TATTA Yuki SAKATA Rie JINKI Yuukou HORITA
Kundan LAL DAS Munehisa SEKIKAWA Naohiko INABA
Menglong WU Tianao YAO Zhe XING Jianwen ZHANG Yumeng LIN
Jian ZHANG Zhao GUANG Wanjuan SONG Zhiyan XU
Shinya Matsumoto Daiki Ikemoto Takuya Abe Kan Okubo Kiyoshi Nishikawa
Kazuki HARADA Yuta MARUYAMA Tomonori TASHIRO Gosuke OHASHI
Zezhong WANG Masayuki SHIMODA Atsushi TAKAHASHI
Pierpaolo AGAMENNONE
Jianmao XIAO Jianyu ZOU Yuanlong CAO Yong ZHOU Ziwei YE Xun SHAO
Kazumasa ARIMURA Ryoichi MIYAUCHI Koichi TANNO
Shinichi NISHIZAWA Shinji KIMURA
Zhe LIU Wu GUAN Ziqin YAN Liping LIANG
Shuichi OHNO Shenjian WANG Kiyotsugu TAKABA
Yindong CHEN Wandong CHEN Dancheng HUANG
Xiaohe HE Zongwang LI Wei HUANG Junyan XIANG Chengxi ZHANG Zhuochen XIE Xuwen LIANG
Conggai LI Feng LIU Yingying LI Yanli XU
Siwei Yang Tingli Li Tao Hu Wenzhi Zhao
Takahiro FUJITA Kazuyuki WADA
Kazuma TAKA Tatsuya ISHIKAWA Kosei SAKAMOTO Takanori ISOBE
Quang-Thang DUONG Kohei MATSUKAWA Quoc-Trinh VO Minoru OKADA
Sihua LIU Xiaodong ZHU Kai KANG Li WAN Yong WANG
Kazuya YAMAMOTO Nobukazu TAKAI
Yasuhiro Sugimoto Nobukazu Takai
Ho-Lim CHOI
Weibang DAI Xiaogang CHEN Houpeng CHEN Sannian SONG Yichen SONG Shunfen LI Tao HONG Zhitang SONG
Duo Zhang Shishan Qi
Young Ghyu Sun Soo Hyun Kim Dong In Kim Jin Young Kim
Hongbin ZHANG Ao ZHAN Jing HAN Chengyu WU Zhengqiang WANG
Yuli YANG Jianxin SONG Dan YU Xiaoyan HAO Yongle CHEN
Kazuki IWAHANA Naoto YANAI Atsuo INOMATA Toru FUJIWARA
Rikuto KURAHARA Kosei SAKAMOTO Takanori ISOBE
Elham AMIRI Mojtaba JOODAKI
Qingqi ZHANG Xiaoan BAO Ren WU Mitsuru NAKATA Qi-Wei GE
Jiaqi Wang Aijun Liu Changjun Yu
Ruo-Fei Wang Jia Zhang Jun-Feng Liu Jing-Wei Tang
Yingnan QI Chuhong TANG Haiyang LIU Lianrong MA
Yi XIONG Senanayake THILAK Daisuke ARAI Jun IMAOKA Masayoshi YAMAMOTO
Zhenhai TAN Yun YANG Xiaoman WANG Fayez ALQAHTANI
Chenrui CHANG Tongwei LU Feng YAO
Takuma TSUCHIDA Rikuho MIYATA Hironori WASHIZAKI Kensuke SUMOTO Nobukazu YOSHIOKA Yoshiaki FUKAZAWA
Shoichi HIROSE Kazuhiko MINEMATSU
Toshimitsu USHIO
Yuta FUKUDA Kota YOSHIDA Takeshi FUJINO
Qingping YU Yuan SUN You ZHANG Longye WANG Xingwang LI
Qiuyu XU Kanghui ZHAO Tao LU Zhongyuan WANG Ruimin HU
Lei Zhang Xi-Lin Guo Guang Han Di-Hui Zeng
Meng HUANG Honglei WEI
Yang LIU Jialong WEI Shujian ZHAO Wenhua XIE Niankuan CHEN Jie LI Xin CHEN Kaixuan YANG Yongwei LI Zhen ZHAO
Ngoc-Son DUONG Lan-Nhi VU THI Sinh-Cong LAM Phuong-Dung CHU THI Thai-Mai DINH THI
Lan XIE Qiang WANG Yongqiang JI Yu GU Gaozheng XU Zheng ZHU Yuxing WANG Yuwei LI
Jihui LIU Hui ZHANG Wei SU Rong LUO
Shota NAKAYAMA Koichi KOBAYASHI Yuh YAMASHITA
Wataru NAKAMURA Kenta TAKAHASHI
Chunfeng FU Renjie JIN Longjiang QU Zijian ZHOU
Masaki KOBAYASHI
Shinichi NISHIZAWA Masahiro MATSUDA Shinji KIMURA
Keisuke FUKADA Tatsuhiko SHIRAI Nozomu TOGAWA
Yuta NAGAHAMA Tetsuya MANABE
Baoxian Wang Ze Gao Hongbin Xu Shoupeng Qin Zhao Tan Xuchao Shi
Maki TSUKAHARA Yusaku HARADA Haruka HIRATA Daiki MIYAHARA Yang LI Yuko HARA-AZUMI Kazuo SAKIYAMA
Guijie LIN Jianxiao XIE Zejun ZHANG
Hiroki FURUE Yasuhiko IKEMATSU
Longye WANG Lingguo KONG Xiaoli ZENG Qingping YU
Ayaka FUJITA Mashiho MUKAIDA Tadahiro AZETSU Noriaki SUETAKE
Xingan SHA Masao YANAGISAWA Youhua SHI
Jiqian XU Lijin FANG Qiankun ZHAO Yingcai WAN Yue GAO Huaizhen WANG
Sei TAKANO Mitsuji MUNEYASU Soh YOSHIDA Akira ASANO Nanae DEWAKE Nobuo YOSHINARI Keiichi UCHIDA
Kohei DOI Takeshi SUGAWARA
Yuta FUKUDA Kota YOSHIDA Takeshi FUJINO
Mingjie LIU Chunyang WANG Jian GONG Ming TAN Changlin ZHOU
Hironori UCHIKAWA Manabu HAGIWARA
Atsuko MIYAJI Tatsuhiro YAMATSUKI Tomoka TAKAHASHI Ping-Lun WANG Tomoaki MIMOTO
Kazuya TANIGUCHI Satoshi TAYU Atsushi TAKAHASHI Mathieu MOLONGO Makoto MINAMI Katsuya NISHIOKA
Masayuki SHIMODA Atsushi TAKAHASHI
Yuya Ichikawa Naoko Misawa Chihiro Matsui Ken Takeuchi
Katsutoshi OTSUKA Kazuhito ITO
Rei UEDA Tsunato NAKAI Kota YOSHIDA Takeshi FUJINO
Motonari OHTSUKA Takahiro ISHIMARU Yuta TSUKIE Shingo KUKITA Kohtaro WATANABE
Iori KODAMA Tetsuya KOJIMA
Yusuke MATSUOKA
Yosuke SUGIURA Ryota NOGUCHI Tetsuya SHIMAMURA
Tadashi WADAYAMA Ayano NAKAI-KASAI
Li Cheng Huaixing Wang
Beining ZHANG Xile ZHANG Qin WANG Guan GUI Lin SHAN
Soh YOSHIDA Nozomi YATOH Mitsuji MUNEYASU
Ryo YOSHIDA Soh YOSHIDA Mitsuji MUNEYASU
Nichika YUGE Hiroyuki ISHIHARA Morikazu NAKAMURA Takayuki NAKACHI
Ling ZHU Takayuki NAKACHI Bai ZHANG Yitu WANG
Toshiyuki MIYAMOTO Hiroki AKAMATSU
Yanchao LIU Xina CHENG Takeshi IKENAGA
Kengo HASHIMOTO Ken-ichi IWATA
Hiroshi FUJISAKI
Tota SUKO Manabu KOBAYASHI
Akira KAMATSUKA Koki KAZAMA Takahiro YOSHIDA
Manabu HAGIWARA
Tetsunao MATSUTA Tomohiko UYEMATSU
Weissman introduced a coding problem for channels with action-dependent states. In this coding problem, there are two encoders and a decoder. An encoder outputs an action that affects the state of the channel. Then, the other encoder outputs a codeword of the message into the channel by using the channel state. The decoder receives a noisy observation of the codeword, and reconstructs the message. In this paper, we show an exponential error bound for channels with action-dependent states based on the random coding argument.
Mitsuharu ARIMURA Hiroki KOGA Ken-ichi IWATA
In this paper we consider fixed-to-fixed length (FF) coding of a general source X with vanishing error probability and define two kinds of optimalities with respect to the coding rate and the redundancy, where the redundancy is defined as the difference between the coding rate and the symbolwise ideal codeword length. We first show that the infimum achievable redundancy coincides with the asymptotic width W(X) of the entropy spectrum. Next, we consider the two sets $mCH(X)$ and $mCW(X)$ and investigate relationships between them, where $mCH(X)$ and $mCW(X)$ denote the sets of all the optimal FF codes with respect to the coding rate and the redundancy, respectively. We give two necessary and sufficient conditions corresponding to $mCH(X) subseteq mCW(X)$ and $mCW(X) subseteq mCH(X)$, respectively. We can also show the existence of an FF code that is optimal with respect to both the redundancy and the coding rate.
Takahiro OTA Hiroyoshi MORITA Adriaan J. de Lind van WIJNGAARDEN
This paper presents a real-time and memory-efficient arrhythmia detection system with binary classification that uses antidictionary coding for the analysis and classification of electrocardiograms (ECGs). The measured ECG signals are encoded using a lossless antidictionary encoder, and the system subsequently uses the compression rate to distinguish between normal beats and arrhythmia. An automated training data procedure is used to construct the automatons, which are probabilistic models used to compress the ECG signals, and to determine the threshold value for detecting the arrhythmia. Real-time computer simulations with samples from the MIT-BIH arrhythmia database show that the averages of sensitivity and specificity of the proposed system are 97.8% and 96.4% for premature ventricular contraction detection, respectively. The automatons are constructed using training data and comprise only 11 kilobytes on average. The low complexity and low memory requirements make the system particularly suitable for implementation in portable ECG monitors.
In spite of continuous improvement of computational power of multi/many-core processors, the memory access performance of the processors has not been improved sufficiently, and thus the overall performance of recent processors is often restricted by the delay of off-chip memory accesses. Low-delay data compression for last level cache (LLC) would be effective to improve the processor performance because the compression increases the effective size of LLC, and thus reduces the number of off-chip memory accesses. This paper proposes a novel data compression method suitable for high-speed parallel decoding in the LLC. Since cache line data often have periodicity of certain lengths, such as 32- or 64-bit instructions, 32-bit integers, and 64-bit floating point numbers, an information word is encoded as a base pattern and a differential pattern between the original word and the base pattern. Evaluation using a GPU simulator shows that the compression ratio of the proposed coding is comparable to LZSS coding and X-Match Pro and superior to other conventional compression algorithms for cache memories. Also this paper presents an experimental decoder designed for ASIC, and the synthesized result shows that the decoder can decompress cache line data of length 32bytes in four clock cycles. Evaluation of the IPC on the GPU simulator shows that, for several benchmark programs, the IPC achieved by the proposed coding is higher than that by the conventional BΔI coding, where the maximum improvement of the IPC is 20%.
Michael Joseph TAN Yuichi KAJI
Novel flash codes with small average write deficiency are proposed. A flash code is a coding scheme for avoiding the wearing of cells in flash memory. One approach to develop flash codes with large parameters is to make use of slices which are small groups of cells. Preliminary study shows that using small slices brings several favorable characteristics, but naive use of small slices induces a certain overhead. In this study, a new structure which is called a cluster is devised to develop a good slice-based flash code. Two different slice encoding schemes are used in a cluster, which decreases the overhead of using small slices while retaining its advantage. The proposed flash codes show much smaller write deficiency compared to another slice-based flash code.
Shan LU Jun CHENG Yoichiro WATANABE
A recursive construction of (k+1)-ary error-correcting signature code is proposed to identify users for MAAC, even in the presence of channel noise. The recursion is originally from a trivial signature code. In the (j-1)-th recursion, from a signature code with minimum distance of 2j-2, a longer and larger signature code with minimum distance of 2j-1 is obtained. The decoding procedure of signature code is given, which consists of error correction and user identification.
We consider two methods for constructing high rate punctured convolutional codes. First, we present the best high rate R=(n-1)/n punctured convolutional codes, for n=5,6,…,16, which are obtained by exhaustive searches. To obtain the best code, we use a regular convolutional code whose weight spectrum is equivalent to that of each punctured convolutional code. We search these equivalent codes for the best one. Next, we present a method that searches for good punctured convolutional codes by partial searches. This method searches the codes that are derived from rate 1/2 original codes obtained in the first method. By this method, we obtain some good punctured convolutional codes relatively faster than the case in which we search for the best codes.
Takayuki NOZAKI Masaki MAEHARA Kenta KASAI Kohichi SAKANIWA
This paper derives the average symbol and bit weight distributions for the irregular non-binary cluster low-density parity-check (LDPC) code ensembles. Moreover, we give the exponential growth rates of the average weight distributions in the limit of large code length. We show the condition that the typical minimum distances linearly grow with the code length.
Akira HIRABAYASHI Jumpei SUGIMOTO Kazushi MIMURA
The main target of compressed sensing is recovery of one-dimensional signals, because signals more than two-dimension can also be treated as one-dimensional ones by raster scan, which makes the sensing matrix huge. This is unavoidable for general sensing processes. In separable cases like discrete Fourier transform (DFT) or standard wavelet transforms, however, the corresponding sensing process can be formulated using two matrices which are multiplied from both sides of the target two-dimensional signals. We propose an approximate message passing (AMP) algorithm for the separable sensing process. Typically, we suppose DFT for the sensing process, in which the measurements are complex numbers. Therefore, the formulation includes cases in which both target signal and measurements are complex. We show the effectiveness of the proposed algorithm by computer simulations.
Taisaku ISHIWATA Yoshinao SHIRAKI
In this paper, we propose a rectangular weighting function that can be used in the method of iteratively reweighted least squares (IRWLS) for designing equiripple all-pass IIR filters. The purpose of introducing this weighting function is to improve the convergence performance in the solution of the IRWLS. The height of each rectangle is designed to be equal to the local maximum of each ripple, and the width of each rectangle is designed so that the area of each rectangle becomes equal to the area of each ripple. Here, the ripple is the absolute value of the phase error. We show experimentally that the convergence performance in the solution of the IRWLS can be improved by using the proposed weighting function.
Wei HOU Tadashi FUJINO Toshiharu KOJIMA
Lattice-reduction (LR) technique has been adopted to improve the performance and reduce the complexity in MIMO data detection. This paper presents an improved quantization scheme for LR aided MIMO detection based on Gram-Schmidt orthogonalization. For the LR aided detection, the quantization step applies the simple rounding operation, which often leads to the quantization errors. Meanwhile, these errors may result in the detection errors. Hence the purpose of the proposed detection is to further solve the problem of degrading the performance due to the quantization errors in the signal estimation. In this paper, the proposed quantization scheme decreases the quantization errors using a simple tree search with a threshold function. Through the analysis and the simulation results, we observe that the proposed detection can achieve the nearly optimal performance with very low complexity, and require a little additional complexity compared to the conventional LR-MMSE detection in the high Eb/N0 region. Furthermore, this quantization error reduction scheme is also efficient even for the high modulation order.
Tetsuya KOBAYASHI Akiko MANADA Takahiro OTA Hiroyoshi MORITA
A shift of finite type (SFT) is a set of all bi-infinite sequences over some alphabet which is characterized by a finite set of forbidden words. It is a typical example of sofic shifts and has been used in media storage area, such as CD's or DVD's. The study of sofic shifts is based on graph theory, and the irreducibility of shifts is an important property to be considered for the study. In this paper, we will provide some sufficient conditions for an SFT to be irreducible from the perspective of the antidictionary of a word and the number of forbidden words. We also present a necessary and sufficient condition for an SFT to be irreducible when the number of forbidden words is one less than the alphabet size.
Nasima BEGUM Toru NAKANISHI Nobuo FUNABIKI
To enhance user privacy, anonymous credential systems allow the user to convince a verifier of the possession of a certificate issued by the issuing authority anonymously. In the systems, the user can prove relations on his/her attributes embedded into the certificate. Previously, a pairing-based anonymous credential system with constant-size proofs in the number of attributes of the user was proposed. This system supports the proofs of the inner product relations on attributes, and thus can handle the complex logical relations on attributes as the CNF and DNF formulas. However this system suffers from the computational cost: The proof generation needs exponentiations depending on the number of the literals in OR relations. In this paper, we propose a pairing-based anonymous credential system with the constant-size proofs for CNF formulas and the more efficient proof generation. In the proposed system, the proof generation needs only multiplications depending on the number of literals, and thus it is more efficient than the previously proposed system. The key of our construction is to use an extended accumulator, by which we can verify that multiple attributes are included in multiple sets, all at once. This leads to the verification of CNF formulas on attributes. Since the accumulator is mainly calculated by multiplications, we achieve the better computational costs.
Hidenori KUWAKADO Shoichi HIROSE
A hash function is an important primitive for cryptographic protocols. Since algorithms of well-known hash functions are almost serial, it seems difficult to take full advantage of recent multi-core processors. This paper proposes a multilane hashing (MLH) mode that achieves both of high parallelism and high security. The MLH mode is designed in such a way that the processing speed is almost linear in the number of processors. Since the MLH mode exploits an existing hash function as a black box, it is applicable to any hash function. The bound on the indifferentiability of the MLH mode from a random oracle is beyond the birthday bound on the output length of an underlying primitive.
Mitsuharu ARIMURA Hiroki KOGA Ken-ichi IWATA
In this letter, we first introduce a stronger notion of the optimistic achievable coding rate and discuss a coding theorem. Next, we give a necessary and sufficient condition under which the coding rates of all the optimal FF codes asymptotically converge to a constant.
Hiroyuki IHARA Tomoharu SHIBUYA
Spatially coupled (SC) low-density parity-check (LDPC) codes are defined by bipartite graphs that are obtained by assembling prototype graphs. The combination and connection of prototype graphs are designated by specifying some parameters, and Kudekar et al. showed that BP threshold of the ensemble of SC LDPC codes agrees with MAP threshold of the ensemble of regular LDPC codes when those parameters are grown up so that the code length tends to infinity. When we design SC LDPC codes with practical code length, however, it is not clear how to set those parameters to enhance the performance of SC LDPC codes. In this paper, we provide the result of numerical experiments that suggest the dependence of error performance of SC LDPC codes over BEC on their design parameters.
To analyze the structure of a set of perfect sequences over a composition algebra of the real number field, transforms of a set of sequences similar to the discrete Fourier transform (DFT) are introduced. The discrete cosine transform, discrete sine transform, and generalized discrete Fourier transform (GDFT) of the sequences are defined and the fundamental properties of these transforms are proved. We show that GDFT is bijective and that there exists a relationship between these transforms and a convolution of sequences. Applying these properties to the set of perfect sequences, a parameterization theorem of such sequences is obtained.
Rimon IKENO Takashi MARUYAMA Satoshi KOMATSU Tetsuya IIZUKA Makoto IKEDA Kunihiro ASADA
Character projection (CP) is a high-speed mask-less exposure technique for electron-beam direct writing (EBDW). In CP exposure of VIA layers, higher throughput is realized if more VIAs are exposed in each EB shot, but it will result in huge number of VIA characters to cover arbitrary VIA arrangements. We adopt one-dimensional VIA arrays as the basic CP character architecture to increase VIA numbers in an EB shot while saving the stencil area by superposed character arrangement. In addition, CP throughput is further improved by layout constraints on the VIA placement in the detail routing phase. Our experimental results proved the feasibility of our exposure strategy in the practical CP use in 14nm lithography.
Katherine Shu-Min LI Yingchieh HO Liang-Bi CHEN
Crosstalk-induced noise has become a key problem in interconnect optimization when technology improves, spacing diminishes, and coupling capacitance/inductance increases. Buffer insertion/sizing is one of the most effective and popular techniques to reduce interconnect delay and decouple coupling effects. It is traditionally applied to post-layout optimization. However, it is obviously infeasible to insert/size hundreds of thousands buffers during the post-layout stage when most routing regions are occupied. Therefore, it is desirable to incorporate buffer planning into floorplanning to ensure timing closure and design convergence. In this paper, we first derive formulae of buffer insertion for timing and noise optimization, and then apply the formulae to compute the feasible regions for inserting buffers to meet both timing and noise constraints. Experimental results show that our approach achieves an average success rate of 80.9% (78.2%) of nets meeting timing constraints alone (both timing and noise constraints) and consumes an average extra area of only 0.49% (0.66%) over the given floorplan, compared with the average success rate of 75.6% of nets meeting timing constraints alone and an extra area of 1.33% by the BBP method proposed previously.
Bo YANG Qing DONG Jing LI Shigetoshi NAKATAKE
This paper proposes a novel design method involving the stages from analog circuit design to layout synthesis in hope of suppressing the process-induced variations with a design style called transistor array. We manage to decompose the transistors into unified sub-transistors, and arrange the sub-transistors on a uniform placement grid so that a better post-CMP profile is expected to be achieved, and that the STI-stress is evened up to alleviate the process variations. However, since lack of direct theoretical support to the transistor decomposition, we analyze and evaluate the errors arising from the decomposition in both large and small signal analysis. A test chip with decomposed transistors on it confirmed our analysis and suggested that the errors are negligibly small and the design with transistor array is applicable. Based on this conclusion, a design flow with transistor array covering from circuit design to layout synthesis is proposed, and several design cases, including three common-source amplifiers, three two-stage OPAMPS and a nano-watt current reference, are implemented on a test chip with the proposed method, to demonstrate the feasibility of our idea. The measurement results from the chip confirmed that the designs with transistor array are successful, and the proposed method is applicable.
Yu ZHANG Gong CHEN Bo YANG Jing LI Qing DONG Ming-Yu LI Shigetoshi NAKATAKE
As CMOS devices scaling down in nowadays integrated circuits, the impact of layout-dependent effects (LDEs) to circuit performances becomes to be significant. This paper mainly focuses on LDE-aware analog circuit synthesis. Our circuit synthesis follows an optimization framework of transistor sizing based on geometric programming (GP) in which analog circuit performances are formulated in terms of monomials and posynomials. Providing GP models for the LDEs such as the shallow trench isolation (STI) stress and the well proximity effect (WPE), we can generate layout constraints related to LDEs during the circuit synthesis. Applying our circuit synthesis to a typical two-stage op-amp, we showed that the resultant circuit, which generated by GP with circuit performance and layout constraints, satisfied all the specifications with the verification of HSPICE simulation based on the BSIM model with LDE options.
Shinichi NISHIZAWA Tohru ISHIHARA Hidetoshi ONODERA
This paper propose a structure of standard cells where the P/N boundary ratio of each cell can be independently customized for near-threshold operation. Lowering the supply voltage is one of the most promising approaches for reducing the power consumption of VLSI circuit, however, this causes an increase of imbalance between rise and fall delays for cells having transistor stacks. Conventional cell library with fixed P/N boundary is not efficient to compensate this delay imbalance. Proposed structure achieves individual P/N boundary ratio optimization for each standard cell, therefore it cancels the imbalance between rise and fall delays at the expense of cell area. Proposed structure is verified using measured result of Ring Oscillator circuits and simulation result of benchmark circuits in 65nm CMOS. The experiments with ISCAS'85 benchmark circuits demonstrate that the standard cell library consisting of the proposed cells reduces the power consumption of the benchmark circuits by 16% on average without increasing the circuit area, compared to that of the same circuit synthesized with a library which is not optimized for the near-threshold operation.
Hyunui LEE Masaya MIYAHARA Akira MATSUZAWA
This paper presents a 12-bit interpolated pipeline analog to digital converter (ADC) using body voltage controlled amplifier for current biasing and common mode feedback (CMFB). The proposed body voltage control method allows the amplifier to achieve small power consumption and large output swing. The proposed amplifier has a power consumption lower than 15.6mW, almost half of the folded cascode amplifier satisfying 12-bit, 400MS/s ADC operation. Moreover, the proposed amplifier secures 600mV output swing, which is one drain source voltage (VDS) wider compared with the telescopic amplifier. The 12-bit interpolated pipeline ADC using the proposed amplifier is fabricated in a 1P9M 90nm CMOS technology with a 1.2V supply voltage. The ADC achieves an effective number of bit (ENOB) of about 10-bit at 300MS/s and an figure of merit (FoM) of 0.2pJ/conv. when the frequency of the input signal is sufficiently low.
Yuuki ARAGA Nao UEDA Yasumasa TAKAGI Makoto NAGATA
A probing front end circuit (PFE) senses and digitizes voltage noises at in-circuit locations on such as power supply wiring and substrate taps in a chip, with the simplest circuit construction only with a source follower or a unity gain buffer, followed by a latch comparator. The PFE with 2.5V I/O transistors in a 65nm CMOS technology node demonstrates 9.0 ENOB and 60.7dB SFDR in equivalent sampling at 1.0Gs/s, for a sinusoidal waveform of 10MHz with 200mV peak-to-peak amplitude. Behavioral modeling of an entire waveform acquisition system using PFEs includes the statistical variations of reference voltage and sampling timing. The simulation quantitatively explains the measured dynamic properties of on-chip noise monitoring, such as the AC response in SNDR and digitizing throughputs, with the clear dependency on the frequency and amplitude of input waveforms.
Zhou JIN Xiao WU Dan NIU Yasuaki INOUE
Recently, the compound element pseudo transient analysis, CEPTA, method is regarded as an efficient practical method to find DC operating points of nonlinear circuits when the Newton-Raphson method fails. In the previous CEPTA method, an effective SPICE3 implementation algorithm was proposed without expanding the Jacobian matrix. However the limitation of step size was not well considered. Thus, the non-convergence problem occurs and the simulation efficiency is still a big challenge for current LSI nonlinear cicuits, especially for some practical large-scale circuits. Therefore, in this paper, we propose a new SPICE3 implementation algorithm and an embedding algorithm, which is where to insert the pseudo capacitors, for the CEPTA method. The proposed implementation algorithm has no limitation for step size and can significantly improve simulation efficiency. Considering the existence of various types of circuits, we extend some possible embedding positions. Numerical examples demonstrate the improvement of simulation efficiency and convergence performance.
Daisuke FUJIMOTO Toshihiro KATASHITA Akihiko SASAKI Yohei HORI Akashi SATOH Makoto NAGATA
Capacitor charging modeling accelerates the time-domain simulation of power current of cryptographic VLSI circuits in a CMOS technology. The model finely represents the amount of charges consumed during the operation of Advanced Encryption Standard (AES) cores in a variety of logical implementations, reflecting their internal logical activities. This approach significantly reduces the complexity of power current simulation, and accomplishes acceleration by a factor of more than 200 over the traditional transistor-level circuit simulation. The correlated power analysis (CPA) attack against AES cores is successfully simulated with a conventional circuit simulator, with the models individually derived for 10,000 different cipher texts. The CPA is also experimentally performed against AES cores fabricated in a 65nm as well as 130nm CMOS technologies, using SASEBO measurement standards. The fast power current simulation is demonstrated to be accurate enough to evaluate the vulnerability of AES cores in various logical implementations as well as in different technologies, and exhibits general agreements with the silicon measurements.
Shuji TSUKIYAMA Masahiro FUKUI
The long-term degradation due to aging such as NBTI (Negative Bias Temperature Instability) is a hot issue in the current circuit design using nanometer process technologies, since it causes a delay fault in the field. In order to resolve the problem, we must estimate delay variation caused by long-term degradation in design stage, but over estimation must be avoided so as to make timing design easier. If we can treat such a variation statistically, and if we treat it together with delay variations due to process variability, then we can reduce over margin in timing design. Moreover, such a statistical static timing analyzer treating process variability and long-term degradation together will help us to select an appropriate set of paths for which field testing are conducted to detect delay faults. In this paper, we propose a new delay model with a half triangular distribution, which is introduced for handling a random factor with unknown distribution such as long term degradation. Then, we show an algorithm for finding the statistical maximum, which is one of key operations in statistical static timing analysis. We also show a few experimental results demonstrating the effect of the proposed model and algorithm.
Taeko MATSUNAGA Shinji KIMURA Yusuke MATSUNAGA
Multi-operand adders that calculate the summation of more than two operands usually consist of compressor trees, which reduce the number of operands to two without any carry propagation, and carry-propagate adders for the two operands in the ASIC implementation. Compressor trees that consist of full adders and half adders cannot be implemented efficiently on LUT-based FPGAs, and carry-chains or dedicated structures have been utilized to produce multi-operand adders on FPGAs. Recent studies indicate that compressor trees can be implemented efficiently on LUTs using Generalized Parallel Counters (GPCs) as the building blocks of compressor trees. This paper addresses the problem of synthesizing compressor trees based on GPCs. Based on the observation that characteristics such as the area, power, and delay correlate roughly to the total number and the maximum level of GPCs, the target problem can be regarded as a minimization problem for the total number of GPCs and the maximum levels of the GPCs, for which an ILP-based approach is proposed. The key point of our formulation is not to model the problem based on the structures of compressor trees like the existing approach, but instead the compression process itself is used to reduce the number of variables and constraints in the ILP formulation. The experimental results demonstrate the advantage of our formulation in terms of the quality and runtime.
Jun YAMASHITA Hiroyuki YOTSUYANAGI Masaki HASHIZUME Kozo KINOSHITA
Open faults are difficult to test since the voltage at the floating line is unpredictable and depends on the voltage at the adjacent lines. The effect of open faults can be easily excited if a test pattern provides the opposite logic value to most of the adjacent lines. In this paper, we present a procedure to generate as high a quality test as possible. We define the test quality for evaluating the effect of adjacent lines by assigning an opposite logic value to the faulty line. In our proposed test generation method, we utilize the SAT-based ATPG method. We generate test patterns that propagate the faulty effect to primary outputs and assign logic values to adjacent lines opposite that of the faulty line. In order to estimate test quality for open faults, we define the excitation effectiveness Eeff. To reduce the test volume, we utilize the open fault simulation. We calculate the excitation effectiveness by open fault simulation in order to eliminate unnecessary test patterns. The experimental results for the benchmark circuits prove the effectiveness of our procedure.
Pseudo Power Gating (Pseudo PG) is one of gate level power reduction methods for combinational circuits by stopping unnecessary input changes of gates. In Pseudo PG, an extra control signal might be added to a gate and other input changes of the gate are deactivated when the control signal takes the controlling value. To improve the power reduction capability, the paper newly introduces dual-stage Pseudo PG with advanced clustering algorithm where up to two extra control signals are added to a gate if effective. The advanced clustering algorithm selects the first control signal to be compatible with the second control signal based on the propagation of controlling condition via a path, with which candidates of controllable gates excluded by the maximum depth constraint can be controlled. Experimental results show that the proposed dual-stage Pseudo PG method has obtained 23.23% average power reduction with 5.28% delay penalty with respect to the original circuits, and has obtained 10.46% more power reduction with 2.75% delay penalty compared with respect to circuits applying the original single-stage Pseudo PG.
Yasuhiro TAKEI Hasitha Muthumala WAIDYASOORIYA Masanori HARIYAMA Michitaka KAMEYAMA
Heterogeneous multi-core architectures with CPUs and accelerators attract many attentions since they can achieve power-efficient computing in various areas from low-power embedded processing to high-performance computing. Since the optimal architecture is different from application to application, finding the most suitable accelerator is very important. In this paper, we propose an FPGA-based heterogeneous multi-core platform with custom accelerators for power-efficient computing. Using the proposed platform, we evaluate several applications and accelerators to identify many key requirements of the applications and properties of the accelerators. Such an evaluation is very important to select and optimize the most suitable accelerator according to the requirements of an application to achieve the best performance.
In this paper, we present a pure hardware implementation of the advanced encryption standard (AES) with 8-bit data path with both encryption/decryption abilities for applications of wireless network. To achieve the requirements of low area resource and high throughput performance, the 8-bit AES design overlaps the MixColumns (MC) and ShiftRows (SR), Inverse MixColumns (IMC) and Inverse ShiftRows (ISR) operations in order to reduce the required clock cycles and critical path delay of transformations involved. The combinations of SB with ISB, MC with IMC, and SR with ISR can effectively reduce the area cost of the AES realization. We implement the AES processor in an ASIC chip. The design has the area cost of 4.3 k-gates with throughput of 72Mbps which can meet the throughput requirement of IEEE 802.11g wireless network standard. From the experimental results, we observe that our AES design has better performance compared with other previous designs.
Shin-ya ABE Youhua SHI Kimiyoshi USAMI Masao YANAGISAWA Nozomu TOGAWA
In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delay into high-level synthesis. In AVHDR architecture, voltages can be dynamically assigned for energy reduction. In other words, low supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Next, an AVHDR-based high-level synthesis algorithm is proposed. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, the modules in each huddle can be placed close to each other and the corresponding AVHDR architecture can be generated and optimized with floorplanning information. Experimental results show that on average our algorithm achieves 43.9% energy-saving compared with conventional algorithms.
Jiayi ZHU Dajiang ZHOU Satoshi GOTO
High efficiency video coding (HEVC) is the next generation video compression standard. In-loop filter is an important component of HEVC which is composed of two parts, deblocking filter (DBF) and sample adaptive offset (SAO). In this article, we propose a high performance in-loop filter architecture for HEVC which integrate both deblocking filter and SAO. To achieve it, several ideas are adopted. Firstly, SAO is processed based on drifted block, which suits the output pattern of deblocking filter and ease the coupling of deblocking filter and SAO. Secondly, luma and chroma samples of each 4×4 block are organized in same memory storage unit and they are processed simultaneously to raise the parallelism. Thirdly, in both deblocking filter and SAO, calculation core is implemented in combinational logic and data storage is implemented in register groups. Calculation core keeps processing data continually, which greatly raises the utilization of DBF core and SAO core. Fourthly, task level pipeline in processing 8×8 block is employed between deblocking filter and SAO. By these means, a high performance in-loop filter including both deblocking filter and SAO is achieved without any intermediate storage or circuit. It takes only four cycles to finish the deblocking filter and SAO of one 8×8 block. The implementation results show that the proposed solution can be synthesized to 240MHz with 65nm technology. Thus this solution can process 3.84G pixels/s at maximum. UHDTV 4320p (7680×4320) @ 60fps decoding can be realized with 124.4MHz working frequency by the proposed architecture.
Xiongxin ZHAO Zhixiang CHEN Xiao PENG Dajiang ZHOU Satoshi GOTO
In this paper, we propose a synthesizable LDPC decoder IP core for the WiMAX system with high parallelism and enhanced error-correcting performance. By taking the advantages of both layered scheduling and fully-parallel architecture, the decoder can fully support multi-mode decoding specified in WiMAX with the parallelism much higher than commonly used partial-parallel layered LDPC decoder architecture. 6-bit quantized messages are split into bit-serial style and 2bit-width serial processing lines work concurrently so that only 3 cycles are required to decode one layer. As a result, 12∼24 cycles are enough to process one iteration for all the code-rates specified in WiMAX. Compared to our previous bit-serial decoder, it doubles the parallelism and solves the message saturation problem of the bit-serial arithmetic, with minor gate count increase. Power synthesis result shows that the proposed decoder achieves 5.83pJ/bit/iteration energy efficiency which is 46.8% improvement compared to state-of-the-art work. Furthermore, an advanced dynamic quantization (ADQ) technique is proposed to enhance the error-correcting performance in layered decoder architecture. With about 2% area overhead, 6-bit ADQ can achieve the error-correcting performance close to 7-bit fixed quantization with improved error floor performance.
Zhen ZHANG Shouyi YIN Leibo LIU Shaojun WEI
TSV-interconnected 3D chips face problems such as high cost, low yield and large power dissipation. We propose a wireless 3D on-chip-network architecture for application-specific SoC design, using inductive-coupling interconnect instead of TSV for inter-layer communication. Primary design challenge of inductive-coupling 3D SoC is allocating wireless links in the 3D on-chip network effectively. We develop a design flow fully exploiting the design space brought by wireless links while providing flexible tradeoff for user's choice. Experimental results show that our design brings great improvement over uniform design and Sunfloor algorithm on latency (5% to 20%) and power consumption (10% to 45%).
The objective of this research is to design a high-performance NAND flash memory system with a data buffer. The proposed buffer system in the NAND flash memory consists of two parts, i.e., a fully associative temporal buffer for temporal locality and a fully associative spatial buffer for spatial locality. We propose a new operating mechanism for reducing overhead of flash memory, that is, erase and write operations. According to our simulation results, the proposed buffer system can reduce the write and erase operations by about 73% and 79% for spec application respectively, compared with a fully associative buffer with two times more space. Futhermore, the average memory access time can improve by about 60% compared with other large buffer systems.
Yichao LU Gang HE Guifen TIAN Satoshi GOTO
Recently, non-binary low-density parity-check (NB-LDPC) codes starts to show their superiority in achieving significant coding gains when moderate codeword lengths are adopted. However, the overwhelming decoding complexity keeps NB-LDPC codes from being widely employed in modern communication devices. This paper proposes a hybrid message-passing decoding algorithm which consumes very low computational complexity. It achieves competitive error performance compared with conventional Min-max algorithm. Simulation result on a (255,174) cyclic code shows that this algorithm obtains at least 0.5dB coding gain over other state-of-the-art low-complexity NB-LDPC decoding algorithms. A partial-parallel NB-LDPC decoder architecture for cyclic NB-LDPC codes is also developed based on this algorithm. Optimization schemes are employed to cut off hard decision symbols in RAMs and also to store only part of the reliability messages. In addition, the variable node units are redesigned especially for the proposed algorithm. Synthesis results demonstrate that about 24.3% gates and 12% memories can be saved over previous works.
Energy-harvesting devices are materials that allow ambient energy sources to be converters into usable electrical power. While a battery powers the modern embedded systems, these energy-harvesting devices power the energy-harvesting embedded systems. This claims a new energy efficient management techniques for the energy-harvesting systems dislike the previous management techniques. The higher entire system efficiency in an energy-harvesting system can be obtained by a higher generating efficiency, a higher consuming efficiency, or a higher transferring efficiency. This paper presents a generalized technique for a dynamic reconfiguration and a task scheduling considering the power loss in DC-DC converters in the system. The proposed technique minimizes the power loss in the DC-DC converter and charger of the system. The proposed technique minimizes the power loss in the DC-DC converters and charger of the system. Experiments with actual application demonstrate that our approach reduces the total energy consumption by 22% in average over the conventional approach.
Trung Anh DINH Shigeru YAMASHITA Tsung-Yi HO Yuko HARA-AZUMI
Microfluidic biochips, also referred to “lab-on-a-chip,” have been recently proposed to integrate all the necessary functions for biochemical analyses. This technology starts a new era of biology science, where a combination of electronic and biology is first introduced. There are several types of microfluidic biochips; among them there has been a great interest in flow-based microfluidic biochips, in which the flows of liquid is manipulated using integrated microvalves. By combining several microvalves, more complex resource units such as micropumps, switches and mixers can be built. For efficient execution, the flows of liquid routes in microfluidic biochips need to be scheduled under some resource constraints and routing constraints. The execution time of a biochemical application depends strongly on the binding and scheduling result. The most previously developed binding and scheduling algorithm is based on heuristics, and there has been no method to obtain optimal results. Considering the above, we propose an optimal method by casting the problem to a clique problem. Moreover, this paper also presents some heuristic techniques for computational time reduction. Experiments demonstrate that the proposed method is able to reduce the execution time of biochemical applications by more than 15% compared with the previous approach. Moreover, the proposed heuristic method is able to produce the results at no or little cost of optimality, in significantly shorter time than the optimal method.
The throughput rate of Viterbi decoding (VD) is not limited by the speed of functional units when look-ahead computation techniques are used. The disadvantages of the look-ahead computation in VD are the hardware complexity and the decode latency. In this paper, implementation methods of the look-ahead ACS computation are proposed to improve the hardware efficiency and reduce the latency where the hardware efficiency and the latency can be balanced with a single parameter.
This paper addresses a high-level synthesis (HLS) using dual-edge-triggered flip-flops (DETFFs) as memory elements. In DETFF-based HLS, the duty cycle becomes a manageable resource to improve the timing performance. To utilize the duty cycle radically, a programmable duty cycle (PDC) mechanism is built into this HLS, and captured by a new HLS task named PDC scheduling. As a first step toward DETFF-based HLS with PDC, the execution time minimization problem is formulated for given results of operation scheduling. A linear program is presented to solve this problem in polynomial time. As a next step, simultaneous operation scheduling and PDC scheduling problem for the same objective is tackled. A mixed integer linear programming-based (MILP) approach is presented to solve this problem. The experimental results show that the MILP can reduce the execution time for several benchmarks.
Hatsuhiro KATO Hatsuyoshi KATO
The recursive transfer method (RTM) is a numerical technique that was developed to analyze scattering phenomena and its formulation is constructed with a difference equation derived from a differential equation by Numerov's discretization method. However, the differential equation to which Numerov's method is applicable is restricted and therefore the application range of RTM is also limited. In this paper, we provide a new discretization scheme to extend RTM formulation using the weak form theory framework. The effectiveness of the proposed formulation is confirmed by microwave scattering induced by a metallic pillar placed asymmetrically in the waveguide. A notable feature of RTM is that it can extract a localized wave from scattering waves even if the tail of the localized wave reaches to the ends of analyzing region. The discrepancy between the experimental and theoretical data is suppressed with in an upper bound determined by the standing wave ratio of the waveguide.
Jining ZHAO Chunxiang XU Fagen LI Wenzheng ZHANG
In the Cloud computing era, users could have their data outsourced to cloud service provider (CSP) to enjoy on-demand high quality service. On the behalf of the user, a third party auditor (TPA) which could verify the real data possession on CSP is critically important. The central challenge is to build efficient and provably secure data verification scheme while ensuring that no users' privacy is leaked to any unauthorized party, including TPA. In this paper, we propose the first identity-based public verification scheme, based on the identity-based aggregate signature (IBAS). In particular, by minimizing information that verification messages carry and TPA obtains or stores, we could simplify key management and greatly reduce the overheads of communication and computation. Unlike the existing works based on certificates, in our scheme, only a private key generator (PKG) has a traditional public key while the user just keeps its identity without binding with certificate. Meanwhile, we utilize privacy-preserving technology to keep users' private data off TPA. We also extend our scheme with the support of batch verification task to enable TPA to perform public audits among different users simultaneously. Our scheme is provably secure in the random oracle model under the hardness of computational Diffie-Hellman assumption over pairing-friendly groups and Discrete Logarithm assumption.
Pradit MITTRAPIYANURUK Pakorn KAEWTRAKULPONG
We present an algorithm for simultaneously recognizing and localizing planar textured objects in an image. The algorithm can scale efficiently with respect to a large number of objects added into the database. In contrast to the current state-of-the-art on large scale image search, our algorithm can accurately work with query images consisting of several specific objects and/or multiple instances of the same object. Our proposed algorithm consists of two major steps. The first step is to generate a set of hypotheses that provides information about the identities and the locations of objects in the image. To serve this purpose, we extend Bag-Of-Visual-Word (BOVW) image retrieval by incorporating a re-ranking scheme based on the Hough voting technique. Subsequently, in the second step, we propose a geometric verification algorithm based on A Contrario decision framework to draw out the final detection results from the generated hypotheses. We demonstrate the performance of the algorithm on the scenario of recognizing CD covers with a database consisting of more than ten thousand images of different CD covers. Our algorithm yield to the detection results of more than 90% precision and recall within a few seconds of processing time per image.
Shin-Woong PARK Jeonghong PARK Bang Chul JUNG
In this letter, parallel orthogonal matching pursuit (POMP) is proposed to supplement orthogonal matching pursuit (OMP) which has been widely used as a greedy algorithm for sparse signal recovery. Empirical simulations show that POMP outperforms the existing sparse signal recovery algorithms including OMP, compressive sampling matching pursuit (CoSaMP), and linear programming (LP) in terms of the exact recovery ratio (ERR) for the sparse pattern and the mean-squared error (MSE) between the estimated signal and the original signal.
Xin LIAO Qiaoyan WEN Jie ZHANG
This letter improves two adaptive steganographic methods in Refs. [5], [6], which utilize the remainders of two consecutive pixels to record the information of secret data. Through analysis, we point out that they perform mistakenly under some conditions, and the recipient cannot extract the secret data exactly. We correct these by enlarging the adjusting range of the remainders of two consecutive pixels within the block in the embedding procedure. Furthermore, the readjusting phase in Ref. [6] is improved by allowing every two-pixel block to be fully modified, and then the sender can select the best choice that introduces the smallest embedding distortion. Experimental results show that the improved method not only extracts secret data exactly but also reduces the embedding distortion.
Saransh MALIK Sangmi MOON Bora KIM Huaping LIU Cheolwoo YOU Jeong-Ho KIM Intae HWANG
In this letter, we propose an Adaptive Modulation and Coding (AMC) scheme with relay protocols, such as Amplify-and-Forward (AF), Decode-and-Forward (DF) and De-Modulate-and-Forward (DMF). We perform simulations based on 3GPP Long Term Evolution-Advanced (LTE-A) parameters to compare the performance of an adaptive Modulation and Coding Scheme (MCS) using relay protocols of AF, DF, and DMF with non-adaptive MCS, with the same relay protocols. We analyze the performance of the proposed scheme and observe how the proposed AMC scheme with DMF performs at various Signal to Noise Ratio (SNR) regions. The simulation results have shown that the performance of the proposed AMC scheme with relay protocols of DMF is much better at lower and a higher SNR regions and also provides higher average throughput.
In this letter, we present a novel interference-aware clustering scheme for cell broadcasting service. The proposed approach is based on a genetic algorithm for re-clustering. Using the genetic algorithm, the suggested method efficiently re-clusters the user nodes when the relays fail in receiving the cell broadcasting message from the base station. The simulation results exhibit that the proposed clustering scheme can maintain much higher capacity than the conventional clustering scheme in the cases of relay outage. The re-clustering method based on genetic algorithm also shows lower complexity than the re-clustering approach based on exhaustive search.
Kensuke SAWADA Shigenobu SASAKI Shinichiro MORI
Geomagnetic information is informative because it has the ability to detect information about orientation by way of a ubiquitous device. However, a magnetic disturbance easily influences geomagnetic information. The magnetic disturbance detection method is needed in order to use geomagnetic information. Firstly, in this paper, the availability of geomagnetic information in Japan is investigated by field measurement work. Then, a new magnetic disturbance detection method which is better than the conventional method is proposed. The basic function of the proposed method is tested in actual condition.