Keyword Search Result

[Keyword] pipeline(141hit)

21-40hit(141hit)

  • A Low Power Tone Recognition for Automatic Tonal Speech Recognizer

    Jirabhorn CHAIWONGSAI  Werapon CHIRACHARIT  Kosin CHAMNONGTHAI  Yoshikazu MIYANAGA  Kohji HIGUCHI  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1403-1411

    This paper proposes a low power tone recognition suitable for automatic tonal speech recognizer (ATSR). The tone recognition estimates fundamental frequency (F0) only from vowels by using a new magnitude difference function (MDF), called vowel-MDF. Accordingly, the number of operations is considerably reduced. In order to apply the tone recognition in portable electronic equipment, the tone recognition is designed using parallel and pipeline architecture. Due to the pipeline and parallel computations, the architecture achieves high throughput and consumes low power. In addition, the architecture is able to reduce the number of input frames depending on vowels, making it more adaptable depending on the maximum number of frames. The proposed architecture is evaluated with words selected from voice activation for GPS systems, phone dialing options, and words having the same phoneme but different tones. In comparison with the autocorrelation method, the experimental results show 35.7% reduction in power consumption and 27.1% improvement of tone recognition accuracy (110 words comprising 187 syllables). In comparison with ATSR without the tone recognition, the speech recognition accuracy indicates 25.0% improvement of ATSR with tone recogntion (2,250 training data and 45 testing words).

  • Non-binary Pipeline Analog-to-Digital Converter Based on β-Expansion

    Hao SAN  Tomonari KATO  Tsubasa MARUYAMA  Kazuyuki AIHARA  Masao HOTTA  

     
    PAPER

      Vol:
    E96-A No:2
      Page(s):
    415-421

    This paper proposes a pipeline analog-to-digital converter (ADC) with non-binary encoding technique based on β-expansion. By using multiply-by-β switched-capacitor (SC) multiplying digital-to-analog converter (MDAC) circuit, our proposed ADC is composed by radix-β (1 < β < 2) 1 bit pipeline stages instead of using the conventional radix-2 1.5 bit/1 bit pipeline stages to realize non-binary analog-to-digital conversion. Also with proposed β-value estimation algorithm, there is not any digital calibration technique is required in proposed pipeline ADC. The redundancy of non-binary ADC tolerates not only the non-ideality of comparator, but also the mismatch of capacitances and the gain error of operational amplifier (op-amp) in MDAC. As a result, the power hungry high gain and wide bandwidth op-amps are not necessary for high resolution ADC, so that the reliability-enhanced pipeline ADC with simple amplifiers can operate faster and with lower power. We analyse the β-expansion of AD conversion and modify the β-encoding technique for pipeline ADC. In our knowledge, this is the first proposal architecture for non-binary pipeline ADC. The reliability of the proposed ADC architecture and β-encoding technique are verified by MATLAB simulations.

  • A Verification-Aware Design Methodology for Thread Pipelining Parallelization

    Guo-An JIAN  Cheng-An CHIEN  Peng-Sheng CHEN  Jiun-In GUO  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E95-D No:10
      Page(s):
    2505-2513

    This paper proposes a verification-aware design methodology that provides developers with a systematic and reliable approach to performing thread-pipelining parallelization on sequential programs. In contrast to traditional design flow, a behavior-model program is constructed before parallelizing as a bridge to help developers gradually leverage the technique of thread-pipelining parallelization. The proposed methodology integrates verification mechanisms into the design flow. To demonstrate the practicality of the proposed methodology, we applied it to the parallelization of a 3D depth map generator with thread pipelining. The parallel 3D depth map generator was further integrated into a 3D video playing system for evaluation of the verification overheads of the proposed methodology and the system performance. The results show the parallel system can achieve 33.72 fps in D1 resolution and 12.22 fps in HD720 resolution through a five-stage pipeline. When verifying the parallel program, the proposed verification approach keeps the performance degradation within 23% and 21.1% in D1 and HD720 resolutions, respectively.

  • Design of High-Performance Asynchronous Pipeline Using Synchronizing Logic Gates

    Zhengfan XIA  Shota ISHIHARA  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E95-C No:8
      Page(s):
    1434-1443

    This paper introduces a novel design method of an asynchronous pipeline based on dual-rail dynamic logic. The overhead of handshake control logic is greatly reduced by constructing a reliable critical datapath, which offers the pipeline high throughput as well as low power consumption. Synchronizing Logic Gates (SLGs), which have no data dependency problem, are used in the design to construct the reliable critical datapath. The design targets latch-free and extremely fine-grain or gate-level pipeline, where the depth of every pipeline stage is only one dual-rail dynamic logic. HSPICE simulation results, in a 65 nm design technology, indicate that the proposed design increases the throughput by 120% and decreases the power consumption by 54% compared with PS0, a classic dual-rail asynchronous pipeline implementation style, in 4-bit wide FIFOs. Moreover, this method is applied to design an array style multiplier. It shows that the proposed design reduces power by 37.9% compared to classic synchronous design when the workloads are 55%. A chip has been fabricated with a 44 multiplier function, which works well at 2.16G data-set/s (Post-layout simulation).

  • Energy Minimum Operation with Self Synchronous Gate-Level Autonomous Power Gating and Voltage Scaling

    Benjamin DEVLIN  Makoto IKEDA  Kunihiro ASADA  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    546-554

    A 65 nm self synchronous field programmable gate array (SSFPGA) which uses autonomous gate-level power gating with minimal control circuitry overhead for energy minimum operation is presented. The use of self synchronous signaling allows the FPGA to operate at voltages down to 370 mV without any parameter tuning. We show both 2.6x total energy reduction and 6.4x performance improvement at the same time for energy minimum operation compared to the non-power gated SSFPGA, and compared to the latest research 1.8x improvement in power-delay product (PDP) and 2x performance improvement. When compared to a synchronous FPGA in a similar process we are able to show up to 84.6x PDP improvement. We also show energy minimum operation for maximum throughput on the power gated SSFPGA is achieved at 0.6 V, 27 fJ/operation at 264 MHz.

  • A 64 Cycles/MB, Luma-Chroma Parallelized H.264/AVC Deblocking Filter for 4 K2 K Applications

    Weiwei SHEN  Yibo FAN  Xiaoyang ZENG  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    441-446

    In this paper, a high-throughput debloking filter is presented for H.264/AVC standard, catering video applications with 4 K2 K (40962304) ultra-definition resolution. In order to strengthen the parallelism without simply increasing the area, we propose a luma-chroma parallel method. Meanwhile, this work reduces the number of processing cycles, the amount of external memory traffic and the working frequency, by using triple four-stage pipeline filters and a luma-chroma interlaced sequence. Furthermore, it eliminates most unnecessary off-chip memory bandwidth with a highly reusable memory scheme, and adopts a “slide window” buffer scheme. As a result, our design can support 4 K2 K at 30 fps applications at the working frequency of only 70.8 MHz.

  • Design of Area- and Power-Efficient Pipeline FFT Processors for 8x8 MIMO-OFDM Systems

    Shingo YOSHIZAWA  Yoshikazu MIYANAGA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E95-A No:2
      Page(s):
    550-558

    We present area- and power-efficient pipeline 128- and 128/64-point fast Fourier transform (FFT) processors for 8x8 multiple-input multiple-output orthogonal frequency multiplexing (MIMO-OFDM) systems based on the specification framework of IEEE 802.11ac WLANs. Our new FFT processors use mixed-radix multipath delay commutator (MRMDC) architecture from the point of view of low complexity and high memory use. A conventional MRMDC architecture induces large circuits in delay commutators, which change the order of data sequences for the butterfly units. The proposed architecture replaces delay elements with new commutators that cooperate with other MIMO-OFDM processing blocks. These commutators are inserted in the front and rear of the input and output memory units. Our FFT processors exhibit a 50–51% reduction in logic gates and 70–72% reduction in power dissipation as compared with conventional ones.

  • Low-Complexity Multi-Mode Memory-Based FFT Processor for DVB-T2 Applications

    Kisun JUNG  Hanho LEE  

     
    PAPER-Digital Signal Processing

      Vol:
    E94-A No:11
      Page(s):
    2376-2383

    This paper presents a low-complexity multi-mode fast Fourier transform (FFT) processor for Digital Video Broadcasting-Terrestrial 2 (DVB-T2) systems. DVB-T2 operations need 1K/2K/4K/8K/16K/32K-point multiple mode FFT processors. The proposed architecture employs pipelined shared-memory architecture in which radix-2/22/23/24 FFT algorithms, multi-path delay commutator (MDC), and a novel data scaling approach are exploited. Based on this architecture, a novel low-cost data scaling unit is proposed to increase area efficiency, and an elaborate memory configuration scheme is designed to make single-port SRAM without degrading throughput rate. Also, new scheduling method of twiddle factor is proposed to reduce the area. The SQNR performance of 32K-point FFT mode is about 45.3 dB at 11-bit internal word length for 256QAM modulation. The proposed FFT processor has a lower hardware complexity and memory size compared to conventional FFT processors.

  • A Single Amplifier-Based 12-bit 100 MS/s 1 V 19 mW 0.13 µm CMOS ADC with Various Power and Area Minimized Circuit Techniques

    Byeong-Woo KOO  Seung-Jae PARK  Gil-Cho AHN  Seung-Hoon LEE  

     
    PAPER-Electronic Circuits

      Vol:
    E94-C No:8
      Page(s):
    1282-1288

    This work describes a 12-bit 100 MS/s 0.13 µm CMOS three-stage pipeline ADC with various circuit design techniques to reduce power and die area. Digitally controlled timing delay and gate-bootstrapping circuits improve the linearity and sampling time mismatch of the SHA-free input network composed of an MDAC and a FLASH ADC. A single two-stage switched op-amp is shared between adjacent MDACs without MOS series switches and memory effects by employing two separate NMOS input pairs based on slightly overlapped switching clocks. The interpolation, open-loop offset sampling, and two-step reference selection schemes for a back-end 6-bit flash ADC reduce both power consumption and chip area drastically compared to the conventional 6-bit flash ADCs. The prototype ADC in a 0.13 µm CMOS process demonstrates measured differential and integral non-linearities within 0.44LSB and 1.54LSB, respectively. The ADC shows a maximum SNDR and SFDR of 60.5 dB and 71.2 dB at 100 MS/s, respectively. The ADC with an active die area of 0.92 mm2 consumes 19 mW at 100 MS/s from a 1.0 V supply. The measured FOM is 0.22 pJ/conversion-step.

  • Background Self-Calibration Algorithm for Pipelined ADC Using Split ADC Scheme

    Takuya YAGI  Kunihiko USUI  Tatsuji MATSUURA  Satoshi UEMORI  Satoshi ITO  Yohei TAN  Haruo KOBAYASHI  

     
    BRIEF PAPER-Electronic Circuits

      Vol:
    E94-C No:7
      Page(s):
    1233-1236

    This brief paper describes a background calibration algorithm for a pipelined ADC with an open-loop amplifier using a Split ADC structure. The open-loop amplifier is employed as a residue amplifier in the first stage of the pipelined ADC to realize low power and high speed. However the residue amplifier as well as the DAC suffer from gain error and non-linearity, and hence they need calibration; conventional background calibration methods take a long time to converge. We investigated the split ADC structure for its background calibration with fast convergence, and validated its effectiveness by MATLAB simulation.

  • An Energy Efficiency 4-bit Multiplier with Two-Phase Non-overlap Clock Driven Charge Recovery Logic

    Yimeng ZHANG  Leona OKAMURA  Tsutomu YOSHIHARA  

     
    PAPER

      Vol:
    E94-C No:4
      Page(s):
    605-612

    A novel charge-recovery logic structure called Pulse Boost Logic (PBL) is proposed in this paper. PBL is a high-speed low-energy-dissipation charge-recovery logic with dual-rail evaluation tree structure. It is driven by 2-phase non-overlap clock, and requires no DC power supply. PBL belongs to boost logic family, which includes boost logic, enhanced boost logic and subthreshold boost logic. In this paper, PBL has been compared with other charge-recovery logic technologies. To demonstrate the performance of PBL structure, a 4-bit pipeline multiplier is designed and fabricated with 0.18 µm CMOS process technology. The simulation results indicate that the 4-bit multiplier can work at a frequency of 1.8 GHz, while the measurement of test chip is at operation frequency of 161 MHz, and the power dissipation at 161 MHz is 772 µW.

  • Asynchronous Pipeline Controller Based on Early Acknowledgement Protocol

    Chammika MANNAKKARA  Tomohiro YONEDA  

     
    PAPER-Computer System

      Vol:
    E93-D No:8
      Page(s):
    2145-2161

    A new pipeline controller based on the Early Acknowledgement (EA) protocol is proposed for bundled-data asynchronous circuits. The EA protocol indicates acknowledgement by the falling edge of the acknowledgement signal in contrast to the 4-phase protocol, which indicates it on the rising edge. Thus, it can hide the overhead caused by the resetting period of the handshake cycle. Since we have designed our controller assuming several timing constraints, we first analyze the timing constraints under which our controller correctly works and then discuss their appropriateness. The performance of the controller is compared both analytically and experimentally with those of two other pipeline controllers, namely, a very high-speed 2-phase controller and an ordinary 4-phase controller. Our controller performs better than a 4-phase controller when pipeline has processing elements. We have obtained interesting results in the case of a non-linear pipeline with a Conditional Branch (CB) operation. Our controller has slightly better performance even compared to 2-phase controller in the case of a pipeline with processing elements. Its superiority lies in the EA protocol, which employs return-to-zero control signals like the 4-phase protocol. Hence, our controller for CB operation is simple in construction just like the 4-phase controller. A 2-phase controller for the same operation needs to have a slightly complicated mechanism to handle the 2-phase operation because of the non-return-to-zero control signals, and this results in a performance overhead.

  • A Low Power and High Throughput Self Synchronous FPGA Using 65 nm CMOS with Throughput Optimization by Pipeline Alignment

    Benjamin STEFAN DEVLIN  Toru NAKURA  Makoto IKEDA  Kunihiro ASADA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E93-A No:7
      Page(s):
    1319-1328

    We detail a self synchronous field programmable gate array (SSFPGA) with dual-pipeline (DP) architecture to conceal pre-charge time for dynamic logic, and its throughput optimization by using pipeline alignment implemented on benchmark circuits. A self synchronous LUT (SSLUT) consists of a three input tree-type structure with 8 bits of SRAM for programming. A self synchronous switch box (SSSB) consists of both pass transistors and buffers to route signals, with 12 bits of SRAM. One common block with one SSLUT and one SSSB occupies 2.2 Mλ2 area with 35 bits of SRAM, and the prototype SSFPGA with 3430 (1020) blocks is designed and fabricated using 65 nm CMOS. Measured results show at 1.2 V 430 MHz and 647 MHz operation for a 3 bit ripple carry adder, without and with throughput optimization, respectively. We find that using the proposed pipeline alignment techniques we can perform at maximum throughput of 647 MHz in various benchmarks on the SSFPGA. We demonstrate up to 56.1 times throughput improvement with our pipeline alignment techniques. The pipeline alignment is carried out within the number of logic elements in the array and pipeline buffers in the switching matrix.

  • A 0.9-V 12-bit 40-MSPS Pipeline ADC for Wireless Receivers

    Tomohiko ITO  Tetsuro ITAKURA  

     
    PAPER

      Vol:
    E93-A No:2
      Page(s):
    395-401

    A 0.9-V 12-bit 40-MSPS pipeline ADC with I/Q amplifier sharing technique is presented for wireless receivers. To achieve high linearity even at 0.9-V supply, the clock signals to sampling switches are boosted over 0.9 V in conversion stages. The clock-boosting circuit for lifting these clocks is shared between I-ch ADC and Q-ch ADC, reducing the area penalty. Low supply voltage narrows the available output range of the operational amplifier. A pseudo-differential (PD) amplifier with two-gain-stage common-mode feedback (CMFB) is proposed in views of its wide output range and power efficiency. This ADC is fabricated in 90-nm CMOS technology. At 40 MS/s, the measured SNDR is 59.3 dB and the corresponding effective number of bits (ENOB) is 9.6. Until Nyquist frequency, the ENOB is kept over 9.3. The ADC dissipates 17.3 mW/ch, whose performances are suitable for ADCs for mobile wireless systems such as WLAN/WiMAX.

  • A 300 MHz Embedded Flash Memory with Pipeline Architecture and Offset-Free Sense Amplifiers for Dual-Core Automotive Microcontrollers

    Shinya KAJIYAMA  Masamichi FUJITO  Hideo KASAI  Makoto MIZUNO  Takanori YAMAGUCHI  Yutaka SHINAGAWA  

     
    PAPER

      Vol:
    E92-C No:10
      Page(s):
    1258-1264

    A novel 300 MHz embedded flash memory for dual-core microcontrollers with a shared ROM architecture is proposed. One of its features is a three-stage pipeline read operation, which enables reduced access pitch and therefore reduces performance penalty due to conflict of shared ROM accesses. Another feature is a highly sensitive sense amplifier that achieves efficient pipeline operation with two-cycle latency one-cycle pitch as a result of a shortened sense time of 0.63 ns. The combination of the pipeline architecture and proposed sense amplifiers significantly reduces access-conflict penalties with shared ROM and enhances performance of 32-bit RISC dual-core microcontrollers by 30%.

  • Interacting Self-Timed Pipelines and Elementary Coupling Control Modules

    Kazuhiro KOMATSU  Shuji SANNOMIYA  Makoto IWATA  Hiroaki TERADA  Suguru KAMEDA  Kazuo TSUBOUCHI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E92-A No:7
      Page(s):
    1642-1651

    The self-timed pipeline (STP) is one of the most promising VLSI/SoC architectures. It achieves efficient utilization of tens of billions of transistors, consumes ultra low power, and is easy-to-design because of its signal integrity and low electro-magnetic interference. These basic features of the STP have been proven by the development of self-timed data-driven multimedia processors, DDMP's. This paper proposes a novel scheme of interacting self-timed (clockless) pipelines by which the various distributed and interconnected pipelines can achieve highly functional stream processing in future giga-transistor chips. The paper also proposes a set of elementary coupling control modules that facilitate various combinations of flow-thru processing between pipelines, and then discusses the practicality of the proposed scheme through the LSI design of application modules such as a priority-based queue, a mutual interconnection network, and a pipelined sorter.

  • Duty Cycle Corrector for Pipelined ADC with Low Added Jitter

    Zhengchang DU  Jianhui WU  Shanli LONG  Meng ZHANG  Xincun JI  

     
    LETTER

      Vol:
    E92-C No:6
      Page(s):
    864-866

    A wide range, low jitter Duty Cycle Corrector (DCC) based on continuous-time integrator is proposed. It introduces little added jitter in the sampling edge, which make it good candidate for pipelined ADC application. The circuit is implemented in CMOS 0.35 µm 2P4M Mixed Signal process. The experimental results show the circuit can work for a wide frequency range from 500 kHz to 280 MHz, with a correction error within 50%1% under 200 MHz, and the acceptable duty cycle can be as wide as 1-99% for low frequency inputs.

  • A 150 MS/s 10-bit CMOS Pipelined Subranging ADC with Time Constant Reduction Technique

    Xian Ping FAN  Pak Kwong CHAN  Piew Yoong CHEE  

     
    PAPER-Electronic Circuits

      Vol:
    E92-C No:5
      Page(s):
    719-727

    A 150 MS/s 10-bit MOS-inverter-based subranging analog-to-digital converter (ADC) dedicated to a high-speed low-power application is presented in this paper. A new time constant reduction technique is proposed in the multi-stage preamplifier design which aims to further increase the speed of the coarse ADC. A synchronized switch is introduced to minimize the sample-time mismatch in the interleaved architecture of fine ADCs. An internal pipelined scheme incorporating the double sampling and interleaving techniques in fine ADCs allows the ADC sample input signal to run on a consecutive clock, thus maximizing the throughput. The prototype ADC achieves 52 dB SNDR for a 10 MHz input frequency at 150 MS/s. Without calibration, the measured differential nonlinearity (DNL) is 0.5 LSB, while the integral nonlinearity (INL) is 0.9 LSB. The CMOS ADC is fabricated in a 0.35 µm CMOS technology, with an active area of 2.7 mm2, consuming only 178 mW from a single 3 V supply. Comparing technology normalized figure-of-merits, it achieves better power-speed efficiency than other similar types of ADCs.

  • Mismatch-Insensitive High Precision Switched-Capacitor Multiply-by-Four Amplifier

    Seunghyun LIM  Gunhee HAN  

     
    LETTER-Electronic Circuits

      Vol:
    E92-C No:3
      Page(s):
    377-379

    This letter proposes a mismatch insensitive switched-capacitor multiply-by-four (4X) amplifier using the voltage addition scheme. The proposed circuit provides 2-times faster speed and about half of silicon area when compared with the cascade of conventional 2X amplifiers. Monte-Carlo simulation results show about 15% gain accuracy improvement over the cascaded 2X- amplifiers.

  • A Reference Voltage Buffer with Settling Boost Technique for a 12 bit 18 MHz Multibit/Stage Pipelined A/D Converter

    Shunsuke OKURA  Tetsuro OKURA  Toru IDO  Kenji TANIGUCHI  

     
    PAPER

      Vol:
    E92-A No:2
      Page(s):
    367-373

    A reference voltage buffer for a multibit/stage pipelined ADC is described, where a settling boost technique is used to improve the settling response of the pipelined stages. A 12 bit 18 MHz pipelined ADC with the buffer is designed and simulated based on a 0.35 µm CMOS process. According to simulation results, the power consumed by the reference voltage buffer is reduced by 33% compared to that without the settling boost technique.

21-40hit(141hit)

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.