Keyword Search Result

[Keyword] pipeline(141hit)

81-100hit(141hit)

  • A Low Cost Reconfigurable Architecture for a UMTS Receiver

    Ronny VELJANOVSKI  Aleksandar STOJCEVSKI  Jugdutt SINGH  Aladin ZAYEGH  Michael FAULKNER  

     
    PAPER

      Vol:
    E86-B No:12
      Page(s):
    3441-3451

    A novel reconfigurable architecture has been proposed for a mobile terminal receiver that can drastically reduce power dissipation dependant on adjacent channel interference. The proposed design can automatically scale the number of filter coefficients and word length respectively by monitoring the in-band and out-of-band powers. The new architecture performance was evaluated in a simulation UTRA-TDD environment because of the large near far problem caused by adjacent channel interference from adjacent mobiles and base stations. The UTRA-TDD downlink mode was examined statistically and results show that the reconfigurable architectures can save an average of up to 75% power dissipation respectively when compared to a fixed filter length of 57 and word length of 16 bits. This power saving only applies to the filter and ADC, not the whole receiver. This will prolong talk and standby time in a mobile terminal. The average number of taps and bits were calculated to be 14.98 and 10 respectively, for an outage of 97%.

  • On Practical Implementation of the PIC Algorithm in Asynchronous CDMA Systems

    Young Wha KIM  Sung Ho CHO  

     
    LETTER-Wireless Communication Technology

      Vol:
    E86-B No:8
      Page(s):
    2508-2511

    In this letter, we present a practical method of implementing the parallel interference cancellation (PIC) algorithm in an asynchronous CDMA system. A novel pipelined structure is employed in this method in order to reduce the processing delay and the memory space comparing to the conventional PIC processing scheme.

  • A High Throughput Pipelined Architecture for Blind Adaptive Equalizer with Minimum Latency

    Masashi MIZUNO  James OKELLO  Hiroshi OCHI  

     
    PAPER

      Vol:
    E86-A No:8
      Page(s):
    2011-2019

    In this paper, we propose a pipelined architecture for an equalizer based on the Multilevel Modified Constant Modulus Algorithm (MMCMA). We also provide the correction factor that mathematically converts the proposed pipelined adaptive equalizer into an equivalent non-pipelined conventional MMCMA based equalizer. The proposed method of pipelining uses modules with 6 filter coefficients, resulting in an overall latency of a single sampling period, along the main transmission line. The basic concept of the proposed architecture is to implement the Finite Impulse Response (FIR) filter and the algorithm portion of the adaptive equalizer, such that the critical path of the whole circuit has a maximum of three complex multipliers and three adders.

  • A Pipeline Structure for High-Speed Step-by-Step RS Decoding

    Tung-Chou CHEN  Che-Ho WEI  Shyue-Win WEI  

     
    LETTER-Fundamental Theories

      Vol:
    E86-B No:2
      Page(s):
    847-849

    Based on a modified step-by-step decoding procedure, a high-speed pipelined Reed-Solomon decoder is presented. The decoder requires only the delay time of three 2-input XOR gates for decoding each coded symbol. The decoder can be operated in a bit rate of Gbits/sec order and thus suitable for the very high speed data transmission systems.

  • Data Transfer Time by HTTP 1.0/1.1 on Asymmetric Networks Composed of Satellite and Terrestrial Links

    Hiroyasu OBATA  Kenji ISHIDA  Junichi FUNASAKA  Kitsutaro AMANO  

     
    PAPER-Internet

      Vol:
    E85-B No:12
      Page(s):
    2895-2903

    Asymmetric networks, which provide asymmetric bandwidth or delay for upstream and downstream transfer, have recently gained much attention since they support popular applications such as the World Wide Web (WWW). HTTP (Hypertext Transfer Protocol) is the basis of most WWW services so, evaluating the performance of HTTP on asymmetric networks is increasingly important, particularly real-world networks. However, the performance of HTTP on the asymmetric networks composed of satellite and terrestrial links has not sufficiently evaluated. This paper proposes new formulas to evaluate the performance of both HTTP1.0 and HTTP1.1 on asymmetric networks. Using these formulas, we calculate the time taken to transfer web data by HTTP1.0/1.1. The calculation results are compared to the results of an existing theoretical formula and experimental results gained from a system that combines a VSAT (Very Small Aperture Terminal) satellite communication system for satellite links (downstream) and the Internet for terrestrial links (upstream). The comparison shows that the proposed formulas yield more accurate results (compared to the measured values) than the existing formula. Furthermore, this paper proposes an evaluation formula for pipelined HTTP1.1, and shows that the values output by the proposed formula agree with those obtained by experiments (on the VSAT system) and simulations.

  • Design Exploration of an Industrial Embedded Microcontroller: Performance, Cost and Software Compatibility

    Ing-Jer HUANG  Li-Rong WANG  Yu-Min WANG  Tai-An LU  

     
    PAPER-VLSI Design

      Vol:
    E85-A No:12
      Page(s):
    2624-2635

    This paper presents a case study of synthesis of the industrial embedded microcontroller HT48100 and analysis of performance, cost and software compatibility for its implementation alternatives, using the hardware/software co-design system for microcontrollers/microprocessors PIPER-II. The synthesis tool accepts as input the instruction set architecture (behavioral) specification, and produces as outputs the pipelined RTL designs with their simulators, and the reordering constraints which guide the compiler backend to optimize the code for the synthesized designs. A compiler backend is provided to optimize the application software according to the reordering constraints. The study shows that the co-design approach was able to help the original design team to analyze the architectural properties, identify inefficient architecture features, and explore possible architectural improvements and their impacts in both hardware and software. Feasible future upgrades for the microcontroller family have been identified by the study.

  • Pipelined Simple Matching for Input Buffered Switches

    Man-Soo HAN  Bongtae KIM  

     
    LETTER-Antenna and Propagation

      Vol:
    E85-B No:11
      Page(s):
    2539-2543

    We present pipelined simple matching, called PSM, for an input buffered switch to relax the scheduling timing constraint by modifying pipelined maximal-sized matching (PMM). Like the pipelined manner of PMM, to produce the matching results in every time slot, PSM employs multiple subschedulers which take more than one time slot to complete matching. Using only head-of-line information of input buffers, PSM successively sends each request to all subschedulers to provide a better matching opportunity. To obtain better performance, PSM uses unique starting points of scheduling pointers in which the difference between the starting points is equal for any two adjacent subschedulers for a same output. Using computer simulations under a uniform traffic, we show PSM is more appropriate than PMM for pipelined scheduling of an input buffered switch.

  • A Digital Calibration Technique of Capacitor Mismatch for Pipelined Analog-to-Digital Converters

    Masanori FURUTA  Shoji KAWAHITO  Daisuke MIYAZAKI  

     
    PAPER

      Vol:
    E85-C No:8
      Page(s):
    1562-1568

    A digital calibration technique, which corrects errors due to capacitor mismatch in pipelined ADC and directly measures the error coefficients using the ADC INL plot, is described. The proposed technique can be applied for various types of pipelined ADC architectures. Test results using an implemented 10-bit pipelined ADC show that the ADC achieves a peak signal-to-noise-and-distortion ratio of 56.5 dB, a peak integral non-linearity of 0.3 LSB, and a peak differential non-linearity of 0.3 LSB using the digital calibration.

  • A 3.2-mA 6-Bit Pipelined A/D Coverter for a Bluetooth RF Transceiver

    Tatsuji MATSUURA  Junya KUDOH  Eiki IMAIZUMI  

     
    PAPER

      Vol:
    E85-C No:8
      Page(s):
    1538-1545

    A low-power-consumption 6-bit pipelined analog-to-digital converter for use in a BluetoothTM RF transceiver has been developed. The RF transceiver chip was fabricated using a 0.35-µm BiCMOS process, and the A/D converter is based on CMOS technology for digital logic. To reduce the power consumption of the converter, we used a look-ahead pipeline architecture to reduce the required settling time of an amplifier in the critical path of the converter. We show that through this reduction, amplifier power consumption of 600 µA can be reduced to 250 µA to achieve a 13-MHz conversion rate. We have also developed a low-power two-capacitor switched-capacitor common-mode feedback circuit which enables an offset cancellation of an amplifier during the reset phase. Offset cancellation is used in each stage of the S/H amplifier to reduce the overall offset of the converter. It achieves an effective number of bits of 5.7 at a conversion rate of 13 Msps and 5.0 at 26 Msps. The residual offset of the converter is only 4 mV. It has a low total current consumption of 3.2 mA at 13 Msps and a supply voltage of 2.8 V.

  • A Pipelined Maximal-Sized Matching Scheme for High-Speed Input-Buffered Switches

    Eiji OKI  Roberto ROJAS-CESSA  H. Jonathan CHAO  

     
    PAPER-Switching

      Vol:
    E85-B No:7
      Page(s):
    1302-1311

    This paper proposes an innovative Pipeline-based Maximal-sized Matching scheduling approach, called PMM, for input-buffered switches. It dramatically relaxes the limitation of a single time slot for completing a maximal matching into any number of time slots. In the PMM approach, arbitration is operated in a pipelined manner, where K subschedulers are used. Each subscheduler is allowed to take more than one time slot for its matching. Every time slot, one of the subschedulers provides the matching result. We adopt an extended version of Dual Round-Robin Matching (DRRM), called iterative DRRM (iDRRM), as a maximal matching algorithm in a subscheduler. PMM maximizes the efficiency of the adopted arbitration scheme by allowing sufficient time for the number of iterations. We show that PMM preserves 100% throughput under uniform traffic and fairness for best-effort traffic of the non-pipelined adopted algorithm, while ensuring that cells from the same virtual output queue (VOQ) are transmitted in sequence. In addition, we confirm that the delay performance of PMM is not significantly degraded by increasing the pipeline degree, or the number of subschedulers, when the number of outstanding requests for each subscheduler from a VOQ is limited to 1.

  • Assignment-Driven Loop Pipeline Scheduling and Its Application to Data-Path Synthesis

    Toshiyuki YOROZUYA  Koji OHASHI  Mineo KANEKO  

     
    PAPER

      Vol:
    E85-A No:4
      Page(s):
    819-826

    In this paper, we study loop pipeline scheduling problem under given resource assignment (operation to functional unit assignments and data to register assignments), which is one of the key tasks in data-path synthesis based on the assignment solution space exploration. We show an approach using a precedence constraint graph with parametric disjunctive arcs generated from the specified assignment information, and derive a scheduling method using branch-and-bound exploration of the parameter space. As an application of the proposed scheduling method, it is incorporated with Simulated-Annealing (SA) based exploration of assignment solution space, and it is demonstrated that data-paths of the fifth-order elliptic wave filter are successfully synthesized.

  • Design and Demonstration of Pipelined Circuits Using SFQ Logic

    Akira AKAHORI  Akito SEKIYA  Takahiro YAMADA  Akira FUJIMAKI  Hisao HAYAKAWA  

     
    PAPER-Digital Devices and Their Applications

      Vol:
    E85-C No:3
      Page(s):
    641-644

    We have designed the Half Adder (HA) circuit and the Carry Save Serial Adder (CSSA) circuit based on pipeline architecture. Our HA has the structure of a two-stage pipeline and consists of 160 Josephson Junctions (JJs). Our CSSA has the structure of a four-stage pipeline with a feedback loop and consists of 360 JJs. These circuits were fabricated by the NEC standard process. There are two issues which should be considered in the design. One is parameter spreads generated by the fabrication process and the other is leakage currents between the gates. We have introduced a parameter optimization method to deal with the parameter spreads. We have also inserted three stages of JTLs to reduce leakage currents. We have experimentally confirmed the correct operations of these circuits. The obtained bias margins were 33.1% for the HA and 24.6% for the CSSA.

  • Logic Design of a Single-Flux-Quantum (SFQ) 22 Unit Switch for Banyan Networks

    Yoshio KAMEDA  Shinichi YOROZU  Shuichi TAHARA  

     
    PAPER-Digital Devices and Their Applications

      Vol:
    E85-C No:3
      Page(s):
    625-630

    We describe the logic design of a single-flux-quantum (SFQ) 22 unit switch. It is the main component of the SFQ Banyan packet switch we are developing that enables a switching capacity of over 1 Tbit/s. In this paper, we focus on the design of the controller in the unit switch. The controller does not have a simple "off-the-shelf" conventional circuit, like those used in shift registers or adders. To design such a complicated random logic circuit, we need to adopt a systematic top-down design approach. Using a graphical technique, we first obtained logic functions. Next, to use the deep pipeline architecture, we broke down the functions into one-level logic operations that can be executed within one clock cycle. Finally, we mapped the functions on to the physical circuits using pre-designed SFQ standard cells. The 22 unit switch consists of 59 logic gates and needs about 600 Josephson junctions without gate interconnections. We tested the gate-level circuit by logic simulation and found that it operates correctly at a throughput of 40 GHz.

  • A 500-MHz Embedded Out-of-Order Superscalar Microprocessor

    Masayuki DAITO  Kazumasa SUZUKI  Ken-ichi UEHIGASHI  Hiroshi MORITA  Hitoshi SONODA  Nobuhito MORIKAWA  Masatoshi MORIYAMA  Shoichiro SATO  Terumi FUKUDA  Saori NAKAMURA  

     
    INVITED PAPER

      Vol:
    E85-C No:2
      Page(s):
    243-252

    A MIPS-architecture-based embedded out-of-order superscalar microprocessor targeting broadband applications has been developed. Aggressive microarchitectures, such as superpipelining and out-of-order execution, have been applied to realize better performance scalability in order to fit with next-generation broadband applications. The chip includes a 32 K-Byte instruction cache, a 32 K-Byte data cache, 6 independent execution units, and has been designed using an ASIC-style design methodology on a 0.13-µm CMOS 5-layer aluminum technology. It can operate up to 500 MHz and achieves 1005 MIPS (Dhrystone 2.1) at 500-MHz operation.

  • High-Level Synthesis of Pipelined Circuits from Modular Queue-Based Specifications

    Maria-Cristina MARINESCU  Martin RINARD  

     
    PAPER-High Level Synthesis

      Vol:
    E84-A No:11
      Page(s):
    2655-2664

    This paper describes a novel approach to high-level synthesis of complex pipelined circuits, including pipelined circuits with feedback. This approach combines a high-level, modular specification language with an efficient implementation. In our system, the designer specifies the circuit as a set of independent modules connected by conceptually unbounded queues. Our synthesis algorithm automatically transforms this modular, asynchronous specification into a tightly coupled, fully synchronous implementation in synthesizable Verilog.

  • A Systolic Array RLS Processor

    Takahiro ASAI  Tadashi MATSUMOTO  

     
    PAPER-Terrestrial Radio Communications

      Vol:
    E84-B No:5
      Page(s):
    1356-1361

    This paper presents the outline of the systolic array recursive least-squares (RLS) processor prototyped primarily with the aim of broadband mobile communication applications. To execute the RLS algorithm effectively, this processor uses an orthogonal triangularization technique known in matrix algebra as QR decomposition for parallel pipelined processing. The processor board comprises 19 application-specific integrated circuit chips, each with approximately one million gates. Thirty-two bit fixed-point signal processing takes place in the processor, with which one cycle of internal cell signal processing requires approximately 500 nsec, and boundary cell signal processing requires approximately 80 nsec. The processor board can estimate up to 10 parameters. It takes approximately 35 µs to estimate 10 parameters using 41 known symbols. To evaluate signal processing performance of the prototyped systolic array processor board, processing time required to estimate a certain number of parameters using the prototyped board was comapred with using a digital signal processing (DSP) board. The DSP board performed a standard form of the RLS algorithm. Additionally, we conducted minimum mean-squared error adaptive array in-lab experiments using a complex baseband fading/array response simulator. In terms of parameter estimation accuracy, the processor is found to produce virtually the same results as a conventional software engine using floating-point operations.

  • A Pipeline Chip for Quasi Arithmetic Coding

    Yair WISEMAN  

     
    PAPER-Digital Signal Processing

      Vol:
    E84-A No:4
      Page(s):
    1034-1041

    A combination of a software and a systolic hardware implementation for the Quasi Arithmetic compression algorithm is presented. The hardware is implemented as a pipeline hardware implementation. The implementation doesn't change the the algorithm. It just split it into two parts. The combination of parallel software and pipeline hardware can give very fast compression without decline of the compression efficiency.

  • A Cascade ALU Architecture for Asynchronous Super-Scalar Processors

    Motokazu OZAWA  Masashi IMAI  Yoichiro UENO  Hiroshi NAKAMURA  Takashi NANYA  

     
    PAPER

      Vol:
    E84-C No:2
      Page(s):
    229-237

    Wire delays, instead of gate delays, are moving into dominance in modern VLSI design. Current synchronous processors have the critical path not in the ALU function but in the cache access. Since the cache performance enhancement is limited by the memory access delay which mainly consists of wire delays, a reduction in gate delays may no longer imply any enhancement in processor performance. To solve this problem, this paper presents a novel architecture, called the Cascade ALU. The Cascade ALU allows super-scalar processors with future technologies to move the critical path into the ALU part. Therefore the Cascade ALU can enjoy the expected progress in future device speed. Since the delay of the Cascade ALU varies depending on the executed instructions, an asynchronous system is shown to be suitable for implementing the Cascade ALU. However an asynchronous system may have a large handshake overhead, this paper also presents an asynchronous Fine Grain Pipeline technique that hides the handshake overhead. Finally, this paper presents results of performance and area evaluation for an asynchronous implementation of the cascade ALU. The results show that the cascade ALU architecture has a good performance scalability on the reduction of the ALU latency and imposes little area penalty compared with current synchronous processors.

  • An Efficient Implementation Method of a Metric Computation Accelerator for Fractal Image Compression Using Reconfigurable Hardware

    Hidehisa NAGANO  Akihiro MATSUURA  Akira NAGOYA  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E84-A No:1
      Page(s):
    372-377

    This paper proposes a method for implementing a metric computation accelerator for fractal image compression using reconfigurable hardware. The most time-consuming part in the encoding of this compression is computation of metrics among image blocks. In our method, each processing element (PE) configured for an image block accelerates these computations by pipeline processing. Furthermore, by configuring the PE for a specific image block, we can reduce the number of adders, which are the main computing elements, by a half even in the worst case.

  • Fast Implementation Technique for Improving Throughput of RLS Adaptive Filters

    Kiyoshi NISHIKAWA  Hitoshi KIYA  

     
    PAPER-Adaptive Signal Processing

      Vol:
    E83-A No:8
      Page(s):
    1545-1550

    This paper proposes a fast implementation technique for RLS adaptive filters. The technique has an adjustable parameter to trade the throughput and the rate of convergence of the filter according to the applications. The conventional methods for improving the throughput do not have this kind of adjustability so that the proposed technique will expand the area of applications for the RLS algorithm. We show that the improvement of the throughput can be easily achieved by rearranging the formula of the RLS algorithm and that there are no need for faster PEs for the improvement.

81-100hit(141hit)

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.