IEICE globals.ieice.org Site

Keyword Search Result

[Keyword] pipeline(141hit)

81-100hit(141hit)

A Low Cost Reconfigurable Architecture for a UMTS Receiver
Ronny VELJANOVSKI Aleksandar STOJCEVSKI Jugdutt SINGH Aladin ZAYEGH Michael FAULKNER

PAPER

Vol:
E86-B No:12
Page(s):
3441-3451
A novel reconfigurable architecture has been proposed for a mobile terminal receiver that can drastically reduce power dissipation dependant on adjacent channel interference. The proposed design can automatically scale the number of filter coefficients and word length respectively by monitoring the in-band and out-of-band powers. The new architecture performance was evaluated in a simulation UTRA-TDD environment because of the large near far problem caused by adjacent channel interference from adjacent mobiles and base stations. The UTRA-TDD downlink mode was examined statistically and results show that the reconfigurable architectures can save an average of up to 75% power dissipation respectively when compared to a fixed filter length of 57 and word length of 16 bits. This power saving only applies to the filter and ADC, not the whole receiver. This will prolong talk and standby time in a mobile terminal. The average number of taps and bits were calculated to be 14.98 and 10 respectively, for an outage of 97%.
On Practical Implementation of the PIC Algorithm in Asynchronous CDMA Systems
Young Wha KIM Sung Ho CHO

LETTER-Wireless Communication Technology

Vol:
E86-B No:8
Page(s):
2508-2511
In this letter, we present a practical method of implementing the parallel interference cancellation (PIC) algorithm in an asynchronous CDMA system. A novel pipelined structure is employed in this method in order to reduce the processing delay and the memory space comparing to the conventional PIC processing scheme.
A High Throughput Pipelined Architecture for Blind Adaptive Equalizer with Minimum Latency
Masashi MIZUNO James OKELLO Hiroshi OCHI

PAPER

Vol:
E86-A No:8
Page(s):
2011-2019
In this paper, we propose a pipelined architecture for an equalizer based on the Multilevel Modified Constant Modulus Algorithm (MMCMA). We also provide the correction factor that mathematically converts the proposed pipelined adaptive equalizer into an equivalent non-pipelined conventional MMCMA based equalizer. The proposed method of pipelining uses modules with 6 filter coefficients, resulting in an overall latency of a single sampling period, along the main transmission line. The basic concept of the proposed architecture is to implement the Finite Impulse Response (FIR) filter and the algorithm portion of the adaptive equalizer, such that the critical path of the whole circuit has a maximum of three complex multipliers and three adders.
A Pipeline Structure for High-Speed Step-by-Step RS Decoding
Tung-Chou CHEN Che-Ho WEI Shyue-Win WEI

LETTER-Fundamental Theories

Vol:
E86-B No:2
Page(s):
847-849
Based on a modified step-by-step decoding procedure, a high-speed pipelined Reed-Solomon decoder is presented. The decoder requires only the delay time of three 2-input XOR gates for decoding each coded symbol. The decoder can be operated in a bit rate of Gbits/sec order and thus suitable for the very high speed data transmission systems.
Data Transfer Time by HTTP 1.0/1.1 on Asymmetric Networks Composed of Satellite and Terrestrial Links
Hiroyasu OBATA Kenji ISHIDA Junichi FUNASAKA Kitsutaro AMANO

PAPER-Internet

Vol:
E85-B No:12
Page(s):
2895-2903
Asymmetric networks, which provide asymmetric bandwidth or delay for upstream and downstream transfer, have recently gained much attention since they support popular applications such as the World Wide Web (WWW). HTTP (Hypertext Transfer Protocol) is the basis of most WWW services so, evaluating the performance of HTTP on asymmetric networks is increasingly important, particularly real-world networks. However, the performance of HTTP on the asymmetric networks composed of satellite and terrestrial links has not sufficiently evaluated. This paper proposes new formulas to evaluate the performance of both HTTP1.0 and HTTP1.1 on asymmetric networks. Using these formulas, we calculate the time taken to transfer web data by HTTP1.0/1.1. The calculation results are compared to the results of an existing theoretical formula and experimental results gained from a system that combines a VSAT (Very Small Aperture Terminal) satellite communication system for satellite links (downstream) and the Internet for terrestrial links (upstream). The comparison shows that the proposed formulas yield more accurate results (compared to the measured values) than the existing formula. Furthermore, this paper proposes an evaluation formula for pipelined HTTP1.1, and shows that the values output by the proposed formula agree with those obtained by experiments (on the VSAT system) and simulations.
Design Exploration of an Industrial Embedded Microcontroller: Performance, Cost and Software Compatibility
Ing-Jer HUANG Li-Rong WANG Yu-Min WANG Tai-An LU

PAPER-VLSI Design

Vol:
E85-A No:12
Page(s):
2624-2635
This paper presents a case study of synthesis of the industrial embedded microcontroller HT48100 and analysis of performance, cost and software compatibility for its implementation alternatives, using the hardware/software co-design system for microcontrollers/microprocessors PIPER-II. The synthesis tool accepts as input the instruction set architecture (behavioral) specification, and produces as outputs the pipelined RTL designs with their simulators, and the reordering constraints which guide the compiler backend to optimize the code for the synthesized designs. A compiler backend is provided to optimize the application software according to the reordering constraints. The study shows that the co-design approach was able to help the original design team to analyze the architectural properties, identify inefficient architecture features, and explore possible architectural improvements and their impacts in both hardware and software. Feasible future upgrades for the microcontroller family have been identified by the study.
Pipelined Simple Matching for Input Buffered Switches
Man-Soo HAN Bongtae KIM

LETTER-Antenna and Propagation

Vol:
E85-B No:11
Page(s):
2539-2543
We present pipelined simple matching, called PSM, for an input buffered switch to relax the scheduling timing constraint by modifying pipelined maximal-sized matching (PMM). Like the pipelined manner of PMM, to produce the matching results in every time slot, PSM employs multiple subschedulers which take more than one time slot to complete matching. Using only head-of-line information of input buffers, PSM successively sends each request to all subschedulers to provide a better matching opportunity. To obtain better performance, PSM uses unique starting points of scheduling pointers in which the difference between the starting points is equal for any two adjacent subschedulers for a same output. Using computer simulations under a uniform traffic, we show PSM is more appropriate than PMM for pipelined scheduling of an input buffered switch.
A Digital Calibration Technique of Capacitor Mismatch for Pipelined Analog-to-Digital Converters
Masanori FURUTA Shoji KAWAHITO Daisuke MIYAZAKI

PAPER

Vol:
E85-C No:8
Page(s):
1562-1568
A digital calibration technique, which corrects errors due to capacitor mismatch in pipelined ADC and directly measures the error coefficients using the ADC INL plot, is described. The proposed technique can be applied for various types of pipelined ADC architectures. Test results using an implemented 10-bit pipelined ADC show that the ADC achieves a peak signal-to-noise-and-distortion ratio of 56.5 dB, a peak integral non-linearity of 0.3 LSB, and a peak differential non-linearity of 0.3 LSB using the digital calibration.
A 3.2-mA 6-Bit Pipelined A/D Coverter for a Bluetooth RF Transceiver
Tatsuji MATSUURA Junya KUDOH Eiki IMAIZUMI

PAPER

Vol:
E85-C No:8
Page(s):
1538-1545
A low-power-consumption 6-bit pipelined analog-to-digital converter for use in a BluetoothTM RF transceiver has been developed. The RF transceiver chip was fabricated using a 0.35-µm BiCMOS process, and the A/D converter is based on CMOS technology for digital logic. To reduce the power consumption of the converter, we used a look-ahead pipeline architecture to reduce the required settling time of an amplifier in the critical path of the converter. We show that through this reduction, amplifier power consumption of 600 µA can be reduced to 250 µA to achieve a 13-MHz conversion rate. We have also developed a low-power two-capacitor switched-capacitor common-mode feedback circuit which enables an offset cancellation of an amplifier during the reset phase. Offset cancellation is used in each stage of the S/H amplifier to reduce the overall offset of the converter. It achieves an effective number of bits of 5.7 at a conversion rate of 13 Msps and 5.0 at 26 Msps. The residual offset of the converter is only 4 mV. It has a low total current consumption of 3.2 mA at 13 Msps and a supply voltage of 2.8 V.
A Pipelined Maximal-Sized Matching Scheme for High-Speed Input-Buffered Switches
Eiji OKI Roberto ROJAS-CESSA H. Jonathan CHAO

PAPER-Switching

Vol:
E85-B No:7
Page(s):
1302-1311
This paper proposes an innovative Pipeline-based Maximal-sized Matching scheduling approach, called PMM, for input-buffered switches. It dramatically relaxes the limitation of a single time slot for completing a maximal matching into any number of time slots. In the PMM approach, arbitration is operated in a pipelined manner, where K subschedulers are used. Each subscheduler is allowed to take more than one time slot for its matching. Every time slot, one of the subschedulers provides the matching result. We adopt an extended version of Dual Round-Robin Matching (DRRM), called iterative DRRM (iDRRM), as a maximal matching algorithm in a subscheduler. PMM maximizes the efficiency of the adopted arbitration scheme by allowing sufficient time for the number of iterations. We show that PMM preserves 100% throughput under uniform traffic and fairness for best-effort traffic of the non-pipelined adopted algorithm, while ensuring that cells from the same virtual output queue (VOQ) are transmitted in sequence. In addition, we confirm that the delay performance of PMM is not significantly degraded by increasing the pipeline degree, or the number of subschedulers, when the number of outstanding requests for each subscheduler from a VOQ is limited to 1.
Assignment-Driven Loop Pipeline Scheduling and Its Application to Data-Path Synthesis
Toshiyuki YOROZUYA Koji OHASHI Mineo KANEKO

PAPER

Vol:
E85-A No:4
Page(s):
819-826
In this paper, we study loop pipeline scheduling problem under given resource assignment (operation to functional unit assignments and data to register assignments), which is one of the key tasks in data-path synthesis based on the assignment solution space exploration. We show an approach using a precedence constraint graph with parametric disjunctive arcs generated from the specified assignment information, and derive a scheduling method using branch-and-bound exploration of the parameter space. As an application of the proposed scheduling method, it is incorporated with Simulated-Annealing (SA) based exploration of assignment solution space, and it is demonstrated that data-paths of the fifth-order elliptic wave filter are successfully synthesized.
Design and Demonstration of Pipelined Circuits Using SFQ Logic
Akira AKAHORI Akito SEKIYA Takahiro YAMADA Akira FUJIMAKI Hisao HAYAKAWA

PAPER-Digital Devices and Their Applications

Vol:
E85-C No:3
Page(s):
641-644
We have designed the Half Adder (HA) circuit and the Carry Save Serial Adder (CSSA) circuit based on pipeline architecture. Our HA has the structure of a two-stage pipeline and consists of 160 Josephson Junctions (JJs). Our CSSA has the structure of a four-stage pipeline with a feedback loop and consists of 360 JJs. These circuits were fabricated by the NEC standard process. There are two issues which should be considered in the design. One is parameter spreads generated by the fabrication process and the other is leakage currents between the gates. We have introduced a parameter optimization method to deal with the parameter spreads. We have also inserted three stages of JTLs to reduce leakage currents. We have experimentally confirmed the correct operations of these circuits. The obtained bias margins were 33.1% for the HA and 24.6% for the CSSA.
Logic Design of a Single-Flux-Quantum (SFQ) 22 Unit Switch for Banyan Networks
Yoshio KAMEDA Shinichi YOROZU Shuichi TAHARA

PAPER-Digital Devices and Their Applications

Vol:
E85-C No:3
Page(s):
625-630
We describe the logic design of a single-flux-quantum (SFQ) 22 unit switch. It is the main component of the SFQ Banyan packet switch we are developing that enables a switching capacity of over 1 Tbit/s. In this paper, we focus on the design of the controller in the unit switch. The controller does not have a simple "off-the-shelf" conventional circuit, like those used in shift registers or adders. To design such a complicated random logic circuit, we need to adopt a systematic top-down design approach. Using a graphical technique, we first obtained logic functions. Next, to use the deep pipeline architecture, we broke down the functions into one-level logic operations that can be executed within one clock cycle. Finally, we mapped the functions on to the physical circuits using pre-designed SFQ standard cells. The 22 unit switch consists of 59 logic gates and needs about 600 Josephson junctions without gate interconnections. We tested the gate-level circuit by logic simulation and found that it operates correctly at a throughput of 40 GHz.
A 500-MHz Embedded Out-of-Order Superscalar Microprocessor
Masayuki DAITO Kazumasa SUZUKI Ken-ichi UEHIGASHI Hiroshi MORITA Hitoshi SONODA Nobuhito MORIKAWA Masatoshi MORIYAMA Shoichiro SATO Terumi FUKUDA Saori NAKAMURA

INVITED PAPER

Vol:
E85-C No:2
Page(s):
243-252
A MIPS-architecture-based embedded out-of-order superscalar microprocessor targeting broadband applications has been developed. Aggressive microarchitectures, such as superpipelining and out-of-order execution, have been applied to realize better performance scalability in order to fit with next-generation broadband applications. The chip includes a 32 K-Byte instruction cache, a 32 K-Byte data cache, 6 independent execution units, and has been designed using an ASIC-style design methodology on a 0.13-µm CMOS 5-layer aluminum technology. It can operate up to 500 MHz and achieves 1005 MIPS (Dhrystone 2.1) at 500-MHz operation.
High-Level Synthesis of Pipelined Circuits from Modular Queue-Based Specifications
Maria-Cristina MARINESCU Martin RINARD

PAPER-High Level Synthesis

Vol:
E84-A No:11
Page(s):
2655-2664
This paper describes a novel approach to high-level synthesis of complex pipelined circuits, including pipelined circuits with feedback. This approach combines a high-level, modular specification language with an efficient implementation. In our system, the designer specifies the circuit as a set of independent modules connected by conceptually unbounded queues. Our synthesis algorithm automatically transforms this modular, asynchronous specification into a tightly coupled, fully synchronous implementation in synthesizable Verilog.
A Systolic Array RLS Processor
Takahiro ASAI Tadashi MATSUMOTO

PAPER-Terrestrial Radio Communications

Vol:
E84-B No:5
Page(s):
1356-1361
This paper presents the outline of the systolic array recursive least-squares (RLS) processor prototyped primarily with the aim of broadband mobile communication applications. To execute the RLS algorithm effectively, this processor uses an orthogonal triangularization technique known in matrix algebra as QR decomposition for parallel pipelined processing. The processor board comprises 19 application-specific integrated circuit chips, each with approximately one million gates. Thirty-two bit fixed-point signal processing takes place in the processor, with which one cycle of internal cell signal processing requires approximately 500 nsec, and boundary cell signal processing requires approximately 80 nsec. The processor board can estimate up to 10 parameters. It takes approximately 35 µs to estimate 10 parameters using 41 known symbols. To evaluate signal processing performance of the prototyped systolic array processor board, processing time required to estimate a certain number of parameters using the prototyped board was comapred with using a digital signal processing (DSP) board. The DSP board performed a standard form of the RLS algorithm. Additionally, we conducted minimum mean-squared error adaptive array in-lab experiments using a complex baseband fading/array response simulator. In terms of parameter estimation accuracy, the processor is found to produce virtually the same results as a conventional software engine using floating-point operations.
A Pipeline Chip for Quasi Arithmetic Coding
Yair WISEMAN

PAPER-Digital Signal Processing

Vol:
E84-A No:4
Page(s):
1034-1041
A combination of a software and a systolic hardware implementation for the Quasi Arithmetic compression algorithm is presented. The hardware is implemented as a pipeline hardware implementation. The implementation doesn't change the the algorithm. It just split it into two parts. The combination of parallel software and pipeline hardware can give very fast compression without decline of the compression efficiency.
A Cascade ALU Architecture for Asynchronous Super-Scalar Processors
Motokazu OZAWA Masashi IMAI Yoichiro UENO Hiroshi NAKAMURA Takashi NANYA

PAPER

Vol:
E84-C No:2
Page(s):
229-237
Wire delays, instead of gate delays, are moving into dominance in modern VLSI design. Current synchronous processors have the critical path not in the ALU function but in the cache access. Since the cache performance enhancement is limited by the memory access delay which mainly consists of wire delays, a reduction in gate delays may no longer imply any enhancement in processor performance. To solve this problem, this paper presents a novel architecture, called the Cascade ALU. The Cascade ALU allows super-scalar processors with future technologies to move the critical path into the ALU part. Therefore the Cascade ALU can enjoy the expected progress in future device speed. Since the delay of the Cascade ALU varies depending on the executed instructions, an asynchronous system is shown to be suitable for implementing the Cascade ALU. However an asynchronous system may have a large handshake overhead, this paper also presents an asynchronous Fine Grain Pipeline technique that hides the handshake overhead. Finally, this paper presents results of performance and area evaluation for an asynchronous implementation of the cascade ALU. The results show that the cascade ALU architecture has a good performance scalability on the reduction of the ALU latency and imposes little area penalty compared with current synchronous processors.
An Efficient Implementation Method of a Metric Computation Accelerator for Fractal Image Compression Using Reconfigurable Hardware
Hidehisa NAGANO Akihiro MATSUURA Akira NAGOYA

LETTER-VLSI Design Technology and CAD

Vol:
E84-A No:1
Page(s):
372-377
This paper proposes a method for implementing a metric computation accelerator for fractal image compression using reconfigurable hardware. The most time-consuming part in the encoding of this compression is computation of metrics among image blocks. In our method, each processing element (PE) configured for an image block accelerates these computations by pipeline processing. Furthermore, by configuring the PE for a specific image block, we can reduce the number of adders, which are the main computing elements, by a half even in the worst case.
Fast Implementation Technique for Improving Throughput of RLS Adaptive Filters
Kiyoshi NISHIKAWA Hitoshi KIYA

PAPER-Adaptive Signal Processing

Vol:
E83-A No:8
Page(s):
1545-1550
This paper proposes a fast implementation technique for RLS adaptive filters. The technique has an adjustable parameter to trade the throughput and the rate of convergence of the filter according to the applications. The conventional methods for improving the throughput do not have this kind of adjustability so that the proposed technique will expand the area of applications for the RLS algorithm. We show that the improvement of the throughput can be easily achieved by rearranging the formula of the RLS algorithm and that there are no need for faster PEs for the improvement.

81-100hit(141hit)

Keyword Search Result

[Keyword] pipeline(141hit)

A Low Cost Reconfigurable Architecture for a UMTS Receiver

On Practical Implementation of the PIC Algorithm in Asynchronous CDMA Systems

A High Throughput Pipelined Architecture for Blind Adaptive Equalizer with Minimum Latency

A Pipeline Structure for High-Speed Step-by-Step RS Decoding

Data Transfer Time by HTTP 1.0/1.1 on Asymmetric Networks Composed of Satellite and Terrestrial Links

Design Exploration of an Industrial Embedded Microcontroller: Performance, Cost and Software Compatibility

Pipelined Simple Matching for Input Buffered Switches

A Digital Calibration Technique of Capacitor Mismatch for Pipelined Analog-to-Digital Converters

A 3.2-mA 6-Bit Pipelined A/D Coverter for a Bluetooth RF Transceiver

A Pipelined Maximal-Sized Matching Scheme for High-Speed Input-Buffered Switches

Assignment-Driven Loop Pipeline Scheduling and Its Application to Data-Path Synthesis

Design and Demonstration of Pipelined Circuits Using SFQ Logic

Logic Design of a Single-Flux-Quantum (SFQ) 22 Unit Switch for Banyan Networks

A 500-MHz Embedded Out-of-Order Superscalar Microprocessor

High-Level Synthesis of Pipelined Circuits from Modular Queue-Based Specifications

A Systolic Array RLS Processor

A Pipeline Chip for Quasi Arithmetic Coding

A Cascade ALU Architecture for Asynchronous Super-Scalar Processors

An Efficient Implementation Method of a Metric Computation Accelerator for Fractal Image Compression Using Reconfigurable Hardware

Fast Implementation Technique for Improving Throughput of RLS Adaptive Filters

Latest Issue

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles