1-6hit |
Gi-Ho PARK Kil-Whan LEE Tack-Don HAN Shin-Dug KIM
This paper presents a dual data cache system structure, called a cooperative cache system, that is designed as a low power cache structure for embedded processors. The cooperative cache system consists of two caches, i.e., a direct-mapped temporal oriented cache (TOC) and a four-way set-associative spatial oriented cache (SOC). The cooperative cache system achieves improvement in performance and reduction in power consumption by virtue of the structural characteristics of the two caches designed inherently to help each other. An evaluation chip of an embedded processor having the cooperative cache system is manufactured by Samsung Electronics Co. with 0.25 µm 4-metal process technology.
Woo-Chan PARK Shi-Wha LEE Oh-Young KWON Tack-Don HAN Shin-Dug KIM
A model for the floating point adder/subtractor which can perform rounding and addition/subtraction operations in parallel is presented. The major requirements and structure to achieve this goal are described and algebraically verified. Processing flow of the conventional floating point addition/subtraction operation consists of alignment, addition/subtraction, normalization, and rounding stages. In general, the rounding stage requires a high speed adder for increment, increasing the overall execution time and occupying a large amount of chip area. Furthermore, it accompanies additional execution time and hardware logics for renormalization stage which may occur by an overflow from the rounding operation. A floating adder/subtractor performing addition/subtraction and IEEE rounding in parallel is designed by optimizing the operational flow of floating point addition/subtraction operation. The floating point adder/subtractor presented does not require any additional execution time nor any high speed adder for rounding operation. In addition, the renormalization step is not required because the rounding step is performed prior to the normalization operation. Thus, performance improvement and cost-effective design can be achieved by this approach.
Won-Jong LEE Vason P. SRINI Woo-Chan PARK Shigeru MURAKI Tack-Don HAN
We present an adaptive dynamic load balancing scheme for 3D texture based sort-last parallel volume rendering on a PC cluster equipped with GPUs. Our scheme exploits not only task parallelism but also data parallelism during rendering by combining the hierarchical data structures (octree and parallel BSP tree) in order to skip empty regions and distribute proper workloads to rendering nodes. Our scheme can also conduct a valid parallel rendering and image compositing in visibility order by employing a 3D clustering algorithm. To alleviate the imbalance when the transfer function is changed, a load rebalancing is inexpensively supported by exchanging only needed data. A detailed performance analysis is provided and scaling characteristics of our scheme are discussed. These show that our scheme can achieve significant performance gains by increasing parallelism and decreasing synchronizing costs compared to the traditional static distribution schemes.
Yong-Jin PARK Woo-Chan PARK Jun-Hyun BAE Jinhong PARK Tack-Don HAN
In this paper, we proposed that an area- and speed-effective fixed-point pipelined divider be used for reducing the bit-width of a division unit to fit a mobile rendering processor. To decide the bit-width of a division unit, error analysis has been carried out in various ways. As a result, when the original bit-width was 31-bit, the proposed method reduced the bit-width to 24-bit and reduced the area by 42% with a maximum error of 0.00001%.
Hyung-Min YOON Woo-Shik KANG Oh-Young KWON Seong-Hun JEONG Bum-Seok KANG Tack-Don HAN
New service concepts involving mobile devices with a diverse range of embedded sensors are emerging that share contexts supporting communication on a wireless network infrastructure. To promote these services in mobile devices, we propose a method that can efficiently detect a context provider by partitioning the location, time, speed, and discovery sensitivities.
Woo-Chan PARK Cheol-Ho JEONG Tack-Don HAN
The format conversion operations between a floating-point number and an integer number and a round operation are the important standard floating-point operations. In most cases, these operations are implemented by adding additional hardware to the floating-point adder. The SR (simultaneous rounding) method, one of the techniques used to improve the performance of the floating-point adder, can perform addition and rounding operations at the same stage and is an efficient method with respect to the silicon area and its performance. In this paper, a hardware model to execute CRops (conversion and rounding operations) for the SR floating-point adder is presented and CRops are analyzed on the proposed hardware model. Implementation details are also discussed. The proposed scheme can maintain the advantages of the SR method and can perform each CRop with three pipeline stages.