Xiaoying GAN Shiying SUN Wentao SONG Bo LIU
A novel threshold choosing method for the threshold-based skip mechanism is presented, in which the threshold is obtained from the analysis of the video device induced noise variance. Simulation results show that the proposed method can remarkably reduce the computation time consumption with only marginal performance penalty.
Min ZHU Leibo LIU Shouyi YIN Chongyong YIN Shaojun WEI
This paper introduces a cycle-accurate Simulator for a dynamically REconfigurable MUlti-media System, called SimREMUS. SimREMUS can either be used at transaction-level, which allows the modeling and simulation of higher-level hardware and embedded software, or at register transfer level, if the dynamic system behavior is desired to be observed at signal level. Trade-offs among a set of criteria that are frequently used to characterize the design of a reconfigurable computing system, such as granularity, programmability, configurability as well as architecture of processing elements and route modules etc., can be quickly evaluated. Moreover, a complete tool chain for SimREMUS, including compiler and debugger, is developed. SimREMUS could simulate 270 k cycles per second for million gates SoC (System-on-a-Chip) and produced one H.264 1080p frame in 15 minutes, which might cost days on VCS (platform: CPU: E5200@ 2.5 Ghz, RAM: 2.0 GB). Simulation showed that 1080p@30 fps of H.264 High Profile@ Level 4 can be achieved when exploiting a 200 MHz working frequency on the VLSI architecture of REMUS.
Xunchao CONG Guan GUI Keyu LONG Jiangbo LIU Longfei TAN Xiao LI Qun WAN
Synthetic aperture radar (SAR) imagery is significantly deteriorated by the random phase noises which are generated by the frequency jitter of the transmit signal and atmospheric turbulence. In this paper, we recast the SAR imaging problem via the phase-corrupted data as for a special case of quadratic compressed sensing (QCS). Although the quadratic measurement model has potential to mitigate the effects of the phase noises, it also leads to a nonconvex and quartic optimization problem. In order to overcome these challenges and increase reconstruction robustness to the phase noises, we proposed a QCS-based SAR imaging algorithm by greedy local search to exploit the spatial sparsity of scatterers. Our proposed imaging algorithm can not only avoid the process of precise random phase noise estimation but also acquire a sparse representation of the SAR target with high accuracy from the phase-corrupted data. Experiments are conducted by the synthetic scene and the moving and stationary target recognition Sandia laboratories implementation of cylinders (MSTAR SLICY) target. Simulation results are provided to demonstrate the effectiveness and robustness of our proposed SAR imaging algorithm.
Dajiang LIU Shouyi YIN Chongyong YIN Leibo LIU Shaojun WEI
Reconfigurable computing system is a class of parallel architecture with the ability of computing in hardware to increase performance, while remaining much of flexibility of a software solution. This architecture is particularly suitable for running regular and compute-intensive tasks, nevertheless, most compute-intensive tasks spend most of their running time in nested loops. Polyhedron model is a powerful tool to give a reasonable transformation on such nested loops. In this paper, a number of issues are addressed towards the goal of optimization of affine loop nests for reconfigurable cell array (RCA), such as approach to make the most use of processing elements (PE) while minimizing the communication volume by loop transformation in polyhedron model, determination of tilling form by the intra-statement dependence analysis and determination of tilling size by the tilling form and the RCA size. Experimental results on a number of kernels demonstrate the effectiveness of the mapping optimization approaches developed. Compared with DFG-based optimization approach, the execution performances of 1-d jacobi and matrix multiplication are improved by 28% and 48.47%. Lastly, the run-time complexity is acceptable for the practical cases.
Shouyi YIN Chongyong YIN Leibo LIU Min ZHU Shaojun WEI
Coarse-grained reconfigurable architecture (CGRA) combines the performance of application-specific integrated circuits (ASICs) and the flexibility of general-purpose processors (GPPs), which is a promising solution for embedded systems. With the increasing complexity of reconfigurable resources (processing elements, routing cells, I/O blocks, etc.), the reconfiguration cost is becoming the performance bottleneck. The major reconfiguration cost comes from the frequent memory-read/write operations for transferring the configuration context from main memory to context buffer. To improve the overall performance, it is critical to reduce the amount of configuration context. In this paper, we propose a configuration context reduction method for CGRA. The proposed method exploits the structure correlation of computation tasks that are mapped onto CGRA and reduce the redundancies in configuration context. Experimental results show that the proposed method can averagely reduce the configuration context size up to 71% and speed up the execution up to 68%. The proposed method does not depend on any architectural feature and can be applied to CGRA with an arbitrary architecture.
Sheng-Bin HU Zhi-Min YUAN Wei ZHANG Bo LIU Lei WAN Rui XIAN
The interaction between slider, lubricant and disk surface is becoming the most crucial robustness concern of advanced data storage systems. This paper reports comparative studies among various techniques for the measurement of head-disk spacing. It is noticed that the triple harmonic method gives a reading much closer to the reading of the head-disk spacing obtained optically at on-track center case, comparing with the PW50 method. Specially prepared disks with different carbon overcoat thickness (6.5 nm, 11 nm, 16 nm and 22 nm) were also used to study the reliability and repeatability of the triple harmonic method.
Jienan ZHANG Shouyi YIN Peng OUYANG Leibo LIU Shaojun WEI
In this paper we propose a method to use features of an individual object to locate and recognize this object concurrently in a static image with Multi-feature fusion based on multiple objects sample library. This method is proposed based on the observation that lots of previous works focuses on category recognition and takes advantage of common characters of special category to detect the existence of it. However, these algorithms cease to be effective if we search existence of individual objects instead of categories in complex background. To solve this problem, we abandon the concept of category and propose an effective way to use directly features of an individual object as clues to detection and recognition. In our system, we import multi-feature fusion method based on colour histogram and prominent SIFT (p-SIFT) feature to improve detection and recognition accuracy rate. p-SIFT feature is an improved SIFT feature acquired by further feature extraction of correlation information based on Feature Matrix aiming at low computation complexity with good matching rate that is proposed by ourselves. In process of detecting object, we abandon conventional methods and instead take full use of multi-feature to start with a simple but effective way-using colour feature to reduce amounts of patches of interest (POI). Our method is evaluated on several publicly available datasets including Pascal VOC 2005 dataset, Objects101 and datasets provided by Achanta et al.
Shouyi YIN Zhongfu SUN Leibo LIU Shaojun WEI
Motivated by the needs of modern agriculture, in this paper we present CropNET, a wireless multimedia sensor network system for agriculture monitoring. Both hardware and software designs of CropNET are tailored for sensing in wide farmland without human supervision. We have carried out multiple rounds of deployments. The evaluation results show that CropNET performs well and facilitates modern agriculture.
Peng OUYANG Shouyi YIN Leibo LIU Shaojun WEI
More and more mobile devices adopt multi-battery and dynamic voltage scaling policy (DVS) to reduce the energy consumption and extend the battery runtime. However, since the nonlinear characteristics of the multi-battery are not considered, the practical efficiency is not good enough. In order to reduce the energy consumption and extend the battery runtime, this paper proposes an approach based on the battery characteristics to implement the co-optimization of the multi-battery scheduling and dynamic voltage scaling on multi-battery powered systems. In this work, considering the nonlinear discharging characteristics of the existing batteries, we use the Markov process to depict the multi-battery discharging behavior, and build a multi-objective optimal model to denote the energy consumption and battery states, then propose a binary tree based algorithm to solve this model. By means of this method, we get an optimal and applicable scheme about multi-battery scheduling and dynamic voltage scaling. Experimental results show that this approach achieves an average improvement in battery runtime of 17.5% over the current methods in physical implementation.
An integrated slider-suspension system was designed and prototyped. The structure of this system has a full flying air-bearing surface in the leading part with a contamination-resistant feature, and it accommodates a slider with a 5-15 nm head-disk spacing at the trailing part. Performance analysis and simulation were conducted to validate the high performances of the design. Two key issues, the rigid motions (vibrations) and the elastic motions of the slider, were investigated systematically. For the rigid motions, it was found that the natural frequencies of the slider system are dependent on the disk contact stiffness and that the slider vibrations under excitation exhibit various nonlinear resonance. For the elastic motions, the average elastic response of the slider body under the random interaction of the interface was derived and characterized.
The effect of surface roughness is crucial for contact recording and proximity recording. In this paper a probability model is developed for investigation of the influence of surface roughness on flying performance and the contact force of the slider. Simulations are conducted for both the contact recording slider and the proximity recording slider, and the results are well coordinated with the reported experimental results and the self-conducted experimental results. Studies are further extended to the characterization of the roughness of the air bearing surface and the disk surface that may support head/disk spacing between 5 nm and 15 nm.
Bo LIU Yao-Long ZHU Ying-Hui LI
A head-disk spacing tester that includes the effect of lubricant will be necessary if the slider-disk interaction is to be considered. The interaction and interaction induced spacing variation can be quantitatively characterized by optical method and by replacing the functional disk media with a glass disk covered with a carbon layer and a lubricant layer of the same materials and the same layer thickness as the functional disk media. This paper reports a tester configuration based on that concept. Experimental investigations into the nanometer spaced head-disk interface with such a setup are presented also. Results indicate that the lubricant plays an important role in slider-disk interaction and the vibration of the slider-disk interface. Two types of interface vibration were noticed: contact vibration and bouncing vibration. For the bouncing case, the natural frequency of air-bearing and its fold frequencies will be excited and air-bearing plays more important role in the determination of the slider vibration, comparing with the contact-vibration case.
Load/unload techniques are widely used in mobile hard disk drives which have to endure external shocks frequently. ABS designs must consider both the load/unload performance and the shock resistance performance. Three ABS designs with different positions of the suction force center are studied in simulation. It is observed that when the position of the suction force center moves frontward, the anti-shock performance improves, but the unload performance degrades, and vice versa. A slider is not necessary to be designed to have its suction force center significantly behind of its geometric center, as the traditional load/unload sliders do. Instead, the suction force center can be designed near the geometric center if the hook limiter is used.
Tongsheng GENG Leibo LIU Shouyi YIN Min ZHU Shaojun WEI
This paper proposes approaches to perform HW/SW (Hardware/Software) partition and parallelization of computing-intensive tasks of the H.264 HiP (High Profile) decoding algorithm on an embedded coarse-grained reconfigurable multimedia system, called REMUS (REconfigurable MUltimedia System). Several techniques, such as MB (Macro-Block) based parallelization, unfixed sub-block operation etc., are utilized to speed up the decoding process, satisfying the requirements of real-time and high quality H.264 applications. Tests show that the execution performance of MC (Motion Compensation), deblocking, and IDCT-IQ (Inverse Discrete Cosine Transform-Inverse Quantization) on REMUS is improved by 60%, 73%, 88.5% in the typical case and 60%, 69%, 88.5% in the worst case, respectively compared with that on XPP PACT (a commercial reconfigurable processor). Compared with ASIC solutions, the performance of MC is improved by 70%, 74% in the typical and in the worst case, respectively, while those of Deblocking remain the same. As for IDCT_IQ, the performance is improved by 17% no matter in the typical or worst case. Relying on the proposed techniques, 1080p@30 fps of H.264 HiP@ Level 4 decoding could be achieved on REMUS when utilizing a 200 MHz working frequency.
Shouyi YIN Rui SHI Leibo LIU Shaojun WEI
Coarse-grained Reconfigurable Architecture (CGRA) is a parallel computing platform that provides both high performance of hardware and high flexibility of software. It is becoming a promising platform for embedded and mobile applications. Since the embedded and mobile devices are usually battery-powered, improving battery lifetime becomes one of the primary design issues in using CGRAs. In this paper, we propose a battery-aware task-mapping method to optimize energy consumption and improve battery lifetime. The proposed method mainly addresses two problems: task partitioning and task scheduling when mapping applications onto CGRA. The task partitioning and scheduling are formulated as a joint optimization problem of minimizing the energy consumption. The nonlinear effects of real battery are taken into account in problem formulation. Using the insights from the problem formulation, we design the task-mapping algorithm. We have used several real-world benchmarks to test the effectiveness of the proposed method. Experiment results show that our method can dramatically lower the energy consumption and prolong the battery-life.
Zhen ZHANG Shouyi YIN Leibo LIU Shaojun WEI
TSV-interconnected 3D chips face problems such as high cost, low yield and large power dissipation. We propose a wireless 3D on-chip-network architecture for application-specific SoC design, using inductive-coupling interconnect instead of TSV for inter-layer communication. Primary design challenge of inductive-coupling 3D SoC is allocating wireless links in the 3D on-chip network effectively. We develop a design flow fully exploiting the design space brought by wireless links while providing flexible tradeoff for user's choice. Experimental results show that our design brings great improvement over uniform design and Sunfloor algorithm on latency (5% to 20%) and power consumption (10% to 45%).
Leibo LIU Dong WANG Yingjie CHEN Min ZHU Shouyi YIN Shaojun WEI
This paper presents the design of a multiple-standard 1080 high definition (HD) video decoder on a mixed-grained reconfigurable computing platform integrating coarse-grained reconfigurable processing units (RPUs) and FPGAs. The proposed RPU, including 16×16 multi-functional processing elements (PEs), is used to accelerate compute-intensive tasks in the video decoding. A soft-core-based microprocessor array is implemented on the FPGA and adopted to speed-up the dynamic reconfiguration of the RPU. Furthermore, a mail-box-based communication scheme is utilized to improve the communication efficiency between RPUs and FPGAs. By exploiting dynamic reconfiguration of the RPUs and static reconfiguration of the FPGAs, the proposed platform achieves scalable performances and cost trade-offs to support a variety of video coding standards, including MPEG-2, AVS, H.264, and HEVC. The measured results show that the proposed platform can support H.264 1080 HD video streams at up to 57 frames per second (fps) and HEVC 1080 HD video streams at up to 52fps under 250MHz, at the same time, it achieves a 3.6× performance gain over an industrial coarse-grained reconfigurable processor for H.264 decoding, and a 6.43× performance boosts over a general purpose processor based implementation for HEVC decoding.
Bo LIU Junzhou LUO Feng SHAN Wei LI Jiahui JIN Xiaojun SHEN
Provisioning multiple paths can improve fault tolerance and transport capability of multi-routing in wireless networks. Disjoint paths can improve the diversity of paths and further reduce the risk of simultaneous link failure and network congestion. In this paper we first address a many-to-one disjoint-path problem (MOND) for multi-path routing in a multi-hop wireless network. The objective of this problem is to maximize the minimum number of disjoint paths of every source to the destination. We prove that it is NP-hard to obtain k disjoint paths for every source when k ≥ 3. To solve this problem efficiently, we propose a heuristic algorithm called TOMAN based on network flow theory. Experimental results demonstrate that it outperforms three related algorithms.
Yu PENG Shouyi YIN Leibo LIU Shaojun WEI
Coarse-grained Reconfigurable Architecture (CGRA) is a promising mobile computing platform that provides both high performance and high energy efficiency. In an application, loop nests are usually mapped onto CGRA for further acceleration, so optimizing the mapping is an important goal for design of CGRAs. Moreover, obviously almost all of mobile devices are powered by batteries, how to reduce energy consumption also becomes one of primary concerns in using CGRAs. This paper makes three contributions: a) Proposing an energy consumption model for CGRA; b) Formulating loop nests mapping problem to minimize the battery charge loss; c) Extract an efficient heuristic algorithm called BPMap. Experiment results on most kernels of the benchmarks and real-life applications show that our methods can improve the performance of the kernels and lower the energy consumption.
Peng OUYANG Shouyi YIN Hui GAO Leibo LIU Shaojun WEI
Scale Invariant Feature Transform (SIFT) algorithm is a very excellent approach for feature detection. It is characterized by data intensive computation. The current studies of accelerating SIFT algorithm are mainly reflected in three aspects: optimizing the parallel parts of the algorithm based on general-purpose multi-core processors, designing the customized multi-core processor dedicated for SIFT, and implementing it based on the FPGA platform. The real-time performance of SIFT has been highly improved. However, the factors such as the input image size, the number of octaves and scale factors in the SIFT algorithm are restricted for some solutions, the flexibility that ensures the high execution performance under variable factors should be improved. This paper proposes a reconfigurable solution to solve this problem. We fully exploit the algorithm and adopt several techniques, such as full parallel execution, block computation and CORDIC transformation, etc., to improve the execution efficiency on a REconfigurable MUltimedia System called REMUS. Experimental results show that the execution performance of the SIFT is improved by 33%, 50% and 8 times comparing with that executed in the multi-core platform, FPGA and ASIC separately. The scheme of dynamic reconfiguration in this work can configure the circuits to meet the computation requirements under different input image size, different number of octaves and scale factors in the process of computing.