Zhengxue CHENG Heming SUN Dajiang ZHOU Shinji KIMURA
High Efficiency Video Coding (HEVC/H.265) obtains 50% bit rate reduction than H.264/AVC standard with comparable quality at the cost of high computational complexity. Merge mode is one of the most important new features introduced in HEVC's inter prediction. Merge mode and traditional inter mode consume about 90% of the total encoding time. To address this high complexity, this paper utilizes the merge mode to accelerate inter prediction by four strategies. 1) A merge candidate decision is proposed by the sum of absolute transformed difference (SATD) cost. 2) An early merge termination is presented with more than 90% accuracy. 3) Due to the compensation effect of merge candidates, symmetric motion partition (SMP) mode is disabled for non-8×8 coding units (CUs). 4) A fast coding unit filtering strategy is proposed to reduce the number of CUs which need to be fine-processed. Experimental results demonstrate that our fast strategies can achieve 35.4%-58.7% time reduction with 0.68%-1.96% BD-rate increment in RA case. Compared with similar works, the proposed strategies are not only among the best performing in average-case complexity reduction, but also notably outperforming in the worst cases.
Lianming SUN Yuanming DING Akira SANO
The paper is concerned with an identification-based predistortion scheme for compensating nonlinearity of high power amplifiers (HPA). The identification algorithms for the Wiener-Hammerstein nonlinear model are developed in the frequency domain. By approximately modeling the nonlinear distortion part in HPA by polynomial or spline functions, and introducing linear distortion parts in the input and output of the nonlinear element, the iterative identification schemes are proposed to estimate all the uncertain parameters and to construct an inverse system for the predistortion.
Heming SUN Dajiang ZHOU Shuping ZHANG Shinji KIMURA
In this paper, we present a low-power system for the de-quantization and inverse transform of HEVC. Firstly, we present a low-delay circuit to process the coded results of the syntax elements, and then reduce the number of multipliers from 16 to 4 for the de-quantization process of each 4x4 block. Secondly, we give two efficient data mapping schemes for the memory between de-quantization and inverse transform, and the memory for transpose. Thirdly, the zero information is utilized through the whole system. For two memory parts, the write and read operation of zero blocks/ rows/ coefficients can all be skipped to save the power consumption. The results show that up to 86% power consumption can be saved for the memory part under the configuration of “Random-access” and common QPs. For the logical part, the proposed architecture for de-quantization can reduce 77% area consumption. Overall, our system can support real-time coding for 8K x 4K 120fps video sequences and the normalized area consumption can be reduced by 68% compared with the latest work.
Heming SUN Dajiang ZHOU Peilin LIU Satoshi GOTO
In this paper, we present an area-efficient 4/8/16/32-point inverse discrete cosine transform (IDCT) architecture for a HEVC decoder. Compared with previous work, this work reduces the hardware cost from two aspects. First, we reduce the logical costs of 1D IDCT by proposing a reordered parallel-in serial-out (RPISO) scheme. By using the RPISO scheme, we can reduce the required calculations for butterfly inputs in each cycle. Secondly, we reduce the area of transpose architecture by proposing a cyclic data mapping scheme that can achieve 100% I/O utilization of each SRAM. To design a fully pipelined 2D IDCT architecture, we propose a pipelining schedule for row and column transform. The results show that the normalized area by maximum throughput for the logical IDCT part can be reduced by 25%, and the memory area can be reduced by 62%. The maximum throughput reaches 1248 Mpixels/s, which can support real-time decoding of a 4K × 2K 60fps video sequence.
Jiu XU Ning JIANG Wenxin YU Heming SUN Satoshi GOTO
In this paper, a feature named Non-Redundant Gradient Semantic Local Binary Patterns (NRGSLBP) is proposed for human detection as a modified version of the conventional Semantic Local Binary Patterns (SLBP). Calculations of this feature are performed for both intensity and gradient magnitude image so that texture and gradient information are combined. Moreover, and to the best of our knowledge, non-redundant patterns are adopted on SLBP for the first time, allowing better discrimination. Compared with SLBP, no additional cost of the feature dimensions of NRGSLBP is necessary, and the calculation complexity is considerably smaller than that of other features. Experimental results on several datasets show that the detection rate of our proposed feature outperforms those of other features such as Histogram of Orientated Gradient (HOG), Histogram of Templates (HOT), Bidirectional Local Template Patterns (BLTP), Gradient Local Binary Patterns (GLBP), SLBP and Covariance matrix (COV).
Jian H. ZHAO Kuang SHENG Yongxi ZHANG Ming SU
This paper will review the development of SiC power devices especially SiC power junction field-effect transistors (JFETs). Rationale and different approaches to the development of SiC power JFETs will be presented, focusing on normally-OFF power JFETs that can provide the highly desired fail-save feature for reliable power switching applications. New results for the first demonstration of SiC Power ICs will be presented and the potential for distributed DC-DC power converters at frequencies higher than 35 MHz will be discussed.
Lu SHEN Shifang FENG Jinjin SUN Zhongwei LI Ming SU Gang WANG Xiaoguang LIU
With the increase of data quantity, people have begun to attach importance to cloud storage. However, numerous security accidents occurred to cloud servers recently, thus triggering thought about the security of traditional single cloud. In other words, traditional single cloud can't ensure the privacy of users' data to a certain extent. To solve those security issues, multi-cloud systems which spread data over multiple cloud storage servers emerged. They employ a series of erasure codes and other keyless dispersal algorithms to achieve high-level security. But non-systematic codes like RS require relatively complex arithmetic, and systematic codes have relatively weaker security. In terms of keyless dispersal algorithms, they avoid key management issues but not suit to complete parallel optimization or deduplication which is important to limited cloud storage resources. So in this paper, we design a new kind of XOR-based non-systematic erasure codes - Privacy Protecting Codes (PPC) and a SIMD encoding algorithm for better performance. To achieve higher-level security, we put forward a novel deduplication-friendly dispersal algorithm called Hash Cyclic Encryption-PPC (HCE-PPC) which can achieve complete parallelization. With these new technologies, we present a multi-cloud storage system called CloudS. For better user experience and the tradeoffs between security and performance, CloudS provides multiple levels of security by various combinations of compression, encryption and coding schemes. We implement CloudS as a web application which doesn't require users to perform complicated operations on local.
Guo-Ming SUNG Ying-Tzu LAI Yueh-Hung HOU
This paper presents a fully differential third-order (2-1) switched-current (SI) cascaded delta-sigma modulator (DSM), with an analog error cancellation logic circuit, and a digital decimation filter that is fabricated using 0.18-µm CMOS technology. The 2-1 architecture with only the quantizer input being fed into the second stage is introduced not only to reduce the circuit complexity, but also to be implemented easily using the switched-current approach. Measurements reveal that the dominant error is the quantization error of the second one-bit quantizer (e2). This error can be eliminated using an analog error cancellation logic circuit. In the proposed differential sample-and-hold circuit, low input impedance is presented with feedback and width-length adjustment in SI feedback memory cell (FMC); and that a coupled differential replicate (CDR) common-mode feedforward circuit (CMFF) is used to compensate the error of the current mirror. Also, measurements indicate that the signal-to-noise ratio (SNR), dynamic range (DR), effective number of bits (ENOB), power consumption and chip size are 67.3 dB, 69 dB, 10.9 bits, 12.3 mW, and 0.200.21 mm2, respectively, with a bandwidth of 40 kHz, a sampling rate of 10.24 MHz, an OSR of 128 and a supply voltage of 1.8 V.
Lianming SUN Hiromitsu OHMORI Akira SANO
This paper is concerned with blind identification of a nonminimum phase transfer function model. By over-sampling the output at a higher rate than the input, it is shown that its input-output relation can be described by a single input multiple output model (SIMO) with a common denominator polynomial. Based on the model expression, we present an algorithm to estimate numerator polynomials and common denominator polynomial in a blind manner. Furthermore, identifiability of the proposed scheme is clarified, and some numerical results are given for demonstrating its effectiveness.
Hajime KAGIWADA Lianming SUN Akira SANO Wenjiang LIU
A new identification algorithm based on output over-sampling scheme is proposed for a IIR model whose input signal can not be available directly. By using only an output signal sampled at higher rate than unknown input, parameters of the IIR model can be identified. It is clarified that the consistency of the obtained parameter estimates is assured under some specified conditions. Further an efficient recursive algorithm for blind parameter estimation is also given for practical applications. Simulation results demonstrate its effectiveness in both system and channel identification.
Guo-Ming SUNG Leenendra Chowdary GUNNAM Wen-Sheng LIN Ying-Tzu LAI
This work develops a third-order multibit switched-current (SI) delta-sigma modulator (DSM) with a four-bit switched-capacitor (SC) flash analog-to-digital converter (ADC) and an incremental data weighted averaging circuit (IDWA), which is fabricated using 0.18µm 1P6M CMOS technology. In the proposed DSM, a 4-bit SC flash ADC is used to improve its resolution, and an IDWA is used to reduce the nonlinearity of digital-to-analog converter (DAC) by moving the quantization noise out of the signal band by first-order noise shaping. Additionally, the proposed differential sample-and-hold circuit (SH) exhibits low input impedance with feedback and width-length adjustment in the SI feedback memory cell (FMC) to increase the conversion rate. A coupled differential replicate (CDR) common-mode feedforward circuit (CMFF) is used to compensate for the mirror error that is caused by the current mirror. Measurements indicate that the signal-to-noise ratio (SNR), dynamic range (DR), effective number of bits (ENOB), power consumption, and chip area are 64.1 dB, 64.4 dB, 10.36 bits, 18.82 mW, and 0.45 × 0.67 mm2 (without I/O pad), respectively, with a bandwidth of 20 kHz, an oversampling ratio (OSR) of 256, a sampling frequency of 10.24 MHz, and a supply voltage of 1.8 V.
Yi GUO Heming SUN Ping LEI Shinji KIMURA
Approximate computing has emerged as a promising approach for error-tolerant applications to improve hardware performance at the cost of some loss of accuracy. Multiplication is a key arithmetic operation in these applications. In this paper, we propose a low-cost approximate multiplier design by employing new probability-driven inexact compressors. This compressor design is introduced to reduce the height of partial product matrix into two rows, based on the probability distribution of the sum result of partial products. To compensate the accuracy loss of the multiplier, a grouped error recovery scheme is proposed and achieves different levels of accuracy. In terms of mean relative error distance (MRED), the accuracy losses of the proposed multipliers are from 1.07% to 7.86%. Compared with the Wallace multiplier using 40nm process, the most accurate variant of the proposed multipliers can reduce power by 59.75% and area by 42.47%. The critical path delay reduction is larger than 12.78%. The proposed multiplier design has a better accuracy-performance trade-off than other designs with comparable accuracy. In addition, the efficiency of the proposed multiplier design is assessed in an image processing application.
Peng QIAN Yan GUO Ning LI Baoming SUN
The compressive sensing (CS) theory has been recognized as a promising technique to achieve the target localization in wireless sensor networks. However, most of the existing works require the prior knowledge of transmitting powers of targets, which is not conformed to the case that the information of targets is completely unknown. To address such a problem, in this paper, we propose a novel CS-based approach for multiple target localization and power estimation. It is achieved by formulating the locations and transmitting powers of targets as a sparse vector in the discrete spatial domain and the received signal strengths (RSSs) of targets are taken to recover the sparse vector. The key point of CS-based localization is the sensing matrix, which is constructed by collecting RSSs from RF emitters in our approach, avoiding the disadvantage of using the radio propagation model. Moreover, since the collection of RSSs to construct the sensing matrix is tedious and time-consuming, we propose a CS-based method for reconstructing the sensing matrix from only a small number of RSS measurements. It is achieved by exploiting the CS theory and designing an difference matrix to reveal the sparsity of the sensing matrix. Finally, simulation results demonstrate the effectiveness and robustness of our localization and power estimation approach.
Guo-Ming SUNG Ying-Tsu LAI Chien-Lin LU
This paper presents a resistor-compensation technique for a CMOS bandgap and current reference, which utilizes various high positive temperature coefficient (TC) resistors, a two-stage operational transconductance amplifier (OTA) and a simplified start-up circuit in the 0.35-µm CMOS process. In the proposed bandgap and current reference, numerous compensated resistors, which have a high positive temperature coefficient (TC), are added to the parasitic n-p-n and p-n-p bipolar junction transistor devices, to generate a temperature-independent voltage reference and current reference. The measurements verify a current reference of 735.6 nA, the voltage reference of 888.1 mV, and the power consumption of 91.28 µW at a supply voltage of 3.3 V. The voltage TC is 49 ppm/ in the temperature range from 0 to 100 and 12.8 ppm/ from 30 to 100. The current TC is 119.2 ppm/ at temperatures of 0 to 100. Measurement results also demonstrate a stable voltage reference at high temperature (> 30), and a constant current reference at low temperature (< 70).
Linear complexity profile and correlation measure of order k are important pseudorandomness measures for sequences used in cryptography. We study both measures for a class of binary sequences called Legendre-Sidelnikov sequences. The proofs involve character sums.
Zhangjie FU Xingming SUN Qi LIU Lu ZHOU Jiangang SHU
Cloud computing is becoming increasingly popular. A large number of data are outsourced to the cloud by data owners motivated to access the large-scale computing resources and economic savings. To protect data privacy, the sensitive data should be encrypted by the data owner before outsourcing, which makes the traditional and efficient plaintext keyword search technique useless. So how to design an efficient, in the two aspects of accuracy and efficiency, searchable encryption scheme over encrypted cloud data is a very challenging task. In this paper, for the first time, we propose a practical, efficient, and flexible searchable encryption scheme which supports both multi-keyword ranked search and parallel search. To support multi-keyword search and result relevance ranking, we adopt Vector Space Model (VSM) to build the searchable index to achieve accurate search results. To improve search efficiency, we design a tree-based index structure which supports parallel search to take advantage of the powerful computing capacity and resources of the cloud server. With our designed parallel search algorithm, the search efficiency is well improved. We propose two secure searchable encryption schemes to meet different privacy requirements in two threat models. Extensive experiments on the real-world dataset validate our analysis and show that our proposed solution is very efficient and effective in supporting multi-keyword ranked parallel searches.
Zhili ZHOU Ching-Nung YANG Beijing CHEN Xingming SUN Qi LIU Q.M. Jonathan WU
For detecting the image copies of a given original image generated by arbitrary rotation, the existing image copy detection methods can not simultaneously achieve desirable performances in the aspects of both accuracy and efficiency. To address this challenge, a novel effective and efficient image copy detection method is proposed based on two global features extracted from rotation invariant partitions. Firstly, candidate images are preprocessed by an averaging operation to suppress noise. Secondly, the rotation invariant partitions of the preprocessed images are constructed based on pixel intensity orders. Thirdly, two global features are extracted from these partitions by utilizing image gradient magnitudes and orientations, respectively. Finally, the extracted features of images are compared to implement copy detection. Promising experimental results demonstrate our proposed method can effectively and efficiently resist rotations with arbitrary degrees. Furthermore, the performances of the proposed method are also desirable for resisting other typical copy attacks, such as flipping, rescaling, illumination and contrast change, as well as Gaussian noising.
The k-error linear complexity of periodic sequences is an important security index of stream cipher systems. By using an interesting decomposing approach, we investigate the intrinsic structure for the set of 2n-periodic binary sequences with fixed complexity measures. For k ≤ 4, we construct the complete set of error vectors that give the k-error linear complexity. As auxiliary results we obtain the counting functions of the k-error linear complexity of 2n-periodic binary sequences for k ≤ 4, as well as the expectations of the k-error linear complexity of a random sequence for k ≤ 3. Moreover, we study the 2t-error linear complexity of the set of 2n-periodic binary sequences with some fixed linear complexity L, where t < n-1 and the Hamming weight of the binary representation of 2n-L is t. Also, we extend some results to pn-periodic sequences over Fp. Finally, we discuss some potential applications.
Yan GUO Baoming SUN Ning LI Peng QIAN
Many basic tasks in Wireless Sensor Networks (WSNs) rely heavily on the availability and accuracy of target locations. Since the number of targets is usually limited, localization benefits from Compressed Sensing (CS) in the sense that measurements can be greatly reduced. Though some CS-based localization schemes have been proposed, all of these solutions make an assumption that all targets are located on a pre-sampled and fixed grid, and perform poorly when some targets are located off the grid. To address this problem, we develop an adaptive dictionary algorithm where the grid is adaptively adjusted. To achieve this, we formulate localization as a joint parameter estimation and sparse signal recovery problem. Additionally, we transform the problem into a tractable convex optimization problem by using Taylor approximation. Finally, the block coordinate descent method is leveraged to iteratively optimize over the parameters and sparse signal. After iterations, the measurements can be linearly represented by a sparse signal which indicates the number and locations of targets. Extensive simulation results show that the proposed adaptive dictionary algorithm provides better performance than state-of-the-art fixed dictionary algorithms.
Yi GUO Heming SUN Ping LEI Shinji KIMURA
Approximate multiplier design is an effective technique to improve hardware performance at the cost of accuracy loss. The current approximate multipliers are mostly ASIC-based and are dedicated for one particular application. In contrast, FPGA has been an attractive choice for many applications because of its high performance, reconfigurability, and fast development round. This paper presents a novel methodology for designing approximate multipliers by employing the FPGA-based fabrics (primarily look-up tables and carry chains). The area and latency are significantly reduced by applying approximation on carry results and cutting the carry propagation path in the multiplier. Moreover, we explore higher-order multipliers on architectural space by using our proposed small-size approximate multipliers as elementary modules. For different accuracy-hardware requirements, eight configurations for approximate 8×8 multiplier are discussed. In terms of mean relative error distance (MRED), the error of the proposed 8×8 multiplier is as low as 1.06%. Compared with the exact multiplier, our proposed design can reduce area by 43.66% and power by 24.24%. The critical path latency reduction is up to 29.50%. The proposed multiplier design has a better accuracy-hardware tradeoff than other designs with comparable accuracy. Moreover, image sharpening processing is used to assess the efficiency of approximate multipliers on application.