Zhi LIU Heng WANG Yuan LI Hongyun LU Hongyuan JING Mengmeng ZHANG
In video-based point cloud compression (V-PCC), the partitioning of the Coding Unit (CU) has ultra-high computational complexity. Just Noticeable Difference Model (JND) is an effective metric to guide this process. However, in this paper, it is found that the performance of traditional JND model is degraded in V-PCC. For the attribute video, due to the pixel-filling operation, the capability of brightness perception is reduced for the JND model. For the geometric video, due to the depth filling operation, the capability of depth perception is degraded in the boundary area for depth based JND models (JNDD). In this paper, a joint JND model (J_JND) is proposed for the attribute video to improve the brightness perception capacity, and an occupancy map guided JNDD model (O_JNDD) is proposed for the geometric video to improve the depth difference estimation accuracy of the boundaries. Based on the two improved JND models, a fast V-PCC Coding Unit (CU) partitioning algorithm is proposed with adaptive CU depth prediction. The experimental results show that the proposed algorithm eliminates 27.46% of total coding time at the cost of only 0.36% and 0.75% Bjontegaard Delta rate increment under the geometry Point-to-Point (D1) error and attribute Luma Peak-signal-Noise-Ratio (PSNR), respectively.
Yuan LI Tingting HU Ryuji FUCHIKAMI Takeshi IKENAGA
1 millisecond (1-ms) vision systems are gaining increasing attention in diverse fields like factory automation and robotics, as the ultra-low delay ensures seamless and timely responses. Superpixel segmentation is a pivotal preprocessing to reduce the number of image primitives for subsequent processing. Recently, there has been a growing emphasis on leveraging deep network-based algorithms to pursue superior performance and better integration into other deep network tasks. Superpixel Sampling Network (SSN) employs a deep network for feature generation and employs differentiable SLIC for superpixel generation. SSN achieves high performance with a small number of parameters. However, implementing SSN on FPGAs for ultra-low delay faces challenges due to the final layer’s aggregation of intermediate results. To address this limitation, this paper proposes an aggregated to pipelined structure for FPGA implementation. The final layer is decomposed into individual final layers for each intermediate result. This architectural adjustment eliminates the need for memory to store intermediate results. Concurrently, the proposed structure leverages decomposed layers to facilitate a pipelined structure with pixel streaming input to achieve ultra-low latency. To cooperate with the pipelined structure, layer-partitioned memory architecture is proposed. Each final layer has dedicated memory for storing superpixel center information, allowing values to be read and calculated from memory without conflicts. Calculation results of each final layer are accumulated, and the result of each pixel is obtained as the stream reaches the last layer. Evaluation results demonstrate that boundary recall and under-segmentation error remain comparable to SSN, with an average label consistency improvement of 0.035 over SSN. From a hardware performance perspective, the proposed system processes 1000 FPS images with a delay of 0.947 ms/frame.
Dongyue JIN Luming CAO You WANG Xiaoxue JIA Yongan PAN Yuxin ZHOU Xin LEI Yuanyuan LIU Yingqi YANG Wanrong ZHANG
Fast switching speed, low power consumption, and good stability are some of the important properties of spin transfer torque assisted voltage controlled magnetic anisotropy magnetic tunnel junction (STT-assisted VCMA-MTJ) which makes the non-volatile full adder (NV-FA) based on it attractive for Internet of Things. However, the effects of process variations on the performances of STT-assisted VCMA-MTJ and NV-FA will be more and more obvious with the downscaling of STT-assisted VCMA-MTJ and the improvement of chip integration. In this paper, a more accurate electrical model of STT-assisted VCMA-MTJ is established on the basis of the magnetization dynamics and the process variations in film growth process and etching process. In particular, the write voltage is reduced to 0.7 V as the film thickness is reduced to 0.9 nm. The effects of free layer thickness variation (γtf) and oxide layer thickness variation (γtox) on the state switching as well as the effect of tunnel magnetoresistance ratio variation (β) on the sensing margin (SM) are studied in detail. Considering that the above process variations follow Gaussian distribution, Monte Carlo simulation is used to study the effects of the process variations on the writing and output operations of NV-FA. The result shows that the state of STT-assisted VCMA-MTJ can be switched under -0.3%≤γtf≤6% or -23%≤γtox≤0.2%. SM is reduced by 16.0% with β increases from 0 to 30%. The error rates of writing ‘0’ in the NV-FA can be reduced by increasing Vb1 or increasing positive Vb2. The error rates of writing ‘1’ can be reduced by increasing Vb1 or decreasing negative Vb2. The reduction of the output error rates can be realized effectively by increasing the driving voltage (Vdd).
This paper presents two different algorithms for random number generation. One algorithm generates a random sequence with an arbitrary distribution from a sequence of pure random numbers, i.e. a sequence with uniform distribution. The other algorithm generates a sequence of pure random numbers from a sequence of a given i.i.d. source. Both algorithms can be regarded as an implementation of the interval algorithm by using the integer arithmetic with limited precision. We analyze the approximation error measured by the variational distance between probability distributions of the desired random sequence and the output sequence generated by the algorithms. Further, we give bounds on the expected length of input sequence per one output symbol, and compare it with that of the original interval algorithm.