1-6hit |
Jianfeng XU Satoshi KOMORITA Kei KAWAMURA
We propose a framework for the integration of heterogeneous networks in human pose estimation (HPE) with the aim of balancing accuracy and computational complexity. Although many existing methods can improve the accuracy of HPE using multiple frames in videos, they also increase the computational complexity. The key difference here is that the proposed heterogeneous framework has various networks for different types of frames, while existing methods use the same networks for all frames. In particular, we propose to divide the video frames into two types, including key frames and non-key frames, and adopt three networks including slow networks, fast networks, and transfer networks in our heterogeneous framework. For key frames, a slow network is used that has high accuracy but high computational complexity. For non-key frames that follow a key frame, we propose to warp the heatmap of a slow network from a key frame via a transfer network and fuse it with a fast network that has low accuracy but low computational complexity. Furthermore, when extending to the usage of long-term frames where a large number of non-key frames follow a key frame, the temporal correlation decreases. Therefore, when necessary, we use an additional transfer network that warps the heatmap from a neighboring non-key frame. The experimental results on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed FSPose achieves a better balance between accuracy and computational complexity than the competitor method. Our source code is available at https://github.com/Fenax79/fspose.
Jianfeng XU Koichi TAKAGI Shigeyuki SAKAZAWA
This paper presents a system for automatic generation of dancing animation that is synchronized with a piece of music by re-using motion capture data. Basically, the dancing motion is synthesized according to the rhythm and intensity features of music. For this purpose, we propose a novel meta motion graph structure to embed the necessary features including both rhythm and intensity, which is constructed on the motion capture database beforehand. In this paper, we consider two scenarios for non-streaming music and streaming music, where global search and local search are required respectively. In the case of the former, once a piece of music is input, the efficient dynamic programming algorithm can be employed to globally search a best path in the meta motion graph, where an objective function is properly designed by measuring the quality of beat synchronization, intensity matching, and motion smoothness. In the case of the latter, the input music is stored in a buffer in a streaming mode, then an efficient search method is presented for a certain amount of music data (called a segment) in the buffer with the same objective function, resulting in a segment-based search approach. For streaming applications, we define an additional property in the above meta motion graph to deal with the unpredictable future music, which guarantees that there is some motion to match the unknown remaining music. A user study with totally 60 subjects demonstrates that our system outperforms the stat-of-the-art techniques in both scenarios. Furthermore, our system improves the synthesis speed greatly (maximal speedup is more than 500 times), which is essential for mobile applications. We have implemented our system on commercially available smart phones and confirmed that it works well on these mobile phones.
Jianfeng XU Toshihiko YAMASAKI Kiyoharu AIZAWA
3D video, which consists of a sequence of mesh models, can reproduce dynamic scenes containing 3D information. To summarize 3D video, a key frame extraction method is developed using rate-distortion (R-D) trade-off. For this purpose, an effective feature vector is extracted for each frame. Shot detection is performed using the feature vectors as a preprocessing followed by key frame extraction. Simple but reasonable definitions of rate and distortion are presented. Based on an assumption of linearity, an R-D curve is generated in each shot, where the locations of the key frames are optimized. Finally, R-D trade-off can be achieved by optimizing a cost function using a Lagrange multiplier, where the number of key frames is optimized in each shot. Therefore, our system will automatically determine the best locations and the number of key frames in the sense of R-D trade-off. Our experimental results show the extracted key frames are compact and faithful to the original 3D video.
Jianfeng XU Hong LI Wen-Yan YIN Junfa MAO Le-Wei LI
The element-by-element finite element method (EBE-FEM) combined with the preconditioned conjugate gradient (PCG) technique is employed in this paper to calculate the coupling capacitances of multi-level high-density three-dimensional interconnects (3DIs). All capacitive coupling 3DIs can be captured, with the effects of all geometric and physical parameters taken into account. It is numerically demonstrated that with this hybrid method in the extraction of capacitances, an effective and accurate convergent solution to the Laplace equation can be obtained, with less memory and CPU time required, as compared to the results obtained by using the commercial FEM software of either MAXWELL 3D or ANSYS.
Jianfeng XU Wen-Yan YIN Junfa MAO Le-Wei LI
In this paper, the thermal characteristic of the GaN HFETs has been analyzed using the hybrid finite element method (FEM). Both the steady and transient state thermal operations are quantitatively studied with the effects of temperature-dependent thermal conductivities of GaN and the substrate materials properly treated. The temperature distribution and the maximum temperatures of the HFETs operated under excitations of continuous-waves (CW) and pulsed-waves (PW) including double exponential shape PW such as electromagnetic pulse (EMP) and ultra-wideband (UWB) signal are studied and compared.
Jianfeng XU Haruhisa KATO Akio YONEYAMA
This paper presents a content-based retrieval algorithm for motion capture data, which is required to re-use a large-scale database that has many variations in the same category of motions. The most challenging problem is that logically similar motions may not be numerically similar due to the motion variations in a category. Our algorithm can effectively retrieve logically similar motions to a query, where a distance metric between our novel short-term features is defined properly as a fundamental component in our system. We extract the features based on short-term analysis of joint velocities after dividing an entire motion capture sequence into many small overlapped clips. In each clip, we select not only the magnitude but also the dynamic pattern of the joint velocities as our features, which can discard the motion variations while keeping the significant motion information in a category. Simultaneously, the amount of data is reduced, alleviating the computational cost. Using the extracted features, we define a novel distance metric between two motion clips. By dynamic time warping, a motion dissimilarity measure is calculated between two motion capture sequences. Then, given a query, we rank all the motions in our dataset according to their motion dissimilarity measures. Our experiments, which are performed on a test dataset consisting of more than 190 motions, demonstrate that our algorithm greatly improves the performance compared to two conventional methods according to a popular evaluation measure P(NR).