Haruhisa KATO Yoshitaka KIDANI Kei KAWAMURA
We propose a method for adaptively selecting merge candidates for the geometric partitioning mode (GPM) in versatile video coding (VVC). The conventional GPM contributes to improved coding efficiency and subjective quality by partitioning the block into two nonrectangular partitions with motion vectors. The motion vector of the GPM is encoded as an index of the merge candidate list, but it does not consider that the GPM partitions are nonrectangular. In this paper, the distribution of merge candidates was evaluated for each GPM mode and partition, and a characteristic bias was revealed. To improve the coding efficiency of VVC, the proposed method allows GPM to select merge candidates that are specific to the partition. This method also introduces adaptive reference frame selection using template matching of adjacent samples. Following common test conditions in the Joint Video Experts Team (JVET), the experimental results showed an improvement in coding efficiency, with a bitrate savings of 0.16%, compared to the reference software for exploration experiments on enhanced compression beyond VVC capability in the JVET.
Federated Learning (FL) facilitates deep learning model training across distributed networks while ensuring data privacy. When deployed on edge devices, network pruning becomes essential due to the constraints of computational resources. However, traditional FL pruning methods face bias issues arising from the varied distribution of local data, which poses a significant challenge. To address this, we propose DDPruneFL, an innovative FL pruning framework that utilizes Discriminative Data (DD). Specifically, we utilize minimally pre-trained local models, allowing each client to extract semantic concepts as DD, which then inform an iterative pruning process. As a result, DDPruneFL significantly outperforms existing methods on four benchmark datasets, adeptly handling both IID and non-IID distributions and Client Selection scenarios. This model achieves state-of-the-art (SOTA) performance in this field. Moreover, our studies comprehensively validate the effectiveness of DD. Furthermore, a detailed computational complexity analysis focused on Floating-point Operations (FLOPs) is also conducted. The FLOPs analysis reveals that DDPruneFL significantly improves performance during inference while only marginally increasing training costs. Additionally, it exhibits a cost advantage in inference when compared to other pruning FL methods of the same type, further emphasizing its cost-effectiveness and practicality.
Kenta YUMOTO Ami YAMAMOTO Takahiro MATSUDA Junichi HIGUCHI Takeshi KODAMA Hitoshi UENO Takashi SHIRAISHI
In cloud computing environments with virtual machines (VMs), we propose a VM placement (VMP) method based on traffic estimation to balance loads due to traffic volumes within physical hosts (PHs) and passing through physical network interface cards (NICs). We refer to a VM or a NIC in a cloud environment as node, and define a flow as a pair of nodes. To balance loads for both PHs and NICs, it is necessary to measure flow traffic volumes because each VM may connect to other VMs in different PHs. However, this is not a cost-effective way to measure flow traffic volumes because the number of flows increases with O(N2) for the number N of nodes. To solve this problem, we propose a VMP method using a compressed sensing (CS)-based traffic estimator. In the proposed method, the relationship between flow traffic volumes and node traffic volumes is formulated by a system of underdetermined linear equations. The flow traffic volumes are estimated with CS from the measured node traffic volumes. From the estimated flow traffic volumes, each VM is assigned to the optimal host for load balancing by solving a mixed-integer optimization problem.
Souhei TAKAGI Takuya KOJIMA Hideharu AMANO Morihiro KUGA Masahiro IIDA
SLM (Scalable Logic Module) is a fine-grained reconfigurable logic developed by Kumamoto University. Its small configuration information size characterizes it, resulting in a smaller area for logic cells. We have been developing an SoC-type FPGA called SLMLET to take advantage of SLM. It keeps multiple sets of configuration data in the memory module inside the chip in a compressed form and exchanges them quickly. This paper proposes a simple run-length compression technique called TLC (Tag Less Compression). It achieved a 1.01-3.06 compression ratio, is embedded in the prototype of the SLMLET, and is available now. Then, we propose DMC (Duplication Module Compression), which uses repeatedly appearing patterns in the SLM configuration data. The DMC achieves a better compression ratio for complicated designs that are hard to compress with TLC.
Shoichi HIROSE Hidenori KUWAKADO
In 2005, Nandi introduced a class of double-block-length compression functions hπ(x) := (h(x), h(π(x))), where h is a random oracle with an n-bit output and π is a non-cryptographic public permutation. Nandi demonstrated that the collision resistance of hπ is optimal if π has no fixed point in the classical setting. Our study explores the collision resistance of hπ and the Merkle-Damgård hash function using hπ in the quantum random oracle model. Firstly, we reveal that the quantum collision resistance of hπ may not be optimal even if π has no fixed point. If π is an involution, then a colliding pair of inputs can be found for hπ with only O(2n/2) queries by the Grover search. Secondly, we present a sufficient condition on π for the optimal quantum collision resistance of hπ. This condition states that any collision attack needs Ω(22n/3) queries to find a colliding pair of inputs. The proof uses the recent technique of Zhandry’s compressed oracle. Thirdly, we show that the quantum collision resistance of the Merkle-Damgård hash function using hπ can be optimal even if π is an involution. Finally, we discuss the quantum collision resistance of double-block-length compression functions using a block cipher.
Nawras KHUDHUR Aryo PINANDITO Yusuke HAYASHI Tsukasa HIRASHIMA
This study investigates the efficacy of a partial decomposition approach in concept map recomposition tasks to reduce cognitive load while maintaining the benefits of traditional recomposition approaches. Prior research has demonstrated that concept map recomposition, involving the rearrangement of unconnected concepts and links, can enhance reading comprehension. However, this task often imposes a significant burden on learners’ working memory. To address this challenge, this study proposes a partial recomposition approach where learners are tasked with recomposing only a portion of the concept map, thereby reducing the problem space. The proposed approach aims at lowering the cognitive load while maintaining the benefits of traditional recomposition task, that is, learning effect and motivation. To investigate the differences in cognitive load, learning effect, and motivation between the full decomposition (the traditional approach) and partial decomposition (the proposed approach), we have conducted an experiment (N=78) where the participants were divided into two groups of “full decomposition” and “partial decomposition”. The full decomposition group was assigned the task of recomposing a concept map from a set of unconnected concept nodes and links, while the partial decomposition group worked with partially connected nodes and links. The experimental results show a significant reduction in the embedded cognitive load of concept map recomposition across different dimensions while learning effect and motivation remained similar between the conditions. On the basis of these findings, educators are recommended to incorporate partially disconnected concept maps in recomposition tasks to optimize time management and sustain learner motivation. By implementing this approach, instructors can conserve cognitive resources and allocate saved energy and time to other activities that enhance the overall learning process.
Masahiro NISHIMURA Taito MANABE Yuichiro SHIBATA
This paper presents an FPGA implementation of real-time high dynamic range (HDR) synthesis, which expresses a wide dynamic range by combining multiple images with different exposures using image pyramids. We have implemented a pipeline that performs streaming processing on images without using external memory. However, implementation for high-resolution images has been difficult due to large memory usage for line buffers. Therefore, we propose an image compression algorithm based on adaptive differential pulse code modulation (ADPCM). Compression modules based on the algorithm can be easily integrated into the pipeline. When the image resolution is 4K and the pyramid depth is 7, memory usage can be halved from 168.48% to 84.32% by introducing the compression modules, resulting in better quality.
In this study, we consider the data compression with side information available at both the encoder and the decoder. The information source is assigned to a variable-length code that does not have to satisfy the prefix-free constraints. We define several classes of codes whose codeword lengths and error probabilities satisfy worse-case criteria in terms of side-information. As a main result, we investigate the exact first-order asymptotics with second-order bounds scaled as Θ(√n) as blocklength n increases under the regime of nonvanishing error probabilities. To get this result, we also derive its one-shot bounds by employing the cutoff operation.
Kengo HASHIMOTO Ken-ichi IWATA
The class of k-bit delay decodable codes, source codes allowing decoding delay of at most k bits for k≥0, can attain a shorter average codeword length than Huffman codes. This paper discusses the general properties of the class of k-bit delay decodable codes with a finite number of code tables and proves two theorems which enable us to limit the scope of codes to be considered when discussing optimal k-bit delay decodable codes.
Koshi SHIMADA Shota SAITO Toshiyasu MATSUSHIMA
The context tree model has the property that the occurrence probability of symbols is determined from a finite past sequence and is a broader class of sources that includes i.i.d. or Markov sources. This paper proposes a non-stationary source with context tree models that change from interval to interval. The Bayes code for this source requires weighting of the posterior probabilities of the context tree models and change points, so the computational complexity of it usually increases to exponential order. Therefore, the challenge is how to reduce the computational complexity. In this paper, we propose a special class of prior probability distribution of context tree models and change points and develop an efficient Bayes coding algorithm by combining two existing Bayes coding algorithms. The algorithm minimizes the Bayes risk function of the proposed source in this paper, and the computational complexity of the proposed algorithm is polynomial order. We investigate the behavior and performance of the proposed algorithm by conducting experiments.
In this paper, we introduce a framework of distributed orthogonal approximate message passing for recovering sparse vector based on sensing by multiple nodes. The iterative recovery process consists of local computation at each node, and global computation performed either by a particular node or joint computation on the overall network by exchanging messages. We then propose a method to reduce the communication cost between the nodes while maintaining the recovery performance.
Sohei SHIMOMAI Kei UEDA Shinji KIMURA
Recently, Quantum Annealing (QA) has attracted attention as an efficient algorithm for combinatorial optimization problems. In QA, the input data size becomes large and its reduction is important for accelerating by the hardware emulation since the usable memory size and its bandwidth are limited. The paper proposes the compression method of input sparse matrices for QA emulator. The proposed method uses the sparseness of the coefficient matrix and the reappearance of the same values. An independent table is introduced and data are compressed by the search and registration method of two consecutive data in the value table. The proposed method is applied to Traveling Salesman Problem (TSP) with 32, 64 and 96 cities and Nurse Scheduling Problem (NSP). The proposed method could reduce the amount of data by 1/40 for 96 city TSP and could manage 96 city TSP on the hardware emulator. When applied to NSP, we confirmed the effectiveness of the proposed method by the compression ratio ranging from 1/4 to 1/11.8. The data reduction is also useful for the simulation/emulation performance when using the compressed data directly and 1.9 times faster speed can be found on 96 city TSP than the CSR-based method.
While deep image compression performs better than traditional codecs like JPEG on natural images, it faces a challenge as a learning-based approach: compression performance drastically decreases for out-of-domain images. To investigate this problem, we introduce a novel task that we call universal deep image compression, which involves compressing images in arbitrary domains, such as natural images, line drawings, and comics. Furthermore, we propose a content-adaptive optimization framework to tackle this task. This framework adapts a pre-trained compression model to each target image during testing for addressing the domain gap between pre-training and testing. For each input image, we insert adapters into the decoder of the model and optimize the latent representation extracted by the encoder and the adapter parameters in terms of rate-distortion, with the adapter parameters transmitted per image. To achieve the evaluation of the proposed universal deep compression, we constructed a benchmark dataset containing uncompressed images of four domains: natural images, line drawings, comics, and vector arts. We compare our proposed method with non-adaptive and existing adaptive compression methods, and the results show that our method outperforms them. Our code and dataset are publicly available at https://github.com/kktsubota/universal-dic.
Xiaotian WANG Tingxuan LI Takuya TAMURA Shunsuke NISHIDA Takehito UTSURO
In the research of machine reading comprehension of Japanese how-to tip QA tasks, conventional extractive machine reading comprehension methods have difficulty in dealing with cases in which the answer string spans multiple locations in the context. The method of fine-tuning of the BERT model for machine reading comprehension tasks is not suitable for such cases. In this paper, we trained a generative machine reading comprehension model of Japanese how-to tip by constructing a generative dataset based on the website “wikihow” as a source of information. We then proposed two methods for multi-task learning to fine-tune the generative model. The first method is the multi-task learning with a generative and extractive hybrid training dataset, where both generative and extractive datasets are simultaneously trained on a single model. The second method is the multi-task learning with the inter-sentence semantic similarity and answer generation, where, drawing upon the answer generation task, the model additionally learns the distance between the sentences of the question/context and the answer in the training examples. The evaluation results showed that both of the multi-task learning methods significantly outperformed the single-task learning method in generative question-and-answer examples. Between the two methods for multi-task learning, that with the inter-sentence semantic similarity and answer generation performed the best in terms of the manual evaluation result. The data and the code are available at https://github.com/EternalEdenn/multitask_ext-gen_sts-gen.
Wenkai LIU Lin ZHANG Menglong WU Xichang CAI Hongxia DONG
The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.
Compressed sensing is a rapidly growing research field in signal and image processing, machine learning, statistics, and systems control. In this survey paper, we provide a review of the theoretical foundations of compressed sensing and present state-of-the-art algorithms for solving the corresponding optimization problems. Additionally, we discuss several practical applications of compressed sensing, such as group testing, sparse system identification, and sparse feedback gain design, and demonstrate their effectiveness through Python programs. This survey paper aims to contribute to the advancement of compressed sensing research and its practical applications in various scientific disciplines.
Siyi HU Makiko ITO Takahide YOSHIKAWA Yuan HE Hiroshi NAKAMURA Masaaki KONDO
Widely adopted by machine learning and graph processing applications nowadays, sparse matrix-Vector multiplication (SpMV) is a very popular algorithm in linear algebra. This is especially the case for fully-connected MLP layers, which dominate many SpMV computations and play a substantial role in diverse services. As a consequence, a large fraction of data center cycles is spent on SpMV kernels. Meanwhile, despite having efficient storage options against sparsity (such as CSR or CSC), SpMV kernels still suffer from the problem of limited memory bandwidth during data transferring because of the memory hierarchy of modern computing systems. In more detail, we find that both integer and floating-point data used in SpMV kernels are handled plainly without any necessary pre-processing. Therefore, we believe bandwidth conservation techniques, such as data compression, may dramatically help SpMV kernels when data is transferred between the main memory and the Last Level Cache (LLC). Furthermore, we also observe that convergence conditions in some typical scientific computation benchmarks (based on SpMV kernels) will not be degraded when adopting lower precision floating-point data. Based on these findings, in this work, we propose a simple yet effective data compression scheme that can be extended to general purpose computing architectures or HPC systems preferably. When it is adopted, a best-case speedup of 1.92x is made. Besides, evaluations with both the CG kernel and the PageRank algorithm indicate that our proposal introduces negligible overhead on both the convergence speed and the accuracy of final results.
We propose a non-photorealistic rendering method for automatically generating point-light-style (PLS) images from photographic images using peripheral difference filters with different window sizes. The proposed method can express PLS patterns near the edges of photographic images as dots. To verify the effectiveness of the proposed method, experiments were conducted to visually confirm PLS images generated from various photographic images.
For massive multiple-input multiple-output (MIMO) communication systems, simple linear detectors such as zero forcing (ZF) and minimum mean square error (MMSE) can achieve near-optimal detection performance with reduced computational complexity. However, such linear detectors always involve complicated matrix inversion, which will suffer from high computational overhead in the practical implementation. Due to the massive parallel-processing and efficient hardware-implementation nature, the neural network has become a promising approach to signal processing for the future wireless communications. In this paper, we first propose an efficient neural network to calculate the pseudo-inverses for any type of matrices based on the improved Newton's method, termed as the PINN. Through detailed analysis and derivation, the linear massive MIMO detectors are mapped on PINNs, which can take full advantage of the research achievements of neural networks in both algorithms and hardwares. Furthermore, an improved limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) quasi-Newton method is studied as the learning algorithm of PINNs to achieve a better performance/complexity trade-off. Simulation results finally validate the efficiency of the proposed scheme.
Keiichiro TAKADA Yasuaki TOKUMO Tomohiro IKAI Takeshi CHUJOH
Video-based point cloud compression (V-PCC) utilizes video compression technology to efficiently encode dense point clouds providing state-of-the-art compression performance with a relatively small computation burden. V-PCC converts 3-dimensional point cloud data into three types of 2-dimensional frames, i.e., occupancy, geometry, and attribute frames, and encodes them via video compression. On the other hand, the quality of these frames may be degraded due to video compression. This paper proposes an adaptive neural network-based post-processing filter on attribute frames to alleviate the degradation problem. Furthermore, a novel training method using occupancy frames is studied. The experimental results show average BD-rate gains of 3.0%, 29.3% and 22.2% for Y, U and V respectively.