Kengo NAKATA Daisuke MIYASHITA Jun DEGUCHI Ryuichi FUJIMOTO
Quantization is commonly used to reduce the inference time of convolutional neural networks (CNNs). To reduce the inference time without drastically reducing accuracy, optimal bit widths need to be allocated for each layer or filter of the CNN. In conventional methods, the optimal bit allocation is obtained by using the gradient descent algorithm while minimizing the model size. However, the model size has little to no correlation with the inference time. In this paper, we present a computational-complexity metric called MAC×bit that is strongly correlated with the inference time of quantized CNNs. We propose a gradient descent-based regularization method that uses this metric for optimal bit allocation of a quantized CNN to improve the recognition accuracy and reduce the inference time. In experiments, the proposed method reduced the inference time of a quantized ResNet-18 model by 21.0% compared with the conventional regularization method based on model size while maintaining comparable recognition accuracy.
Anoop A Christo K. THOMAS Kala S
In this paper, a novel Enhanced Spatial Modulation-based Orthogonal Time Frequency Space (ESM-OTFS) is proposed to maximize the benefits of enhanced spatial modulation (ESM) and orthogonal time frequency space (OTFS) transmission. The primary objective of this novel modulation is to enhance transmission reliability, meeting the demanding requirements of high transmission rates and rapid data transfer in future wireless communication systems. The paper initially outlines the system model and specific signal processing techniques employed in ESM-OTFS. Furthermore, a novel detector based on sparse signal estimation is presented specifically for ESM-OTFS. The sparse signal estimation is performed using a fully factorized posterior approximation using Variational Bayesian Inference that leads to a low complexity solution without any matrix inversions. Simulation results indicate that ESM-OTFS surpasses traditional spatial modulation-based OTFS, and the newly introduced detection algorithm outperforms other linear detection methods.
Lei ZHOU Ryohei SASANO Koichi TAKEDA
In practice, even a well-trained neural machine translation (NMT) model can still make biased inferences on the training set due to distribution shifts. For the human learning process, if we can not reproduce something correctly after learning it multiple times, we consider it to be more difficult. Likewise, a training example causing a large discrepancy between inference and reference implies higher learning difficulty for the MT model. Therefore, we propose to adopt the inference discrepancy of each training example as the difficulty criterion, and according to which rank training examples from easy to hard. In this way, a trained model can guide the curriculum learning process of an initial model identical to itself. We put forward an analogy to this training scheme as guiding the learning process of a curriculum NMT model by a pretrained vanilla model. In this paper, we assess the effectiveness of the proposed training scheme and take an insight into the influence of translation direction, evaluation metrics and different curriculum schedules. Experimental results on translation benchmarks WMT14 English ⇒ German, WMT17 Chinese ⇒ English and Multitarget TED Talks Task (MTTT) English ⇔ German, English ⇔ Chinese, English ⇔ Russian demonstrate that our proposed method consistently improves the translation performance against the advanced Transformer baseline.
Ryota HIGASHIMOTO Soh YOSHIDA Takashi HORIHATA Mitsuji MUNEYASU
Noisy labels in training data can significantly harm the performance of deep neural networks (DNNs). Recent research on learning with noisy labels uses a property of DNNs called the memorization effect to divide the training data into a set of data with reliable labels and a set of data with unreliable labels. Methods introducing semi-supervised learning strategies discard the unreliable labels and assign pseudo-labels generated from the confident predictions of the model. So far, this semi-supervised strategy has yielded the best results in this field. However, we observe that even when models are trained on balanced data, the distribution of the pseudo-labels can still exhibit an imbalance that is driven by data similarity. Additionally, a data bias is seen that originates from the division of the training data using the semi-supervised method. If we address both types of bias that arise from pseudo-labels, we can avoid the decrease in generalization performance caused by biased noisy pseudo-labels. We propose a learning method with noisy labels that introduces unbiased pseudo-labeling based on causal inference. The proposed method achieves significant accuracy gains in experiments at high noise rates on the standard benchmarks CIFAR-10 and CIFAR-100.
Cong ZHOU Jing TAO Baosheng WANG Na ZHAO
As a key technology of 5G, NFV has attracted much attention. In addition, monitoring plays an important role, and can be widely used for virtual network function placement and resource optimisation. The existing monitoring methods focus on the monitoring load without considering they own resources needed. This raises a unique challenge: jointly optimising the NFV monitoring systems and minimising their monitoring load at runtime. The objective is to enhance the gain in real-time monitoring metrics at minimum monitoring costs. In this context, we propose a novel NFV monitoring solution, namely, iMon (Monitoring by inferring), that jointly optimises the monitoring process and reduces resource consumption. We formalise the monitoring process into a multitarget regression problem and propose three regression models. These models are implemented by a deep neural network, and an experimental platform is built to prove their availability and effectiveness. Finally, experiments also show that monitoring resource requirements are reduced, and the monitoring load is just 0.6% of that of the monitoring tool cAdvisor on our dataset.
In this paper, we propose a selective membership inference attack method that determines whether certain data corresponding to a specific class are being used as training data for a machine learning model or not. By using the proposed method, membership or non-membership can be inferred by generating a decision model from the prediction of the inference models and training the confidence values for the data corresponding to the selected class. We used MNIST as an experimental dataset and Tensorflow as a machine learning library. Experimental results show that the proposed method has a 92.4% success rate with 5 inference models for data corresponding to a specific class.
This paper proposes a low-complexity variational Bayesian inference (VBI)-based method for massive multiple-input multiple-output (MIMO) downlink channel estimation. The temporal correlation at the mobile user side is jointly exploited to enhance the channel estimation performance. The key to the success of the proposed method is the column-independent factorization imposed in the VBI framework. Since we separate the Bayesian inference for each column vector of signal-of-interest, the computational complexity of the proposed method is significantly reduced. Moreover, the temporal correlation is automatically uncoupled to facilitate the updating rule derivation for the temporal correlation itself. Simulation results illustrate the substantial performance improvement achieved by the proposed method.
We consider a reliable decentralized supervisory control problem for discrete event systems in the inference-based framework. This problem requires us to synthesize local supervisors such that the controlled system achieves the specification and is nonblocking, even if local control decisions of some local supervisors are not available for making the global control decision. In the case of single-level inference, we introduce a notion of reliable 1-inference-observability and show that reliable 1-inference-observability together with controllability and Lm(G)-closedness is a necessary and sufficient condition for the existence of a solution to the reliable decentralized supervisory control problem.
Young H. OH Yunho JIN Tae Jun HAM Jae W. LEE
Many cloud service providers employ specialized hardware accelerators, called neural processing units (NPUs), to accelerate deep neural networks (DNNs). An NPU scheduler is responsible for scheduling incoming user requests and required to satisfy the two, often conflicting, optimization goals: maximizing system throughput and satisfying quality-of-service (QoS) constraints (e.g., deadlines) of individual requests. We propose Layerweaver+, a low-cost layer-wise DNN scheduler for NPUs, which provides both high system throughput and minimal QoS violations. For a serving scenario based on the industry-standard MLPerf inference benchmark, Layerweaver+ significantly improves the system throughput by up to 266.7% over the baseline scheduler serving one DNN at a time.
Ryosuke KURAMOCHI Hiroki NAKAHARA
Convolutional neural networks (CNNs) are widely used for image processing tasks in both embedded systems and data centers. In data centers, high accuracy and low latency are desired for various tasks such as image processing of streaming videos. We propose an FPGA-based low-latency CNN inference for randomly wired convolutional neural networks (RWCNNs), whose layer structures are based on random graph models. Because RWCNNs have several convolution layers that have no direct dependencies between them, our architecture can process them efficiently using a pipeline method. At each layer, we need to use the calculation results of multiple layers as the input. We use an FPGA with HBM2 to enable parallel access to the input data with multiple HBM2 channels. We schedule the order of execution of the layers to improve the pipeline efficiency. We build a conflict graph using the scheduling results. Then, we allocate the calculation results of each layer to the HBM2 channels by coloring the graph. Because the pipeline execution needs to be properly controlled, we developed an automatic generation tool for hardware functions. We implemented the proposed architecture on the Alveo U50 FPGA. We investigated a trade-off between latency and recognition accuracy for the ImageNet classification task by comparing the inference performances for different input image sizes. We compared our accelerator with a conventional accelerator for ResNet-50. The results show that our accelerator reduces the latency by 2.21 times. We also obtained 12.6 and 4.93 times better efficiency than CPU and GPU, respectively. Thus, our accelerator for RWCNNs is suitable for low-latency inference.
The test of homogeneity for normal mixtures has been used in various fields, but its theoretical understanding is limited because the parameter set for the null hypothesis corresponds to singular points in the parameter space. In this paper, we shed a light on this issue from a new perspective, variational Bayes, and offer a theory for testing homogeneity based on it. Conventional theory has not reveal the stochastic behavior of the variational free energy, which is necessary for constructing a hypothesis test, has remained unknown. We clarify it for the first time and construct a new test base on it. Numerical experiments show the validity of our results.
Asuka MAKI Daisuke MIYASHITA Shinichi SASAKI Kengo NAKATA Fumihiko TACHIBANA Tomoya SUZUKI Jun DEGUCHI Ryuichi FUJIMOTO
Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.
Tian XIE Hongchang CHEN Tuosiyu MING Jianpeng ZHANG Chao GAO Shaomei LI Yuehang DING
In partial label data, the ground-truth label of a training example is concealed in a set of candidate labels associated with the instance. As the ground-truth label is inaccessible, it is difficult to train the classifier via the label information. Consequently, manifold structure information is adopted, which is under the assumption that neighbor/similar instances in the feature space have similar labels in the label space. However, the real-world data may not fully satisfy this assumption. In this paper, a partial label metric learning method based on likelihood-ratio test is proposed to make partial label data satisfy the manifold assumption. Moreover, the proposed method needs no objective function and treats the data pairs asymmetrically. The experimental results on several real-world PLL datasets indicate that the proposed method outperforms the existing partial label metric learning methods in terms of classification accuracy and disambiguation accuracy while costs less time.
This letter presents a novel technique to achieve a fast inference of the binarized convolutional neural networks (BCNN). The proposed technique modifies the structure of the constituent blocks of the BCNN model so that the input elements for the max-pooling operation are binary. In this structure, if any of the input elements is +1, the result of the pooling can be produced immediately; the proposed technique eliminates such computations that are involved to obtain the remaining input elements, so as to reduce the inference time effectively. The proposed technique reduces the inference time by up to 34.11%, while maintaining the classification accuracy.
Deep Graphical Model (DGM) based on Generative Adversarial Nets (GANs) has shown promise in image generation and latent variable inference. One of the typical models is the Iterative Adversarial Inference model (GibbsNet), which learns the joint distribution between the data and its latent variable. We present RGNet (Re-inference GibbsNet) which introduces a re-inference chain in GibbsNet to improve the quality of generated samples and inferred latent variables. RGNet consists of the generative, inference, and discriminative networks. An adversarial game is cast between the generative and inference networks and the discriminative network. The discriminative network is trained to distinguish between (i) the joint inference-latent/data-space pairs and re-inference-latent/data-space pairs and (ii) the joint sampled-latent/generated-data-space pairs. We show empirically that RGNet surpasses GibbsNet in the quality of inferred latent variables and achieves comparable performance on image generation and inpainting tasks.
Dongping YU Yan GUO Ning LI Qiao SU
As an emerging and promising technique, device-free localization (DFL) has drawn considerable attention in recent years. By exploiting the inherent spatial sparsity of target localization, the compressive sensing (CS) theory has been applied in DFL to reduce the number of measurements. In practical scenarios, a prior knowledge about target locations is usually available, which can be obtained by coarse localization or tracking techniques. Among existing CS-based DFL approaches, however, few works consider the utilization of prior knowledge. To make use of the prior knowledge that is partly or erroneous, this paper proposes a novel faulty prior knowledge aided multi-target device-free localization (FPK-DFL) method. It first incorporates the faulty prior knowledge into a three-layer hierarchical prior model. Then, it estimates location vector and learns model parameters under a variational Bayesian inference (VBI) framework. Simulation results show that the proposed method can improve the localization accuracy by taking advantage of the faulty prior knowledge.
Xiaojuan ZHU Yang LU Jie ZHANG Zhen WEI
Topological inference is the foundation of network performance analysis and optimization. Due to the difficulty of obtaining prior topology information of wireless sensor networks, we propose routing topology inference, RTI, which reconstructs the routing topology from source nodes to sink based on marking packets and probing locally. RTI is not limited to any specific routing protocol and can adapt to a dynamic and lossy networks. We select topological distance and reconstruction time to evaluate the correctness and effectiveness of RTI and then compare it with PathZip and iPath. Simulation results indicate that RTI maintains adequate reconstruction performance in dynamic and packet loss environments and provides a global routing topology view for wireless sensor networks at a lower reconstruction cost.
Hongbin LIN Zheng WU Dong LEI Wei WANG Xiuping PENG
This letter presents a novel tensor voting mechanism — analytic tensor voting (ATV), to get rid of the difficulties in original tensor voting, especially the efficiency. One of the main advantages is its explicit voting formulations, which benefit the completion of tensor voting theory and computational efficiency. Firstly, new decaying function was designed following the basic spirit of decaying function in original tensor voting (OTV). Secondly, analytic stick tensor voting (ASTV) was formulated using the new decaying function. Thirdly, analytic plate and ball tensor voting (APTV, ABTV) were formulated through controllable stick tensor construction and tensorial integration. These make the each voting of tensor can be computed by several non-iterative matrix operations, improving the efficiency of tensor voting remarkably. Experimental results validate the effectiveness of proposed method.
Takayoshi SHOUDAI Yuta YOSHIMURA Yusuke SUZUKI Tomoyuki UCHIDA Tetsuhiro MIYAHARA
A cograph (complement reducible graph) is a graph which can be generated by disjoint union and complement operations on graphs, starting with a single vertex graph. Cographs arise in many areas of computer science and are studied extensively. With the goal of developing an effective data mining method for graph structured data, in this paper we introduce a graph pattern expression, called a cograph pattern, which is a special type of cograph having structured variables. Firstly, we show that a problem whether or not a given cograph pattern g matches a given cograph G is NP-complete. From this result, we consider the polynomial time learnability of cograph pattern languages defined by cograph patterns having variables labeled with mutually different labels, called linear cograph patterns. Secondly, we present a polynomial time matching algorithm for linear cograph patterns. Next, we give a polynomial time algorithm for obtaining a minimally generalized linear cograph pattern which explains given positive data. Finally, we show that the class of linear cograph pattern languages is polynomial time inductively inferable from positive data.
Zhenghang CUI Issei SATO Masashi SUGIYAMA
As the emergence and the thriving development of social networks, a huge number of short texts are accumulated and need to be processed. Inferring latent topics of collected short texts is an essential task for understanding its hidden structure and predicting new contents. A biterm topic model (BTM) was recently proposed for short texts to overcome the sparseness of document-level word co-occurrences by directly modeling the generation process of word pairs. Stochastic inference algorithms based on collapsed Gibbs sampling (CGS) and collapsed variational inference have been proposed for BTM. However, they either require large computational complexity, or rely on very crude estimation that does not preserve sufficient statistics. In this work, we develop a stochastic divergence minimization (SDM) inference algorithm for BTM to achieve better predictive likelihood in a scalable way. Experiments show that SDM-BTM trained by 30% data outperforms the best existing algorithm trained by full data.