A binarized neural network (BNN) inference accelerator is designed in which weights are stores in loadless four-transistor static random access memory (4T SRAM) cells. A time-multiplexed exclusive NOR (XNOR) multiplier with switched capacitors is proposed which prevents the loadless 4T SRAM cell from being destroyed in the operation. An accumulator with current sensing scheme is also proposed to make the multiply-accumulate operation (MAC) completely linear and read-disturb free. The BNN inference accelerator is applied to the MNIST dataset recognition problem with accuracy of 96.2% for 500 data and the throughput, the energy efficiency and the area efficiency are confirmed to be 15.50 TOPS, 72.17 TOPS/W and 50.13 TOPS/mm2, respectively, by HSPICE simulation in 32 nm technology. Compared with the conventional SRAM cell based BNN inference accelerators which are scaled to 32 nm technology, the synapse cell size is reduced to less than 16% (0.235 μm2) and the cell efficiency (synapse array area/synapse array plus peripheral circuits) is 73.27% which is equivalent to the state-of-the-art of the SRAM cell based BNN accelerators.
Jia-ji JIANG Hai-bin WAN Hong-min SUN Tuan-fa QIN Zheng-qiang WANG
In this paper, the Towards High Performance Voxel-based 3D Object Detection (Voxel-RCNN) three-dimensional (3D) point cloud object detection model is used as the benchmark network. Aiming at the problems existing in the current mainstream 3D point cloud voxelization methods, such as the backbone and the lack of feature expression ability under the bird’s-eye view (BEV), a high-performance voxel-based 3D object detection network (Reinforced Voxel-RCNN) is proposed. Firstly, a 3D feature extraction module based on the integration of inverted residual convolutional network and weight normalization is designed on the 3D backbone. This module can not only well retain more point cloud feature information, enhance the information interaction between convolutional layers, but also improve the feature extraction ability of the backbone network. Secondly, a spatial feature-semantic fusion module based on spatial and channel attention is proposed from a BEV perspective. The mixed use of channel features and semantic features further improves the network’s ability to express point cloud features. In the comparison of experimental results on the public dataset KITTI, the experimental results of this paper are better than many voxel-based methods. Compared with the baseline network, the 3D average accuracy and BEV average accuracy on the three categories of Car, Cyclist, and Pedestrians are improved. Among them, in the 3D average accuracy, the improvement rate of Car category is 0.23%, Cyclist is 0.78%, and Pedestrians is 2.08%. In the context of BEV average accuracy, enhancements are observed: 0.32% for the Car category, 0.99% for Cyclist, and 2.38% for Pedestrians. The findings demonstrate that the algorithm enhancement introduced in this study effectively enhances the accuracy of target category detection.
Hua HUANG Yiwen SHAN Chuan LI Zhi WANG
Image denoising is an indispensable process of manifold high level tasks in image processing and computer vision. However, the traditional low-rank minimization-based methods suffer from a biased problem since only the noisy observation is used to estimate the underlying clean matrix. To overcome this issue, a new low-rank minimization-based method, called nuclear norm minus Frobenius norm rank residual minimization (NFRRM), is proposed for image denoising. The propose method transforms the ill-posed image denoising problem to rank residual minimization problems through excavating the nonlocal self-similarity prior. The proposed NFRRM model can perform an accurate estimation to the underlying clean matrix through treating each rank residual component flexibly. More importantly, the global optimum of the proposed NFRRM model can be obtained in closed-form. Extensive experiments demonstrate that the proposed NFRRM method outperforms many state-of-the-art image denoising methods.
Owing to the several cases wherein abnormal sounds, called adventitious sounds, are included in the lung sounds of a patient suffering from pulmonary disease, the objective of this study was to automatically detect abnormal sounds from auscultatory sounds. To this end, we expressed the acoustic features of the normal lung sounds of healthy people and abnormal lung sounds of patients using Gaussian mixture model (GMM)-hidden Markov models (HMMs), and distinguished between normal and abnormal lung sounds. In our previous study, we constructed left-to-right GMM-HMMs with a limited number of states. Because we expressed abnormal sounds that occur intermittently and repeatedly using limited states, the GMM-HMMs could not express the acoustic features of abnormal sounds. Furthermore, because the analysis frame length and intervals were long, the GMM-HMMs could not express the acoustic features of short time segments, such as heart sounds. Therefore, the classification rate of normal and abnormal respiration was low (86.60%). In this study, we propose the construction of ergodic GMM-HMMs with a repetitive structure for intermittent sounds. Furthermore, we considered a suitable frame length and frame interval to analyze acoustic features. Using the ergodic GMM-HMM, which can express the acoustic features of abnormal sounds and heart sounds that occur repeatedly in detail, the classification rate increased (89.34%). The results obtained in this study demonstrated the effectiveness of the proposed method.
Gang LIU Xin CHEN Zhixiang GAO
Photo animation is to transform photos of real-world scenes into anime style images, which is a challenging task in AIGC (AI Generated Content). Although previous methods have achieved promising results, they often introduce noticeable artifacts or distortions. In this paper, we propose a novel double-tail generative adversarial network (DTGAN) for fast photo animation. DTGAN is the third version of the AnimeGAN series. Therefore, DTGAN is also called AnimeGANv3. The generator of DTGAN has two output tails, a support tail for outputting coarse-grained anime style images and a main tail for refining coarse-grained anime style images. In DTGAN, we propose a novel learnable normalization technique, termed as linearly adaptive denormalization (LADE), to prevent artifacts in the generated images. In order to improve the visual quality of the generated anime style images, two novel loss functions suitable for photo animation are proposed: 1) the region smoothing loss function, which is used to weaken the texture details of the generated images to achieve anime effects with abstract details; 2) the fine-grained revision loss function, which is used to eliminate artifacts and noise in the generated anime style image while preserving clear edges. Furthermore, the generator of DTGAN is a lightweight generator framework with only 1.02 million parameters in the inference phase. The proposed DTGAN can be easily end-to-end trained with unpaired training data. Extensive experiments have been conducted to qualitatively and quantitatively demonstrate that our method can produce high-quality anime style images from real-world photos and perform better than the state-of-the-art models.
In this letter, we propose a feature-based knowledge distillation scheme which transfers knowledge between intermediate blocks of teacher and student with flow-based architecture, specifically Normalizing flow in our implementation. In addition to the knowledge transfer scheme, we examine how configuration of the distillation positions impacts on the knowledge transfer performance. To evaluate the proposed ideas, we choose two knowledge distillation baseline models which are based on Normalizing flow on different domains: CS-Flow for anomaly detection and SRFlow-DA for super-resolution. A set of performance comparison to the baseline models with popular benchmark datasets shows promising results along with improved inference speed. The comparison includes performance analysis based on various configurations of the distillation positions in the proposed scheme.
Tian FANG Feng LIU Conggai LI Fangjiong CHEN Yanli XU
Underwater acoustic channels (UWA) are usually sparse, which can be exploited for adaptive equalization to improve the system performance. For the shallow UWA channels, based on the proportional minimum symbol error rate (PMSER) criterion, the adaptive equalization framework requires the sparsity selection. Since the sparsity of the L0 norm is stronger than that of the L1, we choose it to achieve better convergence. However, because the L0 norm leads to NP-hard problems, it is difficult to find an efficient solution. In order to solve this problem, we choose the Gaussian function to approximate the L0 norm. Simulation results show that the proposed scheme obtains better performance than the L1 based counterpart.
Lead bromide-based perovskite organic-inorganic quantum-well films incorporated polycyclic aromatic chromophores into the organic layer (in other words, hybrid quantum-wells combined lead bromide semiconductor and organic semiconductors) were prepared by use of the spin-coating technique from the DMF solution in which PbBr2 and alkyl ammonium bromides which were linked polycyclic aromatics, pyrene, phenanthrene, and anthracene. When the pyrene-linked methyl ammonium bromide, which has a relatively small molecular cross-section with regard to the inorganic semiconductor plane, was employed, a lead bromide-based perovskite structure was successfully formed in the spin-coated films. When the phenanthrene-linked and anthracene-linked ammonium bromides, whose chromophore have large molecular cross-sections, were employed, lead bromide-based perovskite structures were not formed. However, the introduction of longer alkyl chains into the aromatics-linked ammonium bromides made it possible to form the perovskite structure.
This letter theoretically analyzes and minimizes the L2-sensitivity for all-pass fractional delay digital filters of which structure is given by the normalized lattice structure. The L2-sensitivity is well known as one of the useful evaluation functions for measuring the performance degradation caused by quantizing filter coefficients into finite number of bits. This letter deals with two cases: L2-sensitivity minimization problem with scaling constraint, and the one without scaling constraint. It is proved that, in both of these two cases, any all-pass fractional delay digital filter with the normalized lattice structure becomes an optimal structure that analytically minimizes the L2-sensitivity.
In many situations, abnormal sounds, called adventitious sounds, are included with the lung sounds of a subject suffering from pulmonary diseases. Thus, a method to automatically detect abnormal sounds in auscultation was proposed. The acoustic features of normal lung sounds for control subjects and abnormal lung sounds for patients are expressed using hidden markov models (HMMs) to distinguish between normal and abnormal lung sounds. Furthermore, abnormal sounds were detected in a noisy environment, including heart sounds, using a heart-sound model. However, the F1-score obtained in detecting abnormal respiration was low (0.8493). Moreover, the duration and acoustic properties of segments of respiratory, heart, and adventitious sounds varied. In our previous method, the appropriate HMMs for the heart and adventitious sound segments were constructed. Although the properties of the types of adventitious sounds varied, an appropriate topology for each type was not considered. In this study, appropriate HMMs for the segments of each type of adventitious sound and other segments were constructed. The F1-score was increased (0.8726) by selecting a suitable topology for each segment. The results demonstrate the effectiveness of the proposed method.
It has been widely recognized that in compressed sensing, many restricted isometry property (RIP) conditions can be easily obtained by using the null space property (NSP) with its null space constant (NSC) 0<θ≤1 to construct a contradicted method for sparse signal recovery. However, the traditional NSP with θ=1 will lead to conservative RIP conditions. In this paper, we extend the NSP with 0<θ<1 to a scale NSP, which uses a factor τ to scale down all vectors belonged to the Null space of a sensing matrix. Following the popular proof procedure and using the scale NSP, we establish more relaxed RIP conditions with the scale factor τ, which guarantee the bounded approximation recovery of all sparse signals in the bounded noisy through the constrained l1 minimization. An application verifies the advantages of the scale factor in the number of measurements.
Yuki OKABE Daisuke KANEMOTO Osamu MAIDA Tetsuya HIROSE
We propose a sampling method that incorporates a normally distributed sampling series for EEG measurements using compressed sensing. We confirmed that the ADC sampling count and amount of wirelessly transmitted data can be reduced by 11% while maintaining a reconstruction accuracy similar to that of the conventional method.
Manaya TOMIOKA Tsuneo KATO Akihiro TAMURA
A neural conversational model (NCM) based on an encoder-decoder recurrent neural network (RNN) with an attention mechanism learns different sequence-to-sequence mappings from what neural machine translation (NMT) learns even when based on the same technique. In the NCM, we confirmed that target-word-to-source-word mappings captured by the attention mechanism are not as clear and stationary as those for NMT. Considering that vector norms indicate a magnitude of information in the processing, we analyzed the inner workings of an encoder-decoder GRU-based NCM focusing on the norms of word embedding vectors and hidden vectors. First, we conducted correlation analyses on the norms of word embedding vectors with frequencies in the training set and with conditional entropies of a bi-gram language model to understand what is correlated with the norms in the encoder and decoder. Second, we conducted correlation analyses on norms of change in the hidden vector of the recurrent layer with their input vectors for the encoder and decoder, respectively. These analyses were done to understand how the magnitude of information propagates through the network. The analytical results suggested that the norms of the word embedding vectors are associated with their semantic information in the encoder, while those are associated with the predictability as a language model in the decoder. The analytical results further revealed how the norms propagate through the recurrent layer in the encoder and decoder.
Takashi ISHIO Naoto MAEDA Kensuke SHIBUYA Kenho IWAMOTO Katsuro INOUE
Software developers may write a number of similar source code fragments including the same mistake in software products. To remove such faulty code fragments, developers inspect code clones if they found a bug in their code. While various code clone detection methods have been proposed to identify clones of either code blocks or functions, those tools do not always fit the code inspection task because a faulty code fragment may be much smaller than code blocks, e.g. a single line of code. To enable developers to search code clones of such a small faulty code fragment in a large-scale software product, we propose a method using Lempel-Ziv Jaccard Distance, which is an approximation of Normalized Compression Distance. We conducted an experiment using an existing research dataset and a user survey in a company. The result shows our method efficiently reports cloned faulty code fragments and the performance is acceptable for software developers.
The effect of provision of “Neither-Good-Nor-Bad” (NGNB) information on the perceived trustworthiness of agents has been investigated in previous studies. The experimental results have revealed several conditions under which the provision of NGNB information works effectively to make users perceive greater trust of agents. However, the experiments in question were carried out in a situation in which a user is able to choose, with the agent's advice, one of a limited number of options. In practical problems, we are often at a loss as to which to choose because there are too many possible options and it is not easy to narrow them down. Furthermore, in the above-mentioned previous studies, it was easy to predict the size of profits that a user would obtain because its pattern was also limited. This prompted us, in this paper, to investigate the effect of provision of NGNB information on the users' trust of agents under conditions where it appears to the users that numerous options are available. Our experimental results reveal that an agent that reliably provides NGNB information tends to gain greater user trust in a situation where it appears to the users that there are numerous options and their consequences, and it is not easy to predict the size of profits. However, in contradiction to the previous study, the results in this paper also reveal that stable provision of NGNB information in the context of numerous options is less effective in a situation where it is harder to obtain larger profits.
Yoichi HINAMOTO Shotaro NISHIMURA
This paper investigates an adaptive notch digital filter that employs normal state-space realization of a single-frequency second-order IIR notch digital filter. An adaptive algorithm is developed to minimize the mean-squared output error of the filter iteratively. This algorithm is based on a simplified form of the gradient-decent method. Stability and frequency estimation bias are analyzed for the adaptive iterative algorithm. Finally, a numerical example is presented to demonstrate the validity and effectiveness of the proposed adaptive notch digital filter and the frequency-estimation bias analyzed for the adaptive iterative algorithm.
Contamination of water resources with pathogenic microorganisms excreted in human feces is a worldwide public health concern. Surveillance of fecal contamination is commonly performed by routine monitoring for a single type or a few types of microorganism(s). To design a feasible routine for periodic monitoring and to control risks of exposure to pathogens, reliable statistical algorithms for inferring correlations between concentrations of microorganisms in water need to be established. Moreover, because pathogens are often present in low concentrations, some contaminations are likely to be under a detection limit. This yields a pairwise left-censored dataset and complicates computation of correlation coefficients. Errors of correlation estimation can be smaller if undetected values are imputed better. To obtain better imputations, we utilize side information and develop a new technique, the asymmetric Tobit model which is an extension of the Tobit model so that domain knowledge can be exploited effectively when fitting the model to a censored dataset. The empirical results demonstrate that imputation with domain knowledge is effective for this task.
Masayuki FUKUMITSU Shingo HASEGAWA
The Schnorr signature is one of the representative signature schemes and its security was widely discussed. In the random oracle model (ROM), it is provable from the DL assumption, whereas there is negative circumstantial evidence in the standard model. Fleischhacker, Jager, and Schröder showed that the tight security of the Schnorr signature is unprovable from a strong cryptographic assumption, such as the One-More DL (OM-DL) assumption and the computational and decisional Diffie-Hellman assumption, in the ROM via a generic reduction as long as the underlying cryptographic assumption holds. However, it remains open whether or not the impossibility of the provable security of the Schnorr signature from a strong assumption via a non-tight and reasonable reduction. In this paper, we show that the security of the Schnorr signature is unprovable from the OM-DL assumption in the non-programmable ROM as long as the OM-DL assumption holds. Our impossibility result is proven via a non-tight Turing reduction.
Shogo NAKAMURA Sho IWAZAKI Koichi ICHIGE
This paper presents a method to optimize 2-D sparse array configurations along with a technique to interpolate holes to accurately estimate the direction of arrival (DOA). Conventional 2-D sparse arrays are often defined using a closed-form representation and have the property that they can create hole-free difference co-arrays that can estimate DOAs of incident signals that outnumber the physical elements. However, this property restricts the array configuration to a limited structure and results in a significant mutual coupling effect between consecutive sensors. In this paper, we introduce an optimization-based method for designing 2-D sparse arrays that enhances flexibility of array configuration as well as DOA estimation accuracy. We also propose a method to interpolate holes in 2-D co-arrays by nuclear norm minimization (NNM) that permits holes and to extend array aperture to further enhance DOA estimation accuracy. The performance of the proposed optimum arrays is evaluated through numerical examples.
Shuoyan LIU Enze YANG Kai FANG
Abnormal behavior detection is now a widely concerned research field, especially for crowded scenes. However, most traditional unsupervised approaches often suffered from the problem when the normal events in the scenario with large visual variety. This paper proposes a self-learning probabilistic Latent Semantic Analysis, which aims at taking full advantage of the high-level abnormal information to solve problems. We select the informative observations to construct the “reference events” from the training sets as a high-level guidance cue. Specifically, the training set is randomly divided into two separate subsets. One is used to learn this model, which is defined as the initialization sequence of “reference events”. The other aims to update this model and the the infrequent samples are chosen into the “reference events”. Finally, we define anomalies using events that are least similar to “reference events”. The experimental result demonstrates that the proposed model can detect anomalies accurately and robustly in the real-world crowd environment.