This paper presents MDX-Mixer, which improves music demixing (MDX) performance by leveraging source signals separated by multiple existing MDX models. Deep-learning-based MDX models have improved their separation performances year by year for four kinds of sound sources: “vocals,” “drums,” “bass,” and “other”. Our research question is whether mixing (i.e., weighted sum) the signals separated by state-of-the-art MDX models can obtain either the best of everything or higher separation performance. Previously, in singing voice separation and MDX, there have been studies in which separated signals of the same sound source are mixed with each other using time-invariant or time-varying positive mixing weights. In contrast to those, this study is novel in that it allows for negative weights as well and performs time-varying mixing using all of the separated source signals and the music acoustic signal before separation. The time-varying weights are estimated by modeling the music acoustic signals and their separated signals by dividing them into short segments. In this paper we propose two new systems: one that estimates time-invariant weights using 1×1 convolution, and one that estimates time-varying weights by applying the MLP-Mixer layer proposed in the computer vision field to each segment. The latter model is called MDX-Mixer. Their performances were evaluated based on the source-to-distortion ratio (SDR) using the well-known MUSDB18-HQ dataset. The results show that the MDX-Mixer achieved higher SDR than the separated signals given by three state-of-the-art MDX models.
Koichi KITAMURA Koichi KOBAYASHI Yuh YAMASHITA
In cyber-physical systems (CPSs) that interact between physical and information components, there are many sensors that are connected through a communication network. In such cases, the reduction of communication costs is important. Event-triggered control that the control input is updated only when the measured value is widely changed is well known as one of the control methods of CPSs. In this paper, we propose a design method of output feedback controllers with decentralized event-triggering mechanisms, where the notion of uniformly ultimate boundedness is utilized as a control specification. Using this notion, we can guarantee that the state stays within a certain set containing the origin after a certain time, which depends on the initial state. As a result, the number of times that the event occurs can be decreased. First, the design problem is formulated. Next, this problem is reduced to a BMI (bilinear matrix inequality) optimization problem, which can be solved by solving multiple LMI (linear matrix inequality) optimization problems. Finally, the effectiveness of the proposed method is presented by a numerical example.
Aditya RAKHMADI Kazuyuki SAITO
Transcatheter renal denervation (RDN) is a novel treatment to reduce blood pressure in patients with resistant hypertension using an energy-based catheter, mostly radio frequency (RF) current, by eliminating renal sympathetic nerve. However, several inconsistent RDN treatments were reported, mainly due to RF current narrow heating area, and the inability to confirm a successful nerve ablation in a deep area. We proposed microwave energy as an alternative for creating a wider ablation area. However, confirming a successful ablation is still a problem. In this paper, we designed a prediction method for deep renal nerve ablation sites using hybrid numerical calculation-driven machine learning (ML) in combination with a microwave catheter. This work is a first-step investigation to check the hybrid ML prediction capability in a real-world situation. A catheter with a single-slot coaxial antenna at 2.45 GHz with a balloon catheter, combined with a thin thermometer probe on the balloon surface, is proposed. Lumen temperature measured by the probe is used as an ML input to predict the temperature rise at the ablation site. Heating experiments using 6 and 8 mm hole phantom with a 41.3 W excited power, and 8 mm with 36.4 W excited power, were done eight times each to check the feasibility and accuracy of the ML algorithm. In addition, the temperature on the ablation site is measured for reference. Prediction by ML algorithm agrees well with the reference, with a maximum difference of 6°C and 3°C in 6 and 8 mm (both power), respectively. Overall, the proposed ML algorithm is capable of predicting the ablation site temperature rise with high accuracy.
Non-orthogonal multiple access (NOMA), which combines multiple user signals and transmits the combined signal over one channel, can achieve high spectral efficiency for mobile communications. However, combining the multiple signals can lead to degradation of bit error rates (BERs) of NOMA under severe channel conditions. In order to improve the BER performance of NOMA, this paper proposes a new NOMA scheme based on orthogonal space-time block codes (OSTBCs). The proposed scheme transmits several multiplexed signals over their respective orthogonal time-frequency channels, and can gain diversity effects due to the orthogonality of OSTBC. Furthermore, the new scheme can detect the user signals using low-complexity linear detection in contrast with the conventional NOMA. The paper focuses on the Alamouti code, which can be considered the simplest OSTBC, and theoretically analyzes the performance of the linear detection. Computer simulations under the condition of the same bit rate per channel show that the Alamouti code based scheme using two channels is superior to the conventional NOMA using one channel in terms of BER performance. As shown by both the theoretical and simulation analyses, the linear detection for the proposed scheme can maintain the same BER performance as that of the maximum likelihood detection, when the two channels have the same frequency response and do not bring about any diversity effects, which can be regarded as the worst case.
Let f be a Boolean function in n variables. The Möbius transform and its converse of f can describe the transformation behaviors between the truth table of f and the coefficients of the monomials in the algebraic normal form representation of f. In this letter, we develop the Möbius transform and its converse into a more generalized form, which also includes the known result given by Reed in 1954. We hope that our new result can be used in the design of decoding schemes for linear codes and the cryptanalysis for symmetric cryptography. We also apply our new result to verify the basic idea of the cube attack in a very simple way, in which the cube attack is a powerful technique on the cryptanalysis for symmetric cryptography.
Mobile communication systems are not only the core of the Information and Communication Technology (ICT) infrastructure but also that of our social infrastructure. The 5th generation mobile communication system (5G) has already started and is in use. 5G is expected for various use cases in industry and society. Thus, many companies and research institutes are now trying to improve the performance of 5G, that is, 5G Enhancement and the next generation of mobile communication systems (Beyond 5G (6G)). 6G is expected to meet various highly demanding requirements even compared with 5G, such as extremely high data rate, extremely large coverage, extremely low latency, extremely low energy, extremely high reliability, extreme massive connectivity, and so on. Artificial intelligence (AI) and machine learning (ML), AI/ML, will have more important roles than ever in 6G wireless communications with the above extreme high requirements for a diversity of applications, including new combinations of the requirements for new use cases. We can say that AI/ML will be essential for 6G wireless communications. This paper introduces some ML techniques and applications in 6G wireless communications, mainly focusing on the physical layer.
Ryosuke KURAMOCHI Hiroki NAKAHARA
Convolutional neural networks (CNNs) are widely used for image processing tasks in both embedded systems and data centers. In data centers, high accuracy and low latency are desired for various tasks such as image processing of streaming videos. We propose an FPGA-based low-latency CNN inference for randomly wired convolutional neural networks (RWCNNs), whose layer structures are based on random graph models. Because RWCNNs have several convolution layers that have no direct dependencies between them, our architecture can process them efficiently using a pipeline method. At each layer, we need to use the calculation results of multiple layers as the input. We use an FPGA with HBM2 to enable parallel access to the input data with multiple HBM2 channels. We schedule the order of execution of the layers to improve the pipeline efficiency. We build a conflict graph using the scheduling results. Then, we allocate the calculation results of each layer to the HBM2 channels by coloring the graph. Because the pipeline execution needs to be properly controlled, we developed an automatic generation tool for hardware functions. We implemented the proposed architecture on the Alveo U50 FPGA. We investigated a trade-off between latency and recognition accuracy for the ImageNet classification task by comparing the inference performances for different input image sizes. We compared our accelerator with a conventional accelerator for ResNet-50. The results show that our accelerator reduces the latency by 2.21 times. We also obtained 12.6 and 4.93 times better efficiency than CPU and GPU, respectively. Thus, our accelerator for RWCNNs is suitable for low-latency inference.
To realize an information-centric networking, IPFS (InterPlanetary File System) generates a unique ContentID for each content by applying a cryptographic hash to the content itself. Although it could improve the security against attacks such as falsification, it makes difficult to realize a similarity search in the framework of IPFS, since the similarity of contents is not reflected in the proximity of ContentIDs. To overcome this issue, we propose a method to apply a locality sensitive hash (LSH) to feature vectors extracted from contents as the key of indexes stored in IPFS. By conducting experiments with 10,000 random points corresponding to stored contents, we found that more than half of randomly given queries return a non-empty result for the similarity search, and yield an accurate result which is outside the σ confidence interval of an ordinary flooding-based method. Note that such a collection of random points corresponds to the worst case scenario for the proposed scheme since the performance of similarity search could improve when points and queries follow an uneven distribution.
In this paper, we present an algorithm that counts the number of empty quadrilaterals whose corners are chosen from a given set S of n points in general position. Our algorithm can separately count the number of convex or non-convex empty quadrilaterals in O(T) time, where T denotes the number of empty triangles in S. Note that T varies from Ω(n2) and O(n3) and the expected value of T is known to be Θ(n2) when the n points in S are chosen uniformly and independently at random from a convex and bounded body in the plane. We also show how to enumerate all convex and/or non-convex empty quadrilaterals in S in time proportional to the number of reported quadrilaterals, after O(T)-time preprocessing.
Koichi KOBAYASHI Kyohei NAKAJIMA Yuh YAMASHITA
Event-triggered control is a method that the control input is updated only when a certain condition is satisfied (i.e., an event occurs). In this paper, event-triggered control over a sensor network is studied based on the notion of uniformly ultimate boundedness. Since sensors are located in a distributed way, we consider multiple event-triggering conditions. In uniformly ultimate boundedness, it is guaranteed that if the state reaches a certain set containing the origin, the state stays within this set. Using this notion, the occurrence of events in the neighborhood of the origin is inhibited. First, the simultaneous design problem of a controller and event-triggering conditions is formulated. Next, this problem is reduced to an LMI (linear matrix inequality) optimization problem. Finally, the proposed method is demonstrated by a numerical example.
Takuma ITO Naoyuki SHINOHARA Shigenori UCHIYAMA
Multivariate public key cryptosystem (MPKC) is one of the major post quantum cryptosystems (PQC), and the National Institute of Standards and Technology (NIST) recently selected four MPKCs as candidates of their PQC. The security of MPKC depends on the hardness of solving systems of algebraic equations over finite fields. In particular, the multivariate quadratic (MQ) problem is that of solving such a system consisting of quadratic polynomials and is regarded as an important research subject in cryptography. In the Fukuoka MQ challenge project, the hardness of the MQ problem is discussed, and algorithms for solving the MQ problem and the computational results obtained by these algorithms are reported. Algorithms for computing Gröbner basis are used as the main tools for solving the MQ problem. For example, the F4 algorithm and M4GB algorithm have succeeded in solving many instances of the MQ problem provided by the project. In this paper, based on the F4-style algorithm, we present an efficient algorithm to solve the MQ problems with dense polynomials generated in the Fukuoka MQ challenge project. We experimentally show that our algorithm requires less computational time and memory for these MQ problems than the F4 algorithm and M4GB algorithm. We succeeded in solving Type II and III problems of Fukuoka MQ challenge using our algorithm when the number of variables was 37 in both problems.
Tsugumichi SHIBATA Yoshito KATO
Capacitive coupling of line coded and DC-balanced digital signals is often used to eliminate steady bias current flow between the systems or components in various communication systems. A multi-layer ceramic chip capacitor is promising for the capacitor of very broadband signal coupling because of its high frequency characteristics expected from the downsizing of the chip recent years. The lower limit of the coupling bandwidth is determined by the capacitance while the higher limit is affected by the parasitic inductance associated with the chip structure. In this paper, we investigate the coupling characteristics up to millimeter wave frequencies by the measurement and simulations. A phenomenon has been found in which the change in the current distribution in the chip structure occur at high frequencies and the coupling characteristics are improved compared to the prediction based on the conventional equivalent circuit model. A new equivalent circuit model of chip capacitor that can express the effect of the improvement has been proposed.
Hatoon S. ALSAGRI Mourad YKHLEF
Social media channels, such as Facebook, Twitter, and Instagram, have altered our world forever. People are now increasingly connected than ever and reveal a sort of digital persona. Although social media certainly has several remarkable features, the demerits are undeniable as well. Recent studies have indicated a correlation between high usage of social media sites and increased depression. The present study aims to exploit machine learning techniques for detecting a probable depressed Twitter user based on both, his/her network behavior and tweets. For this purpose, we trained and tested classifiers to distinguish whether a user is depressed or not using features extracted from his/her activities in the network and tweets. The results showed that the more features are used, the higher are the accuracy and F-measure scores in detecting depressed users. This method is a data-driven, predictive approach for early detection of depression or other mental illnesses. This study's main contribution is the exploration part of the features and its impact on detecting the depression level.
Wei JHANG Shiaw-Wu CHEN Ann-Chen CHANG
In this letter, an efficient hybrid direction-of-arrival (DOA) estimation scheme is devised for massive uniform rectangular array. In this scheme, the DOA estimator based on a two-dimensional (2D) discrete Fourier transform is first applied to acquire coarse initial DOA estimates for single data snapshot. Then, the fine DOA is accurately estimated through using the iterative search estimator within a very small region. Meanwhile, a Nyström-based method is utilized to correctly compute the required noise-subspace projection matrix, avoiding the direct computation of full-dimensional sample correlation matrix and its eigenvalue decomposition. Therefore, the proposed scheme not only can estimate DOA, but also save computational cost, especially in massive antenna arrays scenarios. Simulation results are included to demonstrate the effectiveness of the proposed hybrid estimate scheme.
This paper constructs packet-oriented erasure correcting codes and their systematic forms for the distributed storage systems. The proposed codes are encoded by exclusive OR and bit-level shift operation. By the shift operation, the encoded packets are slightly longer than the source packets. This paper evaluates the extra length of the encoded packets, called overhead, and shows that the proposed codes have smaller overheads than the zigzag decodable codes, which are existing codes using bit-level shift operation and exclusive OR.
Xuan ZHANG Xiaopeng JIAO Yu-Cheng HE Jianjun MU
Low-density parity-check (LDPC) codes can be used to improve the storage reliability of multi-level cell (MLC) flash memories because of their strong error-correcting capability. In order to improve the weighted bit-flipping (WBF) decoding of LDPC codes in MLC flash memories with cell-to-cell interference (CCI), we propose two strategies of normalizing weights and adjusting log-likelihood ratio (LLR) values. Simulation results show that the WBF decoding under the proposed strategies is much advantageous in both error and convergence performances over existing WBF decoding algorithms. Based on complexity analysis, the strategies provide the WBF decoding with a good tradeoff between performance and complexity.
An adaptive bit allocation scheme for zero-forcing (ZF) Tomlinson-Harashima precoding (THP) is proposed. The ZF-THP enables us to achieve feasible bit error rate (BER) performance when appropriate substream permutations are installed at the transmitter. In this study, the number of bits in each substream is adaptively allocated to minimize the average BER in fading environments. Numerical examples are provided to compare the proposed method with eigenbeam space division multiplexing (E-SDM) method.
Zhiqiang YI Meilin HE Peng PAN Haiquan WANG
This paper analyzes the performance of various decoders in a two-user interference channel, and some improved decoders based on enhanced utilization of channel state information at the receiver side are presented. Further, new decoders, namely hierarchical constellation based decoders, are proposed. Simulations show that the improved decoders and the proposed decoders have much better performance than existing decoders. Moreover, the proposed decoders have lower decoding complexity than the traditional maximum likelihood decoder.
In this paper, to make asynchronous circuit design easy, we propose a conversion method from synchronous Register Transfer Level (RTL) models to asynchronous RTL models with bundled-data implementation. The proposed method consists of the generation of an intermediate representation from a given synchronous RTL model and the generation of an asynchronous RTL model from the intermediate representation. This allows us to deal with different representation styles of synchronous RTL models. We use the eXtensible Markup Language (XML) as the intermediate representation. In addition to the asynchronous RTL model, the proposed method generates a simulation model when the target implementation is a Field Programmable Gate Array and a set of non-optimization constraints for the control circuit used in logic synthesis and layout synthesis. In the experiment, we demonstrate that the proposed method can convert synchronous RTL models specified manually and obtained by a high-level synthesis tool to asynchronous ones.
Rachelle RIVERO Yuya ONUMA Tsuyoshi KATO
It has been reported repeatedly that discriminative learning of distance metric boosts the pattern recognition performance. Although the ITML (Information Theoretic Metric Learning)-based methods enjoy an advantage that the Bregman projection framework can be applied for optimization of distance metric, a weak point of ITML-based methods is that the distance threshold for similarity/dissimilarity constraints must be determined manually, onto which the generalization performance is sensitive. In this paper, we present a new formulation of metric learning algorithm in which the distance threshold is optimized together. Since the optimization is still in the Bregman projection framework, the Dykstra algorithm can be applied for optimization. A nonlinear equation has to be solved to project the solution onto a half-space in each iteration. We have developed an efficient technique for projection onto a half-space. We empirically show that although the distance threshold is automatically tuned for the proposed metric learning algorithm, the accuracy of pattern recognition for the proposed algorithm is comparable, if not better, to the existing metric learning methods.