Yasunori ISHIHARA Takashi HAYATA Toru FUJIWARA
This paper discusses a static analysis problem, called absolute consistency problem, for relational schema mappings. A given schema mapping is said to be absolutely consistent if every source instance has a corresponding target instance. Absolute consistency is an important property because it guarantees that data exchange never fails for any source instance. Originally, for XML schema mappings, the absolute consistency problem was defined and its complexity was investigated by Amano et al. However, as far as the authors know, there are no known results for relational schema mappings. In this paper, we focus on relational schema mappings such that both the source and the target schemas have functional dependencies, under the assumption that mapping rules are defined by constant-free tuple-generating dependencies. In this setting, we show that the absolute consistency problem is in coNP. We also show that it is solvable in polynomial time if the tuple-generating dependencies are full and the size of the left-hand side of each functional dependency is bounded by some constant. Finally, we show that the absolute consistency problem is coNP-hard even if the source schema has no functional dependency and the target schema has only one; or each of the source and the target schemas has only one functional dependency such that the size of the left-hand side of the functional dependency is at most two.
Dongliang CHEN Peng SONG Wenjing ZHANG Weijian ZHANG Bingui XU Xuan ZHOU
In this letter, we propose a novel robust transferable subspace learning (RTSL) method for cross-corpus facial expression recognition. In this method, on one hand, we present a novel distance metric algorithm, which jointly considers the local and global distance distribution measure, to reduce the cross-corpus mismatch. On the other hand, we design a label guidance strategy to improve the discriminate ability of subspace. Thus, the RTSL is much more robust to the cross-corpus recognition problem than traditional transfer learning methods. We conduct extensive experiments on several facial expression corpora to evaluate the recognition performance of RTSL. The results demonstrate the superiority of the proposed method over some state-of-the-art methods.
Ying SUN Xiao-Yuan JING Fei WU Yanfei SUN
Cross-project defect prediction (CPDP) is a research hot recently, which utilizes the data form existing source project to construct prediction model and predicts the defect-prone of software instances from target project. However, it is challenging in bridging the distribution difference between different projects. To minimize the data distribution differences between different projects and predict unlabeled target instances, we present a novel approach called selective pseudo-labeling based subspace learning (SPSL). SPSL learns a common subspace by using both labeled source instances and pseudo-labeled target instances. The accuracy of pseudo-labeling is promoted by iterative selective pseudo-labeling strategy. The pseudo-labeled instances from target project are iteratively updated by selecting the instances with high confidence from two pseudo-labeling technologies. Experiments are conducted on AEEEM dataset and the results show that SPSL is effective for CPDP.
Kota KUDO Yuichi TAKANO Ryo NOMURA
This paper addresses the problem of selecting a significant subset of candidate features to use for multiple linear regression. Bertsimas et al. [5] recently proposed the discrete first-order (DFO) algorithm to efficiently find near-optimal solutions to this problem. However, this algorithm is unable to escape from locally optimal solutions. To resolve this, we propose a stochastic discrete first-order (SDFO) algorithm for feature subset selection. In this algorithm, random perturbations are added to a sequence of candidate solutions as a means to escape from locally optimal solutions, which broadens the range of discoverable solutions. Moreover, we derive the optimal step size in the gradient-descent direction to accelerate convergence of the algorithm. We also make effective use of the L2-regularization term to improve the predictive performance of a resultant subset regression model. The simulation results demonstrate that our algorithm substantially outperforms the original DFO algorithm. Our algorithm was superior in predictive performance to lasso and forward stepwise selection as well.
Tailin NIU Xi CHEN Longjiang QU Chao LI
(m+k,m)-functions with good cryptographic properties when 1≤k
The interval in ℕ composed of finite states of the stream version of asymmetric binary systems (ABS) is irreducible if it admits an irreducible finite-state Markov chain. We say that the stream version of ABS is irreducible if its interval is irreducible. Duda gave a necessary condition for the interval to be irreducible. For a probability vector (p,1-p), we assume that p is irrational. Then, we give a necessary and sufficient condition for the interval to be irreducible. The obtained conditions imply that, for a sufficiently small ε, if p∈(1/2,1/2+ε), then the stream version of ABS could not be practically irreducible.
This paper presents a Siamese architecture model with two identical Convolutional Neural Networks (CNNs) to identify code clones; two code fragments are represented as Abstract Syntax Trees (ASTs), CNN-based subnetworks extract feature vectors from the ASTs of pairwise code fragments, and the output layer produces how similar or dissimilar they are. Experimental results demonstrate that CNN-based feature extraction is effective in detecting code clones at source code or bytecode levels.
Yibo JIANG Hui BI Hui LI Zhihao XU Cheng SHI
In partially depleted SOI (PD-SOI) technology, the SCR-based protection device is desired due to its relatively high robustness, but be restricted to use because of its inherent low holding voltage (Vh) and high triggering voltage (Vt1). In this paper, the body-tie side triggering diode inserting silicon controlled rectifier (BSTDISCR) is proposed and verified in 180 nm PD-SOI technology. Compared to the other devices in the same process and other related works, the BSTDISCR presents as a robust and latchup-immune PD-SOI ESD protection device, with appropriate Vt1 of 6.3 V, high Vh of 4.2 V, high normalized second breakdown current (It2), which indicates the ESD protection robustness, of 13.3 mA/µm, low normalized parasitic capacitance of 0.74 fF/µm.
Takamaru MATSUI Shouhei KIDERA
Here, we present a novel spectroscopic imaging method based on the boundary-extraction scheme for wide-beam terahertz (THz) three-dimensional imaging. Optical-lens-focusing systems for THz subsurface imaging generally require the depth of the object from the surface to be input beforehand to achieve the desired azimuth resolution. This limitation can be alleviated by incorporating a wide-beam THz transmitter into the synthetic aperture to automatically change the focusing depth in the post-signal processing. The range point migration (RPM) method has been demonstrated to have significant advantages in terms of imaging accuracy over the synthetic-aperture method. Moreover, in the RPM scheme, spectroscopic information can be easily associated with each scattering center. Thus, we propose an RPM-based terahertz spectroscopic imaging method. The finite-difference time-domain-based numerical analysis shows that the proposed algorithm provides accurate target boundary imaging associated with each frequency-dependent characteristic.
Wei JHANG Shiaw-Wu CHEN Ann-Chen CHANG
This letter presents an improved hybrid direction of arrival (DOA) estimation scheme with computational efficiency for massive uniform linear array. In order to enhance the resolution of DOA estimation, the initial estimator based on the discrete Fourier transform is applied to obtain coarse DOA estimates by a virtual array extension for one snapshot. Then, by means of a first-order Taylor series approximation to the direction vector with the one initially estimated in a very small region, the iterative fine estimator can find a new direction vector which raises the searching efficiency. Simulation results are provided to demonstrate the effectiveness of the proposed scheme.
Fanxin ZENG Yue ZENG Lisheng ZHANG Xiping HE Guixin XUAN Zhenyu ZHANG Yanni PENG Linjie QIAN Li YAN
Sequences that attain the smallest possible absolute sidelobes (SPASs) of periodic autocorrelation function (PACF) play fairly important roles in synchronization of communication systems, Large scale integrated circuit testing, and so on. This letter presents an approach to construct 16-QAM sequences of even periods, based on the known quaternary sequences. A relationship between the PACFs of 16-QAM and quaternary sequences is established, by which when quaternary sequences that attain the SPASs of PACF are employed, the proposed 16-QAM sequences have good PACF.
Ryohei BANNO Jingyu SUN Susumu TAKEUCHI Kazuyuki SHUDO
MQTT is one of the promising protocols for various data exchange in IoT environments. Typically, those environments have a characteristic called “edge-heavy”, which means that things at the network edge generate a massive volume of data with high locality. For handling such edge-heavy data, an architecture of placing multiple MQTT brokers at the network edges and making them cooperate with each other is quite effective. It can provide higher throughput and lower latency, as well as reducing consumption of cloud resources. However, under this kind of architecture, heterogeneity could be a vital issue. Namely, an appropriate product of MQTT broker could vary according to the different environment of each network edge, even though different products are hard to cooperate due to the MQTT specification providing no interoperability between brokers. In this paper, we propose Interworking Layer of Distributed MQTT brokers (ILDM), which enables arbitrary kinds of MQTT brokers to cooperate with each other. ILDM, designed as a generic mechanism independent of any specific cooperation algorithm, provides APIs to facilitate development of a variety of algorithms. By using the APIs, we also present two basic cooperation algorithms. To evaluate the usefulness of ILDM, we introduce a benchmark system which can be used for both a single broker and multiple brokers. Experimental results show that the throughput of five brokers running together by ILDM is improved 4.3 times at maximum than that of a single broker.
In this paper, we consider the clustering problem of independent general subspaces. That is, with given data points lay near or on the union of independent low-dimensional linear subspaces, we aim to recover the subspaces and assign the corresponding label to each data point. To settle this problem, we take advantages of both greedy strategy and energy minimization strategy to propose a simple yet effective algorithm based on the assumption that an m-branched (i.e., perfect m-ary) tree which is constructed by collecting m-nearest neighbor points in each node has a high probability of containing the near-exact subspace. Specifically, at first, subspace candidates are enumerated by multiple m-branched trees. Each tree starts with a data point and grows by collecting nearest neighbors in the breadth-first search order. Then, subspace proposals are further selected from the enumeration to initialize the energy minimization algorithm. Eventually, both the proposals and the labeling result are finalized by iterative re-estimation and labeling. Experiments with both synthetic and real-world data show that the proposed method can outperform state-of-the-art methods and is practical in real application.
Xiuzhen CHEN Xiaoyan ZHOU Cheng LU Yuan ZONG Wenming ZHENG Chuangao TANG
For cross-corpus speech emotion recognition (SER), how to obtain effective feature representation for the discrepancy elimination of feature distributions between source and target domains is a crucial issue. In this paper, we propose a Target-adapted Subspace Learning (TaSL) method for cross-corpus SER. The TaSL method trys to find a projection subspace, where the feature regress the label more accurately and the gap of feature distributions in target and source domains is bridged effectively. Then, in order to obtain more optimal projection matrix, ℓ1 norm and ℓ2,1 norm penalty terms are added to different regularization terms, respectively. Finally, we conduct extensive experiments on three public corpuses, EmoDB, eNTERFACE and AFEW 4.0. The experimental results show that our proposed method can achieve better performance compared with the state-of-the-art methods in the cross-corpus SER tasks.
Bandhit SUKSIRI Masahiro FUKUMOTO
This paper presents an efficient wideband two-dimensional direction-of-arrival (DOA) estimation for an L-shaped microphone array. We propose a way to construct a wideband sample cross-correlation matrix without any process of DOA preliminary estimation, such as beamforming technique, by exploiting sample cross-correlation matrices of two different frequencies for all frequency bins. Subsequently, wideband DOAs can be estimated by using this wideband matrix along with a scheme of estimating DOA in a narrowband subspace method. Therefore, a contribution of our study is providing an alternative framework for recent narrowband subspace methods to estimating the DOA of wideband sources directly. It means that this framework enables cutting-edge techniques in the existing narrowband subspace methods to implement the wideband direction estimation for reducing the computational complexity and facilitating the estimation algorithm. Theoretical analysis and effectiveness of the proposed method are substantiated through numerical simulations and experiments, which are performed in reverberating environments. The results show that performance of the proposed method performs better than others over a range of signal-to-noise ratio with just a few microphones. All these advantages make the proposed method a powerful tool for navigation systems based on acoustic signal processing.
Yancheng CHEN Ning LI Xijian ZHONG Yan GUO
Unmanned aerial vehicle mounted base stations (UAV-BSs) can provide wireless cellular service to ground users in a variety of scenarios. The efficient deployment of such UAV-BSs while optimizing the coverage area is one of the key challenges. We investigate the deployment of UAV-BS to maximize the coverage of ground users, and further analyzes the impact of the deployment of UAV-BS on the fairness of ground users. In this paper, we first calculated the location of the UAV-BS according to the QoS requirements of the ground users, and then the fairness of ground users is taken into account by calculating three different fairness indexes. The performance of two genetic algorithms, namely Standard Genetic Algorithm (SGA) and Multi-Population Genetic Algorithm (MPGA) are compared to solve the optimization problem of UAV-BS deployment. The simulations are presented showing that the performance of the two algorithms, and the fairness performance of the ground users is also given.
Tatsuya NAGAI Masaki KAMIZONO Yoshiaki SHIRAISHI Kelin XIA Masami MOHRI Yasuhiro TAKANO Masakatu MORII
Epidemic cyber incidents are caused by malicious websites using exploit kits. The exploit kit facilitate attackers to perform the drive-by download (DBD) attack. However, it is reported that malicious websites using an exploit kit have similarity in their website structure (WS)-trees. Hence, malicious website identification techniques leveraging WS-trees have been studied, where the WS-trees can be estimated from HTTP traffic data. Nevertheless, the defensive component of the exploit kit prevents us from capturing the WS-tree perfectly. This paper shows, hence, a new WS-tree construction procedure by using the fact that a DBD attack happens in a certain duration. This paper proposes, moreover, a new malicious website identification technique by clustering the WS-tree of the exploit kits. Experiment results assuming the D3M dataset verify that the proposed technique identifies exploit kits with a reasonable accuracy even when HTTP traffic from the malicious sites are partially lost.
Mizuho NAGANUMA Yuichi TAKANO Ryuhei MIYASHIRO
This paper is concerned with a mixed-integer optimization (MIO) approach to selecting a subset of relevant features from among many candidates. For ordinal classification, a sequential logit model and an ordered logit model are often employed. For feature subset selection in the sequential logit model, Sato et al.[22] recently proposed a mixed-integer linear optimization (MILO) formulation. In their MILO formulation, a univariate nonlinear function contained in the sequential logit model was represented by a tangent-line-based approximation. We extend this MILO formulation toward the ordered logit model, which is more commonly used for ordinal classification than the sequential logit model is. Making use of tangent planes to approximate a bivariate nonlinear function involved in the ordered logit model, we derive an MILO formulation for feature subset selection in the ordered logit model. Our computational results verify that the proposed method is superior to the L1-regularized ordered logit model in terms of solution quality.
Yutaro ODA Yosuke TANIGAWA Hideki TODE
Network function virtualization (NFV) flexibly provides servoces by virtualizing network functions on a general-purpose server, and attracted research interest in recent years. In NFV environment, providing service chaining, which dynamically connects each network function (virtual network function: VNF), is critical issue. However, as it is challenging to select the optimal sequence of VNF services in the service chain in a decentralized manner, the distances between the VNFs tend to increase, leading to longer communication and processing delays. Furthermore, it has never considered that certain VNFs that can be exchange the order of services with one another. To address this problem, in this paper, we propose a distributed search method for ordered VNFs to reduce delays while considering the load on control server, by exploiting an in-network guidance technology, called Breadcrrmubs, for query messages.
Yusaku HAYAMIZU Akihisa SHIBUYA Miki YAMAMOTO
In content oriented networks (CON), routers in a network are generally equipped with local cache storages and store incoming contents temporarily. Efficient utilization of total cache storage in networks is one of the most important technical issues in CON, as it can reduce content server load, content download latency and network traffic. Performance of networked cache is reported to strongly depend on both cache decision and content request routing. In this paper, we evaluate several combinations of these two strategies. Especially for routing, we take up off-path cache routing, Breadcrumbs, as one of the content request routing proposals. Our performance evaluation results show that off-path cache routing, Breadcrumbs, suffers low performance with cache decisions which generally has high performance with shortest path routing (SPR), and obtains excellent performance with TERC (Transparent En-Route Cache) which is well-known to have low performance with widely used SPR. Our detailed evaluation results in two network environments, emerging CONs and conventional IP, show these insights hold in both of these two network environments.