1-4hit |
Yushi OGIWARA Ayanori YOROZU Akihisa OHYA Hideyuki KAWASHIMA
In the Robot Operating System (ROS), a major middleware for robots, the Transform Library (TF) is a mandatory package that manages transformation information between coordinate systems by using a directed forest data structure and providing methods for registering and computing the information. However, the structure has two fundamental problems. The first is its poor scalability: since it accepts only a single thread at a time due to using a single giant lock for mutual exclusion, the access to the tree is sequential. Second, there is a lack of data freshness: it retrieves non-latest synthetic data when computing coordinate transformations because it prioritizes temporal consistency over data freshness. In this paper, we propose methods based on transactional techniques. This will allow us to avoid anomalies, achieve high performance, and obtain fresh data. These transactional methods show a throughput of up to 429 times higher than the conventional method on a read-only workload and a freshness of up to 1276 times higher than the conventional one on a read-write combined workload.
Yasin OGE Masato YOSHIMI Takefumi MIYOSHI Hideyuki KAWASHIMA Hidetsugu IRIE Tsutomu YOSHINAGA
In this paper, we propose Configurable Query Processing Hardware (CQPH), an FPGA-based accelerator for continuous query processing over data streams. CQPH is a highly optimized and minimal-overhead execution engine designed to deliver real-time response for high-volume data streams. Unlike most of the other FPGA-based approaches, CQPH provides on-the-fly configurability for multiple queries with its own dynamic configuration mechanism. With a dedicated query compiler, SQL-like queries can be easily configured into CQPH at run time. CQPH supports continuous queries including selection, group-by operation and sliding-window aggregation with a large number of overlapping sliding windows. As a proof of concept, a prototype of CQPH is implemented on an FPGA platform for a case study. Evaluation results indicate that a given query can be configured within just a few microseconds, and the prototype implementation of CQPH can process over 150 million tuples per second with a latency of less than a microsecond. Results also indicate that CQPH provides linear scalability to increase its flexibility (i.e., on-the-fly configurability) without sacrificing performance (i.e., maximum allowable clock speed).
Yasin OGE Takefumi MIYOSHI Hideyuki KAWASHIMA Tsutomu YOSHINAGA
A novel design is proposed to implement highly parallel stream join operators on a field-programmable gate array (FPGA), by examining handshake join algorithm for hardware implementation. The proposed design is evaluated in terms of the hardware resource usage, the maximum clock frequency, and the performance. Experimental results indicate that the proposed implementation can handle considerably high input rates, especially at low match rates. Results of simulation conducted to optimize size of buffers included in join and merge units give a new intuition regarding static and adaptive buffer tuning in handshake join.
Harunobu DAIKOKU Hideyuki KAWASHIMA Osamu TATEBE
This paper proposes and examines the three in-memory shuffling methods designed to address problems in MapReduce shuffling caused by skewed data. Coupled Shuffle Architecture (CSA) employs a single pairwise all-to-all exchange to shuffle both blocks, units of shuffle transfer, and meta-blocks, which contain the metadata of corresponding blocks. Decoupled Shuffle Architecture (DSA) separates the shuffling of meta-blocks and blocks, and applies different all-to-all exchange algorithms to each shuffling process, attempting to mitigate the impact of stragglers in strongly skewed distributions. Decoupled Shuffle Architecture with Skew-Aware Meta-Shuffle (DSA w/ SMS) autonomously determines the proper placement of blocks based on the memory consumption of each worker process. This approach targets extremely skewed situations where some worker processes could exceed their node memory limitation. This study evaluates implementations of the three shuffling methods in our prototype in-memory MapReduce engine, which employs high performance interconnects such as InfiniBand and Intel Omni-Path. Our results suggest that DSA w/ SMS is the only viable solution for extremely skewed data distributions. We also present a detailed investigation of the performance of CSA and DSA in various skew situations.