Shota FUJII Shohei KAKEI Masanori HIROTOMO Makoto TAKITA Yoshiaki SHIRAISHI Masami MOHRI Hiroki KUZUNO Masakatu MORII
Haoran LUO Tengfei SHAO Tomoji KISHI Shenglei LI
Chee Siang LEOW Tomoki KITAGAWA Hideaki YAJIMA Hiromitsu NISHIZAKI
Dengtian YANG Lan CHEN Xiaoran HAO
Rong HUANG Yue XIE
Toshiki ONISHI Asahi OGUSHI Ryo ISHII Akihiro MIYATA
Meihua XUE Kazuki SUGITA Koichi OTA Wen GU Shinobu HASEGAWA
Jinyong SUN Zhiwei DONG Zhigang SUN Guoyong CAI Xiang ZHAO
Yusuke HIROTA Yuta NAKASHIMA Noa GARCIA
Yusuke HIROTA Yuta NAKASHIMA Noa GARCIA
Kosetsu TSUKUDA Tomoyasu NAKANO Masahiro HAMASAKI Masataka GOTO
ZhengYu LU PengFei XU
Binggang ZHUO Ryota HONDA Masaki MURATA
Qingqing YU Rong JIN
Huawei TAO Ziyi HU Sixian LI Chunhua ZHU Peng LI Yue XIE
Qianhang DU Zhipeng LIU Yaotong SONG Ningning WANG Zeyuan JU Shangce GAO
Ryota TOMODA Hisashi KOGA
Reina SASAKI Atsuko TAKEFUSA Hidemoto NAKADA Masato OGUCHI
So KOIDE Yoshiaki TAKATA Hiroyuki SEKI
Huang Rong Qian Zewen Ma Hao Han Zhezhe Xie Yue
Huu-Long PHAM Ryota MIBAYASHI Takehiro YAMAMOTO Makoto P. KATO Yusuke YAMAMOTO Yoshiyuki SHOJI Hiroaki OHSHIMA
Taku WAKUI Fumio TERAOKA Takao KONDO
Shaobao Wu Zhihua Wu Meixuan Huang
Koji KAMMA Toshikazu WADA
Dingjie PENG Wataru KAMEYAMA
Zhizhong WANG Wen GU Zhaoxing LI Koichi OTA Shinobu HASEGAWA
Tomoaki YAMAZAKI Seiya ITO Kouzou OHARA
Daihei ISE Satoshi KOBAYASHI
Masanari ICHIKAWA Yugo TAKEUCHI
Shota SUZUKI Satoshi ONO
Reoma MATSUO Toru KOIZUMI Hidetsugu IRIE Shuichi SAKAI Ryota SHIOYA
Hirotaka HACHIYA Fumiya NISHIZAWA
Issa SUGIURA Shingo OKAMURA Naoto YANAI
Mudai KOBAYASHI Mohammad Mikal Bin Amrul Halim Gan Takahisa SEKI Takahiro HIROFUCHI Ryousei TAKANO Mitsuhiro KISHIMOTO
Chi ZHANG Luwei ZHANG Toshihiko YAMASAKI
Jung Min Lim Wonho Lee Jun-Hyeong Choi Jong Wook Kwak
Zhuo ZHANG Donghui LI Kun JIANG Ya LI Junhu WANG Xiankai MENG
Takayoshi SHIKANO Shuichi ICHIKAWA
Shotaro ISHIKURA Ryosuke MINAMI Miki YAMAMOTO
Pengfei ZHANG Jinke WANG Yuanzhi CHENG Shinichi TAMURA
Fengqi GUO Qicheng LIU
Runlong HAO Hui LUO Yang LI
Rongchun XIAO Yuansheng LIU Jun ZHANG Yanliang HUANG Xi HAN
Yong JIN Kazuya IGUCHI Nariyoshi YAMAI Rei NAKAGAWA Toshio MURAKAMI
Toru HASEGAWA Yuki KOIZUMI Junji TAKEMASA Jun KURIHARA Toshiaki TANAKA Timothy WOOD K. K. RAMAKRISHNAN
Rikima MITSUHASHI Yong JIN Katsuyoshi IIDA Yoshiaki TAKAI
Zezhong LI Jianjun MA Fuji REN
Lorenzo Mamelona TingHuai Ma Jia Li Bright Bediako-Kyeremeh Benjamin Kwapong Osibo
Wonho LEE Jong Wook KWAK
Xiaoxiao ZHOU Yukinori SATO
Kento WATANABE Masataka GOTO
Kazuyo ONISHI Hiroki TANAKA Satoshi NAKAMURA
Takashi YOKOTA Kanemitsu OOTSU
Chenbo SHI Wenxin SUN Jie ZHANG Junsheng ZHANG Chun ZHANG Changsheng ZHU
Masateru TSUNODA Ryoto SHIMA Amjed TAHIR Kwabena Ebo BENNIN Akito MONDEN Koji TODA Keitaro NAKASAI
Masateru TSUNODA Takuto KUDO Akito MONDEN Amjed TAHIR Kwabena Ebo BENNIN Koji TODA Keitaro NAKASAI Kenichi MATSUMOTO
Hiroaki AKUTSU Ko ARAI
Lanxi LIU Pengpeng YANG Suwen DU Sani M. ABDULLAHI
Xiaoguang TU Zhi HE Gui FU Jianhua LIU Mian ZHONG Chao ZHOU Xia LEI Juhang YIN Yi HUANG Yu WANG
Yingying LU Cheng LU Yuan ZONG Feng ZHOU Chuangao TANG
Jialong LI Takuto YAMAUCHI Takanori HIRANO Jinyu CAI Kenji TEI
Wei LEI Yue ZHANG Hanfeng XIE Zebin CHEN Zengping CHEN Weixing LI
David CLARINO Naoya ASADA Atsushi MATSUO Shigeru YAMASHITA
Takashi YOKOTA Kanemitsu OOTSU
Xiaokang Jin Benben Huang Hao Sheng Yao Wu
Tomoki MIYAMOTO
Ken WATANABE Katsuhide FUJITA
Masashi UNOKI Kai LI Anuwat CHAIWONGYEN Quoc-Huy NGUYEN Khalid ZAMAN
Takaharu TSUBOYAMA Ryota TAKAHASHI Motoi IWATA Koichi KISE
Chi ZHANG Li TAO Toshihiko YAMASAKI
Ann Jelyn TIEMPO Yong-Jin JEONG
Jiakun LI Jiajian LI Yanjun SHI Hui LIAN Haifan WU
Nikolay FEDOROV Yuta YAMASAKI Masateru TSUNODA Akito MONDEN Amjed TAHIR Kwabena Ebo BENNIN Koji TODA Keitaro NAKASAI
Yukasa MURAKAMI Yuta YAMASAKI Masateru TSUNODA Akito MONDEN Amjed TAHIR Kwabena Ebo BENNIN Koji TODA Keitaro NAKASAI
Akira ITO Yoshiaki TAKAHASHI
Rindo NAKANISHI Yoshiaki TAKATA Hiroyuki SEKI
Chuzo IWAMOTO Ryo TAKAISHI
Koichi FUJII Tomomi MATSUI
Kazuyuki AMANO
Takumi SHIOTA Tonan KAMATA Ryuhei UEHARA
Hitoshi MURAKAMI Yutaro YAMAGUCHI
Kento KIMURA Tomohiro HARAMIISHI Kazuyuki AMANO Shin-ichi NAKANO
Ryotaro MITSUBOSHI Kohei HATANO Eiji TAKIMOTO
Naohito MATSUMOTO Kazuhiro KURITA Masashi KIYOMI
Tomohiro KOBAYASHI Tomomi MATSUI
Shin-ichi NAKANO
Ming PAN
M.M. Hafizur RAHMAN Susumu HORIGUCHI
Interconnection networks usually suffer from Little's Law: low cost implies low performance and high performance is obtained high cost. However, hierarchical interconnection networks provide high performance at low cost by exploring the locality that exists in communication patterns of massively parallel computers. In this paper, we propose a new hierarchical interconnection network, called Hierarchical Torus Network (HTN). This network reduces the number of vertical links in 3D stacked implementation while maintaining good network features. This paper addresses the architectural details of the HTN, and explores aspects such as the network diameter, average distance, bisection width, peak number of vertical links, and VLSI layout area of the HTN as well as for several commonly used networks for parallel computers. It is shown that the HTN possesses several attractive features including small diameter, small average distance, small number of wires, a particularly small number of vertical links, and economic layout area.
Wireless networks are quickly becoming an integral part of the Internet. But, the TCP does not work well in wireless networks. Considerable research has tried to improve TCP performance in wireless networks, especially in the face of the wireless link loss problem. However, TCP performance is also deeply influenced by channel conditions, and the performance in variable channel conditions has not been studied widely. In this paper, we observe the behavior of the traditional standard TCP performance in the face of variable channel conditions. Then, we propose a new simple TCP flow control scheme. The traditional standard TCP performs poorly because it does not reflect current channel conditions. Our adaptive TCP receiver window control scheme, however, performs well on variable channel conditions. Our scheme efficiently improves TCP performance with minimum modification of TCP module on the wireless terminal. It adaptively adjusts the TCP receiver window size depending on the dynamic channel conditions. Thus, our scheme maintains network conditions properly and has good TCP performance over all wireless channel conditions. In addition, since our scheme is simple and the only the TCP receiver module on the wireless terminal needs to be changed, it is feasible. Through the simulation and analysis, we found that our scheme has good TCP throughput and short end-to-end delay over all variable channel conditions.
Hosang YUN Kwangwook SHIN Hyunsoo YOON
The crucial handover elements in wireless ATM networks are handover delay and handover efficiency. Since the research about the handover in wireless ATM has until now focused mainly on minimizing handover delay, the results have shown the inefficiency of network resources. In broadband wireless ATM networks, handover efficiency is critical to network capacity. In this paper, we propose a new handover scheme based on a partial path rerouting scheme called the delay limited best-fit backtracking scheme. The scheme searches for the crossover switch that limits handover delay and at the same time maximizes handover efficiency. It uses a new crossover switch searching method, which is an adaptive backtracking searching method that uses a best-fit search manner, to search for the optimal crossover switch that satisfies the given crossover switch condition. We evaluated the performance of proposed handover scheme, and show that the suggested scheme can improve handover efficiency more than other handover schemes.
JeHyok RYU MoonBae SONG Chong-Sun HWANG
In wireless mobile environments, data requests based on the location of mobile clients (MCs) have increased. The requested data, called location-dependent data (LDD), may be uncertain if MCs use terms of general distance like "near". Fuzzy theory allows us to represent uncertainty without sharp boundaries. In this paper we quantify the fuzziness and propose a method for constructing the data region of LDD by the degree of the distance between LDDs' and MCs' locations. In simulation studies, we evaluate the LDD regions (LDRs) in various situations: MCs' extensive and intensive queried pattern in data regions of two "near" senses and civilized regions with regional features. Our performance evaluation shows that the number of database accesses in proposed LDRs can be reduced in each case.
In this paper, a quasi-synchronous code-division multiple-access (QS-CDMA) is investigated for application in the reverse link of a microcellular or indoor mobile communication environment. In a QS-CDMA system, the relative time delay between the signals of different users is normally restricted within a few chips. Generalized orthogonal (GO) codes added with guard chips are employed as the spreading sequences in this paper and the strict timing error restrictions for BPSK and M-QAM modulation schemes are derived based on the correlation properties of GO code set which determines the multiple access interference (MAI). The results reveal that the system employing GO code set with bigger GO zone can tolerate more serious timing error, and higher order modulation schemes require stricter synchronization. Based on the system model presented, the BER performance for BPSK and M-QAM is evaluated by Gaussian Approximation (GA) and Monte Carlo simulation. The system capacity in terms of acquirable total bit rates are also evaluated, revealing that if system synchronization error is limited within the GO zone, M-QAM scheme can be utilized to improve the system capacity.
Xiaohong JIANG Hong SHEN Md. Mamun-ur-Rashid KHANDKER Susumu HORIGUCHI
Crosstalk in optical switch is an intrinsic drawback of optical networks, and avoiding crosstalk is important for making optical network work properly. Horizontal expansion and vertical stacking are two basic techniques for creating nonblocking multistage interconnection networks (MINs). Rearrangeable (nonblocking) optical MINs are feasible since they have lower complexity than their strictly nonblocking counterparts. In this paper, we study the crosstalk-free permutations in rearrangeable optical MINs built on a combination of horizontal expansion and vertical stacking of banyan networks, and provide a scheme for realizing crosstalk-free permutations in this kind of optical MINs. The basic idea of this scheme is to first decompose a permutation into multiple partial permutations by using Euler Split technique, then route and realize each of these partial permutations crosstalk-free in one plane (stacked copy) of a MIN based on both the Euler Split technique and self-routing property of a banyan network. The tradeoff between the overall time complexity and hardware cost of this class of MINs is also explored in this paper.
Jiman HONG Sangsu KIM Yookun CHO
Forked checkpointing scheme is proposed to achieve low checkpoint overhead. When a process wants to take a checkpoint in the forked checkpointing scheme, it creates a child process and continues its normal computation. Two recovery models can be used for forked checkpointing when the parent process fails before the child process establishes the checkpoint. One is the pessimistic recovery model where the recovery process rolls back to the previous checkpoint state. The other is the optimistic recovery model where a recovery process waits for the checkpoint to be established by the child process. In this paper, we present the recovery models for forked checkpointing by deriving the expected execution time of a process with and without checkpointing and also show that the expected recovery time of the optimistic recovery model is smaller than that of the pessimistic recovery model.
Hsin-Chuan CHEN Jen-Shiun CHIANG
In the design of a set-associative cache, maintaining low average access time and reducing the average energy dissipation are important issues. In this paper, we propose a set-associative cache that can provide the flexibility to configure its associativity according to different program behaviors, which means that the proposed cache scheme can be configured from n-way set-associative cache to direct-mapped cache. Besides, the proposed cache scheme also can disable all tag-subarrays and only enable a desired data-subarray when adjacent memory references are within the same block as the previous access. By this scheme, the power consumption can be saved when an n-way set-associative cache configures the cache with lower associativity (less than n) due to only enabling fewer subarrays of the tag memory and data memory, and when the tag checking is eliminated for the intra-block access due to disabling all subarrays of the tag memory. However, the performance is still maintained to the same as the conventional set-associative cache or the direct-mapped cache.
Mostafa SOLIMAN Stanislav SEDUKHIN
Within a few years it will be possible to integrate a billion transistors on a single chip operating at frequency more than 10 GHz. At this integration level, we propose using a multi-level ISA to express fine-grain data parallelism to hardware instead of using a huge transistor budget to dynamically extract it. Since the fundamental data structures for a wide variety of data parallel applications are scalar, vector, and matrix, our proposed Trident processor extends a scalar ISA with vector and matrix instruction sets to effectively process matrix formulated applications. Like vector architectures, the Trident processor consists of a set of parallel lanes (each lane contains a set of vector pipelines and a slice of register file) combined with a fast scalar core. However, Trident processor can effectively process on the parallel lanes not only vector but also matrix data. One key point of our architecture is the local communication within and across lanes to overcome the limitations of the future VLSI technology. Another key point is the effective execution of a mixture of scalar, vector, and matrix operations. This paper describes the architecture of the Trident processor and evaluates its performance on BLAS and on the standard matrix bidiagonalization algorithm. The last one is evaluated as an example of an entire application based on a mixture of scalar, vector, and matrix operations. Our results show that many data parallel applications, such as scientific, engineering, multimedia, etc., can be speeded up on the Trident processor. Besides, the scalability of the Trident processor does not require more fetch, decode, or issue bandwidth, but requires only replication of parallel lanes.
Dirk FIMMEL Jan MULLER Renate MERKER
We present a new approach to the loop scheduling problem, which excels previous solutions in two important aspects: The resource constraints are formulated using flow graphs, and the initiation interval λ is treated as a rational variable. The approach supports heterogeneous processor architectures and pipelined functional units, and the Integer Linear Programming implementation produces an optimum loop schedule, whereby a minimum λ is achieved. Our flow graph model facilitates the cyclic binding of loop operations to functional units. Compared to previous research results, the solution can provide faster loop schedules and a significant reduction of the problem complexity and solution time.
Norman SCAIFE Ryoko HAYASHI Susumu HORIGUCHI
We have constructed a parallelizing compiler for Standard ML (SML) based upon algorithmic skeletons. We present an implementation of a Parallel Molecular Dynamics (PMD) simulation in order to compare our functional approach with a traditional imperative approach. Although we present performance data, the principal benefits from our approach are in the modularity of the code and the ease of programming. Extant FORTRAN90 code for an O(N 2) algorithm is translated, firstly into imperative SML and then into purely functional SML which is then parallelized. The ease of programming and the performance of the FORTRAN90 and SML code are compared. Modest parallel performance is obtained from the parallel SML but with a much slower sequential execution time compared to the FORTRAN90. We then improve the implementation with a ring topology implementation which gives much closer performance to the FORTRAN90 implementation.
Many cooperated web cache systems and protocols have been proposed. These systems, however, require expensive resources, such as external bandwidth and CPU power or storage of a proxy, while inducing hefty administrative costs to achieve adequate client population growth. Moreover, a scalability problem in the cache server management still exists. This paper suggests peer-to-peer client-clustering. The client-cluster provides a proxy cache with backup storage which is comprised of the residual resources of the clients. We use DHT based peer-to-peer lookup protocol to manage the client-cluster. With the natural characteristics of this protocol, the client-cluster is self-organizing, fault-tolerant, well-balanced and scalable. Additionally, we propose the Backward ICP which is used to communicate between the proxy cache and the client-cluster, to reduce the overhead of the object replication and to use the resources more efficiently. We examine the performance of the client-cluster via a trace driven simulation and demonstrate effective enhancement of the proxy cache performance.
Yuh-Rau WANG Shi-Jinn HORNG Yu-Hua LEE Pei-Zong LEE
Based on the dimensionality reduction technique and the solution for proximate points problem, we achieve the optimality of the three-dimensional Euclidean distance transform (3D_EDT) computation. For an N
Bing Bing ZHOU Andrzej M. GOSCINSKI Richard P. BRENT
Applying gang scheduling can alleviate the blockade problem caused by exclusively used space-sharing strategies for parallel processing. However, the original form of gang scheduling is not practical as there are several fundamental problems associated with it. Recently many researchers have developed new strategies to alleviate some of these problems. Unfortunately, one important problem has not been so far seriously addressed, that is, how to set the length of time slots to obtain a good performance of gang scheduling. In this paper we present a strategy to deal with this important issue for efficient gang scheduling.
Distributed configuration management involves maintaining a set of distributed storage and processing resources in such a way that they serve a community of users fairly, promptly, reliably and securely. Security has recently received much attention due to the general anxiety of hacking. Parallel computing systems such as clusters of workstations are no exception to this threat. This paper discusses experiments that measure the effect of employing randomisation in the scheduling of interdependent user and management tasks onto a distributed system such as clusters of workstations. Two attributes are investigated, namely performance and security. Performance is usually the prime objective in task scheduling. In this work the scheduling problem is viewed as a multi-objective optimisation problem where there is a subtle balance between efficient schedules and security. A schedule is secure if it is not vulnerable to malicious acts or inadvertent human errors. Further, the scheduling model should be hidden from unauthorised observers. The results of the study support the use of randomisation in the scheduling of tasks over an insecure network of processing nodes inhabited by malicious users.
Biplab KUMER SARKER Anil KUMAR TRIPATHI Deo PRAKASH VIDYARTHI Kuniaki UEHARA
A Distributed Computing System (DCS) contributes in proper partitioning of the tasks into modules and allocating them to various nodes so as to enable parallel execution of their modules by individual different processing nodes of the system. The scheduling of various modules on particular processing nodes may be preceded by appropriate allocation of modules of the different tasks to various processing nodes and then only the appropriate execution characteristic can be obtained. A number of algorithms have been proposed for allocation of tasks in a DCS. Most of the solutions proposed had simplifying assumptions. The very first assumption has been: consideration of a single task with their corresponding modules only; second, no consideration of the status of processing nodes in terms of the previously allocated modules of various tasks and third, the capacity and capability of the processing nodes. This work proposes algorithms for a realistic situation wherein multiple tasks with their modules compete for execution on a DCS dynamically considering their architectural capability. In this work, we propose two algorithms based on the two well-known A* and GA for the task allocation models. The paper explains the algorithms elaborately by illustrated examples and presents a comparative performance study among our algorithms and the algorithms for task allocation proposed in the various literatures. The results demonstrate that our GA based task allocation algorithm achieves better performance compared with the other algorithms.
Most heuristics for the task scheduling problem employ a simple model of the target system, assuming fully connected processors, a dedicated communication subsystem and no contention for communication resources. A small number of algorithms is aware of the contention, using an undirected graph model of the communication network. Although, many scheduling algorithms have been compared in the literature, little is known about the accuracy and appropriateness of the employed models. This article evaluates the accuracy of task scheduling algorithms on generic parallel systems. The performed experiments show a significant inaccuracy of the schedules produced. In an extensive analysis, the reasons for these results are identified and the implications for the scheduling model are discussed.
In this paper, we give an algorithm for the node-to-set disjoint paths problem in pancake graphs with its evaluation results. The algorithm is of polynomial order of n for an n-pancake graph. It is based on recursion and divided into two cases according to the distribution of destination nodes in classes into which all the nodes in a pancake graph are categorized. The sum of lengths of paths obtained and the time complexity of the algorithm are estimated and the average performance is evaluated based on computer simulation.
A minimum feedback node set in a graph is a minimum node subset whose deletion makes the graph acyclic. Its detection in a dependency graph is important to recover from a deadlock configuration. A livelock configuration is also avoidable if a check point is set in each node in the minimum feedback node set. Hence, its detection is very important to establish dependable network systems. In this letter, we give a minimum feedback node set in a trivalent Cayley graph. Assuming that each word has n bits, for any node, we can judge if it is included in the set or not in constant time.
Hitoshi ISAHARA Fumito MASUI Manabu OKUMURA
This paper describes a practical Japanese natural language Question Answering system adopting effective selection of dynamic passages, Lexico-Semantic Patterns (LSP), and Predictive Answer Indexing. By analyzing the previous TREC QA data, we defined a dynamic passage unit and developed a passage selection method suitable for Question Answering. Using LSP, we identify the answer type of a question and detect answer candidates without any deep linguistic analyses of the texts. To guarantee a short response time, Predictive Answer Indexing is combined into our overall system architecture. As a result of the three engineering techniques, our system showed excellent performance when evaluated by mean reciprocal rank (MRR) in NTCIR-3 QAC-1.
In a question-answering (QA) task, "real" information retrieval, rather than document retrieval is required. Effective QA thus involves complicated and time-consuming processing, such as natural language processing and named-entity processing. To reduce the amount of processing needed, the quantity of documents in a database can be narrowed down during an initial stage of the answering procedure. This paper proposes a new evaluation measurement and compares the retrieval accuracy of initial-stage searching that uses "overall" document retrieval and "partial" passage retrieval with the TREC QA data set. The initial search and final result accuracy for various cutoff points defined according to the number of documents or words that are output is evaluated. A variety of experiments demonstrate that middle-length passage-retrieval is effective for QA, and short-length passage-retrieval could improve the accuracy of the final result for a specific question type.
Tatsunori MORI Tomohiro OHTA Katsuyuki FUJIHATA Ryutaro KUMON
In this paper, we propose a method to introduce an A* search control into sentential matching mechanism for question-answering systems, in order to reduce the response time while the accuracy of the answer is preserved. The question answering is a new technology to retrieve not relevant documents but the answer(s) directly by combining several methodology including IR and IE. One of the essential processes is the sentential matching between the user's query and each sentence in documents. In general, in order to obtain matching score precisely in higher resolution, we need some processes with higher computational costs. We therefore introduce an A* search in which both the processing cost and the resolution of matching score are took into account simultaneously. According to the experiments in NTCIR3 QAC1 Task1, the system with the controlled search is 3.4-8.5 times faster than the system with no control.
This paper presents a Japanese Question Answering (QA) system based on a "Question Answering as Abduction" perspective. This perspective regards QA as the process of abductively explaining why a question is true based on logical contents of appropriately described textual information. This perspective is strongly inspired by Jerry Hobbs et al.'s "Interpretation as Abduction". It is also a simple conceptualization of Harabagiu et al.'s logic based QA system. We reify this concept in our QA system called SAIQA-Is. This system was designed to output only most likely answer candidates to a question. This system was participated in NTCIR QAC1. SAIQA-Is provided very good results in Task 2 and Task 3 of the QAC experiments. This results demonstrated strong feasibility and high potential of our Question Answering as Abduction approach.
Tetsuro TAKAHASHI Kozo NAWATA Kentaro INUI Yuji MATSUMOTO
In this paper, we propose an answer seeking algorithm for question answering that integrates structural matching and paraphrasing, and report the results of our empirical evaluation conducted with the aim of examining effects of incorporating those two components. According to the results, the contribution of structural matching and paraphrasing was not so large as expected. Based on error analysis, we conclude that structural matching-based approaches to answer seeking require technologies for (a) coreference resolution, (b) processing of parse forests instead of parse trees, and (c) large-scale acquisition of paraphrase patterns.
Naoaki OKAZAKI Yutaka MATSUO Naohiro MATSUMURA Mitsuru ISHIZUKA
Although there has been a great deal of research on automatic summarization, most methods rely on statistical methods, disregarding relationships between extracted textual segments. We propose a novel method to extract a set of comprehensible sentences which centers on several key points to ensure sentence connectivity. It features a similarity network from documents with a lexical dictionary, and spreading activation to rank sentences. We show evaluation results of a multi-document summarization system based on the method participating in a competition of summarization, TSC (Text Summarization Challenge) task, organized by the third NTCIR project.
Youngjoong KO Kono KIM Jungyun SEO
Automatic text summarization has the goal of reducing the size of a document while preserving its content. Generally, producing a summary as extracts is achieved by including only sentences which are the most topic-related. DOCUSUM is our summarization system based on a new topic keyword identification method. The process of DOCUSUM is as follows. First, DOCUSUM converts the content words of a document into elements of a context vector space. It then constructs lexical clusters from the context vector space and identifies core clusters. Next, it selects topic keywords from the core clusters. Finally, it generates a summary of the document using the topic keywords. In the experiments on various compression ratios (the compression of 30%, the compression of 10%, and the extraction of the fixed number of sentences: 4 or 8 sentences), DOCUSUM showed better performance than other methods.
Tsutomu HIRAO Kazuhiro TAKEUCHI Hideki ISOZAKI Yutaka SASAKI Eisaku MAEDA
In this paper, we propose a machine learning-based method of multi-document summarization integrating sentence extraction with bunsetsu elimination. We employ Support Vector Machines for both of the modules used. To evaluate the effect of bunsetsu elimination, we participated in the multi-document summarization task at TSC-2 by the following two approaches: (1) sentence extraction only, and (2) sentence extraction + bunsetsu elimination. The results of subjective evaluation at TSC-2 show that both approaches are superior to the Lead-based method from the viewpoint of information coverage. In addition, we made extracts from given abstracts to quantitatively examine the effectiveness of bunsetsu elimination. The experimental results showed that our bunsetsu elimination makes summaries more informative. Moreover, we found that extraction based on SVMs trained by short extracts are better than the Lead-based method, but that SVMs trained by long extracts are not.
Hiroyuki SAKAI Shigeru MASUYAMA
This paper proposes a statistical method of acquiring knowledge about the abbreviation possibility of some of multiple adnominal phrases. Our method calculates weight values of multiple adnominal phrases by mutual information based on the strength of relation between the adnominal phrases and modified nouns. Among adnominal phrases, those having relatively low weight values are deleted. The evaluation of our method by experiments shows that precision attains about 84.1% and recall attains about 59.2%, respectively.
Tatsumi YOSHIDA Shigeru MASUYAMA
We developed a multiple document summarization system GOLD. This system generates a single summary from relevant newspaper articles with any summarization rate specified by a user. GOLD is incorporated a number of methods to summarize. In particular, some methods for sentence reduction are useful to shorten each sentence. As a result, it increased the number of outputted sentences which include important information. We participated in task B of NTCIR3 TSC2 to evaluate this system. GOLD exhibits a good performance in content-based evaluation which suggests that summarization methods employed by GOLD are promising for practical use.
Teiji FURUGORI Rihua LIN Takeshi ITO Dongli HAN
Described here is an automatic text summarization system for Japanese newspaper articles on sassho-jiken (murders and bodily harms). We extract the pieces of information from a text, inter-connect them to represent the scenes and participants involved in the sassho-jiken, and finally produce a summary by generating sentences from the information extracted. An experiment and its evaluation show that, while a limitation being imposed on the domain, our method works well in depicting important information from the newspaper articles and the summaries produced are better in adequacy and readability than those obtained by extracting sentences.
Akira TERADA Takenobu TOKUNAGA
Nominalization is a linguistic phenomenon in which events usually described in terms of clauses are expressed in the form of noun phrases. Extracting event structures is an important task in text mining applications. To achieve this goal, clauses are parsed and the argument structure of main verbs are extracted from the parsed results. This kind of preprocessing has been commonly done in the past research. In order to extract event structure from nominalized phrases as well, we need to establish a technique to transform nominalized phrases into clauses. In this paper, we propose a method to transform nominalized phrases into clauses by using corpus-based approach. The proposed method first enumerates possible predicate/argument structures by referring to a nominalized phrase (noun phrase) and makes their ranking based on the frequency of each argument in the corpus. The algorithm based on this method was evaluated using a corpus consisting of 24,626 aviation safety reports in English and it achieved a 78% accuracy in transformation. The algorithm was also evaluated by applying a text mining application to extract events and their cause-effect relations from the texts. This application produced an improvement in the text mining application's performance.
Retrieval effectiveness depends on both the retrieval model and how terms are extracted and indexed. For Chinese, Japanese and Korea text, there are no spaces to delimit words. Indexing using hybrid terms (i.e. words and bigrams) was found to be effective and efficient using the 2-Poisson model in NTCIR-III open evaluation workshop. Here, we explore another Okapi weight, BM25, based on the 2-Poisson model and compared their performances with bigram and word indexing strategies. Results show that word indexing is the most efficient in terms of indexing time and storage but hybrid term indexing requires the least amount of retrieval time per query. Without pseudo-relevance feedback (PRF), our BM25 appeared to yield better retrieval effectiveness performance for short queries. With PRF, our implementation of the BM11 weights, which are a simplified version of BM25, with hybrid term indexing remains the most effective combination for retrieval in this study.
Koichi KISE Markus JUNKER Andreas DENGEL Keinosuke MATSUMOTO
Document retrieval is a fundamental but important task for intelligent access to a huge amount of information stored in documents. Although the history of its research is long, it is still a hard task especially in the case that lengthy documents are retrieved with very short queries (a few keywords). For the retrieval of long documents, methods called passage-based document retrieval have proven to be effective. In this paper, we experimentally show that a passage-based method based on window passages is also effective for dealing with short queries on condition that documents are not too short. We employ a method called "density distributions" as a method based on window passages, and compare it with three conventional methods: the simple vector space model, pseudo relevance feedback and latent semantic indexing. We also compare it with a passage-based method based on discourse passages.
In this paper, we present our investigation of Latent Semantic Indexing (LSI) on the local query regions for solving the computation restrictions of the LSI on the global information space. Through the experiments with different SVD dimensionality on the local query regions, the results show that low-dimensional LSI can achieve much better precision than VSM and similar precision to global LSI. Such small SVD factors indicate that there is an almost linear surface in the local query regions. The largest or the two largest singular vectors have the ability to capture such a linear surface and benefit the particular query. In spite of the fact that Local LSI analysis needs to perform the Singular Value Decomposition (SVD) computation for each query, the surprisingly small requirements of the SVD dimension resolve the computation restrictions of LSI for large scale IR tasks. Moreover, on the condition that several relevant sample documents are available, application of low dimensional LSI for these documents can obtain comparable precision with the Local RF in a different manner.
Mechanisms used for results merging are very important for distributed search systems. They are to select the most relevant documents retrieved by different servers and put them on the top of the list returned to the end user. There are several approaches to solve key problems of this task such as eliminating duplicates and ranking results combined. But it is still not clear how to achieve this. We use the clustering technique to divide retrieved results into several groups and a metric on the base of the vector space model to arrange items inside each group. Preliminary tests were conducted using the OASIS system and several collections of real Internet data. They showed relatively superior results when compared to the neural network clustering and LSI calculation. Proposed mechanisms can be applied to metasearch systems and to distributed search systems as well because such mechanisms do not require any special information except standard de facto data received from servers.
Yoshiyuki TAKEDA Kyoji UMEMURA Eiko YAMAMOTO
Determining indexing strings is an important factor in information retrieval. Ideally, the strings should be words that represent documents or queries. Although any single word may be the first candidate for indexing strings for an English corpus, it may not be ideal due to the existence of compound nouns, which are often good indexing strings, and which often depend on the genre of the corpus used. The situation is even worse in Japanese or Chinese where the words are not separated by spaces. In this paper, we propose a method of determining indexing strings based on statistical analysis. The novel features of our method are to make the most of the statistical measure called "adaptation" and not to use language-dependent resources such as dictionaries and stop word lists. In evaluating our method using a Japanese test collection, we found that it actually improves the precision of information retrieval systems.
Takashi YUKAWA Sen YOSHIDA Kazuhiro KUWABARA
A framework is described for a peer-to-peer information exchange system, and a collaborative information retrieval (IR) scheme for the system is proposed. The aims of the system include smooth knowledge and information management to activate organizations or communities. Conventional server-centric systems are weak because they create information-providing bottlenecks. Accordingly, the proposed framework targets the collaborative inter-working of personal repositories that accumulate per-user information, and accept and service requests. Issues concerning the framework are addressed. One issue is the retrieval of information from another's personal repository; the retrieval criteria of a system are tightly personalized for its user. The system is assumed to employ a vector space model with a concept-base as its IR mechanism. The vector space on one system is very different from that on another system. Another issue is the automated control of the information-providing criteria. This paper presents solutions to the first problem. To achieve IR that provides satisfactory results to a user requiring information from another's personal repository, we need vector space equalization to compensate for the differences in the vector spaces of the personal repositories. The paper presents a vector space equalization scheme, the automated relevance feedback scheme, that compensates the differences in the vector spaces of the personal repositories. We implement the scheme as a system and evaluate its performance using documents on the Internet.
Self-organizing map is a widely used tool in high-dimensional data visualization. However, despite its benefits of plotting very high-dimensional data on a low-dimensional grid, browsing and understanding the meaning of a trained map turn to be a difficult task -- specially when number of nodes or the size of data increases. Though there are some well-known techniques to visualize SOMs, they mainly deals with cluster boundaries and they fail to consider raw information available in original data in browsing SOMs. In this paper, we propose our Factor controlled Hierarchical SOM that enables us select number of data to train and label a particular map based on a pre-defined factor and provides consistent hierarchical SOM browsing.
Koji EGUCHI Keizo OYAMA Emi ISHIDA Noriko KANDO Kazuko KURIYAMA
This paper proposes the evaluation methods for measuring retrieval effectiveness of Web search engine systems, attempting to make them suitable for real Web environment. With this objective, we constructed 100-gigabyte and 10-gigabyte document sets that were mainly gathered from the '.jp' domain, and conducted an evaluation workshop at the third NTCIR Workshop from 2001 to 2002, where we assessed the retrieval effectiveness of a certain number of Web search engine systems using the common data set. Conventional evaluation workshops assessed the relevance of the retrieved documents, which were submitted by the workshop participants, by considering the contents of individual pages. On the other hand, we assessed the relevance of the retrieved pages by considering the relationship between the pages referenced by hyperlinks.
Jianliang XU Yong CHEN Tsunehiro YOSHINAGA Katsushi INOUE
This paper introduces a 1-inkdot two-way alternating pushdown automaton which is a two-way alternating pushdown automaton (2apda) with the additional power of marking at most 1 tape-cell on the input (with an inkdot) once. We first investigate a relationship between the accepting powers of sublogarithmically space-bounded 2apda's with and without 1 inkdot, and show, for example, that sublogarithmically space-bounded 2apda's with 1 inkdot are more powerful than those which have no inkdots. We next investigate an alternation hierarchy for sublogarithmically space-bounded 1-inkdot 2apda's, and show that the alternation hierarchy on the first level for 1-inkdot 2apda's holds, and we also show that 1-inkdot two-way nondeterministic pushdown automata using sublogarithmic space are incomparable with 1-inkdot two-way alternating pushdown automata with only universal states using the same space.
In most cases of distributed memory computations, node programs are executed on processors according to the owner computes rule. However, owner computes rule is not best suited for irregular application codes. In irregular application codes, use of indirection in accessing left hand side array makes it difficult to partition the loop iterations, and because of use of indirection in accessing right hand side elements, we may reduce total communication by using heuristics other than owner computes rule. In this paper, we propose a communication cost reduction computes rule for irregular loop partitioning, called least communication computes rule. We partition a loop iteration to a processor on which the minimal communication cost is ensured when executing that iteration. Then, after all iterations are partitioned into various processors, we give global vs. local data transformation rule, indirection arrays remapping and communication optimization methods. The experimental results show that, in most cases, our approaches achieved better performance than other loop partitioning rules.
Manabu OHTA Atsuhiro TAKASU Jun ADACHI
Optical Character Reader (OCR) incorrect recognition is a serious problem when searching for OCR-scanned documents in databases such as digital libraries. In order to reduce costs, this paper proposes fuzzy retrieval methods for English text containing errors in the recognized text without correcting the errors manually. The proposed methods generate multiple search terms for each input query term based on probabilistic automata which reflect both error-occurrence probabilities and character-connection probabilities. Experimental results of test-set retrieval indicate that one of the proposed methods improves the recall rate from 95.96% to 98.15% at the cost of a decrease in precision from 100.00% to 96.01% with 20 expanded search terms.
Association mining extracts common relationships among a finite number of categorical data objects in a set of transactions. However, if the data objects are not categorical and potentially unlimited, it is impossible to employ the association mining approach. On the other hand, clustering is suitable for modeling a large number of non-categorical data objects as long as there exists a distance measure among them. Although it has been used to classify data objects in a data set into groups of similar objects based on data similarity, it can be used to extract the properties of similar data objects commonly appearing in a set of transactions. In this paper, a new clustering method, CLOCK, is proposed to find common knowledge such as frequent ranges of similar objects in a set of transactions. The common knowledge of data objects in the transactions can be represented by the occurrence frequency of similar data objects in terms of a transaction as well as the common repetitive ratio of similar data objects in each transaction. Furthermore, the proposed method also addresses how to maintain identified common knowledge as a summarized profile. As a result, any data difference between a newly collected transaction and the common knowledge of past transactions can be easily identified.
Yaokai FENG Akifumi MAKINOUCHI
In light of the increasing number of computer applications that rely heavily on multimedia data, the database community has focused on the management and retrieval of multidimensional data. Nearest Neighbor queries (NN queries) have been widely used to perform content-based retrieval (e.g., similarity search) in multimedia applications. Incremental NN (INN) query is a kind of NN queries and can also be used when the number of the NN objects to be retrieved is not known in advance. This paper points out the weaknesses of the existing INN search algorithms and proposes a new one, called Batch-Incremental Nearest Neighbor search algorithm (denoted B-INN search algorithm), which can be used to process the INN query efficiently. The B-INN search algorithm is different from the existing INN search algorithms in that it does not employ the priority queue that is used in the existing INN search algorithms and is very CPU and memory intensive for large databases in high-dimensional spaces. And it incrementally reports b(b > 1) objects simultaneously (Batch-Incremental), whereas the existing INN search algorithms report the neighbors one by one. In order to implement the B-INN search, a new search (called k-d-NN search) with a new pruning strategy is proposed. Performance tests indicate that the B-INN search algorithm clearly outperforms the existing INN search algorithms in high-dimensional spaces.
Hiroshi YAMAGUCHI Atsushi KITAZAWA Hiroshi DOI Kaoru KUROSAWA Shigeo TSUJII
In this paper we present a new, two-centered electronic voting scheme that is capable of preserving privacy, universal verifiability, and robustness. An interesting property of our scheme is the use of double encryption with additive homomorphic encryption functions. In the two-centered scheme, the first center decrypts the ballots, checks the eligibility of the voters, and multiplies each eligible vote, which is still encrypted in the cryptosystem of the second center. After the deadline is reached, the second center obtains the final tally by decrypting the accumulated votes. As such, both centers cannot know the content of any individual vote, as each vote is hidden in the accumulated result, therefore the privacy of the voters is preserved. Our protocols, together with some existing protocols, allow everyone to verify that all valid votes are correctly counted. We apply the r-th residue cryptosystem as the homomorphic encryption function. Although decryption in the r-th residue cryptosystem requires an exhaustive search for all possible values, based on experiments we show that it is possible to achieve desirable performance for large-scale elections.
Xicheng LIU Shin HIBINO Taizo HANAI Toshiaki IMANISHI Tatsuaki SHIRATAKI Tetsuo OGAWA Hiroyuki HONDA Takeshi KOBAYASHI
A brain-computer interface using an electroencephalogram as input into an artificial neural network is investigated as a potentially general control system applicable to all subjects and time frames. Using the intent and imagination of bending the left or right elbow, the left and right desired movements are successfully distinguished using event-related desynchronization resolved by fast Fourier transformation of the electroencephalogram and analysis of the power spectrum using the artificial neural network. The influence of age was identified and eliminated through the use of a frequency distribution in the α band, and the recognition rate was further improved by confirmation based on forced excitement of the β band in the case of an error. The proposed system was effectively trained for general use by using the combined data of a cross-section of subjects.
Hanseok KO Ilkwang LEE Jihyo LEE David HAN
In this paper, we develop an image-based tracking algorithm of multiple vehicles performing effective detection and tracking of moving objects under adverse environmental conditions. In particular, we employ low cost commercial off-the-shelf IR or CCD image sensor for generating continuous images of multiple moving vehicles. The motion in image sequences is first detected by adaptive background estimation and then tracked by Kalman filtering with the attribute information being updated by data association. Upon applying a modified Retinex procedure as preprocessing to reduce the illumination effects, we proceed with a two-step tracking algorithm. The first step achieves blob grouping and then judicially selects the true targets for tracking using data association through information registration. In the second stage, all blobs detected go through a validation for screening as well as for occlusion reasoning, and those found pertinent to the real object survive to become the 'Object' state for stable tracking. The results of representative tests confirm its effectiveness in vehicle tracking under both daylight and nighttime conditions while resolving occlusions.
In this paper color image compression using a fuzzy Hopfield-model net based on rough-set reasoning is created to generate optimal codebook based on Vector Quantization (VQ) in Discrete Wavelet Transform (DWT). The main purpose is to embed rough-set learning scheme into the fuzzy Hopfield network to construct a compression system named Rough Fuzzy Hopfield Net (RFHN). First a color image is decomposed into 3-D pyramid structure with various frequency bands. Then the RFHN is used to create different codebooks for various bands. The energy function of RFHN is defined as the upper- and lower-bound fuzzy membership grades between training samples and codevectors. Finally, near global-minimum codebooks in frequency domain can be obtained when the energy function converges to a stable state. Therefore, only 3
Kohji INAGAKI Masahiro OKUDA Masaaki IKEHARA Shin-ichi TAKAHASHI
Due to the explosive growth of the network technologies, 3D models and animations have led to a great interest in various media. Especially 3D mesh models (3D meshes), which approximate surfaces by polygonal meshes are widely used to model 3D objects. In 1D and 2D signals such as speech, audio, images, video, etc., the signal values are located on "grids", for example the signals of images are defined on pixels. Thus, the errors of such signals can be explicitly defined by differences of the values on the "grids". However since in the 3D meshes, vertices are located on arbitrary positions in a 3D space and are triangulated in arbitrary ways, the grids cannot be defined. This makes it difficult to measure error on the 3D meshes. In this paper, we propose a new numerical method to measure the errors between two different 3D meshes.
The objective of this study was to explore suitable spatial filters for inverse estimation of cortical potentials from the scalp electroencephalogram. The effect of incorporating noise covariance into inverse procedures was examined by computer simulations. The parametric projection filter, which allows inverse estimation with the presence of information on the noise covariance, was applied to an inhomogeneous three-concentric-sphere model under various noise conditions in order to estimate the cortical potentials from the scalp potentials. The present simulation results suggest that incorporation of information on the noise covariance allows better estimation of cortical potentials, than inverse solutions without knowledge about the noise covariance, when the correlation between the signal and noise is low. The method for determining the optimum regularization parameter, which can be applied for parametric inverse techniques, is also discussed.
Yoshiki KAWATA Noboru NIKI Hironobu OHMATSU Noriyuki MORIYAMA
Accurately segmenting and quantifying pulmonary nodule structure is a key issue in three-dimensional (3-D) computer-aided diagnosis (CAD) schemes. This paper presents a nodule segmentation method from 3-D thoracic CT images based on a deformable surface model. In this method, first, a statistical analysis of the observed intensity is performed to measure differences between the nodule and other regions. Based on this analysis, the boundary and region information are represented by boundary and region likelihood, respectively. Second, an initial surface in the nodule is manually set. Finally, the deformable surface model moves the initial surface so that the surface provides high boundary likelihood and high posterior segmentation probability with respect to the nodule. For the purpose, the deformable surface model integrates the boundary and region information. This integration makes it possible to cope with inappropriate position or size of an initial surface in the nodule. Using the practical 3-D thoracic CT images, we demonstrate the effectiveness of the proposed method.
Zhe-Ming LU Hao-Tian WU Dian-Guo XU Sheng-He SUN
This paper presents an image watermarking method for two purposes: to notify the copyright owner with a visible watermark, and to protect the copyright with an invisible watermark. These two watermarks are embedded in different blocks with different methods. Simulation results show that the visible watermark is hard to remove and the invisible watermark is robust.
This paper describes a method of analyzing musical sound using a self-organizing map. To take compound factors into account, energy spectra whose frequency ranges were based on the psycho-acoustic experiments were used as input data. Results for music compact discs confirmed that our method could effectively display the positioning and relationships among musical sounds on a map.