Tomoki NAKAMURA Naoki HAYASHI Masahiro INUIGUCHI
In this paper, we consider distributed decision-making over directed time-varying multi-agent systems. We consider an adversarial bandit problem in which a group of agents chooses an option from among multiple arms to maximize the total reward. In the proposed method, each agent cooperatively searches for the optimal arm with the highest reward by a consensus-based distributed Exp3 policy. To this end, each agent exchanges the estimation of the reward of each arm and the weight for exploitation with the nearby agents on the network. To unify the explored information of arms, each agent mixes the estimation and the weight of the nearby agents with their own values by a consensus dynamics. Then, each agent updates the probability distribution of arms by combining the Hedge algorithm and the uniform search. We show that the sublinearity of a pseudo-regret can be achieved by appropriately setting the parameters of the distributed Exp3 policy.
Yihan DONG Shiyao DING Takayuki ITO
This paper presents the design and implementation of an automated multi-phase facilitation agent based on LLM to realize inclusive facilitation and efficient use of a large language model (LLM) to facilitate realistic discussions. Large-scale discussion support systems have been studied and implemented very widely since they enable a lot of people to discuss remotely and within 24 hours and 7 days. Furthermore, automated facilitation artificial intelligence (AI) agents have been realized since they can efficiently facilitate large-scale discussions. For example, D-Agree is a large-scale discussion support system where an automated facilitation AI agent facilitates discussion among people. Since the current automated facilitation agent was designed following the structure of the issue-based information system (IBIS) and the IBIS-based agent has been proven that it has superior performance. However, there are several problems that need to be addressed with the IBIS-based agent. In this paper, we focus on the following three problems: 1) The IBIS-based agent was designed to only promote other participants' posts by replying to existing posts accordingly, lacking the consideration of different behaviours taken by participants with diverse characteristics, leading to a result that sometimes the discussion is not sufficient. 2) The facilitation messages generated by the IBIS-based agent were not natural enough, leading to consequences that the participants were not sufficiently promoted and the participants did not follow the flow to discuss a topic. 3) Since the IBIS-based agent is not combined with LLM, designing the control of LLM is necessary. Thus, to solve the problems mentioned above, the design of a phase-based facilitation framework is proposed in this paper. Specifically, we propose two significant designs: One is the design for a multi-phase facilitation agent created based on the framework to address problem 1); The other one is the design for the combination with LLM to address problem 2) and 3). Particularly, the language model called “GPT-3.5” is used for the combination by using corresponding APIs from OPENAI. Furthermore, we demonstrate the improvement of our facilitation agent framework by presenting the evaluations and a case study. Besides, we present the difference between our framework and LangChain which has generic features to utilize LLMs.
Keita TERASHIMA Koichi KOBAYASHI Yuh YAMASHITA
In a multi-agent system, it is important to consider a design method of cooperative actions in order to achieve a common goal. In this paper, we propose two novel multi-agent reinforcement learning methods, where the control specification is described by linear temporal logic formulas, which represent a common goal. First, we propose a simple solution method, which is directly extended from the single-agent case. In this method, there are some technical issues caused by the increase in the number of agents. Next, to overcome these technical issues, we propose a new method in which an aggregator is introduced. Finally, these two methods are compared by numerical simulations, with a surveillance problem as an example.
Junya YOSHIDA Naoki HAYASHI Shigemasa TAKAI
This paper presents a quantized gradient descent algorithm for distributed nonconvex optimization in multiagent systems that takes into account the bandwidth limitation of communication channels. Each agent encodes its estimation variable using a zoom-in parameter and sends the quantized intermediate variable to the neighboring agents. Then, each agent updates the estimation by decoding the received information. In this paper, we show that all agents achieve consensus and their estimated variables converge to a critical point in the optimization problem. A numerical example of a nonconvex logistic regression shows that there is a trade-off between the convergence rate of the estimation and the communication bandwidth.
Takahiro OGURA Haiyan WANG Qiyao WANG Atsuki KIUCHI Chetan GUPTA Naoshi UCHIHIRA
We propose a penalty-based and constraint Bayesian optimization methods with an agent-based supply-chain (SC) simulator as a new Monte Carlo optimization approach for multi-echelon inventory management to improve key performance indicators such as inventory cost and sales opportunity loss. First, we formulate the multi-echelon inventory problem and introduce an agent-based SC simulator architecture for the optimization. Second, we define the optimization framework for the formulation. Finally, we discuss the evaluation of the effectiveness of the proposed methods by benchmarking it against the most commonly used genetic algorithm (GA) in simulation-based inventory optimization. Our results indicate that the constraint Bayesian optimization can minimize SC inventory cost with lower sales opportunity loss rates and converge to the optimal solution 22 times faster than GA in the best case.
With the high development of computation requirements in Internet of Things, resource-limited edge servers usually require to cooperate to perform the tasks. Most related studies usually assume a static cooperation approach which might not suit the dynamic environment of edge computing. In this paper, we consider a dynamic cooperation approach by guiding edge servers to form coalitions dynamically. It raises two issues: 1) how to guide them to optimally form coalitions and 2) how to cope with the dynamic feature where server statuses dynamically change as the tasks are performed. The coalitional Markov decision process (CMDP) model proposed in our previous work can handle these issues well. However, its basic solution, coalitional Q-learning, cannot handle the large scale problem when the task number is large in edge computing. Our response is to propose a novel algorithm called deep coalitional Q-learning (DCQL) to solve it. To sum up, we first formulate the dynamic cooperation problem of edge servers as a CMDP: each edge server is regarded as an agent and the dynamic process is modeled as a MDP where the agents observe the current state to formulate several coalitions. Each coalition takes an action to impact the environment which correspondingly transfers to the next state to repeat the above process. Then, we propose DCQL which includes a deep neural network and so can well cope with large scale problem. DCQL can guide the edge servers to form coalitions dynamically with the target of optimizing some goal. Furthermore, we run experiments to verify our proposed algorithm's effectiveness in different settings.
Distributed edge cloud computing is an important computation infrastructure for Internet of Things (IoT) and its task offloading problem has attracted much attention recently. Most existing work on task offloading in distributed edge cloud computing usually assumes that each self-interested user owns one edge server and chooses whether to execute its tasks locally or to offload the tasks to cloud servers. The goal of each edge server is to maximize its own interest like low delay cost, which corresponds to a non-cooperative setting. However, with the strong development of smart IoT communities such as smart hospital and smart factory, all edge and cloud servers can belong to one organization like a technology company. This corresponds to a cooperative setting where the goal of the organization is to maximize the team interest in the overall edge cloud computing system. In this paper, we consider a new problem called cooperative task offloading where all edge servers try to cooperate to make the entire edge cloud computing system achieve good performance such as low delay cost and low energy cost. However, this problem is hard to solve due to two issues: 1) each edge server status dynamically changes and task arrival is uncertain; 2) each edge server can observe only its own status, which makes it hard to optimize team interest as global information is unavailable. For solving these issues, we formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP) which can well handle the dynamic features under partial observations. Then, we apply a multi-agent reinforcement learning algorithm called value decomposition network (VDN) and propose a VDN-based task offloading algorithm (VDN-TO) to solve the problem. Specifically, the motivation is that we use a team value function to evaluate the team interest, which is then divided into individual value functions for each edge server. Then, each edge server updates its individual value function in the direction that can maximize the team interest. Finally, we choose a part of a real dataset to evaluate our algorithm and the results show the effectiveness of our algorithm in a comparison with some other existing methods.
Jion HIROSE Junya NAKAMURA Fukuhito OOSHITA Michiko INOUE
We study the gathering problem requiring a team of mobile agents to gather at a single node in arbitrary networks. The team consists of k agents with unique identifiers (IDs), and f of them are weakly Byzantine agents, which behave arbitrarily except falsifying their identifiers. The agents move in synchronous rounds and cannot leave any information on nodes. If the number of nodes n is given to agents, the existing fastest algorithm tolerates any number of weakly Byzantine agents and achieves gathering with simultaneous termination in O(n4·|Λgood|·X(n)) rounds, where |Λgood| is the length of the maximum ID of non-Byzantine agents and X(n) is the number of rounds required to explore any network composed of n nodes. In this paper, we ask the question of whether we can reduce the time complexity if we have a strong team, i.e., a team with a few Byzantine agents, because not so many agents are subject to faults in practice. We give a positive answer to this question by proposing two algorithms in the case where at least 4f2+9f+4 agents exist. Both the algorithms assume that the upper bound N of n is given to agents. The first algorithm achieves gathering with non-simultaneous termination in O((f+|&Lambdagood|)·X(N)) rounds. The second algorithm achieves gathering with simultaneous termination in O((f+|&Lambdaall|)·X(N)) rounds, where |&Lambdaall| is the length of the maximum ID of all agents. The second algorithm significantly reduces the time complexity compared to the existing one if n is given to agents and |&Lambdaall|=O(|&Lambdagood|) holds.
The effect of provision of “Neither-Good-Nor-Bad” (NGNB) information on the perceived trustworthiness of agents has been investigated in previous studies. The experimental results have revealed several conditions under which the provision of NGNB information works effectively to make users perceive greater trust of agents. However, the experiments in question were carried out in a situation in which a user is able to choose, with the agent's advice, one of a limited number of options. In practical problems, we are often at a loss as to which to choose because there are too many possible options and it is not easy to narrow them down. Furthermore, in the above-mentioned previous studies, it was easy to predict the size of profits that a user would obtain because its pattern was also limited. This prompted us, in this paper, to investigate the effect of provision of NGNB information on the users' trust of agents under conditions where it appears to the users that numerous options are available. Our experimental results reveal that an agent that reliably provides NGNB information tends to gain greater user trust in a situation where it appears to the users that there are numerous options and their consequences, and it is not easy to predict the size of profits. However, in contradiction to the previous study, the results in this paper also reveal that stable provision of NGNB information in the context of numerous options is less effective in a situation where it is harder to obtain larger profits.
Takanori HARA Masahiro SASABE Shoji KASAHARA
Traffic congestion in road networks has been studied as the congestion game in game theory. In the existing work, the road usage by each agent was assumed to be static during the whole time horizon of the agent's travel, as in the classical congestion game. This assumption, however, should be reconsidered because each agent sequentially uses roads composing the route. In this paper, we propose a multi-agent distributed route selection scheme based on a gradient descent method considering the time-dependency among agents' road usage for vehicular networks. The proposed scheme first estimates the time-dependent flow on each road by considering the agents' probabilistic occupation under the first-in-first-out (FIFO) policy. Then, it calculates the optimal route choice probability of each route candidate using the gradient descent method and the estimated time-dependent flow. Each agent finally selects one route according to the optimal route choice probabilities. We first prove that the proposed scheme can exponentially converge to the steady-state at the convergence rate inversely proportional to the product of the number of agents and that of individual route candidates. Through simulations under a grid-like network and a real road network, we show that the proposed scheme can improve the actual travel time by 5.1% and 2.5% compared with the conventional static-flow based approach, respectively. In addition, we demonstrate that the proposed scheme is robust against incomplete information sharing among agents, which would be caused by its low penetration ratio or limited transmission range of wireless communications.
This paper proposes a switched pinning control method with a multi-rating mechanism for vehicle platoons. The platoons are expressed as multi-agent systems consisting of mass-damper systems in which pinning agents receive target velocities from external devices (ex. intelligent traffic signals). We construct model predictive control (MPC) algorithm that switches pinning agents via mixed-integer quadratic programmings (MIQP) problems. The optimization rate is determined according to the convergence rate to the target velocities and the inter-vehicular distances. This multi-rating mechanism can reduce the computational load caused by iterative calculation. Numerical results demonstrate that our method has a reduction effect on the string instability by selecting the pinning agents to minimize errors of the inter-vehicular distances to the target distances.
Fanying ZHENG Fu GU Yangjian JI Jianfeng GUO Xinjian GU Jin ZHANG
In the context of Web 2.0, the interaction between users and resources is more and more frequent in the process of resource sharing and consumption. However, the current research on resource pricing mainly focuses on the attributes of the resource itself, and does not weigh the interests of the resource sharing participants. In order to deal with these problems, the pricing mechanism of resource-user interaction evaluation based on multi-agent game theory is established in this paper. Moreover, the user similarity, the evaluation bias based on link analysis and punishment of academic group cheating are also included in the model. Based on the data of 181 scholars and 509 articles from the Wanfang database, this paper conducts 5483 pricing experiments for 13 months, and the results show that this model is more effective than other pricing models - the pricing accuracy of resource resources is 94.2%, and the accuracy of user value evaluation is 96.4%. Besides, this model can intuitively show the relationship within users and within resources. The case study also exhibits that the user's knowledge level is not positively correlated with his or her authority. Discovering and punishing academic group cheating is conducive to objectively evaluating researchers and resources. The pricing mechanism of scientific and technological resources and the users proposed in this paper is the premise of fair trade of scientific and technological resources.
Makoto YAMASHITA Naoki HAYASHI Takeshi HATANAKA Shigemasa TAKAI
This paper investigates a constrained distributed online optimization problem over strongly connected communication networks, where a local cost function of each agent varies in time due to environmental factors. We propose a distributed online projected subgradient method over unbalanced directed networks. The performance of the proposed method is evaluated by a regret which is defined by the error between the cumulative cost over time and the cost of the optimal strategy in hindsight. We show that a logarithmic regret bound can be achieved for strongly convex cost functions. We also demonstrate the validity of the proposed method through a numerical example on distributed estimation over a diffusion field.
Chiharu KATAOKA Osamu KUKIMOTO Yuichiro YOSHIKAWA Kohei OGAWA Hiroshi ISHIGURO
Connected services have been under development in the automotive industry. Meanwhile, the volume of predictive notifications that utilize travel-related data is increasing, and there are concerns that drivers cannot process such an amount of information or do not accept and follow such predictive instructions straightforwardly because the information provided is predicted. In this work, an interactive voice system using two agents is proposed to realize notifications that can easily be accepted by drivers and enhance the reliability of the system by adding contextual information. An experiment was performed using a driving simulator to compare the following three forms of notifications: (1) notification with no contextual information, (2) notification with contextual information using one agent, and (3) notification with contextual information using two agents. The notification content was limited to probable near-miss incidents. The results of the experiment indicate that the driver may decelerate more with the one- and two-agent notification methods than with the conventional notification method. The degree of deceleration depended the number of times the notification was provided and whether there were cars parked on the streets.
Makoto YAMASHITA Naoki HAYASHI Shigemasa TAKAI
This paper considers a distributed subgradient method for online optimization with event-triggered communication over multi-agent networks. At each step, each agent obtains a time-varying private convex cost function. To cooperatively minimize the global cost function, these agents need to communicate each other. The communication with neighbor agents is conducted by the event-triggered method that can reduce the number of communications. We demonstrate that the proposed online algorithm achieves a sublinear regret bound in a dynamic environment with slow dynamics.
Ryohei KAWATA Katsuhide FUJITA
Multi-time negotiation which repeats negotiations many times under the same conditions is an important class of automated negotiation. We propose a meta-strategy that selects an agent's individual negotiation strategy for multi-time negotiation. Because the performance of the negotiating agents depends on situational parameters, such as the negotiation domains and the opponents, a suitable and effective individual strategy should be selected according to the negotiation situation. However, most existing agents negotiate based on only one negotiation policy: one bidding strategy, one acceptance strategy, and one opponent modeling method. Although the existing agents effectively negotiate in most situations, they do not work well in particular situations and their utilities are decreased. The proposed meta-strategy provides an effective negotiation strategy for the situation at the beginning of the negotiation. We model the meta-strategy as a multi-armed bandit problem that regards an individual negotiation strategy as a slot machine and utility of the agent as a reward. We implement the meta-strategy as the negotiating agents that use existing effective agents as the individual strategies. The experimental results demonstrate the effectiveness of our meta-strategy under various negotiation conditions. Additionally, the results indicate that the individual utilities of negotiating agents are influenced by the opponents' strategies, the profiles of the opponent and its own profiles.
Yuta HOSOKAWA Katsuhide FUJITA
In recent years, agreement technologies have garnered interest among agents in the field of multi-agent systems. Automated negotiation is one of the agreement technologies, in which agents negotiate with each other to make an agreement so that they can solve conflicts between their preferences. Although most agents keep their own preferences private, it is necessary to estimate the opponent's preferences to obtain a better agreement. Therefore, opponent modeling is one of the most important elements in automated negotiating strategy. A frequency model is widely used for opponent modeling because of its robustness against various types of strategy while being easy to implement. However, existing frequency models do not consider the opponent's proposal speed and the transition of offers. This study proposes a novel frequency model that considers the opponent's behavior using two main elements: the offer ratio and the weighting function. The offer ratio stabilizes the model against changes in the opponent's offering speed, whereas the weighting function takes the opponent's concession into account. The two experiments conducted herein show that our proposed model is more accurate than other frequency models. Additionally, we find that the agent with the proposed model performs with a significantly higher utility value in negotiations.
Masashi TSUCHIDA Fukuhito OOSHITA Michiko INOUE
We propose two algorithms for the gathering of k mobile agents in asynchronous Byzantine environments. For both algorithms, we assume that graph topology is arbitrary, each node is equipped with an authenticated whiteboard, agents have unique IDs, and at most f weakly Byzantine agents exist. Here, a weakly Byzantine agent can make arbitrary behavior except falsifying its ID. Under these assumptions, the first algorithm achieves a gathering without termination detection in O(m+fn) moves per agent (m is the number of edges and n is the number of nodes). The second algorithm achieves a gathering with termination detection in O(m+fn) moves per agent by additionally assuming that agents on the same node are synchronized, $f
Takuma WAKASA Yoshiki NAGATANI Kenji SAWADA Seiichi SHIN
This paper considers a velocity control problem for merging and splitting maneuvers of vehicle platoons. In this paper, an external device sends velocity commands to some vehicles in the platoon, and the others adjust their velocities autonomously. The former is pinning control, and the latter is consensus control in multi-agent control. We propose a switched pinning control algorithm. Our algorithm consists of three sub-methods. The first is an optimal switching method of pinning agents based on an MLD (Mixed Logical Dynamical) system model and MPC (Model Predictive Control). The second is a representation method for dynamical platoon formation with merging and splitting maneuver. The platoon formation follows the positional relation between vehicles or the formation demand from the external device. The third is a switching reduction method by setting a cost function that penalizes the switching of the pinning agents in the steady-state. Our proposed algorithm enables us to improve the consensus speed. Moreover, our algorithm can regroup the platoons to the arbitrary platoons and control the velocities of the multiple vehicle platoons to each target value.
This paper presents a compromising strategy based on constraint relaxation for automated negotiating agents in the nonlinear utility domain. Automated negotiating agents have been studied widely and are one of the key technologies for a future society in which multiple heterogeneous agents act collaboratively and competitively in order to help humans perform daily activities. A pressing issue is that most of the proposed negotiating agents utilize an ad-hoc compromising process, in which they basically just adjust/reduce a threshold to forcibly accept their opponents' offers. Because the threshold is just reduced and the agent just accepts the offer since the value is more than the threshold, it is very difficult to show how and what the agent conceded even after an agreement has been reached. To address this issue, we describe an explainable concession process using a constraint relaxation process. In this process, an agent changes its belief by relaxing constraints, i.e., removing constraints, so that it can accept it is the opponent's offer. We also propose three types of compromising strategies. Experimental results demonstrate that these strategies are efficient.