IEICE globals.ieice.org Site

Author Search Result

[Author] TaeChoong CHUNG(2hit)

1-2hit

Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks
Ngo Anh VIEN Nguyen Hoang VIET SeungGwan LEE TaeChoong CHUNG

PAPER-Network

Vol:
E92-B No:6
Page(s):
2008-2022
In this paper, we solve the call admission control (CAC) and routing problem in an integrated network that handles several classes of calls of different values and with different resource requirements. The problem of maximizing the average reward (or cost) of admitted calls per unit time is naturally formulated as a semi-Markov Decision Process (SMDP) problem, but is too complex to allow for an exact solution. Thus in this paper, a policy gradient algorithm, together with a decomposition approach, is proposed to find the dynamic (state-dependent) optimal CAC and routing policy among a parameterized policy space. To implement that gradient algorithm, we approximate the gradient of the average reward. Then, we present a simulation-based algorithm to estimate the approximate gradient of the average reward (called GSMDP algorithm), using only a single sample path of the underlying Markov chain for the SMDP of CAC and routing problem. The algorithm enhances performance in terms of convergence speed, rejection probability, robustness to the changing arrival statistics and an overall received average revenue. The experimental simulations will compare our method's performance with other existing methods and show the robustness of our method.
Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors
Ngo Anh VIEN SeungGwan LEE TaeChoong CHUNG

PAPER

Vol:
E93-D No:2
Page(s):
271-279
In and we have presented a simulation-based algorithm for optimizing the average reward in a parameterized continuous-time, finite-state semi-Markov Decision Process (SMDP). We approximated the gradient of the average reward. Then, a simulation-based algorithm was proposed to estimate the approximate gradient of the average reward (called GSMDP), using only a single sample path of the underlying Markov chain. GSMDP was proved to converge with probability 1. In this paper, we give bounds on the approximation and estimation errors for GSMDP algorithm. The approximation error of that approximation is the size of the difference between the true gradient and the approximate gradient. The estimation error, the size of the difference between the output of the algorithm and its asymptotic output, arises because the algorithm sees only a finite data sequence.

Author Search Result

[Author] TaeChoong CHUNG(2hit)

Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks

Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors

Latest Issue

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles