This paper presents an inertial estimator learning automata scheme by which both the short-term and long-term perspectives of the environment can be incorporated in the stochastic estimator – the long term information crystallized in terms of the running reward-probability estimates, and the short term information used by considering whether the most recent response was a reward or a penalty. Thus, when the short-term perspective is considered, the stochastic estimator becomes pertinent in the context of the estimator algorithms. The proposed automata employ an inertial weight estimator as the short-term perspective to achieve a rapid and accurate convergence when operating in stationary random environments. According to the proposed inertial estimator scheme, the estimates of the reward probabilities of actions are affected by the last response from environment. In this way, actions that have gotten the positive response from environment in the short time, have the opportunity to be estimated as “optimal”, to increase their choice probability and consequently, to be selected. The estimates become more reliable and consequently, the automaton rapidly and accurately converges to the optimal action. The asymptotic behavior of the proposed scheme is analyzed and it is proved to be ε-optimal in every stationary random environment. Extensive simulation results indicate that the proposed algorithm converges faster than the traditional stochastic-estimator-based S ERI scheme, and the deterministic-estimator-based DGPA and DPRI schemes when operating in stationary random environments.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Junqi ZHANG, Lina NI, Chen XIE, Shangce GAO, Zheng TANG, "Inertial Estimator Learning Automata" in IEICE TRANSACTIONS on Fundamentals,
vol. E95-A, no. 6, pp. 1041-1048, June 2012, doi: 10.1587/transfun.E95.A.1041.
Abstract: This paper presents an inertial estimator learning automata scheme by which both the short-term and long-term perspectives of the environment can be incorporated in the stochastic estimator – the long term information crystallized in terms of the running reward-probability estimates, and the short term information used by considering whether the most recent response was a reward or a penalty. Thus, when the short-term perspective is considered, the stochastic estimator becomes pertinent in the context of the estimator algorithms. The proposed automata employ an inertial weight estimator as the short-term perspective to achieve a rapid and accurate convergence when operating in stationary random environments. According to the proposed inertial estimator scheme, the estimates of the reward probabilities of actions are affected by the last response from environment. In this way, actions that have gotten the positive response from environment in the short time, have the opportunity to be estimated as “optimal”, to increase their choice probability and consequently, to be selected. The estimates become more reliable and consequently, the automaton rapidly and accurately converges to the optimal action. The asymptotic behavior of the proposed scheme is analyzed and it is proved to be ε-optimal in every stationary random environment. Extensive simulation results indicate that the proposed algorithm converges faster than the traditional stochastic-estimator-based S ERI scheme, and the deterministic-estimator-based DGPA and DPRI schemes when operating in stationary random environments.
URL: https://globals.ieice.org/en_transactions/fundamentals/10.1587/transfun.E95.A.1041/_p
Copy
@ARTICLE{e95-a_6_1041,
author={Junqi ZHANG, Lina NI, Chen XIE, Shangce GAO, Zheng TANG, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Inertial Estimator Learning Automata},
year={2012},
volume={E95-A},
number={6},
pages={1041-1048},
abstract={This paper presents an inertial estimator learning automata scheme by which both the short-term and long-term perspectives of the environment can be incorporated in the stochastic estimator – the long term information crystallized in terms of the running reward-probability estimates, and the short term information used by considering whether the most recent response was a reward or a penalty. Thus, when the short-term perspective is considered, the stochastic estimator becomes pertinent in the context of the estimator algorithms. The proposed automata employ an inertial weight estimator as the short-term perspective to achieve a rapid and accurate convergence when operating in stationary random environments. According to the proposed inertial estimator scheme, the estimates of the reward probabilities of actions are affected by the last response from environment. In this way, actions that have gotten the positive response from environment in the short time, have the opportunity to be estimated as “optimal”, to increase their choice probability and consequently, to be selected. The estimates become more reliable and consequently, the automaton rapidly and accurately converges to the optimal action. The asymptotic behavior of the proposed scheme is analyzed and it is proved to be ε-optimal in every stationary random environment. Extensive simulation results indicate that the proposed algorithm converges faster than the traditional stochastic-estimator-based S ERI scheme, and the deterministic-estimator-based DGPA and DPRI schemes when operating in stationary random environments.},
keywords={},
doi={10.1587/transfun.E95.A.1041},
ISSN={1745-1337},
month={June},}
Copy
TY - JOUR
TI - Inertial Estimator Learning Automata
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1041
EP - 1048
AU - Junqi ZHANG
AU - Lina NI
AU - Chen XIE
AU - Shangce GAO
AU - Zheng TANG
PY - 2012
DO - 10.1587/transfun.E95.A.1041
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E95-A
IS - 6
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - June 2012
AB - This paper presents an inertial estimator learning automata scheme by which both the short-term and long-term perspectives of the environment can be incorporated in the stochastic estimator – the long term information crystallized in terms of the running reward-probability estimates, and the short term information used by considering whether the most recent response was a reward or a penalty. Thus, when the short-term perspective is considered, the stochastic estimator becomes pertinent in the context of the estimator algorithms. The proposed automata employ an inertial weight estimator as the short-term perspective to achieve a rapid and accurate convergence when operating in stationary random environments. According to the proposed inertial estimator scheme, the estimates of the reward probabilities of actions are affected by the last response from environment. In this way, actions that have gotten the positive response from environment in the short time, have the opportunity to be estimated as “optimal”, to increase their choice probability and consequently, to be selected. The estimates become more reliable and consequently, the automaton rapidly and accurately converges to the optimal action. The asymptotic behavior of the proposed scheme is analyzed and it is proved to be ε-optimal in every stationary random environment. Extensive simulation results indicate that the proposed algorithm converges faster than the traditional stochastic-estimator-based S ERI scheme, and the deterministic-estimator-based DGPA and DPRI schemes when operating in stationary random environments.
ER -