Keyword Search Result

[Keyword] POMDP(6hit)

1-6hit
  • A POMDP-Based Approach to Assortment Optimization Problem for Vending Machine Open Access

    Gaku NEMOTO  Kunihiko HIRAISHI  

     
    PAPER-Mathematical Systems Science

      Pubricized:
    2023/09/05
      Vol:
    E107-A No:6
      Page(s):
    909-918

    Assortment optimization is one of main problems for retailers, and has been widely studied. In this paper, we focus on vending machines, which have many characteristic issues to be considered. We first formulate an assortment optimization problem for vending machines, next propose a model that represents consumer’s decision making, and then show a solution method based on partially observable Markov decision process (POMDP). The problem includes incomplete state observation, stochastic consumer behavior and policy decisions that maximize future expected rewards. Using computer simulation, we observe that sales increases compared to that by heuristic methods under the same condition. Moreover, the sales approaches the theoretical upper bound.

  • Energy-Efficient and Throughput Maximization Scheme for Sensor-Aided Cognitive Radio Networks

    Hiep VU-VAN  Insoo KOO  

     
    PAPER

      Vol:
    E98-B No:10
      Page(s):
    1996-2003

    A cognitive radio user (CU) can get assistance from sensor nodes (SN) to perform spectrum sensing. However, the SNs are often powered by a finite-capacity battery, which can maintain operations of the SNs over a short time. Therefore, energy-efficiency of the SNs becomes a crucial problem. In this paper, an SN is considered to be a device with an energy harvester that can harvest energy from a non-radio frequency (non-RF) energy resource while performing other actions concurrently. In any one time slot, in order to maintain the required sensing accuracy of the CR network and to conserve energy in the SNs, only a small number of SNs are required to sense the primary user (PU) signal, and other SNs are kept silent to save energy. For this, an algorithm to divide all SNs into groups that can satisfy the required sensing accuracy of the network, is proposed. In a time slot, each SN group can be assigned one of two actions: stay silent, or be active to perform sensing. The problem of determining the optimal action for all SN groups to maximize throughput of the CR network is formulated as a framework of a partially observable Markov decision process (POMDP), in which the effect of the current time slot's action on the throughput of future time slots is considered. The solution to the problem, that is the decision mode of the SN groups (i.e., active or silent), depends on the residual energy and belief of absence probability of the PU signal. The simulation results show that the proposed scheme can improve energy efficiency of CR networks compared with other conventional schemes.

  • Optimal Channel-Sensing Scheme for Cognitive Radio Systems Based on Fuzzy Q-Learning

    Fereidoun H. PANAHI  Tomoaki OHTSUKI  

     
    PAPER

      Vol:
    E97-B No:2
      Page(s):
    283-294

    In a cognitive radio (CR) network, the channel sensing scheme used to detect the existence of a primary user (PU) directly affects the performances of both CR and PU. However, in practical systems, the CR is prone to sensing errors due to the inefficiency of the sensing scheme. This may yield primary user interference and low system performance. In this paper, we present a learning-based scheme for channel sensing in CR networks. Specifically, we formulate the channel sensing problem as a partially observable Markov decision process (POMDP), where the most likely channel state is derived by a learning process called Fuzzy Q-Learning (FQL). The optimal policy is derived by solving the problem. Simulation results show the effectiveness and efficiency of our proposed scheme.

  • A POMDP Based Distributed Adaptive Opportunistic Spectrum Access Strategy for Cognitive Ad Hoc Networks

    Yichen WANG  Pinyi REN  Zhou SU  

     
    LETTER

      Vol:
    E94-B No:6
      Page(s):
    1621-1624

    In this letter, we propose a Partially Observable Markov Decision Process (POMDP) based Distributed Adaptive Opportunistic Spectrum Access (DA-OSA) Strategy for Cognitive Ad Hoc Networks (CAHNs). In each slot, the source and destination choose a set of channels to sense and then decide the transmission channels based on the sensing results. In order to maximize the throughput for each link, we use the theories of sequential decision and optimal stopping to determine the optimal sensing channel set. Moreover, we also establish the myopic policy and exploit the monotonicity of the reward function that we use, which can be used to reduce the complexity of the sequential decision.

  • Constructing a Multilayered Boundary to Defend against Intrusive Anomalies

    Zonghua ZHANG  Hong SHEN  

     
    PAPER-Application Information Security

      Vol:
    E90-D No:2
      Page(s):
    490-499

    We propose a model for constructing a multilayered boundary in an information system to defend against intrusive anomalies by correlating a number of parametric anomaly detectors. The model formulation is based on two observations. First, anomaly detectors differ in their detection coverage or blind spots. Second, operating environments of the anomaly detectors reveal different information about system anomalies. The correlation among observation-specific anomaly detectors is first formulated as a Partially Observable Markov Decision Process, and then a policy-gradient reinforcement learning algorithm is developed for an optimal cooperation search, with the practical objectives being broader overall detection coverage and fewer false alerts. A host-based experimental scenario is developed to illustrate the principle of the model and to demonstrate its performance.

  • Labeling Q-Learning in POMDP Environments

    Haeyeon LEE  Hiroyuki KAMAYA  Kenichi ABE  

     
    PAPER-Biocybernetics, Neurocomputing

      Vol:
    E85-D No:9
      Page(s):
    1425-1432

    This paper presents a new Reinforcement Learning (RL) method, called "Labeling Q-learning (LQ-learning)," to solve the partially obervable Markov Decision Process (POMDP) problems. Recently, hierarchical RL methods are widely studied. However, they have the drawback that the learning time and memory are exhausted only for keeping the hierarchical structure, though they wouldn't be necessary. On the other hand, our LQ-learning has no hierarchical structure, but adopts a new type of internal memory mechanism. Namely, in the LQ-learning, the agent percepts the current state by pair of observation and its label, and then, the agent can distinguish states, which look as same, but obviously different, more exactly. So to speak, at each step t, we define a new type of perception of its environment õt=(ot,θt), where ot is conventional observation, and θt is the label attached to the observation ot. Then the classical RL-algorithm is used as if the pair (ot,θt) serves as a Markov state. This labeling is carried out by a Boolean variable, called "CHANGE," and a hash-like or mod function, called Labeling Function (LF). In order to demonstrate the efficiency of LQ-learning, we will apply it to "maze problems" in Grid-Worlds, used in many literatures as POMDP simulated environments. By using the LQ-learning, we can solve the maze problems without initial knowledge of environments.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.