Takashi YOKOTA Kanemitsu OOTSU Shun KOJIMA
An interconnection network is an inevitable component for constructing parallel computers. It connects computation nodes so that the nodes can communicate with each other. As a parallel computation essentially requires inter-node communication according to a parallel algorithm, the interconnection network plays an important role in terms of communication performance. This paper focuses on the collective communication that is frequently performed in parallel computation and this paper addresses the Cup-Stacking method that is proposed in our preceding work. The key issues of the method are splitting a large packet into slices, re-shaping the slice, and stacking the slices, in a genetic algorithm (GA) manner. This paper discusses extending the Cup-Stacking method by introducing additional items (genes) and proposes the extended Cup-Stacking method. Furthermore, this paper places comprehensive discussions on the drawbacks and further optimization of the method. Evaluation results reveal the effectiveness of the extended method, where the proposed method achieves at most seven percent improvement in duration time over the former Cup-Stacking method.
Sungbok LEE Jaehyun PARK Jonghyeok LEE
In this paper, we consider wireless powered sensor networks. In these networks, the energy access point (EAP) transmits the energy packets to the sensor nodes and then, the sensor nodes send their sensing data to the information access point (IAP) by exploiting the harvested energy. Because the sensor nodes have a limited information queue (data storage) and energy queue (battery), energy packet/data packet scheduling is important. Accordingly, to reduce the total energy required to support the associated sensor network and simultaneously avoid sensing data loss, the energy packet/data packet transmission periods are jointly optimized. Furthermore, analyses identify the optimal location of EAP which will yield energy-efficient wireless powered sensor networks. Through the computer simulations, the performance of the proposed packet scheduling and deployment policy is demonstrated.
Takashi YOKOTA Kanemitsu OOTSU Takeshi OHKAWA
Interconnection network is one of the inevitable components in parallel computers, since it is responsible to communication capabilities of the systems. It affects the system-level performance as well as the physical and logical structure of the systems. Although many studies are reported to enhance the interconnection network technology, we have to discuss many issues remaining. One of the most important issues is congestion management. In an interconnection network, many packets are transferred simultaneously and the packets interfere to each other in the network. Congestion arises as a result of the interferences. Its fast spreading speed seriously degrades communication performance and it continues for long time. Thus, we should appropriately control the network to suppress the congested situation for maintaining the maximum performance. Many studies address the problem and present effective methods, however, the maximal performance in an ideal situation is not sufficiently clarified. Solving the ideal performance is, in general, an NP-hard problem. This paper introduces particle swarm optimization (PSO) methodology to overcome the problem. In this paper, we first formalize the optimization problem suitable for the PSO method and present a simple PSO application as naive models. Then, we discuss reduction of the size of search space and introduce three practical variations of the PSO computation models as repetitive model, expansion model, and coding model. We furthermore introduce some non-PSO methods for comparison. Our evaluation results reveal high potentials of the PSO method. The repetitive and expansion models achieve significant acceleration of collective communication performance at most 1.72 times faster than that in the bursty communication condition.
Runzi ZHANG Jinlin WANG Yiqiang SHENG Xiao CHEN Xiaozhou YE
Cache affinity has been proved to have great impact on the performance of packet processing applications on multi-core platforms. Flow-based packet scheduling can make the best of data cache affinity with flow associated data and context structures. However, little work on packet scheduling algorithms has been conducted when it comes to instruction cache (I-Cache) affinity in modified pipelining (MPL) architecture for multi-core systems. In this paper, we propose a protocol-aware packet scheduling (PAPS) algorithm aiming at maximizing I-Cache affinity at protocol dependent stages in MPL architecture for multi-protocol processing (MPP) scenario. The characteristics of applications in MPL are analyzed and a mapping model is introduced to illustrate the procedure of MPP. Besides, a stage processing time model for MPL is presented based on the analysis of multi-core cache hierarchy. PAPS is a kind of flow-based packet scheduling algorithm and it schedules flows in consideration of both application-level protocol of flows and load balancing. Experiments demonstrate that PAPS outperforms the Round Robin algorithm and the HRW-based (HRW) algorithm for MPP applications. In particular, PAPS can eliminate all I-Cache misses at protocol dependent stage and reduce the average CPU cycle consumption per packet by more than 10% in comparison with HRW.
Zul Azri BIN MUHAMAD NOH Takahiro SUZUKI Shuji TASAKA
This paper proposes a cross-layer packet scheduling scheme for QoS support in audio-video transmission with IEEE 802.11e HCCA and assesses application-level QoS and QoE of the scheduling scheme under lossy channel conditions. In the proposed scheme, the access point (AP) basically allocates transmission opportunity (TXOP) for each station in a service interval (SI) like the reference scheduler of the IEEE 802.11e standard, which is referred to as the TGe scheme in this paper. In the proposed scheme, however, the AP calculates the number of MAC service data units (MSDUs) arrived in an SI, considering the inter-arrival time of audio samples and that of video frames, which are referred to as media units (MUs), at the application layer. The AP then gives additional TXOP duration in the SI to stations which had audio or video MAC protocol data units (MPDUs) in their source buffers at the end of the previous TXOP. In addition, utilizing video frame information from the application layer, we propose video frame skipping at the MAC-level of a source station. If a station fails to transmit a video MPDU, it drops all the following video MPDUs in the source buffer until the next intra-coded frame comes to the head of the buffer. We compare the reference scheduler (TGe scheme), the proposed packet scheduling scheme with and without the video frame skipping at the source in terms of application-level QoS and QoE. We discuss the effectiveness of the proposed packet scheduling scheme from a viewpoint of QoE as well as QoS. Numerical results reveal that the proposed packet scheduling scheme can achieve higher quality than the TGe scheme under lossy channel conditions. We also show that the proposed scheduling scheme can improve the QoS and QoE by using the video frame skipping at the source. Furthermore, we also examine the effect of SI on the QoS and QoE of the proposed packet scheduling scheme and obtain that the appropriate value of SI is equal to the inter-arrival time of video frame.
Sungho HWANG Jeongsik PARK Ho-Shin CHO
In this paper, an efficient radio resource allocation scheme for OFDMA systems is proposed, which follows two steps to take care of real-time traffic characterized with multi-level delay constraints. Urgent packets, those with imminent deadlines, are released first in step 1. After that the remaining channel resources are managed in such a way that overall throughput is maximized at Step 2. In this work, 2-dimensional diversity over multiple sub-bands and multiple users are jointly considered. The proposed scheme is compared with existing schemes designed for real-time traffic such as Exponential Scheduling (EXP) scheme, Modified Largest Weighted Delay First (M-LWDF) scheme, and Round robin scheme in terms of the packet discard probability and throughput. Numerical results show that the proposed scheme performs much better than the aforementioned ones in terms of the packet discard probability, while slightly better in terms of throughput.
To increase both the capacity and the processing speed for input-queued (IQ) switches, we proposed a fair scalable scheduling architecture (FSSA). By employing FSSA comprised of several cascaded sub-schedulers, a large-scale high performance switches or routers can be realized without the capacity limitation of monolithic device. In this paper, we present a fair scheduling algorithm named FSSA_DI based on an improved FSSA where a distributed iteration scheme is employed, the scheduler performance can be improved and the processing time can be reduced as well. Simulation results show that FSSA_DI achieves better performance on average delay and throughput under heavy loads compared to other existing algorithms. Moreover, a practical 64 64 FSSA using FSSA_DI algorithm is implemented by four Xilinx Vertex-4 FPGAs. Measurement results show that the data rates of our solution can be up to 800 Mbps and the tradeoff between performance and hardware complexity has been solved peacefully.
Shawish AHMED Xiaohong JIANG Susumu HORIGUCHI
With the wide expansion of voice services over the IP networks (VoIP), the volume of this delay sensitive traffic is steadily growing. The current packet schedulers for IP networks meet the delay constraint of VoIP traffic by simply assigning its packets the highest priority. This technique is acceptable as long as the amount of VoIP traffic is relatively very small compared to other non-voice traffic. With the notable expansion of VoIP applications, however, the current packet schedulers will significantly sacrifice the fairness deserved by the non-voice traffic. In this paper, we extend the conventional Deficit Round-Robin (DRR) scheduler by including a packet classifier, a Token Bucket and a resource reservation scheme and propose an integrated packet scheduler architecture for the growing VoIP traffic. We demonstrate through both theoretical analysis and extensive simulation that the new architecture makes it possible for us to significantly improve the fairness to non-voice traffic while still meeting the tight delay requirement of VoIP applications.
Gangming LV Shihua ZHU Zhimeng ZHONG
A delay-oriented packet scheduling scheme is proposed for downlink OFDMA networks with heterogeneous delay requirements. Using a novel packet utility concept, the proposed algorithm can exploit diversity from traffic characteristics and requirements to improve delay performance for delay sensitive traffics. Besides, the proposal also shows good ability in balancing fairness and efficiency. Simulation results show that our proposal outperforms existing delay-oriented scheduling schemes in terms of both delay performance and spectrum efficiency.
Liping WANG Yusheng JI Fuqiang LIU
The integration of multihop relays with orthogonal frequency-division multiple access (OFDMA) cellular infrastructures can meet the growing demands for better coverage and higher throughput. Resource allocation in the OFDMA two-hop relay system is more complex than that in the conventional single-hop OFDMA system. With time division between transmissions from the base station (BS) and those from relay stations (RSs), fixed partitioning of the BS subframe and RS subframes can not adapt to various traffic demands. Moreover, single-hop scheduling algorithms can not be used directly in the two-hop system. Therefore, we propose a semi-distributed algorithm called ASP to adjust the length of every subframe adaptively, and suggest two ways to extend single-hop scheduling algorithms into multihop scenarios: link-based and end-to-end approaches. Simulation results indicate that the ASP algorithm increases system utilization and fairness. The max carrier-to-interference ratio (Max C/I) and proportional fairness (PF) scheduling algorithms extended using the end-to-end approach obtain higher throughput than those using the link-based approach, but at the expense of more overhead for information exchange between the BS and RSs. The resource allocation scheme using ASP and end-to-end PF scheduling achieves a tradeoff between system throughput maximization and fairness.
Zul Azri BIN MUHAMAD NOH Takahiro SUZUKI Shuji TASAKA
This paper studies packet scheduling schemes with QoE (user-level QoS) guarantee for audio-video transmission in a wireless LAN with HCF controlled channel access (HCCA) of the IEEE 802.11e MAC protocol. We first propose the static scheduling (SS) scheme, which grants adjustable transmission opportunity (TXOP) duration for constant bit rate (CBR) traffic. The SS scheme can determine the minimum TXOP duration capable of guaranteeing high QoE; it can maximize the number of admitted flows. As the burstiness of variable bit rate (VBR) traffic cannot be absorbed by the SS scheme, we also propose the multimedia priority dynamic scheduling (MPDS) scheme, which can absorb the burstiness through allocating additional TXOP duration. We then compare the SS scheme, the MPDS scheme, and the reference scheduler (TGe scheme) in terms of application-level QoS and user-level QoS (QoE). Numerical results show that in the SS scheme, the QoE can be kept relatively higher even when the TXOP duration is reduced in the case of video with the I picture pattern; this implies that more flows can be admitted. In the case of video with the IPPPPP picture pattern, which has the VBR characteristic more remarkably, reducing the TXOP duration according to the SS scheme will deteriorate the QoS level. In this case, the MPDS scheme performs better when the number of multimedia stations is small. However, the performance of the MPDS scheme deteriorates with the increase of the number of multimedia stations, though the results are comparable to or even better than those of the SS and TGe schemes.
In this paper, we investigate the proportional fair scheduling (PFS) problem for multiuser OFDM systems, considering the impact of packet length. Packet length influences scheduling schemes in a way that each scheduled packet should be ensured to be completely transmitted within the scheduled frames. We formulate the PFS problem as an optimization problem. Based on the observations on the structure of optimal solutions, we propose a heuristic scheduling algorithm that consists of two stages. First, subcarriers are allocated among users without considering the packet length constraint. Then on the second stage, subcarrier readjustment is done in a way that surplus subcarriers from length-satisfied users are released and allocated among length-unsatisfied users. The objective is to provide proportional fairness among users while guaranteeing complete transmission of each scheduled packet. Simulation results show that the proposed scheme has quite close performance to the optimal scheme in terms of Multi-carrier Proportional Fairness Measure (MCPFM), throughput and average packet delay.
Fair queueing is a service scheduling discipline to pursue the fairness among users in packet communication networks. Many fair queueing algorithms, however, have problems of computational overhead since the central scheduler has to maintain a certain performance counter for each flow of user packets based on the global virtual time. Moreover, they are not suitable for wireless networks with high probability of input channel errors due to the lack or complexity in the compensation mechanism for the recovery from the error state. In this paper, we propose a new, computationally efficient, distributed fair queueing scheme, which we call Channel-Aware Throughput Fair Queueing (CATFQ), that is applicable to both wired and wireless packet networks. In our CATFQ scheme, each flow is equipped with a counter that measures the weighted throughput achievement while it has a backlog of packets. At the end of every service to a packet, the scheduler simply selects a flow with the minimum counter value as the one from which a packet is served next. We show that the difference between any two throughput counters is bounded. Our scheme significantly reduces the scheduler's computational overhead and guarantees fair throughput for all flows. For wireless networks with error-prone channels, the service chance lost in bad channel condition is compensated quickly as the channel recovers. Our scheme suppresses the service for leading flows, brings short-term fairness for flows without channel errors, and achieves long-term fairness for all flows. These merits are verified by simulation.
Mikyung KANG Dong-In KANG Jinwoo SUH Junghoon LEE
This paper proposes a low power real-time packet scheduling scheme that reduces power consumption and network errors on wireless local area networks. The proposed scheme is based on the dynamic modulation scheme which can scale the number of bits per symbol when time slots are unused, and the reclaiming scheme which can switch the primary polling schedule when a specific station falls into a bad state. Built on top of the EDF scheduling policy, the proposed scheme enhances the power performance without violating the constraints of subsequent real-time streams. The simulation results show that the proposed scheme enhances success ratio and reduces power consumption.
YoungWoo CHOI Seong-Lyun KIM Sehun KIM
This letter discusses how to enhance the capacity of uplink DS-CDMA networks that support multihop transmission. For this purpose, we derived simple theoretic conditions by which one can choose between singlehop- and multihop transmission. Numerical results show that we can significantly increase the radio network capacity by adopting only a few number of multihop transmissions.
A self-organizing wireless network has to deal with reliability and congestion problems when the network size increases. In order to alleviate such problems, we designed and analyzed protocols and algorithms for a reliable and efficient multiple-layer self-organizing wireless network architecture. Each layer uses a high-power root node to supervise the self-organizing functions, to capture and maintain the physical topology, and to serve as the root of the hierarchical routing topology of the layer. We consider the problem of adding a new root with its own rooted spanning tree to the network. Based on minimum-depth and minimum-load metrics, we present efficient algorithms that achieve optimum selection of root(s). We then exploit layer scheduling algorithms that adapt to network load fluctuations in order to optimize the performance. For optimality we consider a load balancing objective and a minimum delay objective respectively. The former attempts to optimize the overall network performance while the latter strives to optimize the per-message performance. Four algorithms are presented and simulations were used to evaluate and compare their performance. We show that the presented algorithms have superior performance in terms of data throughput and/or message delay, compared to a heuristic approach that does not account for network load fluctuations.
Teck Meng LIM Bu-Sung LEE Chai Kiat YEO
Researchers have proposed numerous approaches to providing Quality-of-Service (QoS) across the Internet. The IETF has proposed two reservation approaches: hop-by-hop bandwidth reservation (IntServ); and per-hop behaviour bandwidth reservation (DiffServ). An edge router generates traffic, accepts per-flow reservation and classifies them into predetermined service class; while a core router ensures different QoS guarantees for each service class. We propose an Edge-to-Edge Quality-of-Service Domain in which packet trains with the same service requirements aggregated using packet deadline at edge router. The properties of a packet train like Inter-Packet Departure Time, Inter-flow Departure Time and accumulated packet delay are embedded and used by our quantum-based scheduler and QoS packet forwarding scheme in core routers. Thus, we are able to extract per-queue and per-flow information. Each queue is reconstructed at core router with packets having an expected departure time that is relative to the ingress router. Useful functions like instantaneous service rate and fine granular dropping scheme can be derived with a combination of embedded information and relative virtual clock technique. The encapsulation of our packet train information converges mathematically. Through simulations, we show that our architecture can provide delay and rate guarantees and minimise jitter for QoS-sensitive flows that requires LR-coupled or LR-decoupled reservations.
Junni ZOU Hongkai XIONG Rujian LIN
To simultaneously support guaranteed real-time services and best-effort service, a Priority-based Scheduling Architecture (PSA) designed for high-speed switches is proposed. PSA divides packet scheduling into high-priority phase and low-priority phase. In the high-priority phase, an improved sorted-priority algorithm is presented. It introduces a new constraint into the scheduling discipline to overcome bandwidth preemption. Meanwhile, the virtual time function with a control factor α is employed. Both computer simulation results and theoretic analysis show that the PSA mechanism has excellent performance in terms of the implementation complexity, fairness and delay properties.
In the adaptive modulation and coding (AMC)-based orthogonal frequency division multiple access (OFDMA) system for broadband wireless service, a large number of users with short packets cause a serious capacity mismatch problem, which incurs resource under-utilization when the data rate of subchannel increases with a better channel condition. To handle the capacity mismatch problem, we propose an AMC-based subchannel multiplexing (ASM) scheme, which allows for sharing the same subchannel among the different simultaneous flows of the same user. Along with the ASM scheme, we also consider multi-class scheduling scheme, which employs the different packet scheduling algorithm for the different service class, e.g., packet loss rate-based (PLR) scheduling algorithm for real-time (RT) service and modified minimum bit rate-based (MMBR) scheduling algorithm for non-real-time (NRT) service. In the typical integrated service scenario with voice, video, and data traffic, we have shown that the proposed schemes significantly improve the overall system capacity.
Yoshiaki OFUJI Sadayuki ABETA Mamoru SAWAHASHI
This paper proposes a unified packet scheduling method that considers the delay requirement of each traffic data packet whether real time (RT) or non-real time (NRT), the channel conditions of each accessing user, and the packet type in hybrid automatic repeat request (ARQ), i.e., either initially transmitted packet or retransmitted packet, in the forward link for Orthogonal Frequency and Code Division Multiplexing (OFCDM) wireless access. In the proposed packet scheduling method, the overall priority function is decided based on PTotal = αDelayPDelay + αTypePType + αSINRPSINR (PDelay, PType, and PSINR are the priority functions derived from the delay requirement, type of packet, and the received signal-to-interference plus noise power ratio (SINR), respectively, and αDelay, αType, and αSINR are the corresponding weighting factors). The computer simulation results show that the weighting factor of each priority function as αType/αDelay = 0.6, αSINR/αDelay = 0.4 assuming the linear-type function in PDelay and a constant-type function in PType is optimized. Furthermore, we show that the outage probability for achieving the packet loss rate (PLR) of less than 10-3 for non-real time (NRT) traffic users employing the proposed packet scheduling method is reduced by approximately two orders of magnitude compared to that using the Priority Queuing (PQ) method while maintaining the PLR of real-time (RT) traffic users at the same level as that using the PQ method.