Neural Processes-Based Node Modeling to Extrapolate Router Metrics

Kyota HATTORI; Tomohiro KORIKAWA; Chikako TAKASAKI

doi:10.23919/transcom.2024EBP3089

1. Introduction

The sixth generation of mobile communications (6G) networks is expected to usher in the Intelligent Internet of Things era. These networks promise to provide ultra-fast mobile broadband services, support low-latency applications like remote surgery and autonomous vehicles, and enable many connected devices to build a virtual digital world [1]. To realize these ambitions, 6G networks will need to deliver significantly faster speeds, lower latency, and wider coverage than their fifth-generation (5G) predecessors. A key aspect of 6G will be its ability to seamlessly integrate multiple heterogeneous networks, combining different technologies such as wireless, satellite, and optical networks [2]. This integration will enable faster and more reliable communications across different technologies. The potential synergy between these different network technologies could provide the best-performing infrastructure for end-users, optimally adapted to each specific environment.

Network disaggregation technology [3] emerges as a promising candidate to facilitate the integration of multiple heterogeneous networks in 6G. This technology decomposes a previously integrated network node into its individual components. In current networks, components often have proprietary software and hardware that lack the flexibility and cost-effectiveness to meet the escalating demands of carriers. These constraints limit the ability of carriers to meet the demands of evolving network requirements. With network disaggregation, carriers can overcome the limitations associated with vendor lock-in and instead choose optimal technologies from various vendors to meet service requirements. However, implementing this technology requires managing and verifying various network nodes and components to ensure they meet network requirements.

Hence, future network infrastructures will need to establish a validation mechanism to ensure the quality of the network nodes and components, supporting the concept of network disaggregation. To tackle this challenge, we present a solution called the network digital replica (NDR) [4], [5], which serves as a digital copy of the physical network. The NDR allows for the digital verification of black-boxed network node performance. We define a “black-boxed” node as a network node where the internal implementation of packet processing mechanisms is not disclosed, but the physical hardware components and their specifications are known. This is often seen in vendor-provided routers, where the detailed routing table lookups and IP forwarding algorithms are not disclosed. However, the physical hardware specifications of these routers, such as the number of CPU cores and memory capacity, are typically published in catalog information. In such nodes, it is difficult to estimate the per-packet cost of how much bandwidth and computation time for a single packet consumes. Network node modeling will be a crucial factor to achieve this, as it emulates the performance of actual network nodes in digital space. Within the framework of the NDR, network node modeling is carried out by employing machine learning-based modeling techniques. This involves utilizing actual network node metrics, such as packet loss rates, throughput, and processing delay, to construct a comprehensive representation of the network nodes. However, its application has been considered only for known environments, i.e., this has only interpolation capability between the training datasets. The challenge of NDR is to extrapolate the node metrics for domains that are not covered in the training dataset. Therefore, the objective of this research is to explore the potential for digitally verifying the performance of black-boxed network nodes, focusing on refining the accuracy of extrapolation for their metrics.

1.1 Contributions

The contribution of this study is the development of a novel technique that allows inference of network node metrics through data-driven analysis while improving extrapolation capabilities. The main idea of our method lies in a combination of Neural Processes (NP) [6] with the enrichment of the training datasets by adding other inferred node metrics. An NP is a kind of meta-learner that extracts object-centric representations. Often, meta-learning is applied to situations where each task has only a few examples. However, the application of NP to the inference of node metrics has not yet been explored. Also, the accuracy of extrapolation needs to be improved for applying only vanilla NP to the inference of network node metrics not covered by training datasets. The proposed method addresses this issue by extending the training datasets through the addition of other inferred node metrics.

In summary, the contributions of this study are twofold:

We propose a method for modeling network nodes based on a meta-learner, NP, aimed at improving the inferred accuracy of the router metric in extrapolation scenarios using actual data. Our approach is characterized by adding other inferred router metrics to the training dataset.
We demonstrate the extrapolation capabilities of our proposed method by varying the ratio of training and testing datasets and comparing its performance with that of conventional vanilla NP.

Building on our previous study in [4], this research addresses advanced applications of our proposed method in the context of extrapolation. We evaluate this method’s performance in terms of extrapolation, particularly using meta-learner NP algorithms. Moreover, we investigate the scalability of the proposed method for extrapolation domains to explore its potential.

The paper’s structure is as follows: Section 2 reviews digital twin technology and network node modeling related to our method. Sections 3 and 4 detail the background, architecture, use case, and assumed traffic scenario of the proposed NDR. Section 5 introduces our network node modeling approach, which is crucial for improving the inference accuracy in extrapolation. Section 6 details the results of experiments conducted to evaluate the accuracy of inferred router metrics, focusing on packet loss rate, throughput, and packet delay. Discussion and limitations of the proposed method are presented in Sect. 7. Section 8 concludes the paper.

Page top

2. Related Work

2.1 Digital Twin Technology

Digital twin technology has been revolutionary, offering new perspectives on system analysis and optimization across various industries. A digital twin is a dynamic virtual model that mirrors a physical object or system, integrating live data and analytics to emulate real-time changes [7]. This virtual representation goes beyond mere reproduction. It has predictive capabilities that forecast the future behavior of its physical counterpart. Creating a digital twin involves integrating real-time system information, such as device configuration. Thus, the state of physical objects needs to be accurately reconstructed. Underpinning the rise of digital twins are significant advances in computational resources, particularly central processing units (CPUs) and graphics processing units (GPUs). These technological developments have increased the ability to process complex real-time data, increasing the fidelity and utility of digital twins. The aerospace industry is leveraging digital twin technology to drive innovation and efficiency. For example, a digital twin blueprint can help optimize a jet engine’s combustion process, leading to increased fuel efficiency and reduced emissions. It also plays an important role in risk prediction, allowing engineers to simulate and analyze potential challenges in virtual space before they manifest in reality [8]. Another rapidly growing application area is the development of digital twins for aircraft ground control systems. This system, especially the nose gear, is critical to the safety of the aircraft. The Digital twins enable the simulation and testing of different scenarios and conditions, ensuring the robustness and reliability of ground steering systems under different operating conditions [9].

Recently, the digital twin network (DTN), an application of the digital twin concept in the network domain, has attracted increasing research interest [10]-[14]. First, DTN is being explored for network troubleshooting to prevent service disruptions by replicating and analyzing past network failures [10]. It is also used in network anomaly detection to identify deviations in network behavior [11]. Beyond these applications, DTN serves as an educational and training platform for network security [12]. Research on the performance of the D-plane through DTN is also progressing [13], [14]. The DTN architecture uses long short-term memory (LSTM) models for effectively predicting network traffic, significantly reducing learning times while maintaining accuracy [13]. In network planning, the approach utilizes advanced technologies such as Generative Adversarial Networks to simulate diverse industrial scenarios to improve resource allocation and predict network performance, thereby optimizing compliance with service level agreements [14]. However, the conventional approaches used in these studies fail to consider the intricate internal structure and behavior of network nodes, including implemented software processes and hardware logic. These elements need to be accurately emulated to verify the performance of network equipment and ensure that it meets carrier requirements.

2.2 Network Node Modeling

Conventional approaches to modeling network nodes have explored methods such as network simulators [15], network emulators [16], and data-driven analysis [17]. A network simulator is a software program that abstractly models network behavior by computing interactions between different network entities, such as nodes and links. This simulation is typically based on discrete-event-driven processes and mathematical formulas. Although simulations can replicate potential scenarios that may occur in the real world, they do not reflect the behavior of actual network equipment. On the other hand, network emulators reproduce the behavior of a network to test how real programs perform under varying network conditions. However, to accurately replicate a network node’s processes and behavior, network emulators usually require software programs and hardware logic that are identical to those of the actual node.

Conversely, a data-driven approach provides an alternative, utilizing models derived from network equipment data to analyze the internal mechanisms and behaviors of network nodes. For example, there is a method to infer the performance of an actual network equipment model on the basis of a neural network (NN) [17]. However, a method based on the NN is not particularly effective in extrapolation [18], which is the focus of this study. Several studies have examined extrapolation performance using machine learning techniques [19], [20]. Reference [19] investigated the potential of implicitly-defined NN to extrapolate on mathematical tasks. Furthermore, to facilitate automatic pattern discovery and extrapolation on multidimensional datasets, a Gaussian Process (GP) framework has been proposed [20]. The GP is used for modeling a distribution over regression functions. The GP would be better at extrapolating if the kernel were better defined for extrapolation. However, an appropriate prior for GP is difficult to design. As an alternative model to GP, the NP was introduced, which enables a stochastic process to be learned from data, leveraging the flexibility of NN [6]. While the GP requires us to explicitly model the prior P and can perform exact posterior inference, the NP is designed to learn distributions over functions from distributions over datasets. However, there are few or no reports of applying the NP to the inference accuracy in the extrapolation domain for actual data.

2.3 Predicting Performance Metrics of the Software Router

Conventional methods for predicting performance metrics of the software router have been proposed to estimate the throughput by modeling the internal processing of the router [21], [22]. One method uses a mathematical approach to account for factors such as Ethernet bandwidth, CPU speed, and cache contention [21]. Another method [22] explores high-speed packet processing frameworks such as netmap [23], PF_RING ZC [24], and Intel DPDK [25]. These frameworks provide a faster alternative for handling traffic rates of several 10 Gbps, allowing the construction of packet processing systems on commodity hardware. They are analyzed on the basis of their performance in packet forwarding scenarios, focusing on the throughput/latency tradeoff, and a model is introduced to estimate and evaluate their performance. However, these conventional methods, which are based on mathematical models, have difficulty modeling black-boxed routers with unknown implementations of packet processing mechanisms, such as those often found in vendor-provided routers. In particular, these mathematical approaches typically require detailed knowledge of the internal workings of the router. They rely on the implementation of packet processing mechanisms, such as the average CPU cycles per packet and the number of accesses to retrieve data from on-chip cache or off-chip memory. This detailed knowledge allows conventional models to accurately estimate router performance for routers whose implementation of packet processing mechanisms are well understood. In contrast, our model assumes that the packet processing mechanisms inside the router are unknown. This assumption makes it difficult to estimate performance metrics, such as estimation for per-packet CPU cycle consumption and memory access times. Therefore, our method applies a data-driven approach to external factors, including measured target metrics (packet loss rates, throughput, and packet delays), node settings, and traffic conditions to develop a performance estimation model. Furthermore, these methods primarily estimate performance when processing packets in a single-input, single-output scenario. Therefore, they need to address the challenge of performance prediction for black-boxed routers where traffic is aggregated from multiple input ports to a single output port, which leads to packet loss and packet delay, as opposed to a single-input and single-output scenario. From a carrier network perspective, the aggregated traffic conditions are critical because they negatively impact quality of service. In addition, these methods do not support extrapolation. On the other hand, a data-driven approach has been explored for the automated generation of Virtual Network Function (VNF) deployment rules by predicting the packet forwarding performance of VNFs using machine learning [26]. However, this approach does not focus on improving inference accuracy of extrapolation.

Given the above, our study focuses on exploring a data-driven method to improve the accuracy of extrapolation for black-boxed network node metrics when only the measured target metrics, network node settings, and traffic conditions are available as external conditions.

To our knowledge, there is little or no research on how the addition of inferred metrics to training datasets influences the accuracy of extrapolation of target metrics. This aspect of regression modeling is a key area of our research. In addition, there are few reports on extrapolating node performance through regression prediction using actual router settings and traffic data.

Page top

3. Background

3.1 Interpolation and Extrapolation for Node Metrics

This study aims to address extrapolation for network node metrics in regression problems. In this research, the training datasets serve to formulate a function that correlates a set of input variables \(X\); node settings, and input traffic to node metrics \(Y\); packet loss rate, throughput, and packet delay. When the input variables are located between the training datasets, this procedure is referred to as interpolation. On the other hand, if the point of estimation is located outside of this domain, it is referred to as extrapolation. In general, the node performance is difficult to extrapolate because this domain is not covered by training datasets, resulting in unpredictable behavior unless the model formulation contains implicit or explicit assumptions.

In this study, we evaluate how the unprecedented increase in traffic volume toward an extrapolation domain, which is not covered in the training dataset, affects the network node metrics.

3.2 Neural Processes

The NP is a probabilistic modeling approach that leverages the flexibility of neural networks to parameterize stochastic processes [6]. The NP consists of three components: an encoding NN, an aggregation operation, and a decoding NN. Together, these components represent a stochastic process. The encoding NN captures the input data, the aggregation operation combines the encoded representations, and the decoding NN generates the output.

First, to enable the NP to learn distributions over functions, we consider a set of datasets, \(D=\left\{(x_i, y_i)\right\}_{i=1}^n\) of \(n\) inputs \(x_i\) and outputs \(y_i\). The NP splits \(D\) into two disjoint subsets: a set of \(m\) context points \(C=\left\{(x_i, y_i)\right\}_{i=1}^m\) and a set of targets \(T=\left\{(x_j, y_j)\right\}_{j=m+1}^n\). The NP model is then presented with \(C\) to estimate the corresponding function values for \(T\). These data points are then processed by the following sequences. First, the encoder calculates a representation \(r_i\) for each of the context pairs \((x_i, y_i)\in C\) using a multi-layer perceptron (\(\mathrm{MLP}_\theta\)) on the basis of the formula: \(r_i=\mathrm{MLP}_\theta(C)\). These \(r_i\) are then aggregated into a single conditioning representation \(r\) using a permutation invariant operator (such as addition) that captures the information about the underlying function provided by the context points. The most straightforward operation of the aggregator is the mean function. Then, a latent variable \(Z\) is calculated using \(r\). \(Z\) is assumed to be a normal distribution, so \(Z\) is calculated on the basis of the formula: \(Z=N(\mu(r),\sigma^2(r))\), where \(\mu\) and \(\sigma\) are the mean and standard deviation, respectively. Intuitively, \(Z\) is designed to capture all the information about the data-generating process needed to predict the target inputs. Finally, the decoder receives as input a concatenated \(T\) and \(Z\). This concatenated vector is then passed to the \(\mathrm{MLP}_\varphi\) to produce the predictions \(\hat{y}_T\) on the basis of the formula: \(\hat{y}_T=\mathrm{MLP}_\varphi(T,Z)\).

Page top

4. Network Digital Replica

4.1 Concept

The concept of the NDR involves the evaluation of network node performance in the digital domain [4]. While similar to a digital twin, there is a distinction in terms of fidelity level. A digital twin involves creating an exact twin of a physical entity in the digital domain. In contrast, NDR models network equipment at an abstract level, enabling the evaluation of network equipment performance. Unlike a digital twin, an NDR does not require an exact copy but rather aims to provide a similar entity that serves the purpose of performance evaluation. Particularly, the NDR utilizes the datasets accumulated by telecommunication carriers through their network operations for modeling the network. Historically, telecommunication carriers have gathered performance data for each network equipment configuration as part of their network operations. In the environment where accumulated router metric data is available, such as the carrier network, a data-driven approach is superior to mathematical models. Hence, NDR modeling is based on a supervised learning approach.

4.2 Architecture of Network Digital Replica

The concept of the NDR is illustrated in Fig. 1. The NDR interfaces with the physical network and predicts the performance of network nodes to determine if they meet network requirements. Consequently, the NDR needs to accurately replicate the behavior of actual network nodes under various external network conditions. To achieve this, the NDR constructs network node models using data from the physical network, such as current traffic and node configurations. These models are designed to emulate the internal operations of physical network equipment in scenarios not yet encountered in the actual networks. The NDR employs a hybrid approach that combines simulator-based network modeling with actual node modeling for enhanced accuracy. In the simulator-based component, traffic from end terminals is replicated, taking into account external conditions. In contrast, the actual node modeling component leverages machine learning technologies to represent the internal structure and behavior of specific target nodes accurately.

Fig. 1 Concept of proposed network digital replica [4].

4.3 Use Case with Network Digital Replica

The NDR is designed to facilitate long-term network planning over several months, supporting the upgrade and expansion of carrier network infrastructure. Traditionally, this type of network planning is based on the operator’s experience to ensure that the predicted traffic does not exceed the capacity of the network equipment, leading to overinvestment in network resources [27]. In response, the NDR allows operators to evaluate the performance of network nodes in advance, on the basis of node settings for expected traffic conditions, before making changes to the physical network. We assume that the expected traffic volume for several months ahead is obtained using a time-series specific machine learning algorithm such as [28]. Then, the proposed method derives node settings by comparing the predicted performance of network nodes obtained using the developed model with the given network requirements through trial-and-error procedures. For example, the NDR facilitates determining the required number of virtual CPUs on a server or the required number of wavelengths for a transmission capacity to provide the required network node performance for the assumed network conditions. As a preliminary step to deriving optimal node settings, this study concentrates on inferring the performance of software routers when provided with the router configuration and traffic information.

4.4 Assumed Traffic Scenario

The proposed node modeling is used for long-term network planning over several months for upgrading and scaling the network infrastructure as described in Sect. 4.3. For the long-term network planning, we assume that the network planning is based on static traffic demands, which are typically derived from either the long-term average traffic demand or a percentile of the peak traffic demand [29]. Therefore, our study utilizes a traffic scenario that is based on average traffic volumes rather than time-variant traffic with burstiness. In particular, we focus on aggregation traffic conditions that negatively affect the quality of service in the carrier network. This scenario includes packet loss and packet delay caused by buffer overflows resulting from traffic aggregation from multiple input ports to a single output port. In this preliminary investigation, both the data used for model training and the data used for inference were under the same traffic exchange condition that aggregates the traffic. Under these conditions, our evaluation was performed on software routers.

Page top

5. Proposed Network Node Modeling

The network node modeling in this study is defined as creating a digital entity to examine the performance of an actual network node. The main focus of this study is to enhance the accuracy of inferred node metrics for black-boxed network nodes within an extrapolation domain not covered by the training dataset. To achieve this, we present a novel approach called NP-based node modeling, aimed at enhancing the accuracy of router metric inference (: packet loss rates \(P_{\text{loss}}\), throughput \(P_{\text{th}}\), and packet delay \(P_{\text{delay}}\)) in the extrapolation region. The novelty of the proposed algorithm lies in its approach to iteratively append inferred router metrics to the training datasets based on feature importance. This differs from the original NP method, which does not incorporate such an iterative process. Our algorithm improves the accuracy of extrapolation for the router metrics by selectively incorporating other inferred router metrics with the highest feature importance that have not yet been included in the training dataset.

Figure 2 shows the architecture of the proposed method for extrapolation inference of router metrics. The core concept of our proposed approach is to infer each router metric by the NP method, and then incorporate the inferred metrics into the training datasets for further inference of other router metrics in the extrapolation domain. This is done in a sequence that prioritizes metrics based on their contribution to each model, thereby increasing the accuracy of the router metric inference. For example, in the case of Fig. 2, the proposed method infers the router metric \(\delta\) in the extrapolation domain by adding the inferred other router metrics \(\alpha\) and \(\beta\) to the training data. Our developed algorithm conducts a feature importance analysis for each metric, which helps to determine its importance in the model. Through this feature importance analysis, our method is able to understand the relevance of each metric within the model and better evaluate how different router metrics affect performance prediction.

Fig. 2 Proposed network node modeling for extrapolation prediction of router metrics.

5.1 Algorithm of Proposed Node Modeling

The proposed method is outlined in the algorithm, which comprises three key steps:

Step 1: Acquiring training datasets from routers

The algorithm begins by collecting training datasets, which include router metrics from actual routers. A traffic generator is utilized to obtain metrics \(M\) (: \(P_{\text{th}}\), \(P_{\text{loss}}\), and \(P_{\text{delay}}\)). The traffic generator measures these metrics while varying the router settings and input traffic volume. Then, the NP is employed to create inference models for these metrics. This node modeling process uses datasets including router settings \(R\), traffic load conditions \(L\), and \(M\).

Step 2: Acquisition of feature importance

Following the collection of training datasets, the algorithm evaluates the features that significantly contribute to each router metrics model. This step is crucial for identifying the specific features relevant to our proposed method. Feature importance analysis is conducted, which examines the impact of predictors in the training datasets. We employ the permutation feature importance approach [30] to ascertain which additional router metrics would be beneficial to include in the training dataset to improve the inferred accuracy of the router metric. This importance is measured by randomly shuffling feature values and observing the resultant decrease in inference accuracy when a single feature vector is permuted, as depicted in [30].

Step 3: Iterative improvement for the inferred accuracy in the extrapolation of router metrics using NP

Our algorithm proceeds by selecting the router metric with the highest feature importance that is not included in the training dataset. Each selected metric is then inferred using the NP with the training datasets, including \(L\), \(R\), and \(M\). The inferred values are input into the predictor for the target metric, such as throughput, and the accuracy of the target metric is evaluated before and after the inferred router metric is added. If the accuracy improves, the process continues by selecting and adding the next inferred router metric with high feature importance and repeating the steps. This cycle continues until no additional improvements in accuracy are achieved, at which point the training model that achieved the highest accuracy is fixed. For performance evaluation, we use the coefficient of determination (\(R^2\)) is employed as the performance metric. eHere, \(R^2\) is described in \(R^2=1-\sum\left(y_i-\hat{y}\right)^2/\sum\left(y_i-\overline{y}\right)^2\), in which \(\overline{y}\) and \(\hat{y}\) represent the average measured and inferred values of router metrics \(y\), respectively. \(R^2= 1\) means a perfect fit of the model to the datasets. If the inferred values deviate too much from measured ones, then \(R^2\) could be negative.

Page top

6. Evaluation & Results

In this study, we preliminarily evaluated a router’s performance as a part of NDR. We evaluated the inferred accuracy in the extrapolation of router metrics using the proposed node modeling, with a focus on software routers. We assumed that the implementation of packet processing mechanisms of the software routers used in this evaluation was unknown in order to evaluate our methodology based on a data-driven approach.

6.1 Methodology for Evaluation

Our evaluation of the proposed method focused on four aspects: (1) the clarification of the feature importance for each router metric, (2) the effectiveness of adding other inferred router metrics to the training dataset, (3) the scalability of the proposed method for the extrapolation domain, and (4) the computation time. First, we evaluated the feature importance for each router metric, as shown in Sect. 6.3, to identify which router metrics affect the inference accuracy of router metrics. Next, we examined the inferred accuracy in the extrapolation of router metrics by adding other inferred router metrics to the training dataset in accordance with the feature importance described in Sect. 6.4. Then, we examined the scalability of the proposed method for the extrapolation domain by varying the value of the splitting ratio (\(\gamma \in [0,1]\)), as detailed in Sect. 6.5. Finally, in Sect. 6.6, we examined the practicality of the model’s training time (\(T_{\text{Train}}\)) and the time required to infer router metrics (\(T_{\text{Infer}}\)), considering that the proposed method tends to increase the computation time due to the characteristic of appending the inferred router metrics to the training datasets.

Here we define \(\gamma\), which splits the dataset into two portions. The smallest \(100\times\gamma\)% portion of the dataset sorted in ascending order by the router metric is designated as the lower portion, represented as \((x_{\mathrm{low}},y_{\mathrm{low}})\). The remaining \(100\times(1-\gamma)\)% is considered the upper portion, represented as \((x_{\mathrm{up}},y_{\mathrm{up}})\). For the lower portion, the dataset \((x_{\mathrm{low}},y_{\mathrm{low}})\) is divided into a training dataset comprising 70% of the data through random sampling, represented as \((x_{\mathrm{low}}^{\text{Train}},y_{\mathrm{low}}^{\text{Train}})\), with the remaining 30% used for testing, represented as \((x_{\mathrm{low}}^{\text{Test}},y_{\mathrm{low}}^{\text{Test}})\). For interpolation, the dataset \((x_{\mathrm{low}}^{\text{Train}},y_{\mathrm{low}}^{\text{Train}})\) serves as the training dataset while the remaining data \((x_{\mathrm{low}}^{\text{Test}},y_{\mathrm{low}}^{\text{Test}})\) is used for testing. Conversely, for extrapolation, the dataset \((x_{\mathrm{low}},y_{\mathrm{low}})\) serves as the training dataset, while the remaining data \((x_{\mathrm{up}},y_{\mathrm{up}})\) is used for testing. Note that as \(\gamma\) decreases, the extrapolation domain becomes more difficult to derive. This means that the smaller \(\gamma\) is, the more difficult to infer the extrapolation domain. In general, \(\gamma\) is set to 0.7 [31]. Therefore, the evaluations, except for scalability described in Sect. 6.5 were conducted with \(\gamma = 0.7\). If \(\gamma\) is set to higher than 0.8, overfitting may occur as most of the data is assigned to the training dataset. Conversely, if \(\gamma\) is set to less than 0.6 due to limited data availability, the model may not be able to represent the data points, resulting in underfitting. In Sect. 6.5, we evaluate the scalability of our proposed approach in terms of its ability to handle extrapolation. Specifically, we compare its inference accuracy with that of vanilla NP across varying values of \(\gamma\). We evaluate \(R^2\) for interpolation \((R_{\mathrm{in}}^2)\) and \(R^2\) for extrapolation (\(R_{\mathrm{ex}}^2\)) to evaluate how well each model inferred the router metrics: \(P_{\text{th}}\), \(P_{\text{loss}}\), and \(P_{\text{delay}}\). Then, \(R_{\mathrm{in}}^2\) and \(R_{\mathrm{ex}}^2\) are calculated using \((y_{\mathrm{low}}^{\text{Test}},\hat{y}_{\mathrm{low}}^{\text{Test}})\) and \((y_{\mathrm{up}},\hat{y}_{\mathrm{up}})\), respectively.

6.2 Experimental Conditions

The experimental setups are depicted in Fig. 3. The proposed method applies the NP to the measured results of the performance on four kinds of virtual routers: Cisco Cloud Services Router 1000 V [32], Juniper vMX Virtual Router [33], Vector Packet Processor [34], and Kamuee router [35] using a x86-based server; two Xeon E5-2697 18-cores 2.30 GHz CPUs and 192 GB RAM with 8 SFP+ ports. First, we measured the performance of these routers in the packet forwarding process in a laboratory environment to acquire training datasets under the conditions shown in Table 1. This condition is set up to evaluate the relationship between the router settings and traffic conditions and to bring about an increase in packet processing delay and packet losses due to queue overflow. In total, 930 samples were acquired from these routers. In this scenario, we measured \(P_{\text{th}}\) (bits per second), \(P_{\text{loss}}\) (per second), and \(P_{\text{delay}}\) (maximum per second) for each \(L\) and \(R\). Next, we modeled the predictor of router metrics on the basis of each measured router metric. Training datasets consist of \(R\) (number of physical ports: \(N_{\text{port}}\), number of flow entries: \(N_{\text{entries}}\), number of allocated virtual CPU cores: \(N_{\text{CPUs}}\), and size of memory allocation: \(S_{\text{Mem}}\)), and \(L\) (Ethernet frame size: \(S_{\text{Eth}}\), number of traffic flows: \(N_{\text{Flow}}\), and average rate of input traffic: \(R_{\text{Input}}\)). The Keysight Ixia platform with 8 SFP+ ports is used to generate the traffic in the experiments. The Ixia platform sends traffic to a single output port at a constant rate RInput from each port with a total of Nport in the fixed Ethernet frame size of \(S_{\text{Eth}}\).

Fig. 3 Experimental setup.

Table 1 Experimental conditions.

We conducted our work using PyTorch [36], which is widely used as an open-source deep learning library. The encoder and decoder of the NP are both composed of 15 hidden layers, where each layer has 128 hidden units. The dimensionality of \(Z\) and the context points \(C\) were set to 128 and 50, respectively. The Adam optimizer [37] is used with a learning rate of 1e\(^{-4}\) to train the NP. 6000 epochs were conducted to build each model on an Nvidia Tesla T4 card with 16 GB of memory.

6.3 Feature Importance for Each Router Metric Model

Figure 4 shows the feature importance of the top 5 features for three router metric models: (a) throughput model, (b) packet loss model, and (c) packet delay model. These values have been normalized to ensure their sum equals 1. Each feature importance incorporates other router metrics, indicating the potential for improving the accuracy of each router metric by including these metrics in the training datasets. We found a correlation between packet loss and throughput in each model. This is because the closer a router handles traffic to its maximum capacity, the more likely packets will be dropped due to buffer overflows, resulting from exceeding their processing capabilities. Conversely, in the packet delay model, packet loss emerged as a more influential factor. This is because router delay is primarily made up of queuing delay. An increase in queuing delay indicates that packets are accumulating in the packet buffer, leading to packet loss due to buffer overflows. Understanding these causal relationships enables the proposed method to enhance the accuracy of each model. By adding the inferred router metrics that contribute to each model, in descending order of feature importance, the proposed approach has the potential to improve the performance of the node modeling process.

Fig. 4 Feature importance for (a) \(P_{\text{th}}\), (b) \(P_{\text{loss}}\), and (c) \(P_{\text{delay}}\).

6.4 Inferred Accuracy in the Extrapolation of Router Metrics

We evaluated the inferred accuracy of NN, NP, and the proposed method for interpolation and extrapolation. Table 2 shows the results of \(R_{\mathrm{in}}^2\) and \(R_{\mathrm{ex}}^2\) values for the \(P_{\text{th}}\) compared with conventional methods: NN and NP at \(\gamma=0.7\). We found that \(R_{\mathrm{in}}^2\) value for each method was 0.97 or more. This means that the inference for interpolation was performed with good accuracy in each method. However, NN could not extrapolate, as shown by poor \(R_{\mathrm{ex}}^2\) values. This is because an NN maps virtually any function by adjusting its parameters in accordance with the presented training data. On the other hand, NP could extrapolate more than an NN by learning to map a context set of observed input-output pairs to a distribution over regression functions. However, \(R_{\mathrm{ex}}^2\) values with NP were still low: \(-0.7\). On the other hand, \(R_{\mathrm{ex}}^2\) values with the proposed method were improved to 0.65 by selecting and adding the other inferred router metrics (\(P_{\text{loss}}\) and \(P_{\text{delay}}\)) to the training datasets in line with the feature importance. From the above, we have found that NN is unsuitable for extrapolation, so we will compare the proposed method with vanilla NP in the following.

Table 2 Comparison with conventional methods on \(R_{\mathrm{in}}^2\) and \(R_{\mathrm{ex}}^2\) for the \(P_{\text{th}}\).

We then focused on evaluating how the proposed method’s approach of incorporating additional inferred router metrics, based on their feature importance, affects the accuracy of the router metric inference. The results, detailed in Table 3, show the \(R_{\mathrm{in}}^2\) and \(R_{\mathrm{ex}}^2\) values for these metrics within the training datasets at \(\gamma =0.7\). Training dataset (1) in Table 3 means the application of vanilla NP, which represents the conventional method. Our results showed improvements in the \(R_{\mathrm{ex}}^2\) values for each router metric when using the proposed method, with a notable improvement in the \(R_{\mathrm{ex}}^2\) values for \(P_{\text{loss}}\), which improved from \(-0.3\) to 0.65 due to the inclusion of highly-ranked inferred \(P_{\text{th}}\) in the training datasets. A similar improvement was seen in the \(R_{\mathrm{ex}}^2\) values for \(P_{\text{th}}\), rising from \(-0.7\) to 0.65. Despite these improvements, not every router metric’s \(R_{\mathrm{ex}}^2\) benefited from the additional datasets of inferred data. Specifically, the \(R_{\mathrm{ex}}^2\) values for \(P_{\text{loss}}\) gains plateaued on dataset (2) as shown in Table 3, due to errors inherent in the training datasets based on the accuracy of inference. Consequently, the addition of training datasets with inferred metrics must be balanced against the potential degradation from these errors, forming a crucial aspect in managing the trade-off between dataset enrichment and error-induced deterioration. Conversely, the improvement of \(R_{\mathrm{ex}}^2\) values for \(P_{\text{delay}}\) was from \(-0.8\) to 0.05, which is not as good as \(P_{\text{th}}\) and \(P_{\text{loss}}\). This may be attributed to the influence of uncertainty factors such as packet jitter. Therefore, we still need to improve the algorithm. Nevertheless, the \(R_{\mathrm{in}}^2\) values for \(P_{\text{delay}}\) improved from 0.89 to 0.99 in a certain effect.

Table 3 \(R_{\mathrm{in}}^2\) and \(R_{\mathrm{ex}}^2\) results for each metric model at \(\gamma=0.7\).

To analyze the overall trend of errors, we show the inferred accuracy for each router metric. Figure 5 shows the inferred accuracy for (a), (b) \(P_{\text{th}}\); (c), (d) \(P_{\text{loss}}\); and (e), (f) \(P_{\text{delay}}\) in the interpolation and extrapolation domains at \(\gamma=0.7\). Specifically, plots (a), (c), and (e) are the results of applying the conventional method (vanilla NP), while plots (b), (d), and (f) are the results of applying our proposed method. For each evaluation, the composition of the training datasets was as follows: datasets (a), (c), and (e) used only \(L\) and \(R\); dataset (b) used \(L\) and \(R\) with the inferred \(P_{\text{loss}}\) and \(P_{\text{delay}}\); dataset (d) used \(L\) and \(R\) with the inferred \(P_{\text{th}}\); and dataset (f) used \(L\) and R with the inferred \(P_{\text{loss}}\). The blue and orange dots represent the measured data vs. the inferred data for interpolation and extrapolation, respectively. In Fig. 5, the proposed method, which has training datasets with other inferred router metrics, shows a correlation between the inferred and measured data compared to the conventional method. The inference errors for extrapolation were still included in each inferred router metric. Nevertheless, the \(R_{ex}^2\) was improved up to 0.65 at maximum.

Fig. 5 Inferred accuracy: (a), (b) \(P_{\text{th}}\); (c), (d) \(P_{\text{loss}}\); and (e), (f) \(P_{\text{delay}}\) - (a), (c), (e) with vanilla NP, (b), (d), (f) with proposed method; datasets (a), (c), and (e) using \(L\) and \(R\) only; dataset (b) using \(L\) and \(R\) with the inferred \(P_{\text{loss}}\) and \(P_{\text{delay}}\); dataset (d) using \(L\) and \(R\) with the inferred \(P_{\text{th}}\); and dataset (f) using \(L\) and \(R\) with the inferred \(P_{\text{loss}}\) in the interpolation and extrapolation domain at \(\gamma=0.7\).

6.5 Scalability for Extrapolation Domain

Here, the evaluation focused on the scalability of the proposed method in terms of its ability to infer router metrics in extrapolation over different \(\gamma\) values. Table 4 shows the results of the \(R_{\mathrm{in}}^2\) and \(R_{\mathrm{ex}}^2\) values for the inferred router metrics at these different \(\gamma\) levels. \(R_{\mathrm{ex}}^2\) for the proposed method performs better than that for vanilla NP for \(\gamma\) values between 0.5 and 0.7 on the given dataset. In particular, at \(\gamma=0.6\), the proposed method improves \(R_{\mathrm{ex}}^2\) for \(P_{\text{th}}\), \(P_{\text{loss}}\), and \(P_{\text{delay}}\). The \(R_{\mathrm{ex}}^2\) of \(P_{\text{th}}\) increased from \(-0.71\) to 0.62, for \(P_{\text{loss}}\) from \(-0.36\) to 0.59, and for \(P_{\text{delay}}\) from \(-0.85\) to 0.01. This improvement in the extrapolation domain is attributed to the synergy between the NP’s “learning how to learn” feature and the proposed method’s appending of inferred router metrics to the dataset. However, for \(\gamma\) values less than 0.5, the performance of our method decreased to the same level as that of vanilla NP, mainly due to the limited data availability, which hindered the model’s ability to effectively represent the data points. In addition, the accuracy of the inferred \(P_{\text{delay}}\) was notably lower than that of \(P_{\text{th}}\), \(P_{\text{loss}}\). This may be due to fluctuations such as packet jitters, which affect the inference of \(P_{\text{delay}}\) more significantly than other metrics.

Table 4 Evaluation results of \(R_{\mathrm{in}}^2\) and \(R_{\mathrm{ex}}^2\) for each emetrics model on training datasets.

6.6 Computation Time for Training and Inference

Table 5 shows the results of \(T_{\text{Train}}\) and \(T_{\text{Infer}}\) for the \(P_{\text{th}}\), \(P_{\text{loss}}\), and \(P_{\text{delay}}\) models, respectively. These results are derived from the dataset conditions where \(R_{\mathrm{ex}}^2\) showed the most improvement in Table 3. For the vanilla NP method, \(T_{\text{Train}}\) averages approximately 74 seconds across all metrics, while \(T_{\text{Infer}}\) averages approximately 0.45 seconds. Conversely, the proposed method results in an average increase in \(T_{\text{Train}}\) of over 173 seconds per metric. The \(T_{\text{Infer}}\) is approximately 0.6 seconds for each metric. As shown in these results, the \(T_{\text{Train}}\) in the proposed method was two to three times longer than that of the conventional method. This increase is due to the inclusion of inferred router metrics in the training dataset, which results in a longer computation time proportional to the number of inferred router metrics. Specifically, for the \(P_{\text{th}}\), \(P_{\text{loss}}\), and \(P_{\text{delay}}\) models, inferred \(P_{\text{loss}}\) and \(P_{\text{delay}}\), inferred \(P_{\text{th}}\), and inferred \(P_{\text{loss}}\) were included in the training dataset, respectively. As a result, the \(T_{\text{Train}}\) for the \(P_{\text{th}}\), \(P_{\text{loss}}\), and \(P_{\text{delay}}\) models experienced an increase, tripling for the first and doubling for the rest, compared to the vanilla NP method. However, since these models are pre-trained before use, an increase in \(T_{\text{Train}}\) of a few minutes is not problematic. In contrast, \(T_{\text{Infer}}\) is more critical in operational contexts, especially when network control is based on inferred network performance. Throughout the experiments, \(T_{\text{Infer}}\) was held for approximately 0.5 seconds. This time is considered acceptable given the speed at which infrastructure enhancements can be implemented on a monthly basis, such as in an assumed fixed network environment.

Table 5 \(T_{\text{Train}}\) and \(T_{\text{Infer}}\) for each model.

Page top

7. Discussion

This study introduced a machine learning-based approach for network node modeling, using other estimated router metrics in the training dataset for extrapolation. This method enhances the accuracy of inferring router metrics like \(P_{\text{th}}\), \(P_{\text{loss}}\), and \(P_{\text{delay}}\) in the extrapolation domain. However, it is essential to understand that inferred router metrics are subject to inference errors. Given this context, the simple addition of these inferred other metrics into the training datasets presents complexities. A key challenge is to balance the benefits of adding inferred other router metrics to enrich the training dataset against the potential deterioration due to errors in these inferred metrics. To address this issue, the proposed method first decides which inferred router metrics to include on the basis of the permutation feature importance [30] of each router metric model. This iterative process continuously refines the inference accuracy of the router metrics.

On the other hand, the proposed method is characterized by integrating the estimated router metrics into the training dataset. This integration leads to an increase in computation time. Within the fixed network configuration assumed in our study, the maximum computation time for inferring router metrics did not exceed 1 second. From a practical point of view, this slight increase in time is considered insignificant when applying these configurations to the assumed fixed network. The results show that although our method requires a slightly longer computation time, it is acceptable for management and configuration tasks in the assumed fixed network.

While this study focused on inferring throughput, packet loss, and packet delay, the data-driven nature of the proposed method is potentially applicable to other performance metrics. For example, it could be adapted to estimate metrics such as packets per second (pps) and packet jitter, provided sufficient data is available for training the models. The key consideration for successful application to new metrics is whether the target metric has correlations with other available metrics. The proposed algorithm uses the feature importance of each router metric model to find the relevance of other router metrics to the target router metric. Then, the proposed method adds other inferred router metrics to the training dataset to improve the inferred accuracy of the target router metric. Therefore, the crucial consideration is whether other metrics are correlated with the target metric.

7.1 Limitations and Future Challenges

(1) Packet jitter-aware node modeling:

This study examined the inference accuracy of \(P_{\text{th}}\), \(P_{\text{loss}}\), and \(P_{\text{delay}}\). However, the results of \(P_{\text{delay}}\) were not as good as those of \(P_{\text{th}}\) and \(P_{\text{loss}}\). This may be attributed to the influence of uncertainty factors such as packet jitter. Therefore, future tasks include improving the algorithm by incorporating governing equations that represent packet traffic multiplexing and using machine learning for node modeling to address packet jitter.

(2) Adaptation to dynamic external environments:

The proposed node modeling, focusing on the packet routers in the fixed network, needs to be further evaluated to assess its applicability in dynamic external environments like non-terrestrial networks (NTNs). In NTN scenarios, unforeseen challenges from unaccounted environmental changes are a crucial consideration for model generalization. Addressing this challenge requires the development of robust network node modeling algorithms capable of adapting to varying weather conditions and link quality variations. Future work could focus on integrating real-time traffic data to refine prediction accuracy under dynamic network conditions.

(3) Generalization of node modeling towards short-term network resource allocation and various traffic distributions:

In this study, we focused on a node model to evaluate its performance for long-term network planning, such as a period of months, aimed at upgrading and scaling the network infrastructure. For this purpose, we focused on using average traffic volumes. However, from a network operator’s perspective for shorter-term network resource allocation, a node model is needed that adapts to bursty traffic scenarios. For example, ultra-reliable and low-latency communication services such as remote surgery and vehicle-to-X communication in 5G networks require network resource efficiency and immediate resource orchestration due to their short-term bursty traffic [38]. This approach contrasts with the long-term network planning described in Sect. 4.3, highlighting the importance of promptly responding to bursty traffic and quickly allocating network resources. Therefore, node modeling needs to be generalized to improve its applicability to the bursty traffic for this kind of short-term operation.

Furthermore, this study focused on the aggregation traffic conditions. However, in real network routers, there are different patterns of traffic exchange. For example, traffic arriving at an input port may be evenly distributed to all output ports, or traffic arriving at an input port may be sent to only one output port. This variation leads to inference inaccuracies if there is a mismatch between the traffic pattern used to train the model and the traffic pattern used for inference. Therefore, the applicability of different traffic patterns is closely related to the challenge of generalizing the node modeling. In light of these limitations, methodological advancements beyond the current focus on improving extrapolation accuracy are required to effectively address this issue.

A potential solution is to employ a reinforcement learning approach [39], which could enhance the generalization of node modeling by allowing the model to learn from varied traffic environments. Although Ref. [39] is specifically designed for urban traffic signal control, it utilizes the Q-value network to learn the value of actions in different environments, enabling the model to adapt to new traffic flows. Therefore, this approach has the potential to improve the generalization of node modeling for a wide range of carrier network traffic patterns.

(4) Applicability for other hardware platforms:

This study evaluated the model using data from only the software router platform (x86-based server). However, various hardware-based platforms exist for router settings, such as P4 programmable switch [40], smartNICs [41], and FPGA NICs [42]. Our proposed method takes a data-driven approach to infer the metrics on a black-boxed router whose implementations of packet processing mechanisms are unclear. Essentially, if the environment can collect the input/output data from the target router, including router settings, traffic information, and router metrics such as packet delay, packet loss, and throughput, then the proposed method based on the data-driven approach could be considered applicable to other router hardware platforms. However, key feature metric exploration to improve the accuracy of extrapolation for the target router metric, which is central to our proposed method, may vary depending on the target router platform. Therefore, feature metric exploration tailored to the target router platform is a future challenge.

(5) Applicability to Stateful Data Plane:

Traditional IP-based networking may not be sufficient to support future network requirements such as ultra-fast mobile broadband services and low-latency applications. Content-Centric Networking (CCN) [43] is a candidate for supporting these requirements. The CCN is characterized by its efficient content delivery, content naming, and inherent support for mobility and in-network caching, which reduce bandwidth usage and improve the user experience. However, the CCN requires stateful data planes to perform efficient in-network caching beyond simple packet forwarding. The proposed node modeling in this study focuses on a simple scenario of aggregating the traffic. Therefore, our current model is not designed to handle stateful data planes, such as the CCN router [43] or L4 stateful network functions, for example, L4 load balancer [44], which perform different processing depending on their states. In the CCN, factors such as cache size, content popularity fluctuations, and network traffic variability could make predicting router performance challenging.

One approach to address these challenges is to incorporate time-series content caching analysis on the basis of popularity and priority using sequence-to-sequence (seq2seq) LSTM [45]. The seq2seq LSTM model captures temporal patterns and dependencies in content popularity and request sequences. This approach could potentially lead to improved accuracy in performance prediction for stateful data planes such as the CCN routers.

Page top

8. Conclusion

We proposed a Neural Processes (NP)-based node modeling that digitally evaluates the performance of actual network equipment for the extrapolation domain not covered in the training dataset. The novelty of the proposed algorithm lies in its approach to iteratively append inferred router metrics to the training datasets based on feature importance to improve the extrapolation accuracy. We demonstrated the effectiveness of the proposed method for inferring the accuracy in the extrapolation of router metrics using software routers. Our method improved the coefficient of determination (\(R^2\)) for the inferred router metrics (packet loss rates \(P_{\text{loss}}\), throughput \(P_{\text{th}}\), and packet delay \(P_{\text{delay}}\)) in the extrapolation, reaching as high as 0.65. Furthermore, we demonstrated the scalability of the proposed method for smaller values of splitting ratio (\(\gamma\)), which represents the size of the extrapolation domain. The results indicate that our proposed method outperforms vanilla NP, where \(\gamma\) was between 0.6 and 0.7 on the given dataset. In particular, at \(\gamma=0.6\), our method improves the \(R^2\) for extrapolation (\(R_{ex}^2\)) of \(P_{\text{th}}\) and \(P_{\text{loss}}\); the \(R_{ex}^2\) of \(P_{\text{th}}\) increased from \(-0.71\) to 0.62 and that of \(P_{\text{loss}}\) increased from \(-0.36\) to 0.59. Conversely, the improvement of \(R_{\mathrm{ex}}^2\) for \(P_{\text{delay}}\) was from \(-0.85\) to 0.01, which was a certain improvement. However, this improvement is not as good as those of \(P_{\text{th}}\) and \(P_{\text{loss}}\) because the \(P_{\text{delay}}\) has an uncertainty factor like packet jitter. Additionally, we examined how incorporating other inferred router metrics into the training datasets affects computation time. The computation times for training and inference in the proposed method were approximately several minutes and less than 1 second, respectively. This time is considered acceptable given the speed at which infrastructure upgrades can be implemented on a monthly basis, such as in an assumed fixed network environment. In summary, we found that the proposed method has the potential to improve the inferred accuracy of extrapolated router metrics in the digital domain on the given software router’s dataset.

Page top

References

[1] M. Giordani, M. Polese, M. Mezzavilla, S. Rangan, and M. Zorzi, “Toward 6G networks: Use cases and technologies,” IEEE Commun. Mag., vol.58, no.3, pp.55-61, March 2020.
CrossRef

[2] W. Rafique, L. Qi, I Yaqoob, M. Imran, R. ur Rasool, and W. Dou, “Complementing IoT services through software defined networking and edge computing: A comprehensive survey,” IEEE Commun. Surveys Tuts., vol.22, no.3, pp.1761-1804, 2020.
CrossRef

[3] K. Ishii, R. Matsumoto, T. Inoue, and S. Namiki, “Disaggregated optical-layer switching for optically composable disaggregated computing,” J. Opt. Commun. Netw., vol.15, no.1, pp.A11-A25, 2023.
CrossRef

[4] K. Hattori, T. Korikawa, C. Takasaki, H. Oowada, and M. Shimizu, “Recursive router metrics prediction using ML-based node modeling for network digital replica,” Proc. IEEE GLOBECOM, 2022.
CrossRef

[5] K. Hattori, T. Korikawa, C. Takasaki, and H. Oowada, “Recursive router metrics prediction using machine learning-based node modeling for network digital replica,” IEEE Access, vol.11, pp.138638-138654, 2023.
CrossRef

[6] M. Garnelo, J. Schwarz, D. Rosenbaum, F. Viola, D.J. Rezende, S.M. Ali Eslami, and Y.W. Teh, “Neural processes,” arXiv:1807.01622, 2018.
CrossRef

[7] M. Grieves and J. Vickers, “Digital twin: Mitigating unpredictable, undesirable emergent behavior in complex systems,” Transdisciplinary Perspectives on Complex Systems, pp.85-113, Springer, 2017.
CrossRef

[8] L. Li, S. Aslam, A. Wileman, and S. Perinpanayagam, “Digital twin in aerospace industry: A gentle introduction,” IEEE Access, vol.10, pp.9543-9562, 2022.
CrossRef

[9] M. Borgo, S. Elliott, T. Ghandchi, and I. Stothers, “Virtual sensing of wheel position in ground-steering systems for aircraft using digital twins,” Proc. 38th IMAC, a Conference and Exposition on Structural Dynamics, pp.1-12, 2020.
CrossRef

[10] P. Almasan, M. Ferriol-Galmés, J. Paillisse, J. Suárez-Varela, D. Perino, D. López, A.A.P. Perales, P. Harvey, L. Ciavaglia, L. Wong, V. Ram, S. Xiao, X. Shi, X. Cheng, A. Cabellos-Aparicio, and P. Barlet-Ros, “Network digital twin: Context, enabling technologies and opportunities,” arXiv:2205.14206, 2022.
CrossRef

[11] P. Kumar, R. Kumar, A. Kumar, A.A. Franklin, S. Garg, and S. Singh, “Blockchain and deep learning for secure communication in digital twin empowered industrial IoT network,” IEEE Trans. Netw. Sci. Eng., vol.10, no.5, pp.2802-2813, 2023.
CrossRef

[12] S. Vakaruk, A. Mozo, A. Pastor, and D.R. Lopez, “A digital twin network for security training in 5G industrial environments,” Proc. IEEE 1st Int. Conf. Digit. Twins Parallel Intell. (DTPI), pp.395-398, July 2021.
CrossRef

[13] H. Shin, S. Oh, A. Isah, I. Aliyu, J. Park, and J. Kim, “Network traffic prediction model in a data-driven digital twin network architecture,” Electronics, vol.12, no.18, 3957, 2023.
CrossRef

[14] B. Erman and C. Di Martino, “Generative network performance prediction with network digital twin,” IEEE Network, vol.37, no.2, pp.286-292, 2023.
CrossRef

[15] “ns-3,” https://www.nsnam.org, accessed April 2, 2024.
URL

[16] J. Lai, J. Tian, K. Zhang, Z. Yang, and D. Jiang, “Network emulation as a service (NEaaS): Towards a cloud-based network emulation platform,” Mobile Networks and Applications, pp.1-15, 2020.

[17] K. Hattori, T. Korikawa, C. Takasaki, H. Oowada, M. Shimizu, and N. Takaya, “Network digital replica using neural-network-based network node modeling,” Proc. IEEE NetSoft, 2022.
CrossRef

[18] K. Xu, M. Zhang, J. Li, S.S. Du, K. Kawarabayashi, and S. Jegelka, “How neural networks extrapolate: From feedforward to graph neural networks,” ICLR, 2021.

[19] J. Decugis, M. Emerling, A. Ganesh, A.Y. Tsai, and L. El Ghaoui, “On the abilities of mathematical extrapolation with implicit models,” NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications.

[20] A.G. Wilson, E. Gilboa, A. Nehorai, and J.P. Cunningham, “GPatt: Fast multidimensional pattern extrapolation with Gaussian processes,” arXiv preprint arXiv:1310.5288, 148, 2013.
CrossRef

[21] K. Suksomboon, M. Fukushima, S. Okamoto, and M. Hayashi, “A dilated-CPU-consumption-based performance prediction for multi-core software routers,” IEEE NetSoft Conference and Workshops (NetSoft), pp.193-201, 2016.
CrossRef

[22] S. Gallenmüller, P. Emmerich, F. Wohlfart, D. Raumer, and G. Carle, “Comparison of frameworks for high-performance packet IO,” ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), pp.29-38, 2015.
CrossRef

[23] L. Rizzo, “netmap: a novel framework for fast packet I/O,” USENIX Annual Technical Conference, April 2012.

[24] “PF_RING ZC,” http:/slash www.ntop.org/products/pf_ring/pf_ring-zc-zero-copy/ (accessed April 2, 2024).
URL

[25] “DPDK,” http://www.dpdk.org (accessed April 2, 2024).
URL

[26] V. Riccobene, M.J. McGrath, M.-A. Kourtis, G. Xilouris, and H. Koumaras, “Automated generation of VNF deployment rules using infrastructure affinity characterization,” IEEE NetSoft Conference and Workshops (NetSoft), pp.226-233, 2016.
CrossRef

[27] W. Yoo, and A. Sim, “Time-series forecast modeling on high-bandwidth network measurements,” J. Grid Computing, vol.14, 463-476, 2016.
CrossRef

[28] Z. Zhang, X. Wang, J. Xie, H. Zhang, and Y. Gu, “Unlocking the potential of deep learning in peak-hour series forecasting,” Proc. 32nd ACM International Conference on Information and Knowledge Management, pp.4415-4419, 2023.
CrossRef

[29] P. Biswas, M.S. Akhtar, S. Saha, S. Majhi, and A. Adhya, “Q-learning-based energy-efficient network planning in IP-Over-EON,” IEEE Trans. Netw. Service Manag., vol.20, no.1, pp.3-13, 2022.
CrossRef

[30] C. Molnar, Interpretable Machine Learning, Lulu.com, 2019.

[31] H. Liu and M. Cocea, “Semi-random partitioning of data into training and test sets in granular computing context,” Granul. Comput., vol.2, no.4, pp.357-386, 2017.
CrossRef

[32] Cisco, “Cisco Cloud Services Router 1000v Series,” https://www.cisco.com/c/en/us/products/routers/cloud-services-router-1000v-series/index.html (accessed April 2, 2024).
URL

[33] Juniper, “vMX Virtual Router,” https://www.juniper.net/gb/en/products/routers/mx-series/vmx-virtual-router-datasheet.html (accessed April 2, 2024).
URL

[34] D. Barach, L. Linguaglossa, D. Marion, P. Pfister, S. Pontarelli, and D. Rossi, “High-speed software data plane via vectorized packet processing,” IEEE Commun. Mag., vol.56, no.12, pp.97-103, Dec. 2018.
CrossRef

[35] Y. Ohara, H. Shirokura, A.D. Banik, Y. Yamagishi, and K. Kyunghwan, “Kamuee: An IP packet forwarding engine for multi-hundred-gigabit software-based networks,” Proc. Internet Conference, 2018.

[36] A. Paszke, et al., “PyTorch: An imperative style, high-performance deep learning library,” Adv. Neural Inf. Process. Syst., vol.32, pp.8024-8035, 2019.

[37] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” ICLR, 2015.

[38] P. Yang, X. Xi, T.Q.S. Quek, J. Chen, X. Cao, and D. Wu, “How should I orchestrate resources of my slices for bursty URLLC service provision?,” IEEE Trans. Commun., vol.69, no.2, pp.1134-1146, Feb. 2021.
CrossRef

[39] H. Zhang, C. Liu, W. Zhang, G. Zheng, and Y. Yu, “GeneraLight: Improving environment generalization of traffic signal control via meta reinforcement learning,” Proc. Conf. Inf. Knowl. Manage., pp.1783-1792, 2020.
CrossRef

[40] E.F. Kfoury, J. Crichigno, and E. Bou-Harb, “An exhaustive survey on P4 programmable data plane switches: Taxonomy, applications, challenges, and future trends,” IEEE Access, vol.9, pp.87094-87155, 2021.
CrossRef

[41] M. Liu, T. Cui, H. Schuh, A. Krishnamurthy, S. Peter, and K. Gupta, “Offloading distributed applications onto SmartNICs using iPipe,” Proc. 2019 ACM SIGCOMM Conference, pp.318-333, 2019.
CrossRef

[42] M.S. Brunella, G. Belocchi, M. Bonola, S. Pontarelli, G. Siracusano, G. Bianchi, A. Cammarano, A. Palumbo, L. Petrucci, and R. Bifulco, “hXDP: Efficient software packet processing on FPGA NICs,” Commun. ACM, vol.65, no.8, pp.92-100, 2022.
CrossRef

[43] S. Kumar, R. Tiwari, M.S. Obaidat, N. Kumar, and K.-F. Hsiao, “CPNDD: Content placement approach in content centric networking,” Proc. IEEE ICC, 2020.
CrossRef

[44] D.E. Eisenbud, et al., “Maglev: A fast and reliable software network load balancer,” Proc. 13th USENIX Symp. Netw. Syst. Design Implement. (NSDI), pp.523-535, 2016.

[45] M.W. Kang and Y.W. Chung, “Content caching based on popularity and priority of content using seq2seq LSTM in ICN,” IEEE Access, vol.11, pp.16831-16842, 2023.
CrossRef

Page top

Authors

Kyota Hattori

is a Senior Research Engineer of Network Service Systems Laboratories in Nippon Telegraph and Telephone Corporation (NTT), Tokyo, Japan. He received a B.E. in applied physics, an M.E. in computational science and engineering from Nagoya University, and a Ph.D. in information science and technology from Hokkaido University in 2004, 2006, and 2019. In 2006, he joined NTT Network Service Systems Laboratories, where he has been engaging in research into traffic flow control and optical network architecture. He received the Award for SC4 Best Paper from Photonics in Switching 2012 (PS2012) and the Young Researcher’s Award from IEICE in 2013. He holds the CISSP certification.

Tomohiro Korikawa

received his B.S. and M.S. degrees in physics from Waseda University, Tokyo, Japan, in 2012 and 2014 and his Ph.D. degree in informatics from Kyoto University, Kyoto, Japan, in 2021. In 2014, he joined NTT, where he is a researcher in Network Service Systems Laboratories. His research interests include network architecture, network design, and network digital twins.

Chikako Takasaki

received her B.S. and M.S. degrees in informatics from Ochanomizu University, Tokyo, Japan, in 2019 and 2021. In 2021, she joined NTT, where she is a researcher in Network Service Systems Laboratories. Her research interests include network architecture, network digital twins, and machine learning for networks.