In order to test the concept, a simple system was simulated to represent the predictive management protocol just described. Note that none of the previous assumptions are made in the simulation. The purpose of this simulation is to determine if the concept is feasible . A key question this simulation attempts to answer is whether the overhead/performance ratio results in a useful system. A small closed queuing network with FCFS servers is used to represent the actual system. Figure 1 shows the real system to be managed and the predictive management model. In this initial feasibility study, the managed system and the predictive management model are both modeled with Maisie. The verification query between the real system and the management model are explicitly illustrated in Figure 1.
Figure 1: Initial Feasibility Network Model
The system consists of three switch-like entities, each switch contains a single queue and switches consisting of 10 exponentially distributed servers which must sequentially service each packet. A mean service time of 10 time units is assumed. The servers represent the link rate. The packet is then forwarded with equal probability to another switch, including itself. Each switch is a driving process; the switches forward real and virtual messages. The cumulative number of packets which have entered each switch and queue is the state. This is similar to SNMP  statistics monitored by SNMP Counters, for example, the ifInOctets counter in MIB-II interfaces .
Both real and virtual messages contain the time at which service ends and a count of the number of times a packet has entered a switch. The switches are fully connected. An initial message enters each queue upon startup to associate a queue with its switch. This is the purpose of the idmsg which enters the queues in Figure 1. The predictive system parameters are more compactly identified as a triple consisting of Lookahead Window Size (seconds), Tolerance (counter value), and Verification Query Period (seconds) in the form . The effect of these parameters are examined on the system of switches previously described. The simulation was run with the following triples: (5,10,5), (5,10,1), (5,3,5), (400,5,5). The graphs which follow show the results for each triple.
The first run parameters were (5, 10, 5). There were no state verification rollbacks although there were some causality induced rollbacks as shown in Figure 2. GVT increased almost instantaneously versus real time; at times the next event far exceeded the look-ahead window. This is the reason for the nearly vertical jumps in the GVT as a function of real-time graph as shown in Figure 2. The state graph for this run is shown in Figure 3.
Figure 2: Rollbacks Due to State Verification Failure (5, 10, 5)
Figure 3: State (5, 10, 5)
In the initial implementation, state verification was performed in the LP immediately after each new message was received. However, the probability that an LP had saved a future state, while processing at its LVT, with the same state save time as the time at which a real message arrived was low. Thus, there was frequently nothing with which to compare the current state in order to perform the state verification. However, it was observed that the predictive system was simulating up to the lookahead window very quickly and spending most of its time holding, during which time it was doing nothing. The implementation was modified so that each entity would perform state verification during its hold time . This design change better utilized the processors and resulted in more accurate alignment between the actual and logical processes.
The results for the (5, 10, 1) run were similar, except that the predictive and actual system comparisons were more frequent because the state verification period had been changed from once every 5 seconds to once every second. Error was measured as the difference in the predicted LP state versus the actual system state. This run showed errors that were greater than those in the first run, great enough to cause state verification rollbacks. The error levels for both runs are shown in Figures 4 and 5. The state graph for this run is shown in Figure 6.
Figure 4: Amount of Error (5, 10, 5)
Figure 5: Amount of Error (5, 10, 1)
Figure 6: State (5, 10, 1)
The next run used (5, 3, 5) parameters. Here we see many more state verification failure rollbacks as shown in Figure 7. This is expected since the tolerance has been reduced from 10 to 3. The cluster of causality rollbacks near the state verification rollbacks was expected. These clusters of causality rollbacks do not appear to significantly reduce the feasibility of the system. The real-time versus GVT plot as shown in Figure 7 shows much larger jumps as the LPs were held back due to rollbacks. The entities had a larger variance in their hold times than the (5, 10, 5) run. The state graph for this run is shown if Figure 8.
Figure 7: Rollbacks Due to State Verification Failure (5, 3, 5)
Figure 8: State (5, 3, 5)
A (400, 5, 5) run showed the GVT jump quickly to 400 and then gradually increase as the sliding lookahead window maintained a 400 time unit lead as shown in Figure 9. The LP hold times were shorter here than an any previous run. The state graph for this run is shown in Figure 10.
Figure 9: Rollbacks Due to State Verification Failure (400, 5, 5)
Figure 10: State (400, 5, 5)
This set of results is interesting because it shows the system to be stable with the introduction of state verification rollbacks. The overhead introduced by these rollbacks did not greatly impact the performance, because as previously shown in the GVT versus time graphs, Figures 2, 7 and 9, the system was always able to predict up to its lookahead time very quickly.