Reviewing speed test results

Past speed tests are summarized on a bond’s Speed Tests page. Each test’s description and summary of results are shown. To view detailed results, click the image0 icon beside the summary.

The speed test details page provides the following information:

Configuration

Target

The target of the test; either a bond or leg.

Description

The configuration of the test, including the protocol, direction, and length.

Submitted time

The time the test was submitted. This may not be the time the test was actually run, since a test may need to wait for other tests to finish or for queued configuration updates to be performed.

Bond configuration

The configuration of the bond and legs when the test was registered. This is only shown for bond tests.

Aggregator

The aggregator to which the bond was assigned when the test was submitted.

Versions

The versions of Bonding on the aggregator and bonder at the time the test was registered.

Results

Test results include details about throughput, latency, send queue, and packet loss.

Throughput

Throughput indicates the rate that data was received, in Mbps and packets per second. It’s normal for this to be somewhat less than the rate limit, even if the rate limit is less than the bandwidth provided by the leg.

Latency

Latency (also known as round-trip time or RTT) measures the average time taken for a packet to travel from a sending host to the receiving host and back. If the latency is relatively large, it indicates that the rate limit of the test was greater than the bandwidth of the leg and packets were placed in long queues. If the latency is relatively small, it indicates that the rate limit of the test was less than the bandwidth of the leg and packets were transmitted without being placed in long queues.

The standard deviation of the latency measurements is also shown, along with the number of samples taken.

Send queue

Send queue indicates the average amount of data waiting to be sent during the test. It is only measured for leg tests with a rate limit. The size of the send queue indicates where throttling is taking place- if the send queue size is 0, then the rate limit is higher than the real bandwidth of the leg, but if the size is greater than 0, then the rate limit is less than the bandwidth of the leg. This indicator should be considered less accurate than latency when identifying the real bandwidth provided by the leg.

A send queue is similar to a line of shoppers waiting to check out at a supermarket. New customers go to the end of the queue. If shoppers enter the queue at a faster rate than the clerk can process them, the queue becomes larger. If they enter at a lower rate than the clerk can process them, the queue becomes smaller. When the queue is large, customers wait a long time for service, even if they are arriving at the same rate as they are checked out. In a similar way, packets enter a queue for sending on each leg. The send queue changes size based on the rate of traffic arriving to the queue. Larger queues result in longer waits for packets that arrive at the end of the queue, resulting in larger round trip times.

Packet loss

Packet loss shows the proportion of packets sent that were never received by the receiving host. The acceptable level of packet loss is different for different types of tests:

  • For TCP tests, packet loss can be anywhere from 0 to 5%.

  • For UDP tests with a rate limit less than the bandwidth provided by the leg, packet loss should be 0.

  • For UDP tests with a rate limit greater than the bandwidth provided by the leg, packet loss can be almost any value—even over 90% in extreme cases.

If packet loss is outside these bounds, it may indicate an issue with the leg.

Testing real packet loss on a leg can only be done with a UDP test and a rate limit known to be lower than the bandwidth provided by the leg. For example, on a healthy 3 Mbps DSL leg, a UDP test with a rate limit of 0.1 Mbps and a payload size of 10 bytes should have no packet loss.

Charts

Charts display the performance of the test over time.

Throughput

The throughput chart shows the speeds observed by the receiving host during the test. A good bond or leg will show little variation in speed over the duration of the test. A high level of variation in a TCP bond test indicates that leg speeds may be set too high, a different packet distribution algorithm should be used, or the reorder max-hold value may be set too low. A high level of variation in a leg test indicates that non-bonded traffic is on the leg or that the leg simply doesn’t provide stable bandwidth. A leg that cannot offer stable speeds will reduce the overall throughput in a bond.

Below, a test with very stable speeds. image1

Below, a test with very unstable speeds. image2

Latency

The latency chart shows the latency during the test.

Below, a test with relatively consistent, low latency. This indicates the speed of the test was lower than the bandwidth offered by the bond or leg.

image3

Below, a test with extremely high latency. This indicates that the speed of the test was higher than the bandwidth offered by the bond or leg.

image4

Send queue

The send queue chart shows the size of the send queue during the test.

Below, a test with a varying send queue size. This is normal when the rate limit of the test is lower than the bandwidth provided by the leg. If the send queue size is consistently 0, it indicates the rate limit of the test is greater than the bandwidth provided by the leg.

image5

CPU load

The CPU load chart shows the load on each CPU core for the duration of the test.

Below, a test on a bonder that was very CPU limited, while the aggregator had no such problems.

cpu-load-chart

Also, a warning will be shown in this case, as the results from the test can be unreliable when the CPU is the limiting factor.

cpu-warning

Issues when running at high CPU loads

When a node has high CPU usage, the difference between the method the speed tests use for test data and from real data passing through the tunnel can cause large differences in bandwidth usage vs CPU load, especially in environments with low latency links. This is generally observed as the speed tests returning significantly higher values than the CPU would otherwise be able to manage.

When running near CPU limits, latency control and QoS become unreliable so it is recommended to either rate limit the legs to a lower speed to prevent the CPU from becoming a bottleneck or upgrading the hardware to have a faster CPU.