Supported environments for nodes

Bare metal is generally recommended for nodes. The networking software is particularly sensitive to latency, which cannot be typically guaranteed in virtualized environments.

In addition to bare-metal devices, the following virtualization technologies are used during development and testing:

  • LXC

  • Systemd-nspawn

  • QEMU/KVM

During the release-candidate phase, the software is also tested in a production VMWare environment.

There are other environments that will work, but we cannot guarantee their operation in all situations.

Note

OpenVZ is known to be incompatible.

Optimizing IRQ handling

One potential bottleneck for server performance is the rate at which the server can process interrupt requests (IRQs) from the NIC. In the context of bonding, this typically appears as one or two server cores totally pinned by a ksoftirq worker. This issue is known to occur most frequently on private WAN routers, but can also occur on aggregators handling many connections.

Note

This issue is not strictly caused by a large volume of traffic, but rather how many interrupts are generated by the traffic (a single 100 Gb connection is much less IRQ-intensive than one thousand 100 Mb connections).

Note

The TCP proxy directly increases the ratio of interrupts generated per bit of traffic, and as such nodes hosting many bonds using the proxy are particularly susceptible to this bottleneck.

In general, there are two ways to go about improving interrupt handling. Informally, they are “distribute interrupts across cores”, and “generate less interrupts in the first place”. Formally, these ideas are called “IRQ affinity balancing” and “IRQ coalescing”, and they are explained in more detail below.

IRQ affinity balancing

Linux uses the irqbalance daemon to manage CPU load generated by interrupts across all CPUs. By default, irqbalance identifies the highest frequency interrupt sources and isolates them all to a single CPU core. For servers handling a large volume of traffic, this can result in only one or two cores being allocated to handle the entirety of network related interrupts. In these situations, manually changing the system IRQ affinity (so as to distribute the interrupts across more cores) can significantly improve overall throughput. This does generally come at the cost of some increased latency and jitter, however.

Resources:

IRQ coalescing

As the name implies, IRQ coalescing is the act of batching multiple packets together so that only one interrupt is generated for the batch. In general, there are two ways to do this:

  1. Only raise an interrupt after a certain number of frames have been queued.

  2. Only raise an interrupt after a certain amount of time has passed since a packet was queued.

Note this requires a NIC supporting multiple interrupt vectors. The actual implementation and configuration details for IRQ coalescing are largely hardware dependent.

Resources:

Virtualization best practices

SD-WAN will operate in many types of virtualization for all host types—management servers, private WAN routers, aggregators, and bonders. Virtualization makes it easy to provision and manage hosts but performance is typically negatively impacted, even in situations where the virtual machine is the only machine on a host.

The following best practices are intended for private WAN routers, aggregators and bonders. As a core part of your customer data network, these nodes are very sensitive to resource availability and efficiency. Management servers should be configured using practices generally accepted for web and database applications; for example, management server requirements focus on memory size and storage performance rather than CPU and network device performance.

General recommendations for bonders, aggregators, and private WAN routers

CPU

Due to the critical latency demands of networking, CPUs should be dedicated to the virtual machines. Sharing CPU cores negatively affects latency which results in lower throughput and general instability of bandwidth.

Tip

Disabling hyperthreading can yield performance improvements on bonders that are CPU-limited.

Memory

Memory must be reserved to the nodes. Generally 2GB should be enough for most nodes, but this should be increased when using the TCP proxy or larger numbers of private WAN spaces.

Storage

Storage is generally not as critical as other resources, but care must be taken to avoid high disk read/write latency. If disk I/O operations take too long, service failures may occur.

Also, if the amount of memory is low, the disk will be used to swap memory pages. If that occurs, the disk will be used more extensively and the entire system performance will be negatively impacted.

Network

Most network device virtualization methods incur an overhead on network performance. A certain amount of CPU and memory is used to implement a virtual interface that copies network packets between the physical interface and the guest operating system.

Most virtualization systems have a relatively low overhead virtual device that should be used instead of full emulation. For example, VMWare offers a VMXNET3 device, while QEMU/KVM offers a VirtIO device. Container systems such as LXC and nspawn already use a reasonably efficient veth device by default. The primary advantage of these devices is that they do not have to emulate a physical device type, allowing the host and guest to pass packets relatively quickly via system memory.

However, these virtual devices are still not as efficient as using the card directly. Most modern server network devices have advanced offloading and acceleration features that are not always exposed via virtual devices. In situations where the traffic load is very high, you may want to consider passing dedicated network devices directly into the guest operating system.

Tips for specific systems

VMWare

  • Install VMware tools. The open source tools are acceptable; these can be installed from standard Debian repositories with:

    • apt-get install open-vm-tools -y

    • service bonding restart

  • If you are using Private WAN with encryption, you must disable TCP segmentation offload (TSO) on all the aggregators and private WAN routers running in VMWare. The VMWare VMXNET3 driver has an issue with TSO in combination with IPSEC that results in greatly reduced throughput.

  • You may be able to reduce idle-wakeup latencies for guests by setting the Latency Sensitivity option from Normal to High. This is found under VM Settings > Options tab > Latency Sensitivity.

Amazon Web Services (AWS)

  • You may need to disable the “Source/Destination Checks” feature. Otherwise, traffic routed by nodes may be dropped by the networking infrastructure. See the documentation on Disabling Source/Destination Checks.