Supported environments for nodes¶
Bare metal is generally recommended for nodes. The networking software is particularly sensitive to latency, which cannot be typically guaranteed in virtualized environments.
In addition to bare-metal devices, the following virtualization technologies are used during development and testing:
LXC
Systemd-nspawn
QEMU/KVM
During the release-candidate phase, the software is also tested in a production VMWare environment.
There are other environments that will work, but we cannot guarantee their operation in all situations.
Note
OpenVZ is known to be incompatible.
Optimizing IRQ handling¶
One potential bottleneck for server performance is the rate at which the server can process interrupt requests (IRQs) from the NIC. In the context of bonding, this typically appears as one or two server cores totally pinned by a ksoftirq worker. This issue is known to occur most frequently on private WAN routers, but can also occur on aggregators handling many connections.
Note
This issue is not strictly caused by a large volume of traffic, but rather how many interrupts are generated by the traffic (a single 100 Gb connection is much less IRQ-intensive than one thousand 100 Mb connections).
Note
The TCP proxy directly increases the ratio of interrupts generated per bit of traffic, and as such nodes hosting many bonds using the proxy are particularly susceptible to this bottleneck.
In general, there are two ways to go about improving interrupt handling. Informally, they are “distribute interrupts across cores”, and “generate less interrupts in the first place”. Formally, these ideas are called “IRQ affinity balancing” and “IRQ coalescing”, and they are explained in more detail below.
IRQ affinity balancing¶
Linux uses the irqbalance daemon to manage CPU load generated by interrupts across all CPUs. By default, irqbalance identifies the highest frequency interrupt sources and isolates them all to a single CPU core. For servers handling a large volume of traffic, this can result in only one or two cores being allocated to handle the entirety of network related interrupts. In these situations, manually changing the system IRQ affinity (so as to distribute the interrupts across more cores) can significantly improve overall throughput. This does generally come at the cost of some increased latency and jitter, however.
Resources:
IRQ coalescing¶
As the name implies, IRQ coalescing is the act of batching multiple packets together so that only one interrupt is generated for the batch. In general, there are two ways to do this:
Only raise an interrupt after a certain number of frames have been queued.
Only raise an interrupt after a certain amount of time has passed since a packet was queued.
Note this requires a NIC supporting multiple interrupt vectors. The actual implementation and configuration details for IRQ coalescing are largely hardware dependent.
Resources:
Virtualization best practices¶
SD-WAN will operate in many types of virtualization for all host types—management servers, private WAN routers, aggregators, and bonders. Virtualization makes it easy to provision and manage hosts but performance is typically negatively impacted, even in situations where the virtual machine is the only machine on a host.
The following best practices are intended for private WAN routers, aggregators and bonders. As a core part of your customer data network, these nodes are very sensitive to resource availability and efficiency. Management servers should be configured using practices generally accepted for web and database applications; for example, management server requirements focus on memory size and storage performance rather than CPU and network device performance.
General recommendations for bonders, aggregators, and private WAN routers¶
CPU¶
Due to the critical latency demands of networking, CPUs should be dedicated to the virtual machines. Sharing CPU cores negatively affects latency which results in lower throughput and general instability of bandwidth.
Tip
Disabling hyperthreading can yield performance improvements on bonders that are CPU-limited.
Memory¶
Memory must be reserved to the nodes. Generally 2GB should be enough for most nodes, but this should be increased when using the TCP proxy or larger numbers of private WAN spaces.
Storage¶
Storage is generally not as critical as other resources, but care must be taken to avoid high disk read/write latency. If disk I/O operations take too long, service failures may occur.
Also, if the amount of memory is low, the disk will be used to swap memory pages. If that occurs, the disk will be used more extensively and the entire system performance will be negatively impacted.
Network¶
Most network device virtualization methods incur an overhead on network performance. A certain amount of CPU and memory is used to implement a virtual interface that copies network packets between the physical interface and the guest operating system.
Most virtualization systems have a relatively low overhead virtual device that
should be used instead of full emulation. For example, VMWare offers a
VMXNET3 device, while QEMU/KVM offers a VirtIO device. Container
systems such as LXC and nspawn already use a reasonably efficient veth
device by default. The primary advantage of these devices is that they do not
have to emulate a physical device type, allowing the host and guest to pass
packets relatively quickly via system memory.
However, these virtual devices are still not as efficient as using the card directly. Most modern server network devices have advanced offloading and acceleration features that are not always exposed via virtual devices. In situations where the traffic load is very high, you may want to consider passing dedicated network devices directly into the guest operating system.
Tips for specific systems¶
VMWare¶
Install VMware tools. The open source tools are acceptable; these can be installed from standard Debian repositories with:
apt-get install open-vm-tools -yservice bonding restart
If you are using Private WAN with encryption, you must disable TCP segmentation offload (TSO) on all the aggregators and private WAN routers running in VMWare. The VMWare
VMXNET3driver has an issue with TSO in combination with IPSEC that results in greatly reduced throughput.You may be able to reduce idle-wakeup latencies for guests by setting the Latency Sensitivity option from Normal to High. This is found under VM Settings > Options tab > Latency Sensitivity.
Amazon Web Services (AWS)¶
You may need to disable the “Source/Destination Checks” feature. Otherwise, traffic routed by nodes may be dropped by the networking infrastructure. See the documentation on Disabling Source/Destination Checks.
External Resources¶
Here are some external resources with a variety of useful information. If you find that any of these links are no longer active, please let us know.
VMware vSphere 5.5 Documentation Center
Performance Best Practices for VMware vSphere® 5.5 (PDF)
VMware KB: Troubleshooting ESX/ESXi virtual machine performance issues (2001003)
Common Mistake: Using CPU reservations to solve CPU Ready
The Performance Cost of SMP – The Reason for Rightsizing
VM Right Sizing – An example of the benefits
VMware: Choosing a network adapter for your virtual machine
VMware: Configuring disks to use VMware Paravirtual SCSI (PVSCSI) adapters
VMware: Best Practices for Performance Tuning of Latency-Sensitive Workloads in vSphere VMs