Managing bonds

Bonds can be updated at any time, except when a bond or leg tuning job is in progress. To update a bond while it is being tested, you need to first cancel the test.

Adding a bond

Before installing Bonding on a device, create its configuration in the web interface.

From the dashboard or bond list page, click Add bond.

image0

Provide the following information.

Details

Details fields are used for configuration of the bonder hardware.

Name

The name of the bonder. This must be unique in the bonder’s space.

The space name and bonder name are combined and used as the hostname for the bonder.

Note

A free-form field for any relevant information.

Circuit ID

The circuit ID of the bond.

Product

The bond’s product name, as sold by the partner.

Serial number

The serial number of the bonder hardware.

Asset tag

The asset tag given to the bonder hardware.

Space

The space to which the bond is assigned.

Username

Usually root, the user for management via SSH.

Password

The password for the user specified above.

The username and password fields are for record-keeping only. You should change them only if you manually change the username or password on the bonder.

Advanced options

The following fields are visible after you click “Show advanced”.

Proof of concept

If checked, you will not be billed for this bond for one month. For more details, see proof of concept bonds.

Metric collection interval

How frequently to query performance metrics, in seconds.

Metric reporting interval

How frequently to report collected metrics to the management server, in seconds.

CPU governor

Select which algorithm to use for scaling CPU frequencies. If unset, the last used method for the CPU type will be used or the system default will be used after the system is rebooted. Selection of an alternate governor, particularly so Performance, may result in increased throughput on certain platforms. For a detailed explanation of each governor, see the documentation.

TCP congestion control algorithm

The TCP congestion control algorithm defines the behaviour of node sourced TCP connections. Currently, there are 14 different algorithms to choose from when configuring a bonder on the management server. Prior to 6.4, CUBIC was the implicit choice. This option has no effect on traffic that is not originated on the bonder itself.

  1. CUBIC
  2. BIC
  3. CDG
  4. Datacenter TCP
  5. Hamilton TCP
  6. Hybla
  7. Illinois
  8. Low priority
  9. Reno
  10. Scalable TCP
  11. Vegas
  12. Veno
  13. Westwood
  14. Yeah TCP
  15. BBR

For more information on this topic and the various algorithms refer to https://en.wikipedia.org/wiki/TCP_congestion_control

Depending on the type of traffic going over a bond, it’s possible that a different algorithm could decrease performance, while others might improve it. This is most noticable when running speed tests.

Conntrack table size

The maximum number of connections the host can track in its internal tables. If the number of tracked connections reaches this number, new connections will be dropped and an entry made in the system log file.

Web server

If checked, the bonder will offer a simple web service to local networks and trusted remote networks. Hosts on a connected IP network will be able to view basic configuration and status information.

Node debug

If checked, services running on the bonder log events in much more detail than normal. PPP clients also log in more detail. Debug mode is not recommended except on the recommendation of a technical support agent.

Manage process affinity

If checked, the tunnel process is bound to the second CPU core on multicore bonders. This improves performance by reducing the number of tunnel context switches.

Tunnel affinity core

Which CPU core, with 1 being the lowest CPU core ID, to try and assign the tunnel to use. Depending on the hardware being used this choice can make a measurable improvement to possible throughput. Testing and benchmarking should be done when nearing CPU limits of a bonder to decide what the best CPU core is for the tunnel process to use on any specific hardware platform. Before this option was available, if “Manage process affinity” was enabled it would act as if this option was set to 2.

Automatic source IP

If checked, traffic originating from the bonder will have a source IP selected from one of these IPs, in the following order:

If the bond is in a PWAN-enabled space:

  1. A public connected IP not included in PWAN (most preferred)
  2. A private connected IP not included in PWAN and having a CPE NAT IP
  3. A private connected IP included in PWAN
  4. A public connected IP included in PWAN
  5. Bonded tunnel IP (least preferred)

If the bond is not in a PWAN-enabled space:

  1. A public connected IP (most preferred)
  2. A private connected IP having a CPE NAT IP
  3. Bonded tunnel IP (least preferred)

If unchecked, traffic from the bonder will use the IP on its bonded tunnel interface and it will be NAT’ed to the IP of the bond’s aggregator.

Enabling this option allows an administrator to test network routing from the bonder in the same way that end-users see the network. That is, if there is a datacenter routing issue, this option makes it easier to identify that issue from the bonder, since the bonder will use the same network as the end-user. If the option is disabled, traffic from the bonder does not use a connected IP, but is NAT’ed through the aggregator, making it somewhat more difficult to test network routing from the perspective of the end user.

This option has no effect on traffic being forwarded through the bonder- only on traffic originating from the bonder itself.

Disabled by default. You can configure this default for new bonds on the bond defaults page.

Configuration

Configuration fields control the handling of customer traffic and have a significant impact on performance and reliability. Settings changed here are sent to both the bonder and aggregator.

Aggregator

The primary aggregation server for the bond. The bonder will send customer traffic to this aggregator, and this aggregator will act as a router for the bond’s connected IPs, CPE NAT IPs, and routes.

Secondary aggregator

The secondary aggregation server for the bond. If the primary aggregation server fails, the bond will be moved to this aggregator. If this field is left blank, the bond will be left on the primary aggregator even if the aggregator fails. The secondary aggregator must be in the same space as the primary aggregator.

Aggregator failback

If checked and a secondary aggregator is selected, the bond will be moved back to the primary aggregator automatically when it recovers. If this field is not checked, the bond will be left on the secondary aggregator even when the primary recovers, and it will need to be moved back manually. This gives an administrator a chance to investigate why the primary aggregator failed before it begins hosting bonds again.

QoS profile

The QoS profile to use for the bond. If the empty option is selected, no traffic shaping will be performed and latency may increase to intolerable levels for applications such as VoIP.

Compression

When checked, traffic through the bond is compressed. This increases the effective bandwidth of the bond at the expense of greater CPU use. The effectiveness of compression depends on type of data going through the bond—data that has already been compressed, such as video files and ZIP archive files, cannot be compressed again.

Tunnel security

The security method used for customer data between the bonder and aggregator. For details, see Tunnel security and encryption.

Encryption cipher

The encryption cipher to use, applicable only when the tunnel security setting is Encryption. The two options, AES 128 and AES 256, are described in Tunnel security and encryption. Some CPUs offer hardware AES 128 acceleration, which significantly reduces the CPU usage required for encryption.

Warning

Changing the encryption cipher while the bond is in use will cause a short outage for customer traffic while the DTLS session renegotiates.

Encryption handshake interval

The length of time between session re-keying, in seconds. This is only applicable when the tunnel security setting is Encryption.

Packet distribution

This value is set when automatically tuning a bond.

The packet distribution algorithm to use for sending traffic. See Manual bond tuning for tips on choosing the best packet distribution algorithm for a bond.

Weighted Round Robin is a simple algorithm that balances traffic across legs in proportion to their configured speeds. For example, in a bond with a 3 Mbps leg and 6 Mbps leg, one third of the traffic will be sent on the 3 Mbps leg and two thirds of the traffic will be sent on the 6 Mbps leg. It’s a good choice for bonds with legs of all the same type, bandwidth, and latency, or where numerous people use the bond at the same time, such as in a large office network.

Weighted Round Robin is available on nodes running SD-WAN 2014.4 or later. It can be selected when a bond’s aggregator, secondary aggregator, and bonder have all been upgraded to 2014.4 or later.

Flowlet is an algorithm that minimizes packet reordering by splitting bursts of traffic in a single flow into sub-flows called “flowlets”. Each flowlet is assigned to a single leg. The flow may be assigned to a different leg after an idle period (see “delta”, below).

Intelligent Delay Managed Packet Queuing (IDMPQ) is an algorithm that constantly analyzes link delay, speed, queue size, and traffic characteristics to make packet distribution decisions that minimize packet reordering and delay variation. It requires very accurate tuning of leg upload and download speeds and results in somewhat higher CPU use than Flowlet. It is an excellent choice for bonds with different types of legs, such as a bond with a DSL and a cable Internet connection.

IDMPQ causes high CPU load on aggregators with a large number of bonds. It is not recommended to use IDMPQ on any bond on an aggregator hosting more than 100 legs. This issue will be resolved in a future release.

Flowlet delta

Used only when the packet distribution algorithm is Flowlet, this is the length of time in milliseconds that must pass between bursts of traffic in a flow before the flow can be assigned to a different leg. If the delta value is 0, it balances traffic in the same way as weighted round robin, but is somewhat less efficient because it keeps track of traffic flows.

Flowlet delta should be set based on the type of traffic going through the bond. For bonds with a small number of concurrent connections (for example, bonds supporting a single web user, one VPN tunnel, or two concurrent video upload streams), the delta should be set to 0. This allows packets in a single flow to use the full bandwidth of each leg, but can cause packet disordering at the receiving host. For bonds supporting many concurrent connections (for example, those hosting web browsing traffic for an office or an ISP), flowlet delta can be set higher to give greater overall throughput. In this scenario, it should be set to the approximate difference in one-way delay between the highest-latency and lowest-latency legs. For example, if the highest-latency leg has a one-way delay of 15 ms, and the lowest-latency leg has a one-way delay of 10 ms, then delta would be 5 ms. However, this setting would result in limited throughput for a single flow, because the flow would be bound to a single leg for short periods.

Advanced options

Batched leg send operations

If checked, packets are sent out as a batch instead of individually, improving CPU utilization at the cost of a small (about 1ms) increase in latency on any sent packets. This is especially useful when trying to run on low-end devices or achieve very high rates of throughput. This option does nothing if the bonder or aggregator for the bond is running Debian 7 (Wheezy).

Clamp TCP

If checked, TCP connections are modified so their packets are small enough to fit through the bonded tunnel interface.

If TCP clamping is disabled, users may have trouble browsing the web, especially SSL-protected web sites. This should only be disabled for troubleshooting.

Source address verification

If checked, the bonder validates outgoing traffic to ensure it comes from an expected source IP address. This reduces the chance that the bond can be used as part of a network attack using spoofed source IP addresses, as described in IETF BCP 38 Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP Source Address Spoofing.

Encryption replay protection

If checked, and the tunnel security option is set to Encryption, encryption replay attacks will be prevented. A replay attack occurs when an adversary records encrypted communications and sends the encrypted data to one of the encryption peers again at a later time. This could cause corruption or duplication of data in the encrypted data stream, but can be prevented by enabling this option.

If a leg is experiencing severe packet reordering it may be interpreted as a replay attack and legitimate data may be dropped, causing packet loss and slow speeds. If a leg experiences this kind of reordering, this option may be disabled to improve performance.

Automatic ping timing

If checked, sets leg ping and fail times automatically based on the leg’s normal latency. Each leg in the bond will have its own ping and fail timing. The ping timers will be set to three times the leg latency and failure timers set to nine times the leg latency. Both of these timers are subject to minimum values as described below.

If unchecked, the values in the ping and fail time fields are used for each leg.

Regular leg ping time

This field controls ping timing on regular (i.e. non-failover) legs. Ping timing is the interval between ping requests sent between the bonder and aggregator. This field is in milliseconds (ms).

If the automatic ping timing option is enabled, the value of this field is used as the minimum interval for ping timing. For example, a leg with normal latency of 10 ms would have its ping interval automatically set to 30 ms. However, if the value of this field is 100 ms, the ping interval would be set to 100 ms. This prevents pings from occurring at a very high rate on legs with very low latency.

If the automatic ping timing option is disabled, the value of this field is used as the ping interval for all regular legs in the bond. This value can be increased from the default of 100 ms to save bandwidth on legs with usage-based billing such as 3G wireless legs.

To calculate the monthly bandwidth used by a leg for ping packets in one direction, use the following equation:

b=305,856/t

where t is the leg ping time in milliseconds and b is the monthly amount of data taken by the ping packets in a single direction, in megabytes.

For example, with the default period t = 100 ms, ping packets take about 3.1 gigabytes of data each direction for each leg, each month.

Regular leg fail time

This field controls fail timing on regular legs. A leg will be declared as failed this long after receiving the last ping reply from the remote tunnel. This field is in milliseconds.

As with the regular leg ping time field, if automatic ping timing is enabled, this field is used as the minimum fail time. If automatic ping timing is disabled, this value is used as the fail time for all regular legs in the bond.

The default is 300 ms. This value must be greater than the regular leg ping time above. The recommended value is three times the ping time to avoid declaring a leg as down during momentary periods of increased latency.

Failover leg ping time

This field is the same as the regular leg ping time except it is applied to the bond’s failover leg, if applicable. The default value is 1000 ms, or one second.

Failover leg fail time

This field is the same as the regular leg fail time except it is applied to the bond’s failover leg, if applicable. The default value is 3000 ms, or three seconds.

Automatic reorder max hold

If checked, adapt the reordering max hold value automatically based on real-time leg latency.

Reorder max hold

The length of time to buffer received packets for reordering, in milliseconds. Like the Flowlet delta value, this should be set to the approximate difference in one-way delay between the highest-latency and lowest-latency legs. Larger values allow the order of packets to be restored even when the difference in delay of the legs is large, but a flow will suffer greater interruption when packets are lost.

Smaller values allow a flow to recover more quickly from lost packets, but can only reorder packets when the difference in delay is smaller.

This value has no effect when automatic reorder max hold is enabled.

Packet loss detection

If enabled, which is the default, legs will determine and report the amount of packet loss that is occurring on each leg. Legs with a loss rate greater than their set threshold are removed from the bond unless no better-performing legs are available. For example, in a bond with two normal legs and one failover leg, where one of the normal legs has a packet loss rate above its removal threshold, that leg will not be used in the bond. If the loss rate of the other normal leg also goes above its threshold, it too will be removed from the bond, and the failover leg will be used. If the failover leg goes down, the two normal legs with high loss rates will then be used in the bond again.

Packet loss is calculated by the nodes in both directions by comparing the number of packets sent to the number of packets received. This method of detection captures packet loss:

  1. as the user experiences it, based on the actual packets lost in transit over the legs. Other forms of packet loss testing (e.g., testing using ICMP on an interval) can show no loss in these situations because packet loss can affect traffic differently depending on protocols, packet sizes, and bursts of traffic.
  2. at a very granular level, including small bursts of packet loss that would otherwise be statistically insignificant with a long-running packet loss test.
  3. using the exact same transport as the data packets, avoiding the possibility of QoS or prioritization giving incorrect results when testing with another transport.

When packet loss detection is enabled, you can view information about current and historical packet loss in three locations:

  1. Alerts on the legs in the summary column. If packet loss is detected to be greater than 0.1% in either direction, an alert is displayed.
  2. The last reported packet loss values are displayed in the extended Leg status information found by clicking the right caret in the first column of the leg list.
  3. The historical values are recorded and visible in the Leg packet loss chart in the Performance section of the bond.

Flap detection

If enabled, which is the default, legs will delay becoming available if they have recently lost connection. The amount of time they delay will increase up to 30 seconds if the leg continues gaining and losing connectivity. For more information, see Flap detection.

Leg MTU detection

If enabled, which is the default, legs will attempt to detect the real MTU of the link between the bonder and the aggregator. This is particularly useful when legs are attached to CPE devices such as DSL routers that create tunnels, reducing the MTU. It is recommended to leave this option on.

Leg MTU detection time

If leg MTU detection is enabled, each leg will attempt to detect the path MTU this many hours after the last detection. The default is 24 hours.

Send jitter buffer

If enabled, there will be a delay sending packets to mitigate jitter. This delay reduces the amount of reordering and can have noticeable performance benefit. This can improve the consistency of applications that require a continuous delivery of packets like VoIP. However, there is less of a benefit for legs with a high amount of jitter as there is only a delay when sending.

Debug

If checked, the tunnel and TCP proxy (if applicable) processes for this bond running on both the bonder and aggregator log events in much more detail than normal. Debug mode is not recommended except on the recommendation of a technical support agent.

TCP Proxy

See TCP proxy for a description of TCP proxy options.

When you complete the form, click Save at the bottom of the page.

image1

The bond will be activated on the aggregation server and the configuration will be available for the bonder to download.

Updating a bond

To modify a bond’s configuration, navigate to its edit page. This page is available by clicking the image2 icon beside the bond in the bond list or by clicking the image3 icon in the list and then clicking Edit at the top-right corner of the bond details page.

image4

Modify values in each form as necessary. You can add more legs, connected IPs, CPE NAT IPs, and routes, or delete objects by clicking the delete button in the “Delete” column at the right side of the object’s form.

Click Save to apply the changes and return to the bond details page. The bonder and aggregator will immediately be updated with the new configuration.

Removing a bond

To minimize resources used on your aggregators, ensure that bonds are deleted when they are no longer needed.

To remove a bond, first take the bonder out of service. Then navigate to its details page and click the image5 icon on the Edit button, then Delete.

image6

Confirm your action on the dialog that appears.