SD-WAN 2016.1 release notes¶
May 26, 2016
We’re pleased to announce the release of SD-WAN 2016.1. Here are the highlights:
- Debian 8 (“Jessie”) is now supported on the management server and nodes. We will upgrade your management server to Debian 8 during the 2016.1 upgrade. Existing bonders, aggregators, and private WAN (PWAN) routers can continue to run Debian 6 or 7, and new nodes will be installed with Debian 8.
- The time-series database that stores charting data, InfluxDB, has been upgraded to version 0.13. This reduces disk usage by about 90%, and offers a number of reliability and performance improvements.
- Aggregator failover can be switched to a global maintenance mode, preventing failover during scheduled maintenance periods. It also offers a mode that suspends failover when a large number of aggregators appear to fail, to prevent unnecessary failover when the management server network is interrupted.
- Private WAN space gateways can be configured at a bonded site. This allows a customer to route all Internet traffic from branch offices through a gateway/firewall at the organization’s head office.
- The performance of the management server user interface and certain backend services has been improved.
Partners should do two things for the 2016.1 management server upgrade:
- Before the upgrade, please verify that the management server has at least 8 GB of memory. This is the new minimum requirement so that the time series database and other services remain responsive. If the server has 4 GB of memory, we will upgrade it to 2016.1 but will not migrate the time series data for charts, dashboard statistics, prevented downtime statistics, and leg 24-hour status indicators. If the server has less than 4 GB of memory, we will not upgrade it.
- After the upgrade, please re-image all provisioning disks with the new Debian 8 ISO. You can download the Debian 8 ISO file from the root space details page Node Setup tab. If you have a PXE server built using our instructions, please rebuild it with Debian 8 using our updated instructions.
Other announcements:
- Debian 6 (“Squeeze”) reached end-of-life on on February 29, 2016. Debian 6 will no longer receive updates, including security updates, from Debian. However, Debian 6 is still supported as a bonder or aggregator platform. We recommend you begin the process of replacing your Debian 6 nodes with Debian 8 nodes. For more information, see SB-4 2016-03-31 Debian Squeeze end-of-life.
- A few example hook scripts and API scripts have been developed. For access ot these examples, contact Technical Support. These examples show how easy it is to customize SD-WAN for your own environment.
Now, the details on 2016.1:
Bonding Node¶
Additions¶
- Debian 8 is a supported operating system for bonders, aggregators, and PWAN routers. Services are supervised by and store logs through systemd. Debian 6 and 7 nodes are still supported and continue to use SysV init, daemontools supervision, and rsyslog for log storage.
- A new Linux kernel package is available. It offers Linux kernel 4.5.
This is installed by default on new hosts and is available for
installation on existing Debian 7 hosts by upgrading the
linux-image-amd64,linux-image-686-pae, or other package depending on the system architecture. The kernel is not available for Debian 6 hosts. - The
irqbalancepackage is installed on all hosts. This package can increase performance on some aggregators by balancing network device interrupts between all CPU cores instead of handling them on core 0. - The status of DNS server monitoring for a leg is shown in the output
of
legids -v. Previously this information was only available in the bonding log file. - Bonding log files are compressed using gzip. To view these files,
either use the
bondlogcommand or use gzip-capable commands such aszless. - Bonding log files are removed prior to their normal 14-day expiration period if necessary to keep disk space usage below 70%.
- The config service on nodes publishes a message for each status message it sends to the management server. This can be used by custom scripts to monitor these message.
- The Bash shell history tracks more details about the commands run by the root user—for example, it tracks the date and time that a command is run.
Changes¶
- The default source address for connections made from a bonder has changed. Previously all outgoing connections used the tunnel’s IP, normally a 172.30.x.y address, by default. Now outgoing connections will by default use a private connected IP included in a PWAN, a public connected IP not in a PWAN, or a private connected IP with a CPE NAT IP. This will make it easier to troubleshoot datacenter and private WAN networking issues, as connections from a bonder will normally use the same IP addresses as connections from an end-user device. Connection source addresses can still be overridden using application-specific arguments and configuration files.
- The version of the Salt configuration management application has been upgraded to a recent version of the 2015.8 branch.
- The performance of the collectd application has been improved on aggregators with many bonds.
- Bonding applications now use Python 3.5.
Fixes¶
- A number of memory leaks in the collectd statistics-collection service have been fixed.
- The management server VPN client no longer increases its reconnect wait time with no bounds. It now tries to reconnect at most once every minute.
- When starting, bonding only removes Quagga static routes that point to tun devices, rather than removing all Quagga static routes.
- The default bonder web interface now returns an error message when the configuration process times out, rather than returning a normal response.
Patches¶
| 2016.1-13: | Reduce CPU usage of collectd process on aggregators with many bonds. |
|---|---|
| 2016.1-14: | Fix various bugs in node, TCP proxy, and collectd services. |
| 2016.1-15: | Remove certain log files if disk is getting full. |
| 2016.1-16: | Improve starting process for services on nodes. |
| 2016.1-18: | Bind connections from bonder to an available connected IP to make it easier to test and troubleshoot network issues. Prevent writing blank configuration file on bonder when bonder is stopped while file is being written. Fix some issues managing mobile modems. Clean up old log files to limit disk space usage. |
| 2016.1-19: | Change priority for selecting automatic source IP. bondlog -f
no longer takes 100% CPU. |
| 2016.1-20: | Fix issue that could crash tunnel process when establishing DTLS sessions on unreliable links. Fix issue related to rotating and cleaning up log files. |
| 2016.1-21: | The bonding package conflicts with the NetworkManager package. This fixes a provisioning issue occurring since July 2016 related to unreliability with Debian’s HTTP mirrors. |
| 2016.1-22: | When provisioning or updating a bonder, configuration files, keys, and certificates are written to disk in a more reliable way to avoid writing corrupt or empty files. |
| 2016.1-23: | Fix an issue related to SaltStack minion authentication that prevented nodes from connecting to the master. |
| 2016.1-25: | Avoid crash at startup due to Debian changing file permissions on a Quagga configuration file: https://www.debian.org/security/2016/dsa-3654.en.html |
| 2016.1-26: | TCP proxy no longer becomes unresponsive while taking 100% of a CPU core after the connection between the bonder and aggregator is interrupted in certain circumstances. |
Bonding Admin¶
Additions¶
- Debian 8 is the only supported operating system for management servers. Existing management servers will be upgraded to Debian 8 during the 2016.1 upgrade.
- ISOs for provisioning nodes have been updated to Debian 8. After the 2016.1 upgrade you can download the new ISOs from the Node Setup tab of the root space details page.
- A maintenance mode has been added to aggregator failover. This is used to temporarily stop monitoring aggregators while maintenance takes place on aggregators or management servers or their networks, to prevent bonds from being moved between aggregators unnecessarily.
- A suspend mode has been added to aggregator failover. The feature is disabled by default and needs to be enabled in the Administration section of the web application. Suspend mode allows all failover to be halted when a large number of aggregators fail, which could indicate a false positive check—a problem with the management server network, not with the aggregators. Suspend mode is activated when the number of failed aggregators meets or exceeds the configured threshold (by default, 100% of aggregators). Once activated, no bonds are moved to secondary aggregators or back to their primary aggregators until an administrator manually cancels the suspend mode.
- The service that accepts status updates from nodes is now multi-threaded. This significantly increases the rate at which this service can process messages. Performance instrumentation has also been added to help guide future performance improvements in this service. By default, the number of threads matches the number of CPU cores on the management server.
- InfluxDB can be hosted on a separate server from the core management services. A separate InfluxDB server can be used to prevent load due to that service from affecting other management services, and to ensure InfluxDB is on hardware optimized for its requirements.
- Messages from the system can be shown to users in the root space via a menu in the main navigation bar. Messages can also be shown at the top of each page. No messages are shown to users in child spaces.
- On the provisioning ISO main menu, the version of SD-WAN used to generate the ISO is shown. This makes it easy to see if a provisioning disk was imaged from an old or new ISO.
- The Salt configuration management system now periodically synchronizes the configuration of 2015.4 and later nodes, rather than only syncing them when a change is made to the managed configuration or when a node is rebooted.
- A command has been added for updating apt configuration on Debian 6
nodes—
upgrade-squeeze-sources. This command should be run before upgrading Squeeze bonders to 2016.1. For more information, see SB-4 2016-03-31 Debian Squeeze end-of-life. - A command has been add for changing leg tunnel port numbers—for example, to avoid port-number based rate limiting at an ISP. For details, contact technical support.
- The Bash shell history tracks more details about the commands run by the root user—for example, it tracks the date and time that a command is run.
- Automatic speed tuning parameters can be customized from a new page in the Administration section. This should be considered an advanced feature and values should not be changed except due to a recommendation from tech support.
Removals¶
- Debian 7 (“Wheezy”) is no longer supported for management servers.
Changes¶
- The time-series database, InfluxDB, has been upgraded to 0.13 Disk space requirements are reduced by about 90%- for example, if charting data took 100 GB in 2015.4, it will take about 10 GB in 2016.1. Startup time is significantly reduced and query performance is improved. Because the data backup and restore process can take a long time, it will be started before the main 2016.1 upgrade process begins, and will finish after the rest of the 2016.1 upgrade is complete. Up to 24 hours of data may be lost during the upgrade.|image0| InfluxDB data usage before and after 2016.1 upgrade.3
- Performance of the web interface has been improved.
- Performance of the bond list API has been improved.
- Connected IP and route “include in private WAN” fields are now disabled by default when making new records.
- Automatic bond tuning can be scheduled immediately after a bond is created. You no longer need to provision and deploy the bonder before scheduling a tuning job.
- When an aggregator’s IP address is changed in the web application, the aggregator’s bonds are notified of the change. This simplifies the procedure for changing aggregator IP address.
- The “What’s New In 2016.1” dialog shown after the upgrade is only displayed automatically to users in the root space, not to users in child spaces. Users in child spaces can still see the dialog if they open it from the menu in the main navigation bar.
- On a user’s account page Groups & Permissions tab, users aren’t shown permissions they don’t have. For example, if a user has no permissions related to PWAN, he or she is not shown any rows related to PWAN. Previously the user would be shown the PWAN rows with no checkboxes indicating which permissions he or she had.
- Validation has been added to prevent uploading navigation bar branding images that are too large.
- Unique Diffie-Hellman parameters are generated for the web server TLS configuration, improving security.
- When the database is restored from backup, HTTP session records are cleared.
- The reliable hosts used to check management server network connectivity in aggregator failover can now be a list of multiple hosts.
- Management server applications now use Python 3.4 and Django 1.9.
- The version of the Salt configuration management application has been upgraded to a recent version of the 2015.8 branch.
- The SD-WAN software repositories hosted on the management server are now mirrored from a central location every day and signed with a key, rather than pushed to management servers individually and signed by a different key on each management server.
Fixes¶
- Configuration updates to nodes no longer run out of order or multiple times in some rare circumstances.
- When resolving legs with conflicting statuses reported by the bonder and aggregator, a check is now performed to avoid requesting details about packet loss and flapping states from bonders with Bonded Internet 2015.3 or earlier, which did not track those values.
Patches¶
| 2016.1-22: | Start salt-master service on boot. Improve reliability of service that forwards performance metrics to InfluxDB. |
|---|---|
| 2016.1-23: | Use unique DH parameters for HTTPS server. Hide “Include in PWAN” column for routing objects of bonds that are not part of a PWAN space. Allow bond tuning to be scheduled even if a bonder has never been online. In account permissions page, only show object permissions if a user has at least one permission for the object. |
| 2016.1-24: | Fix issues with system charts units. Increase file descriptor limit of service that forwards performance metrics to InfluxDB. Fix some issues with URL routing in web interface that could cause blank pages to be displayed. |
| 2016.1-25: | Fix an issue in the service that forwards performance metrics to InfluxDB that could result in memory load issue to do frequent error reports. |
| 2016.1-27: | Fix a couple of bugs in space statistics charts. Fix issue in Internet Explorer preventing bonds from being moved from one space to another. |
Private WAN¶
Additions¶
- PWAN spaces can now use a gateway at a bonded site. This allows
Internet traffic from branch offices to be routed to a gateway at the
head office, instead of only using a gateway in the datacentre. The
bonded site with the gateway must run SD-WAN 2016.1.

- Space gateways, both in a datacentre and at a bonded site, can be used by bonds on aggregators in different routing groups. For example, if a customer has bonds on aggregators in a Vancouver datacentre and a Montreal datacentre, and a gateway defined only in the Vancouver datacentre, the bonds at the Montreal datacentre will use the Vancouver gateway.
- Port ranges can be specified in port forward rules.
- Users inside the private WAN network can access resources inside the PWAN via the public IPs (the IPs defined in port forward or NAT rules), to avoid requiring specialized DNS server configuration inside the PWAN. This is known as hairpin NAT.
- PWAN routers using VMware are supported when the vSwitch port is in promiscuous mode.
- PWAN routers can accept traffic from aggregators on one interface and route traffic to customer-specific VLANs over another interface. This is required when using VMware. The interface for VLAN traffic is specified in the “VLAN trunk interface” field on the PWAN router’s details page.