SD-WAN 2014.4 release notes

November 12, 2014

SD-WAN 2014.4 improves handling of legs with varying bandwidth, significantly improves the collection and charting of performance metrics, and improves CPU performance of the main VPN tunnel process.

Management servers with SD-WAN 2014.4 no longer support nodes with 2013.5. The 2014.4 upgrade is offered to partners with only 2013.6 or later nodes.

Bonding Node

Additions

  • Bonders and aggregators can automatically adjust the speed of the legs based on latency changes on the leg. This can help control latency in situations where the leg has varying bandwidth. This is a per-leg option that is disabled by default.
  • The BANDWIDTH_ADAPTATION field has been added to the leg hook environment.
  • Bonders and aggregators can automatically adjust the ping and failure times of legs based on the leg’s current latency. This is a per-bond option that is enabled by default for new bonds. It can be changed for many bonds at once using using the edit multiple dialog on the bond index page.
  • The AUTOMATIC_PING_TIMING field has been added to the bond hook environment.
  • A new balancing algorithm, Weighted Round Robin, has been added. This algorithm balances packets in the same way as the Flowlet algorithm with a flowlet delta of 0 but improves CPU performance.
  • On bonders with multi-core CPUs, the tunnel process is bound to the second CPU core. This improves performance by reducing the number of context switches in the tunnel application. This is enabled by default but can be disabled by changing the “manage process affinity” option on the bond edit page.
  • Leg ping timing information is shown by the legids application.
  • Performance metrics, such as link throughput and latency, interface errors, and CPU and memory usage, are reported to the management server via an open-source application called Collectd. Counters are collected at 10 second intervals and reported at 60 second intervals, by default.
  • When installing on devices with two or more types of network controllers (such as both Intel and Marvell), the interfaces are named in the expected order. Previously, interfaces could be named with eth0 as the fourth port, for example.
  • Nodes report their ECDSA key to the management server.

Changes

  • Improved VPN tunnel CPU performance by 30 to 90%, depending on the type of hardware. For example, our rating for the PC Engines APU in 2014.2 was 105 Mbps download in a lab environment. In 2014.4, the APU is rated for 138 Mbps. The rating for the Lanner 7581 increased from 451 Mbps to 881 Mbps. Complete ratings are available upon request.
  • DNS requests from bonders prefer going through DHCP or PPPoE legs rather than through static IP legs. Static legs will still be used for DNS requests if there are no available DHCP or PPPoE legs.
  • Automated testing procedures have been improved.

Removals

  • Performance metrics are no longer collected via Munin. Munin is not installed on new nodes.

Fixes

  • TCP proxy connections are now closed when the remote client closes the connection rather than after a 5 second delay.
  • Rsyslog and logrotate configurations no longer cause multiple restarts of the rsyslog process.
  • Supervision of tunnel and TCP proxy processes has been improved. Processes are no longer started if previous instances are still stopping.
  • The node service handles system signals when starting and stopping more reliably.
  • Certain iptables rules are now removed after stopping Bonding on an aggregator.
  • The nodeconfig and nodessl applications show error messages even if the management server sends back an improperly-formatted file or if the connection to the server is interrupted by a proxy.
  • Fixed an issue that could cause a tunnel using encryption to crash in a rare scenario.
  • Fixed an issue the could cause a tunnel to crash when running a speed test on a leg with extremely high latency.

Patches

2014.4-8:Fixed an issue where tunnel control packets weren’t sent with the expected DSCP or encrypted fields, and changed the speed at which legs are created in certain scenarios with bandwidth adaptation enabled.
2014.4-9:Fixed an issue setting tunnel CPU affinity, fixed issues with DTLS and performance improvements, improved supervision of collectd process.
2014.4-11:Improved logging and error reporting and fixed a divide-by-zero error in new kernels in collectd process.

Bonding Admin

Additions

  • The charting system has been completely rebuilt. Metrics are now collected every 10 seconds instead of every 5 minutes, and charts are rendered in the web browser, improving functionality while reducing load on the management server. Charts offer a variety of resolutions from 15 minutes to 1 year, the time frame can be shifted forward and back, and charts are updated automatically without refreshing the web page. image0 For example, below is the new leg latency chart. image1
  • Performance metrics are collected over a dedicated connection between the nodes and a new service on the management server. Data is stored in the open-source database InfluxDB. Historical metrics will not migrated from the old system to the new system. Munin charts are will continue to be shown for nodes earlier than 2014.4, but this will be disabled in a future release.
  • Business statistics such as number of legs and bonds are now available from the API at /api/statistics/.
  • Information about nodes is now available from the API at /api/nodes/.
  • API requests can now be filtered and paginated.
  • The uwsgitop application is installed on management servers. This program shows real-time status and performance information for the uwsgi application server.
  • A warning label is shown on various pages for nodes using a certain faulty Debian Linux kernel. For details, see SB-2 in the Service Bulletins section of the documentation.
  • An updated 64-bit kernel is available when installing from our ISO files.

Changes

  • New ISO files for provisioning nodes have been created. Please discard existing provisioning disks and use the new ones, available from the Help-> Node Setup page. The new ISOs will be available after the management server is upgraded.
  • Leg icons have been updated with new states for reduced bandwidth mode and high latency warnings.
  • The table of leg information on the bond details page has been reorganized to emphasize key information.
  • Charts showing growth of legs, bonds, connected IPs, etc., are available on the System Charts page.
  • Software version information available from the API has been moved to the /api/system/ URI.
  • TCP port 8080 is now included in the default TCP proxy ports list. Existing bond settings have not been changed.
  • Updates to the node software repositories are now pushed with rsync.
  • Minor updates have been made to the node provisioning preseed settings.

Removals

  • The aggregator failover service no longer monitors aggregators earlier than version 2014.1.

Fixes

  • The node VPN client indicator is no longer occasionally overwritten with incorrect values.
  • The uwsgi application server now kills requests that run for more than 5 minutes. This should eliminate an issue on busy sites where the entire uwsgi service could stop responding.
  • The huey process starts properly even when MySQL and Redis are not available.
  • Web user session information is stored in the SQL server, eliminating an issue that could require users to frequently re-authenticate.
  • Timeouts no longer occur when retrieving data for 60 second speed tests.
  • Form validation for connected IPs has been improved. It is now an error to disable a connected IP that has an active CPE NAT IP.

Patches

2014.4-7:Added a warning message on the bond details page if the bonder version is greater than the aggregator version, made minor updates to API resources, improved various logging messages, charts, and page text.
2014.4-8:Fixed an issue where the system charts page would raise a HTTP 500 error if the InfluxDB service was unavailable.
2014.4-9:Changed the metric collection interval on the management server to 10 seconds from 1 second, improved stability of collectd and InfluxDB services and related logging, and fixed an issue where bonder versions were not validated when setting the weighted round robin balancing algorithm from the edit multiple bonds dialog.
2014.4-10:Fixed an issue reporting the version of the bonding package available in the management server software repository and further improved logging of charting-related services.
2014.4-11:Fixed an issue with bonder version validation when changing balancing algorithms and an issue with connection handling in the influxmux service.