Monitoring system performance

Current status

Information about the current state of the SD-WAN management server is available on the System Charts page, available by clicking the gear icon near the top-right corner of any page, then click System Charts.

Charts

  1. Load average: the measure of average system load.
  2. CPU usage (average): the percent of CPU used by type (averaged across all cores), such as user vs system traffic.
  3. CPU usage (per core): the percent of CPU used by core, such as CPU 1 vs CPU 2.
  4. Memory usage: the amount of used, buffered, cached, and free memory.
  5. Swap usage: the amount of used, cached, and free swap space.
  6. Web server (Nginx): the number of requests and active connections at any given time.
  7. Configuration updates: the rate of configuration updates to nodes.
  8. Configuration updates status: the number of configuration updates in each state, excluding completed updates.
  9. Huey: the average job latency, in seconds.
  10. OpenVPN connections: the current number of connected nodes.
  11. OpenVPN traffic: the number of bytes sent/received per second.
  12. Disk space: used, reserved, and free space (one chart for each mount point).
  13. Disk throughput: the rate of read/write throughput split up by disk.
  14. Disk operations: the rate of read/write operations split up by disk.
  15. Disk response time: read/write response times split up by disk.
  16. Postgres operations: the number of database records inserted, updated, and deleted per second.
  17. Postgres IO: the number of input/output operations performed by the SQL database per second.
  18. InfluxDB database size: the total size of the performance metrics database.
  19. InfluxDB writes: the rate of writes to the disk of the performance metrics database.
  20. InfluxDB points written: the number of points written/sec of the performance metrics database.
  21. InfluxDB memory usage: the total and heap amount of memory used by the performance metrics database.

Example

As an example, the following chart shows the load average on the management server over the past hour, with one minute, five minute, and 15 minute moving averages.

image0