Baselines

CV-CUE dynamically computes and updates a baseline for normal performance and connectivity of the network. The baseline adjusts as the network behavior changes, eliminating the false positive and false negative alerts associated with thresholds.

Baselines versus Thresholds

A baseline is used as a basis against which things are measured. Baselines have been traditionally used when you want to determine the effect of a change. For example, if you want to optimize your wireless network, you need to take a baseline of metrics such as retry rates or average data rates so that you can measure if the changes had a positive or negative impact.

A threshold is a level that must be exceeded to trigger an action. Thresholds are commonly used in network monitoring systems for alerts. For example, if a retry rate threshold were set at 50%, the system would trigger a warning when the retry rate exceeded 50%.

CV-CUE studies the behavior from the historical data of clients, APs and applications, automatically calculates a baseline. The baseline is calculated at an interval of 15 minutes. Any behaviour that deviates significantly from the baseline is considered to be an anomaly and highlighted in the graph. In controller based network monitoring systems, thresholds are static and the same value gets applied globally. This creates problems for network admins because wireless network characteristics can be different in different environments.

Thresholds are good for monitoring information where there is a clear, non-arbitrary delineation between acceptable and not acceptable. Thresholds are static. They do not adjust to changing conditions. Wireless networks are dynamic and change over time. The normal level of retry rates may be very different today and a month from now. Clients change, environments change, applications change, and usage changes rapidly. A static threshold is a challenge because it does not adapt to what is normal for the network. Then, if some metric regularly crosses its static threshold, the network admin is bombarded with irrelevant warnings. The network admin must then go in and reset the threshold. The problem lies in determining what the correct threshold is. If the threshold is set low, there will be too many alarms as to cause alarm fatigue. This is dangerous because valid alarms are lost in the sea of unimportant, false positive alarms. To counter alarm fatigue, many network admins set the threshold too high. This is dangerous because valid problems (false negatives) do not trigger action.

How to Read a Baseline Graph?

CV-CUE takes the idea of the baseline and makes it dynamic. Dynamic baselines determine what is normal for a network and adjust as network conditions change. For example, retry rates may be low when the Wi-Fi is first set up with only a few clients. Later, when many more clients are added to the Wi-Fi network, the retry rate may be very different. Dynamic baselines adjust as networks change. This avoids the problem of thresholds while allowing comparisons to the baseline to identify real problems.

Each baseline graphs is made up of these four elements:
  • Baseline - Blue line
  • Deviation Range - the light blue shaded area around the baseline
  • Observation points - Purple dots are an average of the data at 15 minute intervals
  • Anomalies - Red dots are observation points that are well outside the norm


The Baseline Graph has a provision to filter data. You can zoom in and zoom out the graph to view the granularity in detail. The zoom feature is at the bottom of the graph.

CV-CUE Baselines

CV-CUE includes baselines for both connectivity and performance events. The table below lists the available baselines and where they can be found on the CV-CUE interface.

Note: Wherever applicable, CV-CUE shows separate baselines for IPv4 and IPv6.
 
Type Baseline Chart Per Location on CV-CUE UI
Connectivity Clients Affected by Failures Location DASHBOARD > Connectivity
AP MONITOR > Access Points > AP Drill Down
Baseline - AAA Latency Location Dashboard > Performance > Avg. Latencies Chart > AAA Drill Down
Baseline - DHCP Latency Location Dashboard > Performance > Avg. Latencies Chart > DHCP Drill Down
Baseline - DNS Latency Location Dashboard > Performance > Avg. Latencies Chart > DNS Drill Down
Performance Data Rate Client MONITOR > Clients > Clients Drill Down
RSSI Client MONITOR > Clients >Clients Drill Down
Retry Rate % AP MONITOR > Access Points > AP Drill Down
Client Affected by Poor Performance Location Dashboard > Performance
AP MONITOR > Access Points > AP Drill Down
Clients Affected by Poor App Experience AP MONITOR > Access Points > AP Drill Down
Clients Affected Location Dashboard > Applications
% Poor Application Experience Location Dashboard > Applications
Baseline - Application Latency Location Dashboard > Performance > Avg. Latencies Chart > Application Drill Down
Note: You can filter the data on each of these widgets. To know more about filters refer Filters on Widgets.

Example 1: Baseline - Clients Affected by Failures (AP Based)

The chart provides a baseline for the clients affected by connection failures for the selected AP.

The data points are determined by the total number of connected clients and the last connectivity state of clients in a 15-minute interval. When you hover on the data point it provides a tooltip. The tooltip contains the consolidated information in the percentage that indicates the good and bad experience of the clients along with the calculated baseline for the given point of time. Click the data point on the graph to retrieve the detailed information.

Example 2: Baseline - Data Rate

The following image displays the baseline graph for Data Rate:

The graph displays the calculated baseline of the average data rate consumed by an individual client. The anomalies are calculated by comparing the data rate against the globally configurable threshold. Data Rate is a metric where what is acceptable is not unique per network or environment so the use of a threshold to detect anomalies is appropriate. The baseline and deviation band are still calculated, but anomalies are determined by the data rate threshold.

Data Reporting and Retention

Client connection success and failure with root cause analysis are reported by the AP to Arista Cloud almost immediately after it occurs. Performance and other data are aggregated and reported every 15 minutes.

Except for Client Application Data, the last week's worth of information is retained in the cloud and available in CV-CUE.
 
Data Type AP Reporting Interval Cloud Storage Duration
Client Connection Attempts Immediately 1 week
AAA, DHCP, DNS. & TCP Latencies Soon after detection 1 week
Client Application Data 15 minutes 12 hours
Client Performance Metrics 15 minutes 1 week
BSSID Performance Metrics 15 minutes 1 week
SSID Application Data 15 minutes 1 week
Baseline Data 15 minutes 1 week

Data Point Drill Down

The below table contains the attributes specifying the detailed info about the connected clients. The info is available in the tabular format on data point drill down from any baseline chart. The attributes with no specific name of a baseline chart are common for all the charts.

 
Option Description
Name Name of the client.
User Name User name of the client.
MAC Address A unique 48-bit IEEE format address of the client assigned to the network adapter by the manufacturer.
Last Failure Time(Available for Baseline - Clients affected by failure) The latest date and time when the client failed to connect to the network.
Associated SSID SSID of the WLAN to which the client is connected.
Associated Access Point The AP with which a client is associated. This is the AP through which the client communicates with other clients and devices on the network.
Location Location of the client.
IP Address IP address of the client.
Protocol Indicates the 802.11 protocol used.
Channel Operating channel of the AP to which the client attempted to connect
OS Name of operating system running on the client.
Average RSSI(dBm) The observed RSSI (Received Signal Strength Indicator) value for the client.
Up/Down Since The latest date and time since when the client is up or down.
Connected/Disconnected Since (Available for Baseline - Clients Affected by Poor Performance graphs)  
First Detected At The date and time when the client was first detected.
Role The role assigned to the client on associating with an SSID.
Google Authorized A boolean value indicating whether the client is in the authorized list of clients imported through Google Integration.
Vendor Name Indicates the vendor name.
Uplink Data (Available for Baseline - Clients Affected by Poor Performance graphs) The amount of data transferred by the client.
Downlink Data (Available for Baseline - Clients Affected by Poor Performance graphs) The amount of data received by the client.
Retry Rate (Not available for Baseline - Clients affected by failure) The retry rate in percentage.
Sticky (Not available for Baseline - Clients affected by failure) A boolean value indicating if the client is a "sticky client", i.e., if it is connected to an AP even though it sees better signal strength from a neighboring AP.
Application Name (Available for Baseline - Poor Application Experience) Name of an application.
Application Usage Time (Available for Baseline - Poor Application Experience) The time duration for which a client has accessed an application.
Poor Application Experience (Available for Baseline - Poor Application Experience) The poor application usage experience for a client connection.
Uplink Bitrate (Available for Baseline - Poor Application Experience) The rate at which the client transmits data (in bits).
Downlink Bitrate (Available for Baseline - Poor Application Experience) The rate at which the client receives data (in bits).
Downlink Jitter (Available for Baseline - Poor Application Experience) Variation in the delay of packets received by a client. It is used to measure the quality of VoIP applications.
Uplink Jitter (Available for Baseline - Poor Application Experience) Variation in the delay of packets transferred by a client. It is used to measure the quality of VoIP applications.