Production Network Monitoring

This chapter describes the dashboards provided on the Production Network tab, which shows traffic and events on the production network interfaces connected to the DANZ Monitoring Fabric. This chapter includes the following sections:

sFlow®

Click the Fabric option; it displays the sFlow®* dashboard by default. It summarizes information from the sFlow messages sent to the Arista Analytics server from the DANZ Monitoring Fabric controller or other sFlow agents. This dashboard provides the following panels:
  • Top Sources
  • Source Port
  • Top Destinations
  • Destination Port
  • Traffic over time
  • Flow by Filter Interface
  • Flow by Device & IF
  • Count sFlow vs. Last Wk
  • Flow QoS PHB
  • Flow Source
  • Flow Destination
  • sFlow MTU Distribution
  • Flows by Time

sFlow and VXLAN

The sFlow dashboard shows both outer and inner flows of VXLAN packets based on the VNI number of the VXLAN packet. For all the inner flows of a particular VXLAN packet, first filter by VXLAN packets on the App L4 Port window to display all VXLAN packets. Identify the VXLAN packet you are interested in from the Flows by Time window. Expand the row, note the packet's VNI number, then remove the VXLAN filter and filter based on the VNI number. It will show both the outer flow of the VXLAN packet and all the inner flows associated with that VXLAN packet.

NetFlow and IPFIX

The system displays the following dashboard by clicking NetFlow:
Figure 1. Production Network > NetFlow Dashboard

Configure the NetFlow collector interface on the Arista Analytics Node to obtain NetFlow packets, as described in the Setting up the NetFlow Collector on the Analytics Node section.

The NetFlow dashboard summarizes information from the NetFlow messages sent to the Arista Analytics Node from the DANZ Monitoring Fabric controller or other NetFlow flow exporter and provides the following panels:
  • nFlow Source IP (inner) Destination IP (outer)
  • NF over Time
  • nFlow Live L4 Ports
  • nFlow by Filter Interface
  • nFlow by Production Device & IF
  • NF by QoS PHB
  • NF by DPI App Name
  • NF Top Talkers by Flow
  • NF Detail
Note: To display the fields in the nFlow by Filter Interface panel for NetFlow V5 and IPFIX generated by the DMF Service Node appliance, records-per-interface, and records-per-dmf-interface knobs must be configured in the DANZ Monitoring Fabric controller.
Starting from the BMF-7.2.1 release, the Arista Analytics Node can also handle NetFlow V5/V9 and IPFIX traffic. All of the flows represent a Netflow index. From the NetFlow Dashboard, filter rules apply to display specific flow information.
Figure 2. NetFlow Version 5
Figure 3. NetFlow Version 9
Figure 4. NetFlow Version 10
Note:
  1. The Arista Analytics Node cluster listens to NetFlow v9 and IPFIX traffic on UDP port 4739. NetFlow v5 traffic learn on UDP port 2055.
  2. Refer to DANZ Monitoring Fabric 8.4 User Guide for NetFlow and IPFIX service configuration.
  3. Starting from the DMF-8.1.0 release, Analytics Node capability augments in support of the following Arista Enterprise-Specific Information Element IDs:
    • 1036 -AristaBscanExportReason
    • 1038 -AristaBscanTsFlowStart
    • 1039 -AristaBscanTsFlowEnd
    • 1040 -AristaBscanTsNewLearn
    • 1042 -AristaBscanTagControl
    • 1043 -AristaBscanFlowGroupId

Consolidating Netflow V9/IPFIX records

You can consolidate NetFlow V9 and IPFIX records by grouping those with similar identifying characteristics within a configurable time window. This process reduces the number of documents published in Elasticsearch, decreases disk usage, and improves efficiency. This is particularly beneficial for long flows, where consolidations as high as 40:1 have been observed. However, enabling consolidation is not recommended for environments with low packet flow rates, as it may cause delays in the publication of documents.

The following configuration sets the load-balancing policy of Netflow/IPFIX traffic among nodes in DMF Analytics.
cluster:analytics# config
analytics(config)# analytics-service netflow-v9-ipfix
analytics(config-controller-service)# load-balancing policy source-hashing
The two settings are:
  • Source hashing: forwards packets to nodes statistically assigned by a hashtable of their source IP address. Consolidation operations are performed on each node independently in source hashing.
  • Round-robin: distributes the packets equally between the nodes if source-hashing results in significantly unbalanced traffic distribution. Round-robin is the default behavior.
Note: Configure the round-robin to lighten the load on the leader node when flow rate is higher than 10k/sec in cluster setup.
Note:This configuration doesn’t apply to single-node deployments.

Kibana Setup

To perform the Kibana configuration, select the System > Configuration tab on the Fabric page and open the Analytics Configuration > netflow_stream panel:

Figure 5. Kibana setup
For editing the netflow stream, go to the following tab:
Figure 6. Edit the netflow stream
There are three required settings:
  • enable: turn consolidation on or off.
  • window_size_ms: adjust window size using the rate of Netflow V9/IPFIX packets per second the analytics node receives. The default window size is 30 seconds but measured in milliseconds.
  • mode: There are three supported modes:
    • ip-port: records with the same source IP address, destination IP address, and IP protocol number. It also consolidates the lower numerical value of the source or destination Layer 4 port number with others.
    • dmf-ip-port-switch:records from common DMF Filter switches that meet ip-port criteria.
    • src-dst-mac: records with the same source and destination MAC addresses.
      Note:It uses the mode when Netflow V9/IPFIX templates collect only Layer 2 fields.
Starting in DMF-8.5.0, the configuration mentioned above is set under a “consolidation JSON”object as follows:
Figure 7. Consolidating Netflow

Consolidation Troubleshooting

If consolidation is enabled but does not occur, Arista Networks recommends creating a support bundle and contacting Arista TAC.

Load-balancing Troubleshooting

If there are any issues related to load-balancing, Arista Networks recommends creating a support bundle and contacting Arista TAC.

NetFlow and IPFIX Flow with Application Information

This feature of Arista Analytics combines Netflow and IPFIX records containing application information with Netflow and IPFIX records containing flow information.

This feature improves the data visibility per application by correlating flow records with applications identified by the flow exporter.

This release supports only applications exported from Arista Networks Service Nodes. In a multi-node cluster, you must configure load balancing in the Analytics Node CLI command.

Configuration

In a multi-node Analytics cluster, set the load-balancing policy of Netflow/IPFIX traffic to source-hashing as the round-robin policy may cause application information to be missing from the resulting flow documents in ElasticSearch.
analytics# config
analytics(config)# analytics-service netflow-v9-ipfix
analytics(config-an-service)# load-balancing policy source-hashing
Note: This configuration doesn’t apply to single-node deployments.

Kibana Configuration

To perform the Kibana configuration, select the System > Configuration tab on the Fabric page and open the Analytics Configuration > netflow_stream visualization.
Figure 8. Dashboard - Netflow stream configuration
Add the app_id configuration object.
Figure 9. Edit - Netflow stream
In the app_id configuration object, it requires the following setting:
  • add_to_flows: Enables or turns off the merging feature.

ElasticSearch Documents

Three fields display the application information in the final NetFlow/IPFIX document stored in ElasticSearch:

  • appScope: Name of the NetFlow/IPFIX exporter.
  • appName: Name of the application. This field is only populated if the exporter is NTOP.
  • appID: Unique application identifier assigned by the exporter.

Troubleshooting

If merging is enabled but does not occur, Arista Networks recommends creating a support bundle and contacting Arista TAC.

Limitations

  • Some flow records may not include the expected application information when configuring round-robin load balancing of Netflow/IPFIX traffic. Arista Networks recommends configuring the source-hashing load-balancing policy and sending all Netflow/IPFIX traffic to the Analytics Node from the same source IP address.
  • Application information and flow records are correlated only if the application record is available before the flow record.
  • Arista Networks only supports collecting application information from Netflow/IPFIX exporters: NTOP, Palo Alto Networks firewalls, and Arista Networks Service Node.
  • This feature isn’t compatible with the consolidation feature documented in the Consolidating Netflow V9/IPFIX records. When merging with application information is enabled, consolidation must be disabled.

NetFlow and sFlow Traffic Volume Upsampling

Arista Analytics can upsample traffic volume sampled by NetFlow V9/IPFIX and sFlow. This feature provides better visibility of traffic volumes by approximating the number of bytes and packets from samples collected by the NetFlow V9/IPFIX or sFlow sampling protocols. It gives those approximation statistics along with the ElasticSearch statistics. The feature bases the approximations on the flow exporter’s sampling rate or a user-provided fixed factor.

Note: When the rate of flow packets is low or for short flows, the approximations will be inaccurate.

The DMF 8.5.0 release does not support the automated approximation of total bytes and packets for Netflow V9/IPFIX. If upsampling is needed, Arista Networks recommends configuring a fixed upsampling rate.

NetFlow/IPFIX Configuration

To perform the Kibana configuration, select the System > Configuration tab on the Fabric page and open the Analytics Configuration > netflow_stream visualization.
Figure 10. Dashboard - Netflow IPFIX configuration
Figure 11. Edit - Netflow IPFIX
There is one required setting, upsample_byte_packet_factor, with two possible options:
  • Auto: This is the default option. DMF 8.5.0 does not support automated upsampling for Netflow V9/IPFIX. Arista Networks recommends configuring an integer if upsampling is needed.
  • Integer: Multiply the number of bytes and packets for each collected sample by this configured number.

sFlow Configuration

To perform the Kibana configuration, select the System > Configuration tab on the Fabric page and open the Analytics Configuration > sFlow visualization.
Figure 12. Dashboard - sFlow configuration
Figure 13. Edit - sFlow
There is one required setting, upsample_byte_packet_factor, with two possible options:
  • Auto: Approximate the number of bytes and packets for each collected sample based on the collector’s sampling rate. Auto is the default option.
  • Integer: Multiply the number of bytes and packets for each collected sample by this configured number.

Dashboards

NetFlow Dashboard
The NetFlow dashboard is on the Production Network > NetFlow tab on the Fabric page. The following visualization will display upsampled statistics:
  • NF over Time
  • NF Top Talkers by Flow
Figure 14. NF Detail visualization
The DMF 8.5.0 release adds two new columns:
  • upsampledPacketCount: Approximate total count of packets for a flow.
  • upsampledByteCount: Approximate total count of bytes for a flow.
Note: In DMF 8.5.0, configuring upsampling to Auto, upsampledByteCount, and upsampledPacketCount will copy the bytes and packets column and display the values of bytes and packets in the graphs and tables of this dashboard.
sFlow Dashboard

The sFlow dashboard is on the Production Network > sFlow tab on the Fabric page. The Traffic over Time visualization will display upsampled statistics.

Figure 15. Flow by Time visualization

The newly added upsampledByteCount represents a flow's approximate total count of bytes.

Troubleshooting

Arista Networks recommends creating a support bundle and contacting Arista Networks TAC if upsampling isn’t working correctly.

TCPFlow

Click the TCPFlow tab to display the following dashboard.
Figure 16. Production Network > TCPFlow Dashboard

The information on the TCPFlow dashboard depends on TCP handshake signals and deduplicates. The Filter Interface visualization indicates the filter switch port where data is received. The switch description is specified in the Description attribute of each switch, configured on the DANZ Monitoring Fabric controller. Device & IF on this dashboard refers to the end device and depends on LLDP packets received.

Flows

Click the Flows tab to display the following dashboard.
Figure 17. Production Network > Flows Dashboard
The Flows Dashboard summarizes information from sFlow and NetFlow messages and provides the following panels:
  • All Flows Type
  • All Flows Overtime
  • All Flows Details

Filters & Flows

Click the Filters & Flows tab to display the following dashboard.
Figure 18. Production Network > Filters & Flows Dashboard

ARP

Click the ARP tab to display the following dashboard. This data correlates with the tracked host feature on the DANZ Monitoring Fabric controller. It shows all ARP data when you switch interface and production devices over time.
Figure 19. Production Network > ARP Dashboard

DHCP

Click the DHCP tab to display the following dashboard.
Figure 20. Production Network > DHCP Dashboard
Note: Operating systems on the network and data by filter interface and production device information are available.
The DHCP Dashboard summarizes information from analyzing DHCP activity and provides the following panels:
  • DHCP OS Fingerprinted
  • DHCP Messages by Filter Interface
  • DHCP Messages by Production Switch
  • Non-whitelist DHCP Servers
  • DHCP Messages Over Time
  • DHCP Messages by Type
  • DHCP Messages

DNS

Click the DNS tab to display the following dashboard.
Figure 21. Production Network > DNS Dashboard
The DNS Dashboard summarizes information from analyzing DNS activity and provides the following panels:
  • DNS Top Servers
  • DNS Top Clients
  • DNS By Filter Interface
  • DNS by Production Device & IF
  • DNS Messages Over Time
  • Unauthorized DNS Servers
  • DNS RTT
  • DNS All Messages
  • DNS RCode Distro
  • DNS QType Description
  • DNS Top QNames
Note: The query and response packet timestamps compute the DNS RTT value. If a query packet does not answer by a response packet within 180 seconds, then the RTT value is set to -1.

ICMP

Click the ICMP tab to display the following dashboard.
Figure 22. Production Network > ICMP Dashboard
The ICMP Dashboard summarizes information from analyzing ICMP activity and provides the following panels:
  • Top ICMP Message Source
  • ICMP by Filter Interface
  • Top ICM Message Dest
  • ICMP by Error Description
  • ICMP by Production Switch
  • ICMP Top Err Dest IPs
  • ICMP Top Err Dest Port Apps
  • ICMP Messages Over Time
  • ICMP Table
*sFlow® is a registered trademark of Inmon Corp.