Print

Troubleshooting MSS

This section contains valuable information on troubleshooting common issues, and perform routine maintenance tasks.

Show Commands

The following show commands help with troubleshooting a ZTX Monitor Node.

Check the interface status to ensure that the port-channel and member ports are connected:

ZTX# show interfaces status
PortName StatusVlan Duplex Speed Type Flags Encapsulation
Et1/1connected in Po1 full 10G 10GBASE-SR 
Et1/2connected in Po1 full 10G 10GBASE-CR 
.. 
Po1 connected routed full 40G N/A

Check each GRE tunnel interface status to ensure that the status is UP (multiple GRE tunnels may terminate on the interface):

ZTX# show interfaces tunnel 0
Tunnel0 is up, line protocol is up (connected)
Hardware is Tunnel, address is 0000.0000.0000
Tunnel source 10.10.254.1, destination 10.10.254.2
Tunnel protocol/transport GRE/IP
Hardware forwarding enabled
Tunnel transport MTU 1476 bytes (default)
Tunnel underlay VRF "default"
Up 3 days, 53 minutes, 22 seconds

Check the flow tracking feature status is active on all GRE monitor tunnels from TORs:

ZTX# show flow tracking firewall distributed
Flow Tracking Status
Type: Distributed Firewall
Running: yes, enabled by the 'flow tracking firewall distributed' command
Tracker: flowtrkr
Active interval: 300000 ms
Inactive timeout: 15000 ms
Groups: IPv4
Exporter: exp
VRF: default
Local interface: Loopback0 (10.10.254.1)
Export format: IPFIX version 10, MTU 9152
DSCP: 0
Template interval: 3600000 ms
Collectors:
127.0.0.1 port 4739
Active Ingress Interfaces: 
Tu0, Tu1 

Check if mirrored flows are seen at the ZTX Node:

ZTX# show firewall distributed instance session-table
Legend
eph - Ephemeral port
Sessions: 5
VRFProto Source/Destination Fwd/Rev Src VTEP IPFwd/Rev PktsFwd/Rev Bytes CompleteHalf-OpenStart Time Destination
------ ----- ------------------ ---------------- -------------------- ----------- ------------- --------------------
vrf2 UDP 1.1.1.1:5000410.10.254.214280 1 2024-10-28 11:13:091.1.1.4:1001 10.10.254.3 1 428
vrf1 UDP 1.1.1.1:eph10.10.254.252140 5 0 2024-10-28 11:13:091.1.1.3:1001 10.10.254.3 5 2140
vrf1 UDP 1.1.1.1:eph10.10.254.252140 5 0 2024-10-28 11:13:091.1.1.2:1001 10.10.254.3 5 2140

If mirrored flows are not seen at the ZTX Node, check for drops:

ZTX# show platform sfe counters | nz
NameOwnerCounter Type UnitCount
------------------------------------- ---------------- ------------ ------- —----
Tunnel-Global-gre_decap_drop_pkts Ip4TunDemuxmodule packets 400
Tunnel-Global-tun_decap_drop_pkts Ip4TunDemuxmodule packets 260
IpInput_Tunnel0-Stateful_drop_counter IpInput_Tunnel0module packets 47

If mirrored flows are not seen at CloudVision Portal, check if ZTX Node has exported the flows, and if there are no failures in IPFIX export:

switch# show agent sfe threads flow cache scan counters
Purged count: 501
IPFIX export count: 354
IPFIX failed export count: 0 

The following is a brief explanation of the output:

  • Type: Distributed Firewall type for ZTX nodes.
  • Running: “yes” indicates that the flow tracking feature is successfully running.
  • Tracker: Name of flow tracker configuration.
  • Active interval: Interval after which IPFIX data packet is exported for active sessions. Active interval is set to 1800000ms (30mins) by default and can be modified in MSS Studio.
  • Inactive timeout: The time after which sessions are considered inactive if no packets are received. Inactive timeout is not configurable and defaults to 15000ms.
  • Groups: Currently, only IPv4 packets are supported.
  • Exporter: Name of exporter configuration.
  • VRF: VRF used for IPFIX export.
  • Local Interface: Local interface used for IPFIX export.
  • Export format: IPFIX version and IP MTU used for exported IPv4 packets.
  • DSCP: Differentiated Service Code Point value used in exported IPv4 packet header.
  • Template Interval: Time interval between successive IPFIX template export to collector.
  • Collectors: List of IPFIX Collector IP and Port. 127.0.0.1 indicates local IPFIX collector running on a ZTX device. The local IPFIX collector will send the exported flows to CVP.
  • Action Ingress Interfaces: Displays all the tunnel interfaces on which IPFIX flow tracking is running.

Tracing

Enabling tracing can seriously impact the switch's performance in some cases. Please use it cautiously and seek advice from an Arista representative before enabling it in any production environments.

trace Sfe setting IpfixWalker*/*

Considerations

When deploying Multi-domain Segmentation Services (MSS) with the ZTX-7250S-16S Monitor Node, several crucial technical aspects must be considered to ensure optimal performance and policy enforcement.

ZTX-7250S-16S Monitor Node

Session Capacity and Traffic Throughput

The ZTX-7250S-16S Monitor Node can handle a maximum of 32 million session entries concurrently. Session entries include a mix of aggregate, short-lived ephemeral, and persistent non-ephemeral sessions. Monitoring your network's session count is vital to avoid exceeding this limit, which could impact performance or lead to dropped connections. Additionally, the node supports up to 80Gbps of incoming monitor traffic when all 16 Ethernet ports are actively connected to upstream service Top-of-Rack (TOR) switches. Designing your network to leverage all available ports will maximize monitoring throughput.

Self-IP Support

MSS Studio automatically applies a Self-IP rule to permit unicast traffic destined for the device. While this is usually sufficient for control plane protocols, those relying on multicast PDUs—like PIM, OSPF, or IPv6 Neighbor Discovery for BGPv6—might fail to establish if a default deny any any rule is in place. You'll need to allow such multicast traffic in your policies explicitly. Also, remember that Layer 2 devices where traffic policy enforcement occurs still require IP routing to be enabled to function correctly with MSS.

Layer 2 Devices

L2 devices where traffic policy enforcement is applied will still need to enable IP routing.

Network Address Translation

NAT and Policy Rule Limitations

If you enable NAT on 7050S, 720S, or 722S platforms, be aware that the number of supported traffic policy rules will be reduced. This reduction is because NAT and traffic policies share the same Ternary Content Addressable Memory (TCAM) resources. Careful planning of your NAT implementation is necessary to avoid impacting your segmentation policies.

Access Control Lists and Policy-Based Routing

Policy Overlap and Precedence

The interaction between different policy types, specifically Access Control Lists (ACLs), Policy-Based Routing (PBR), and MSS Traffic Policy rules, requires careful consideration. If these policies are configured to apply to overlapping flow attributes, their combined effect might not be as intended. For instance, ACLs can be safely applied to Self-IP traffic, but keep in mind that MSS Studio isn't aware of any ACLs that impact data packets directly. You'll need to manually account for how these different policy mechanisms might interact to avoid unintended traffic behavior or security gaps.

..