BGP PIC Edge for EVPN VXLAN Routes for Remote VTEP Failures
When a remote VTEP goes down, the IGP and BGP must recompute a new best path traffic destined to affected BGP prefixes originally reachable by the problematic VTEP. Currently, the BGP PIC is restricted to locally identifiable failures such as link failures.
To overcome such VTEP failure issues, support for EVPN-learned VTEPs improve convergence times in these scenarios by tying the liveness detection provided by the BFD sessions into existing BGP PIC support for software fast-failover. Upon detecting that a BFD session to a remote VTEP has gone down, the hardware forwarding agents will update the affected adjacencies before the corresponding underlay route has been removed from the FIB which can improve convergence times.
The diagram above outlines a scenario in which CE-2 is sending traffic bound for 10.10.10.0/24 via PE-12 to PE-11 to CE-1. PE-11 goes down, however we have BFD sessions from PE-21 to the remote VTEPs of PE-11 and PE-12 and, therefore detect that it goes down and quickly update the forwarding to send the traffic along the pre-computed backup path via PE-12 to CE-1.
Configuring BGP PIC Edge for EVPN VXLAN Routes for Remote VTEP Failures
switch# config
switch(config)# interface VXLAN1
switch(config-if-Vx1)# bfd vtep evpn interval <interval> min-rx <min-rx> multiplier <multiplier>
This configuration uses the specified timer values to initiate BFD sessions for all VTEPs learned through EVPN VXLAN for this VTI.
- interval – Transmit rate in milliseconds
- min-rx – Expected minimum incoming rate in milliseconds
- multiplier – BFD multiplier
Example
switch(config-if-Vx1)# bfd vtep evpn interval 100 min-rx 100 multiplier 3
In this example (assuming symmetric configuration on other PE devices), any BFD for VXLAN session initiated on the VTI would have a detect time of 300ms (interval of 100ms multiplied by 3).
To utilize these BFD sessions, traffic must have an alternate path tin the event that the session goes down. This would include other paths in an ECMP group or a backup path.
switch# config
switch(config)# interface VXLAN1
switch(config-if-Vx1)# bfd vtep evpn prefix-list <PREFIX-LIST>
This command uses a supplied prefix list to filter and select the candidate VTEPs. By default, an empty prefix list will act as a deny-all and not initiate BFD sessions with any learned VTEPs.
Show Commands
switch# show interface VXLAN1
VXLAN1 is up, line protocol is up (connected)
Hardware is VXLAN
Source interface is Loopback0 and is active with 10.1.1.1
Replication/Flood Mode is headend with Flood List Source: CLI
Remote MAC learning is disabled
VNI mapping to VLANs
Static VLAN to VNI mapping is
Dynamic VLAN to VNI mapping for 'evpn' is
[4092, 30000] [4093, 20000]
Dynamic VLAN to VNI mapping for 'vccbfd' is
[4091, 0]
Note: All Dynamic VLANs used by VCS are internal VLANs.
Use 'show VXLAN vni' for details.
Static VRF to VNI mapping is
[vrf0, 20000]
MLAG Shared Router MAC is 0000.0000.0000
BFD is enabled with transmit interval 50, receive interval 50, multiplier 3, VTEP prefix list pl-example
switch# show bfd peers
VRF name: default
-----------------
DstAddr MyDisc YourDisc Interface/Transport Type LastUp LastDown LastDiag State
-------- ---------- ------------- ---------------------- --------- -------------- ------------ -------------- -----------
10.1.1.2 1965370229 3607849318 NA VXLAN 01/12/21 10:45 NA No Diagnostic Up
10.1.1.3 1355343148 2407539267 NA VXLAN 01/12/21 10:45 NA No Diagnostic Up
switch# show ip route vrf example-vrf
VRF: example-vrf
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
NG - Nexthop Group Static Route, V - VXLAN Control Service,
DH - DHCP client installed default route, M - Martian,
DP - Dynamic Policy Route, L - VRF Leaked,
RC - Route Cache Route
Gateway of last resort is not set
C 20.0.2.0/24 is directly connected, Ethernet14/1
B E 99.99.0.0/24 [200/0] via VTEP 10.1.1.2 VNI 30000 router-mac fc:bd:67:3d:21:fd
via VTEP 10.1.1.3 VNI 30000 router-mac ba:ed:43:3f:ca:8e backup
In the above example there is a prefix with the primary path using a VXLAN tunnel to VTEP 10.1.1.2 and has a backup VXLAN tunnel to VTEP 10.1.1.3. Both paths are monitored via BFD.
switch# show ip route vrf example-vrf
VRF: example-vrf
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
NG - Nexthop Group Static Route, V - VXLAN Control Service,
DH - DHCP client installed default route, M - Martian,
DP - Dynamic Policy Route, L - VRF Leaked,
RC - Route Cache Route
Gateway of last resort is not set
C 20.0.2.0/24 is directly connected, Ethernet14/1
B E 99.99.0.0/24 [200/0] via VTEP 10.1.1.2 VNI 30000 router-mac fc:bd:67:3d:24:fe
via VTEP 10.1.1.3 VNI 30000 router-mac ba:ed:43:3f:ca:8e
MLAG
This feature applies only to the scenario when remote prefix is known via two different MLAG VTEP pairs.
Prior to 4.26.0F this feature is only supported on the primary switch of an MLAG pair due to the use of Shared VTEP IP within MLAG pair as VXLAN tunnel source/destination. If BFD for VXLAN packets are received on the secondary MLAG switch, they will be forwarded to the primary MLAG switch for processing. Because only the primary MLAG switch will have BFD state for remote VTEPs, if a BFD session to a remote VTEP goes down only the primary MLAG switch will perform the fast-failover, while the secondary MLAG switch will retain current behavior. Therefore, it is not recommended to use this feature in conjunction with MLAG.
As of 4.26.0F, the primary MLAG switch will sync its BFD for VXLAN state to the secondary MLAG switch to allow the secondary to failover to an alternate path as well. To view the synced state, a new show command has been added.
switch# show bfd peers protocol VXLAN mlag primary
Remote VTEPS for VXLAN1 on MLAG primary:
VTEP BFD Status
---------- ----------
10.1.1.2 up
10.1.1.3 up
However, because this state is synced across devices, the secondary MLAG switch will not be as performant in reacting to the BFD state transitions as the primary MLAG switch, which is natively responding to the BFD session.
Another exception is the multi-VTEP MLAG feature, which allows BFD for VXLAN to run on the secondary MLAG switch. When running with multi-VTEP MLAG both the primary and secondary switches will run independent BFD sessions to remote VTEPs and react to BFD state transitions separately. Each switch will use the local VTEP IP of the VTI as the source IP address for the BFD sessions, which must differ from the MLAG VTEP IP.
In summary, it is only recommended to use MLAG with this feature if configured with the multi-VTEP IP feature referenced above.
Troubleshooting
- Ensure that BFD configuration is present on the relevant VTI and that the VTI status shows BFD as enabled using the mentioned show interface <VTI> command.
- As mentioned in the prior section, BFD state transitions are Syslogged and will display if a BFD session to a remote VTEP goes down.
- Upon fast-failover to a separate path, show ip route will still display FIB state that may display the original path. To view the post failover prefix state, show ip hardware ale vrf <VRF> <prefix> can be used instead.
Limitations
- Support is limited to EVPN VXLAN.
- IPv6 VXLAN underlay with this feature is not supported.