EVPN IGP Cost for VTEP Reachability

In EVPN deployment with VXLAN underlay when an EVPN type-5 prefix is imported into an IP VRF, the IGP cost of the underlay VTEP reachability is not considered as part of BGP bestpath selection post import. Therefore, if such a prefix is reachable via more than one VTEPs, the IGP metric step in the BGP best-path selection algorithm will not filter out any paths irrespective of the underlay’s IGP metric for the VTEP reachability. If ECMP is enabled in the overlay and multiple paths are found to be otherwise equivalent, such paths would form ECMP regardless of the IGP metric. This is the default behavior.

However, the above mentioned behavior can be overridden by the following configuration command encapsulation vxlan layer-3 set next-hop igp-cost under config-route-bgp-af mode for address-family evpn as shown.
switch(config-router-bgp)# address-family evpn
switch(config-router-bgp-af)# [no | default] encapsulation vxlan layer-3 set next-hop igp-cost

The encapsulation vxlan layer-3 set next-hop igp-cost command will cause the underlay IGP metric for the VTEP reachability to be considered for BGP best path selection in the IP VRF that is importing the EVPN route. An IGP protocol such as OSPF, ISIS or static configuration could be the source of such a metric value.

Note: This feature is available only with the multi-agent routing protocol model.

Configuration Example

Let us consider a topology of four routers leaf1, leaf2, leaf3 and rtr1 as shown in the figure.
Figure 1. EVPN IGP Cost for VTEP


There could be IGP running among the four routers, but for simplicity let's have static routes with IGP metric on leaf1 to lo0 of border leaf2 and leaf3 (and vice-versa for reverse reachability on the respective routers, though that configuration not shown here):
leaf1#
ip route 11.0.1.1/32 10.0.0.2 metric 340
ip route 11.0.2.1/32 10.0.0.2 metric 350
Following are the IP routes in the default VRF on leaf1.
leaf1# show ip route

VRF: default
Codes: C - connected, S - static, K - kernel, 
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
       R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
       O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
       NG - Nexthop Group Static Route, V - VXLAN Control Service,
       DH - DHCP client installed default route, M - Martian,
       DP - Dynamic Policy Route, L - VRF Leaked,
       RC - Route Cache Route

Gateway of last resort:
 S        0.0.0.0/0 [1/0] via 10.0.0.2, Ethernet2

 C        10.0.0.0/24 is directly connected, Ethernet2
 C        11.0.0.1/32 is directly connected, Loopback0
 S        11.0.1.1/32 [1/340] via 10.0.0.2, Ethernet2
 S        11.0.2.1/32 [1/350] via 10.0.0.2, Ethernet2

Following are eBGP-multihop EVPN neighbor pairs with VXLAN as underlay:

leaf1 (ASN-300) ? leaf2 (ASN-301)

leaf1 (ASN-300) ? leaf3 (ASN-302)

Consider an example where a prefix 20.0.100.1/32 is reachable behind two VTEPs leaf2 and leaf3 as learnt on leaf1 via eBGP EVPN Type-5 routes.

Following EVPN paths will show for 20.0.100.1/32 on leaf1:
leaf1(config)# show bgp evpn detail

BGP routing table entry for ip-prefix 20.0.100.1/32, Route Distinguisher: 11.0.1.1:0
 Paths: 1 available
  301
    11.0.1.1 from 10.0.1.1 (0.0.2.1)
      Origin INCOMPLETE, metric -, localpref 100, weight 0, valid, external, best
      Extended Community: Route-Target-AS:64500:20000 TunnelEncap:tunnelTypeVxlan 
      EvpnRouterMac:00:00:78:03:00:00
      VNI: 20000
BGP routing table entry for ip-prefix 20.0.100.1/32, Route Distinguisher: 11.0.2.1:0
 Paths: 1 available
  302
    11.0.2.1 from 10.0.2.1 (0.0.3.1)
      Origin INCOMPLETE, metric -, localpref 100, weight 0, valid, external, best
      Extended Community: Route-Target-AS:64500:20000 TunnelEncap:tunnelTypeVxlan 
      EvpnRouterMac:00:00:78:04:00:00
      VNI: 20000

Show Commands

By default, since the underlay IGP cost for the VTEP reachability is not used for best path selection in the imported EVPN Type-5 routes either of the two paths could be selected as the best path. If ECMP is configured in the importing VRF then the two paths will form ECMP. Following example shows an ECMP that is allowed in the importing VRF vrf1.
switch1# show ip route vrf vrf1

VRF: vrf1
Codes: C - connected, S - static, K - kernel, 
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
       R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
       O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
       NG - Nexthop Group Static Route, V - VXLAN Control Service,
       DH - DHCP client installed default route, M - Martian,
       DP - Dynamic Policy Route, L - VRF Leaked,
       RC - Route Cache Route

 B E      20.0.100.1/32 [200/0] via VTEP 11.0.1.1 VNI 20000 router-mac 00:00:78:02:00:00
                                via VTEP 11.0.2.1 VNI 20000 router-mac 00:00:78:03:00:00

switch1# show ip bgp 20.0.100.1/32 vrf vrf1
BGP routing table information for VRF vrf1
Router identifier 11.0.0.1, local AS number 300
BGP routing table entry for 20.0.100.1/32
 Paths: 2 available
  302
    11.0.2.1 from 10.0.2.1 (0.0.3.1), imported EVPN route, RD 11.0.2.1:0
      Origin INCOMPLETE, metric 0, localpref 100, IGP metric 350, weight 0, tag 0
      Received 01:11:00 ago, valid, external, ECMP head, ECMP, best, ECMP contributor
      Extended Community: Route-Target-AS:64500:20000 TunnelEncap:tunnelTypeVxlan 
      EvpnRouterMac:00:00:78:04:00:00
      Remote VNI: 20000
      Rx SAFI: Unicast
  301
    11.0.1.1 from 10.0.1.1 (0.0.2.1), imported EVPN route, RD 11.0.1.1:0
      Origin INCOMPLETE, metric 0, localpref 100, IGP metric 340, weight 0, tag 0
      Received 01:11:00 ago, valid, external, ECMP, ECMP contributor
      Not best: ECMP-Fast configured
      Extended Community: Route-Target-AS:64500:20000 TunnelEncap:tunnelTypeVxlan 
      EvpnRouterMac:00:00:78:03:00:00
      Remote VNI: 20000
      Rx SAFI: Unicast
The configuration command encapsulation vxlan layer-3 set next-hop igp-cost under “address-family evpn” will cause the underlay IGP cost to be taken into account and only VTEP 11.0.1.1 with lower IGP cost will be selected as shown below:
switch1# show ip route vrf vrf1

VRF: vrf1
Codes: C - connected, S - static, K - kernel, 
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
       R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
       O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
       NG - Nexthop Group Static Route, V - VXLAN Control Service,
       DH - DHCP client installed default route, M - Martian,
       DP - Dynamic Policy Route, L - VRF Leaked,
       RC - Route Cache Route

 B E      20.0.100.1/32 [200/0] via VTEP 11.0.1.1 VNI 20000 router-mac 00:00:78:02:00:00

switch1(config)# show ip bgp 20.0.100.1/32 vrf vrf1
BGP routing table information for VRF vrf1
Router identifier 11.0.0.1, local AS number 300
BGP routing table entry for 20.0.100.1/32
 Paths: 2 available
  301
    11.0.1.1 from 10.0.1.1 (0.0.2.1), imported EVPN route, RD 11.0.1.1:0
      Origin INCOMPLETE, metric 0, localpref 100, IGP metric 340, weight 0, tag 0
      Received 00:23:35 ago, valid, external, best
      Extended Community: Route-Target-AS:64500:20000 TunnelEncap:tunnelTypeVxlan 
      EvpnRouterMac:00:00:78:03:00:00
      Remote VNI: 20000
      Rx SAFI: Unicast
  302
    11.0.2.1 from 10.0.2.1 (0.0.3.1), imported EVPN route, RD 11.0.2.1:0
      Origin INCOMPLETE, metric 0, localpref 100, IGP metric 350, weight 0, tag 0
      Received 00:23:35 ago, valid, external
      Not best: IGP cost
      Extended Community: Route-Target-AS:64500:20000 TunnelEncap:tunnelTypeVxlan 
      EvpnRouterMac:00:00:78:04:00:00
      Remote VNI: 20000
      Rx SAFI: Unicast