FlexRoute Engine IP Forwarding - Network Efficiency

 

 
Arista Networks White Paper
White Paper
Arista FlexRoute™ Engine

Introduction

Arista Networks’ award-winning Arista 7500 Series was introduced in April 2010 as a revolutionary switching platform, which maximized datacenter performance, efficiency and overall network reliability. It raised the bar for switching performance, being five times faster, one-tenth the power draw and one-half the footprint compared to other modular datacenter switches.

In 2013, the Arista 7500E Series delivered a three-fold increase in density and performance, with no sacrifices on features and functionality and with complete investment protection. Just three years later the Arista 7500R Universal Spine platform delivers more than a 3.8X increase in performance and density with significant increases in features and functionality including support for full internet table routing capacity.

FlexRoute™ is the technology that enables IP forwarding capacity in excess of 1M+ prefixes in hardware on the Arista 7500R Universal Spine and Arista 7280R Universal Leaf platforms. This whitepaper details the FlexRoute Engine.

Arista 7500R Universal Spine and Arista 7280R Universal Leaf platforms
Arista 7500R Universal Spine and Arista 7280R Universal Leaf platforms

Arista FlexRoute Engine

The Arista FlexRoute Engine provides support for the full internet routing table, in hardware, with IP forwarding at Layer 3 and with sufficient headroom for future growth in both IPv4 and IPv6 route scale to more than 1 million routes. The innovative FlexRoute Engine with its patented algorithmic approach to building layer 3 forwarding tables on Arista 7500R and 7280R Universal Spine and Leaf platforms is unique to Arista and a key enabler in calling these platforms routers.

On the hardware side, FlexRoute performs a longest-prefix-match (LPM) layer 3 lookup for IPv4 and IPv6 as part of the ingress packet processing on the distributed packet processor(s) on every linecard (Figure 1.) or system.

Arista FlexRoute engine within packet processor
Figure 1: Arista FlexRoute Engine Within the Packet Processor on Linecards

Internally FlexRoute uses an algorithmic approach to performing lookups. When compared to legacy LPM approaches, FlexRoute uses less active silicon (lower activity factor) combined with a more efficient use of the transistors (denser storage) to hold the LPM forwarding tables. The result is dramatically lower power, a higher number of ports and greater throughput when compared to alternate approaches on the same process node.

Arista FlexRoute Prefix-Match Lookups
Figure 2: Arista FlexRoute Engine for longest-prefix-match lookups compared to alternatives

The algorithms used to perform the LPM lookup are optimized based on the historic growth of the internet routing table and known trends of how the routing table is expected to evolve. For example, FlexRoute is optimized on the continued and expected acceleration of de-aggregation of the IPv4 prefix space (e.g. /23 prefixes deaggregating to 2 x /24s). It is also optimized around an aggressive expansion of IPv6 announcements (most prefix announcements are /32 and /48). In comparison to the legacy ways of increasing LPM tables which either involve increasing the size of tables and memories (more transistors, more power/heat, lower port density) or increasing the depth of lookups in a tree structure (lower performance), the algorithmic approach used in FlexRoute becomes more efficient with these trends and the evolution of the internet routing table.

Paths, Prefixes and Internet Growth

At Arista, we’re confident the algorithmic techniques used to build the LPM in FlexRoute will provide many years of headroom for continued growth of the internet routing table. Without stating how far beyond 1M+ prefixes it can scale or how the efficiency evolves over time, let's look back at the how the internet routing table has evolved to its current size (May 2016: ~649K prefixes [610K IPv4, ~29K IPv6]) and how it is expected to evolve in future.

Past, Present and Future Internet Growth

Geoff Huston, the Chief Scientist at APNIC, the Asia Pacific Regional Internet Registry has been providing research, analysis and commentary on the global internet routing table for close to a decade. In January 2016 Geoff, as part of APNIC Labs, published an analysis of the Internet routing table in 2015[1] building upon previous years’ analysis and commentary on the topic.

The exact number of IPv4 and IPv6 prefixes that make up the internet varies depending on location and localized summarization, however the broad number of prefixes is quite clear, so too are the trends. Using the passive measurement point of the global routing table from AS131072 and its data from the perspective of Australia and Japan in the APNIC region, the data collected shows IPv4 and IPv6 prefix space expansion as follows:

Table 1: Historic growth of IPv4 and IPv6 announcements (source: Geoff Huston / APNIC Labs Table 1 from [1])
MetricJan-2013Jan-2014Jan-2015Jan-2016
IPv4 prefixes 441,000 488,000 (+10%) 530,000 (+9%) 587,000 (+11%)
IPv6 prefixes 11,900 16,700 (+40%) 21,000 (+26%) 27,200 (+30%)
Total (IPv4+IPv6) 452,000 504,700 (+11%) 551,000 (+9%) 614,200 (+11%)

Taking into account the Regional Internet Registry prefix allocations and actual prefix route announcements (e.g. more specific prefixes advertised) and how that trend has increased over time, with a view to what future prefix announcements, updates and de-aggregation will likely happen based on historic trends, the same report provides predictions for the future expected growth. IPv6 is a little harder to predict, so the report provides predictions based both on linear growth (L) and exponential growth (E), with the reality most likely somewhere between the two:

Table 2: Historic growth of IPv4 and IPv6 announcements (source: Geoff Huston / APNIC Labs Table 2 from [1])
MetricJan-2016 (actual)Jan-2017 (prediction)Jan-2018 (prediction)Jan-2019 (prediction)Jan-2020 (prediction)Jan-2021 (prediction)
IPv4 prefixes 586,879 629,000 (+7%) 675,000 (+7%) 722,000 (+7%) 769,000 (+7%) 816,000 (+6%)
IPv6 prefixes (L) 27,241 30,421 (+12%) 35,113 (+15%) 39,806 (+13%) 44,498 (+12%) 49,203 (+11%)
IPv6 prefixes (E) 27,241 37,968 (+39%) 51,303 (+35%) 69,322 (+35%) 93,669 (+35%) 126,671 (+35%)
Total (linear IPv6) 614,120 659,421 (+7%) 710,113 (+8%) 761,806 (+7%) 813,498 (+7%) 865,203 (+6%)
Total (exponential IPv6) 614,120 666,968 (+9%) 726,303 (+9%) 791,322 (+9%) 862,669 (+9%) 942,671 (+9%)

While the predictions in [1] summarized in Table 2 are predictions, the underlying data clearly shows there is more than 5 years’ of headroom before the total of IPv4 and IPv6 prefix announcements cumulatively exceeds 1 million entries, even with an aggressive expansion rate.

BGP Paths, Routes And Forwarding Entries

There are often misconceptions on how prefixes and paths in BGP relate to entries stored in forwarding tables. For example, if you receive transit capacity from three upstream providers (BGP neighbors), each sending 600K prefixes in BGP, there are 1.8 million paths (600K x 3 neighbors) but this still 600K unique prefixes, not 1.8 million prefixes. That some prefixes are preferred via one neighbor or another would be resolved at the BGP level, or if there are multiple equal-cost paths for a prefix, the route prefix would be via equal-cost-multi-pathing (ECMP), however the result is still that there are still only 600K prefixes just that some prefixes point at one next-hop or another, or a group of next-hop entries in the ECMP case.

The relationship between prefixes received in BGP and how they are stored in the routing table (RIB) and forwarding table (FIB) is shown in figure 3.

Arista FlexRoute Prefixes received in BGP
Figure 3: Prefixes received in BGP and their resolution from BGP to RIB to FIB

Regardless of number of the number of full tables received from transit providers, numbers of peers, or even someone inadvertently announcing prefixes they aren’t meant to, there is no increase in the number of prefixes as a result of multiple transit or peering providers.

Real World FlexRoute Resource Utilization

Arista’s innovative FlexRoute Engine is designed and built around the internet routing table and prefix distribution with capacity of over 1 million prefixes for IPv4 and IPv6 combined. FlexRoute is enabled via a FlexRoute license and the following CLI commands:
arista(config)# ip hardware fib optimize prefixes profile internet
arista(config)# ipv6 hardware fib optimize prefixes profile internet
Real world examples of the hardware capacity and resources utilized in multiple deployments are shown below.

Real World Example 1: Internet2 Edge Router (IPV4 Only)

In this deployment (an Internet2 edge router) of IPv4, there are ~595K prefixes received from two BGP neighbors that resulted in ~579K unique prefixes in the routing table (RIB). The highest-capacity hardware resource in this case is at 62% usage. The “show hardware capacity” EOS command shows the resource utilization:
Router Connected to Internet2
Figure 4: A Router Connected to Internet2

hosting provider with both IPv4 & IPv6
Figure 5: A large hosting provider with both IPv4 & IPv6

Real World Example 2: Cloud Titan Full IPV4/IPV6 Internet Edge Router

In this deployment, a cloud titan is using the device as an edge router, with both IPv4 and IPv6 via multiple transit providers. In this case there are four full feeds for both IPv4 and IPv6 with ~2.3M IPv4 and ~140K IPv6 paths that results in ~575K IPv4 and ~35K IPv6 prefixes in the routing table (RIB). The highest-utilized hardware resource in this case is 88%:

Cloud Titan Provider with Four Full Feeds
Figure 6: A Cloud Titan Provider with Four Full Feeds (4 transit providers) for both IPv4 & IPv6

Hardware Resource Summary

Due to the algorithmic approach, exactly which resources are used varies across deployments. In the examples provided there is more than sufficient capacity to forward using the full internet routing table, with forwarding resource headroom for many years of future growth:
Network Hardware Resource Utilization
Figure 7: Summary of hardware resource utilization across the examples

Arista’s work on the algorithms and techniques around FlexRoute will continue, with additional capacity enhancements planned.

Arista EOS, SysDB and NetDB

At the core of the Arista 7500R and 7280R Universal Spine and Leaf platforms is Arista EOS® (Extensible Operating System). EOS is built on the strong foundations of a multi-process state-sharing architecture with modularity, programmability, fault containment and resiliency as the core software building blocks.

System state is stored in a highly efficient, centralized System Database (SysDB) and accessed using an automated publish/ subscribe/notify model and internally NetDB is used to enable scaling of the routing stack to support millions of routes and hundreds of neighbors with faster convergence than traditional routers and legacy approaches to control-plane state on routers would otherwise.

While many network vendors claim they have a fast, scalable and robust control-plane, the fine print is that it can take seconds to react to failures and minutes for routes to be programmed in hardware. Arista EOS scales with industry-leading convergence and route programming, sub-second (typically milliseconds) reaction times to disruptions. In contrast to legacy approaches a key consideration of FlexRoute has been the ability to support fast prefix programming in the dataplane and make-before-break programming of the forwarding tables in hardware that doesn’t disrupt adjacent entries.

Summary

Summary Arista’s FlexRoute Engine provides support for the full internet routing table in hardware, with IP forwarding at Layer 3 and with sufficient headroom for future growth in both IPv4 and IPv6 route scale to more than 1 million routes. The innovative FlexRoute Engine with its patented algorithmic approach to building layer 3 forwarding tables on Arista 7500R and 7280R Universal Spine and Leaf platforms is unique to Arista and a key enabler in calling these platforms routers.

References: [1] Analysis of the Internet Routing table in 2015, Geoff Huston (APNIC): https://labs.apnic.net/?p=767
Copyright © 2017 Arista Networks, Inc. All rights reserved. CloudVision, and EOS are registered trademarks and Arista Networks is a trademark of Arista Networks, Inc. All other company names are trademarks of their respective holders. Information in this document is subject to change without notice. Certain features may not yet be available. Arista Networks, Inc. assumes no responsibility for any errors that may appear in this document. Ver: 02-0072-02