DPDK Mode

This chapter will focus on new DPDK based CloudEOS, while highlighting the changes from the existing kernel based CloudEOS, as needed.

The Arista CloudEOS Router is a cloud-grade, feature-rich, multi-cloud and multi-hypervisor virtual router that empowers enterprises and cloud providers to build consistent, highly secure and scalable multi-cloud networks.

The CloudEOS Router can run in two modes : DPDK (high performance) and kernel, each with its own set of supported features. From CloudEOS-Router-4.23.0FX new installations of CloudEOS router from the hypervisor image or on public cloud by default run in high performance DPDK mode. Going forward all the new features and development will be in the DPDK mode.

Both flavors of CloudEOS perform all the packet forwarding operations in software, but use different software components for the same. DPDK based CloudEOS which use DPDK to perform packet forwarding operations is much more efficient. Note, that while DPDK mode has more number of features than the kernel mode, there are certain features that are available only in the kernel mode like Zone Based Segmentation (ZSS) and sflow.

Note: We publish both 64 bit and 32 bit mode we recommend users to use 64 bit mode. Customers who upgrade from 32 bit to 64 bit mode need to be aware that the 64bit mode requires 30 percent more memory than 32 bit mode. In general, 64 bit mode can scale better and has higher datapath performance. The public cloud images are only available in 64 bit mode.

Platform Compatibility

 

The following platforms supports CloudEOS-DPDK mode.

  • VMWare ESXi 6.0+
    • Supported NICs
      • VMWare vmxnet3 ( para-virtualized )
      • Intel x520/82599 PCIe passthrough and SRIOV mode
  • Linux / KVM
    • RHEL/CentOS 7.0+
    • Ubuntu 18.04+
      • Supported NICs
        • Virtio-net ( para-virtualized )
        • Intel x520/82599 PCIe passthrough and SRIOV mode

Hardware Resource Requirements

 

The CloudEOS Router DPDK mode depends on the features available in modern CPUs, and thus it is important that CloudEOS VMs are deployed on only certain types of server platforms in order to meet performance benchmarks. Ideal server configuration should be similar to the following.

  • Intel E5-26XX v4 or later class of CPU
    • > 2.2 GHz core frequency for best performance
    • Minimum 2 CPU cores reserved for CloudEOS
      • At-least 2 additional cores will be used by hypervisor software
    • Hyper-threading disabled for best performance
    • non-NUMA or ensure CloudEOS resources are from single NUMA node
    • Power saving and frequency scaling features disabled in BIOS/Firmware/Hypervisor
  • High speed hard-drive (~7200 RPM) or SSD and minimum 8 GB of free disk space.
  • At-least 4 GB for RAM reserved for CloudEOS.
    • 4 GB RAM can only support 3 VRFs
    • 8 GB required for supporting up-to 8 VRFs.
    • Server should have more memory for use by hypervisor software.

In addition to the server requirements, user need to configure the hypervisor to provide memory and CPU reservation for CloudEOS VMs to ensure optimum performance is achieved. In addition to this, map each vCPU used by CloudEOS VM to a unique physical CPU core. These configuration and settings are based on the type of hypervisor used, and information is usually documented in configuration guide(s) provided by the hypervisor vendor.

Switching to DPDK Mode

 

All the new installations of CloudEOS router from the hypervisor image or public cloud by default run in high performance DPDK mode from CloudEOS-Router-4.23.0FX.

If you have an existing instance that runs in the kernel mode that you want to switch to DPDK mode, please consider the following:

  • Make sure you are not enabled features that DPDK mode does not support.
  • Memory requirements of DPDK mode are listed below under “Hardware resource requirements”. DPDK requires more memory than the kernel mode.
  • When upgrading an existing instance make sure the instance has sufficient memory to run in DPDK mode.
  • DPDK mode runs the datapath cores in poll mode and therefore it runs the CPU at 100%.
To perform following actions, please ensure that the system is configured to allow bash access, at-least temporarily and the perform the following actions on CloudEOS CLI prompt.

switch#conf t
switch(config)#bash sudo su -
Arista Networks EOS shell
-bash-4.3# cat /mnt/flash/veos-config
# Use 'MODE' to set the forwarding plane for vEOS. If 'MODE' is set multiple times
# the last configuration takes effect.
# 'MODE=linux' runs vEOS with linux forwarding plane
MODE=linux
# 'MODE=sfe' runs vEOS with DPDK forwarding plane
#MODE=sfe
Now, please use a text editor to modify this file by commenting out MODE=linux and un-commenting MODE=sfe. After modification verify the changes and then save the file. The file should look like as shown.

-bash-4.3# cat /mnt/flash/veos-config
# Use 'MODE' to set the forwarding plane for vEOS. If 'MODE' is set multiple times
# the last configuration takes effect.
# 'MODE=linux' runs vEOS with linux forwarding plane
#MODE=linux
# 'MODE=sfe' runs vEOS with DPDK forwarding plane
MODE=sfe
-bash-4.3# exit
logout
switch(config)# reload
After reload,vEOS Router will boot up in DPDK mode.

Upgrading CloudEOS-Kernel to CloudEOS-DPDK

Upgrade an existing CloudEOS installation from kernel mode to DPDK mode, by copying EOS.swi to /mnt/flash , and then follow procedure defined above in Installing CloudEOS-DPDK, to switch to DPDK mode. Please note, this process requires a system reload.

CloudEOS-DPDK Mode Verification

 

To check if CloudEOS is running in DPDK mode, verify if the “sfe” agent is running using the following command.

switch#show agent sfe ping
show agent sfe ping
Agent Name Last Ping Max PingMax Ping Response SeenLast Ping Response Seen
---------------------- ------------- ----------------------------------- ------------------------
Sfe1.571 ms 2209.819 ms 2019-11-15 11:14:05 2019-12-12 15:02:48

A system in DPDK mode uses 100% of CPU cycles for each datapath vCPU. This is normal and expected. To ensure that packet forwarding tasks, which are CPU intensive, do not starve control plane and management operations, EOS dedicates CPU cores for control/management functions.

Linux “top” command followed by typing “1” when “top” is running is used to get detailed CPU utilization. The below output shows “top” results for a CloudEOS with 2 cores. Depending on the version either “Sfe” or “bessd” will show using the 100% of the datapath core.


vEOS-CLI(config)#bash top -n 1
Tasks: 236 total, 1 running, 235 sleeping, 0 stopped, 0 zombie
%Cpu0:1.6 us,0.7 sy,0.0 ni, 95.1 id,0.0 wa,2.6 hi,0.0 si,0.0 st
%Cpu1:100.0 us,0.0 sy,0.0 ni,0.0 id,0.0 wa,0.0 hi,0.0 si,0.0 st
KiB Mem: 8122156 total,4642632 used,3479524 free, 255624 buffers
KiB Swap:0 total,0 used,0 free,1857744 cached
 
PID USERPRNIVIRTRESSHR S%CPU %MEMTIME+COMMAND
 3355 root20 0 2186m 239m 201m S 100.13.039262:38 Sfe [or bessd]
 2544 root20 0375m58m38m S 0.30.7 192:36.08 ProcMgr-worker
 2705 root20 0403m 180m 142m S 0.32.3 135:53.49 Sysdb
 3102 root20 0379m 111m95m S 0.31.4 8:06.90 Ira
 3119 root20 0373m86m70m S 0.31.124:34.26 StpTxRx

CloudEOS vCPU Core Allocation

The CloudEOS-DPDK VM is supported only on a few standard CPU configurations. Such as:

  • 2-core version
  • 4-core version
  • 8-core version

Depending on the CPU core count, a certain number of CPU cores are reserved for control and management operations, and the remaining are reserved for packet forwarding and data-plane operations.

  • On 2-core and 4-core VMs, a single core is reserved for management and control plane and remaining core(s) is/are reserved for packet-forwarding or data-plane operations.
  • On 8-core VMs, two cores are reserved for control/management functions and the remaining are reserved for data-plane functions.

Monitoring Datapath CPU utilization in DPDK Mode

Since DPDK runs in poll mode, it always shows the CPU utilization is 100% utilized. To verify this use sfe platform command and see what percentage of CPU is used for packet processing.

switch#platform sfe bessctl
shType "help" for more information.
localhost:10514 $ show busy
 Worker IDBusy PPS
 0 0 0
 1 0 0
 2 0 0
localhost:10514 $ show busy
 Worker IDBusy PPS
 0 0 0
 1 0 0
 2 0 0

In addition to this, a syslog message is logged if the CPU utilization is over 80% for 60 seconds, all the while doing useful packet processing.


2019-10-28 23:44:11.7769098908 Log 
0 %SFE-4-CORE_SUSTAINED_BUSY: Software Data Plane Forwarding service is experiencing heavy load as it has been80 percent busy over last 60 seconds
2019-10-29 15:54:53.3942848908 Log 
0 %SFE-4-CORE_SUSTAINED_BUSY: Software Data Plane Forwarding service is experiencing heavy load as it has been80 percent busy over last 60 seconds
Note: That these features are for capacity planning and are intended to be used _after_ high CPU alarms are turned off in the hypervisor. Sfe/DPDK runs at 100 % always, and if the alarms are turned off, then this is the way to distinguish between CPU usage due to useful packet processing task and idle spin.

General Troubleshooting

 

To find information about packet forwarding agent, please check the following log file.

vEOS-CLI(config)# bash cat /var/log/agents/bessd.INFO
In case of errors, another log file would be generated by the system and can be accessed by using the following command.
vEOS-CLI(config)# bash cat /var/log/agents/bessd.FATAL

In addition to the aforementioned log file(s), syslog and EOS show-tech are also a valuable source of troubleshooting information.