Leaf Smart System Upgrade (Leaf SSU)

The Smart System Upgrade (SSU) process includes the core functionality of Accelerated Software Upgrade, plus additional optimizations that permit a hitless restart of several features. SSU leverages protocols capable of graceful restart to minimize traffic loss during upgrade. For protocols not capable of graceful restart, SSU generates control plane messages and buffers them in hardware to be slowly released when the control plane is offline. Additionally, under SSU, the forwarding ASIC does not get reset and ports do not flap.

Features capable of hitless restart under SSU include:

  • QinQ
  • 802.3ad Link Aggregation/LACP
  • 802.3x flow control
  • BGP (BGP graceful restart must be enabled: see Configuring BGP)
  • MP-BGP (BGP graceful restart must be enabled: see Configuring BGP)
  • 128-way Equal Cost Multipath Routing (ECMP)
  • VRF
  • route maps
  • L2 MTU
  • QoS
Note: SSU is not compatible with VRRP. If VRRP is configured on the switch, another upgrade method must be used.

Upgrading the eos image with Smart System Upgrade

Using SSU to upgrade the active eos image is a five-step process:

  1. Prepare switch for upgrade ( Prepare the Switch for SSU).
  2. Transfer image file to the switch (Transfer the Image File for SSU). (Not required if desired file is on the switch).
  3. Modify boot-config file to point to the desired image file (Modify boot-config).
  4. Start the SSU process (Start the SSU Process).
  5. Verify that the upgrade was successful (Verify Success of the Upgrade).

Prepare the Switch for SSU

Backing Up Critical Software

Before upgrading the eos image, ensure thatcopies of the currently running eos version and the running-config file are available in case of corruption during the upgrade process. To copy the running-config file, use the copy running-config command. In this example, running-config is copied to a file in the flash drive on the switch.

switch#copy running-config flash:/cfg_06162014
Copy completed successfully.
switch#

Making Room on the Flash Drive

Determine the size of the new eos image. Then verify that there is enough space available on the flash drive for two copies of this image, plus a recommended 240MB (if available) for diagnostic information in case of a fatal error. Use the dir command to check the “bytes free” figure.

switch#dir flash:
Directory of flash:/
-rwx 293168526Nov 422:17 eos4.11.0.swi
-rwx36Nov 810:24 boot-config
-rwx 37339Jun 16 14:18 cfg_06162014

606638080 bytes total (602841088 bytes free)

Verifying Connectivity

Ensure that the switch has a management interface configured with an IP addresses and default gateway. Refer the sections, Assigning a Virtual IP Address to Access the Active Ethernet Management Port and Configuring a Default Route to the Gateway (see Assigning a Virtual IP Address to Access the Active Ethernet Management Port and Configuring a Default Route to the Gateway), and confirm that it can be reached through the network by using the command and pinging the default gateway.

switch#show interfaces status
PortName Status Vlan Duplex SpeedType
Et3/1notconnect 1 autoauto 1000BASE-T

<-------OUTPUT OMITTED FROM EXAMPLE-------->
Ma1/1connected routed unconf unconfUnknown 

switch#ping 1.1.1.10
PING 172.22.26.1 (172.22.26.1) 72(100) bytes of data.
80 bytes from 1.1.1.10: icmp_seq=1 ttl=64 time=0.180 ms
80 bytes from 1.1.1.10: icmp_seq=2 ttl=64 time=0.076 ms
80 bytes from 1.1.1.10: icmp_seq=3 ttl=64 time=0.084 ms
80 bytes from 1.1.1.10: icmp_seq=4 ttl=64 time=0.073 ms
80 bytes from 1.1.1.10: icmp_seq=5 ttl=64 time=0.071 ms

Verifying Configuration

Verify that the switch configuration is valid for SSU by using the show reload hitless command. If parts of the configuration are blocking execution of SSU, an error message will be displayed explaining what they are. For SSU to proceed, the configuration conflicts must be corrected before issuing the reload hitless command.

switch#show reload hitless
switch#'reload hitless' cannot proceed due to the following:
Spanning-tree portfast is not enabled for one or more ports
Spanning-tree BPDU guard is not enabled for one or more ports
switch#

Configuring BGP

For hitless restart of BGP and MP-BGP, BGP graceful restart must first be enabled using the graceful-restart command. The default restart time value (300 seconds) is appropriate for mostconfigurations.

The BGP configuration mode in which the graceful-restart command is issued determines which BGP connections will restart gracefully.

  • For all BGP connections, use the graceful-restart command in BGP configuration mode:
    switch#config
    switch(config)#router bgp 64496
    switch(config-router-bgp)#graceful-restart
    switch(config-router-bgp)#
  • For all BGP connections in a specific VRF, use the graceful-restart command in BGP VRF configuration mode:
    switch#config
    switch(config)#router bgp 64496
    switch(config-router-bgp)#vrf purple
    switch(config-router-bgp-vrf-purple)#graceful-restart
    switch(config-router-bgp-vrf-purple)#exit
    switch(config-router-bgp)#
  • For all BGP connections in a specific BGP address family, use the graceful-restartcommand in BGP address-family configuration mode:
    switch#config
    switch(config)#router bgp 64496
    switch(config-router-bgp)#address-family ipv6
    switch(config-router-bgp-af)#graceful-restart
    switch(config-router-bgp-af)#exit
    switch(config-router-bgp)#

BGP graceful restart can also be configured for a specific interface.

Transfer the Image File for SSU

The target image must be copied to the file system on the switch, typically onto the flash drive. After verifying that there is space for two copies of the image plus an optional 240MB for diagnostic information, use the copy command to copy the image to the flash drive, then confirm that the new image file has been correctly transferred.

These command examples transfer an image file to the flash drive from various locations.

USB Memory

command

copy usb1:/sourcefile flash:/destfile

Example

Sch#copy usb1:/eos-4.14.4.swi flash:/eos-4.14.4.swi

FTP Server

command

copy ftp:/ftp-source/sourcefile flash:/destfile

Example

sch#copy ftp:/user:password@10.0.0.3/eos-4.14.4.swi flash:/eos-4.14.4.swi

SCP

command

copy scp://scp-source/sourcefile flash:/destfile

Example

sch#copy scp://user:password@10.1.1.8/user/eos-4.14.4.swi flash:/eos-4.14.4.swi

HTTP

command

copy http://http-source/sourcefile flash:/destfile

Example

sch#copy http://10.0.0.10/eos-4.14.4.swi flash:/eos-4.14.4.swi

Once the file has been transferred, verify that it is present in the directory, then confirm the MD5 checksum using the verify command. The MD5 checksum is available from the eos download page of the Arista website.

switch#dir flash:
Directory of flash:/
-rwx 293168526 Nov 4 22:17 eos4.14.2.swi
-rwx36 Nov 8 10:24 boot-config
-rwx 37339 Jun 1614:18 cfg_06162014
-rwx 394559902 May 3002:57 eos4.13.1.swi

606638080 bytes total (208281186 bytes free)
switch#53#verify /md5 flash:eos-4.14.4.swi 
verify /md5 (flash:eos-4.14.4.swi) =c277a965d0ed48534de6647b12a86991 

Modify boot-config

After transferring and confirming the desired image file, use the boot system command to update the boot-config file to point to the new eos image.

This command changes the boot-config file to point to the image file located in flash memory at eos-4.14.4.swi.

switch#configure terminal
switch(config)#boot system flash:/eos-4.14.4.swi

Use the show boot-config command to verify that the boot-config file is correct:

switch(config)#show boot-config
Software image: flash:/eos-4.14.4.swi
Console speed: (not set)
Aboot password (encrypted): $1$ap1QMbmz$DTqsFYeauuMSa7/Qxbi2l1

Save the configuration to the startup-config file with the write command.

switch#write

Start the SSU Process

After updating the boot-config file, verify that your configuration supports SSU (if you have not already done so) by using the show reload hitless command. If parts of the configuration are blocking execution of SSU, an error message will be displayed explaining what they are.

switch#show reload hitless
switch#'reload hitless' cannot proceed due to the following:
Spanning-tree portfast is not enabled for one or more ports
Spanning-tree BPDU guard is not enabled for one or more ports

Then start the SSU process using the reload hitless command to reload the switch and activate the new image. The CLI will identify any changes that must be made to the configuration before starting SSU, prompt to save any modifications to the system configuration, and request confirmation before reloading.

switch#reload hitless
System configuration has been modified. Save? [yes/no/cancel/diff]:y
Copy completed successfully.
Proceed with reload? [confirm]y

Verify Success of the Upgrade

Before making any configuration changes to the switch after reload, verify that the SSU process is complete using the command show boot stages log. If the process is complete, the last message should be Asu Hitless boot stages complete.

switch#show boot stages log
Timestamp Delta Begin Msg
2015-03-28 15:18:30 000.000000 Asu Hitless boot stages started
2015-03-28 15:18:30 000.069732 stage CriticalAgent started
2015-03-28 15:18:30 000.069811 event CriticalAgent:SuperServer completed

2015-03-28 15:20:20 110.224504 stage BootSanityCheck is complete
2015-03-28 15:20:20 110.225439 Asu Hitless boot stages complete
switch#

Completion of the SSU process may also be verified by checking the syslog for the following message:

LAUNCHER-6-BOOT_STATUS: 'reload hitless' reconciliation complete

To verify whether the SSU upgrade was successful, use the show reload cause command. If a fatal error occurred during the upgrade process, the switch will have completely rebooted and the fatal error will be displayed along with the directory in which diagnostic information can be found. If the SSU upgrade succeeded, it will read Hitless reload requested by the user.

Fatal Error Display

switch#show reload cause
Reload Cause 1:
-------------------
Reload requested by the user.

Reload Time:
------------
Reload occurred at Sat Feb 28 02:34:26 2015 PST.

Recommended Action:
-------------------
No action necessary.

Debugging Information:
----------------------
None available.

Reload Cause 2:
-------------------
Fatal error during 'reload hitless'. (stageMgr - LinkStatusUpdate timed out)

Reload Time:
------------
Reload occurred at Sat Feb 28 02:33:54 2015 PST.

Recommended Action:
-------------------
A fatal error occurred during hitless reload.
If the problem persists, contact your customer support representative.

Debugging Information:
----------------------
/mnt/flash/persist/fatalError-2015-02-28_023355
switch#

Successful Upgrade Display

switch#show reload cause
Reload Cause 1:
-------------------
Hitless reload requested by the user.

Reload Time:
------------
Reload occurred at Wed Mar 25 14:49:04 2015 PDT.

Recommended Action:
-------------------
No action necessary.

Debugging Information:
----------------------
None available.
switch#

The show version command will confirm whether the correct image is loaded. The Software image version line displays the version of the active image file.

switch#show version
Arista DCS-7050QX-32-F
Hardware version:02.00
Serial number: JPE14071098
System MAC address:001c.7355.556f

Software image version: 4.14.5F-2353054.eos4145F
Architecture: i386
Internal build version: 4.14.5F-2353054.eos4145F
Internal build ID:e8748ea7-916d-4217-878f-4bfe2adc7122

Uptime: 4 minutes
Total memory: 3981328 kB
Free memory:1342408 kB

switch#
Note: If a fatal error occurs during the SSU process, the new eos image will still be loaded and booted.