Upgrading CloudVision Portal (CVP)

Note: While upgrading CVP, refer to the latest release notes available at Arista Software Download page; and upgrade procedures.

Devices under management must:

be running supported EOS version
have supported TerminAttr version installed
have the TerminAttr agent enabled and successfully streaming telemetry to CVP.

The following steps can be taken at any point on an existing cluster as part of preparing for an upgrade to the current version:

Upgrade existing CVP clusters to the latest CVP release
Upgrade all EOS devices under management to the supported release train.
For devices running EOS releases prior to 4.20, ensure that the eAPI unix domain socket is enabled with the following configuration:
```
management api http-commands
 protocol unix-socket
```
Install supported TerminAttr on all EOS devices under management.
Enable state streaming from all EOS devices under management by applying the SYS_StreamingTelemetry configlet and pushing the required configuration to all devices.
Ensure that all devices are successfully streaming to the CVP cluster.
Ensure that all devices are in image and config compliance.
Complete regular backups. Complete a final backup prior to upgrade.
Ensure that all tasks are in a terminal state (Success, Failed, or Canceled).
Ensure that all Change Controls are in a terminal state.
Note: After the cluster is upgraded to the latest CVP release, systems running unsupported TerminAttr versions fail to connect to the CVP cluster. These devices will have to be first upgraded to a supported TerminAttr version by re-onboarding them from the CloudVision UI. You cannot rollback a device to a time before it was running the supported TerminAttr version.

The upgrade from the previous CVP release to the current CVP release trains include data migrations that can take several hours on larger scale systems.

Upgrades

Upgrades do not require that the VMs be redeployed, and do not result in the loss of logs.

The CVP cluster must be functional and running to successfully complete an upgrade. As a precaution against the loss of CVP data, it is recommended that you backup the CVP data before performing an upgrade. To upgrade CVP to the current release, you must first upgrade CVP to the supported release that supports an upgrade to the current release. For more information, refer the CVP release notes at Arista Software Download page.

Note: Centos updates (yum update commands) outside of CVP upgrades are not supported.

Verifying the Health of CVP before Performing Upgrades

Upgrades should only be performed on healthy and fully functional CVP systems. Before performing the upgrade, make sure that you verify that the CVP system is healthy.

Complete the following steps to verify the health of CVP.

Enter into the Linux shell of the primary node as cvp user.
Execute the cvpi status all command on your CVP:

This shows the status of all CVP components.
Confirm that all CVP components are running.
Log into the CVP system to check functionality.
Once you have verified the health of your CVP installation, you can begin the upgrade process.
- Upgrading CloudVision Portal (CVP)

Upgrading from version 2018.1.2 (or later)

Use this procedure to complete the fast upgrade of CVP to the current version of CVP.

Pre-requisites:

Before you begin the upgrade procedure, make sure that you have:

Verified the health of your CVP installation (see Verifying the health of CVP before performing upgrades.
Verified that you are running version 2018.1.2 or later.

Complete the following steps to perform the upgrade.

SSH as root into the primary node.
Run these commands:
1. rm -rf /tmp/upgrade(to remove data from old upgrades if already present)
2. mkdir /data/upgrade
3. ln -s /data/upgrade /tmp/upgrade
4. scp/wget cvp-upgrade-<version>.tgz to the /data/upgrade directory.
Run the su cvpadmin command to trigger the shell.
Select the upgrade option from the shell.
Note: On a multi-node cluster, upgrade can be performed only on the primary node. Upgrading to the current version may take up to 30 minutes.

Note: If an issue occurs during an upgrade, you will be prompted to continue the upgrade once the issue is resolved.

Note: Upgrade to 2021.1.0 and newer requires the configuration of a kubernetes cluster network. You will be prompted during the upgrade to enter the private IP range for the kubernetes cluster network.For this reason, a separate, unused network addressing should be provided when configuring CVP.

Users will see this prompt while running the upgrade:
```
This upgrade requires to configure kubernetes cluster network. 
Please enter private ip range for kubernetes cluster network : 
```
The cvpi env command will show kubernetes cluster related parameters. KUBE_POD_NETWORK and KUBE_SERVICE_NETWORK are the two subnetworks derived from KUBE_CLUSTER_NETWORK.KUBE_CLUSTER_DNS is the second IP address from KUBE_SERVICE_NETWORK.

Note: KUBE_CLUSTER_NETWORK is the kubernetes private IP range and this should not conflict with CVP nodes, device interface IPs, cluster interface IPs, or switch IPs. In addition, do not use link-local or the subnet reserved for loopback purposes or any multicast IP addresses. The subnet length for KUBE_CLUSTER_NETWORK needs to be less than or equal to 20.

CVP Node RMA

Use this procedure to replace any node of a multi-node cluster. Replacing nodes of multi-node cluster involves removing the node you want to replace, waiting for the remaining cluster nodes to recover, powering on the replacement node, and applying the cluster configuration to the new node.

When you replace cluster nodes, you must replace only one node at a time. If you plan to replace more than one node of a cluster, you must complete the entire procedure for each node to be replaced.

When replacing a node the CloudVision VM that comes with the new CVA might not be the same version as the one running on the other nodes. For more information on redeploying with the correct version refer to: https://www.arista.com/en/qsg-cva-200cv-250cv/cva-200cv-250cv-redeploy-cvp-vm-tool

Check that the XML file is similar as on the other appliances. This can be checked using the virsh dumpxml cvp command.

Note: It is recommended that you save the CVP cluster configuration to a temporary file, or write down the configuration on a worksheet. The configuration can be found in /cvpi/cvp-config.yaml.

Power off the node you want to replace (primary, secondary, or tertiary).
Remove the node to be replaced.
Allow all components of the remaining nodes to recover.

The remaining nodes need to be up and settled before continuing to step 4.

Use the cvpi status all command to ensure that remaining nodes are healthy.You will see some services are reported as “NOT RUNNING” due to not all pods for those services being online. This is expected while a node is offline.

[root@node2 ~]# cvpi status all

Executing command. This may take some time...
Completed 227/227 discovered actions

secondary components total:147 running:108 disabled:12 not running:27
tertiarycomponents total:112 running:103 disabled:9
primary NODE DOWN

 
Action Output
-------------

COMPONENTACTIONNODESTATUS ERROR

aaastatussecondary NOT RUNNINGOnly 2/3 pod(s) ready

ambassador statussecondary NOT RUNNINGOnly 2/3 pod(s) ready

apiserverstatussecondary NOT RUNNINGOnly 2/3 pod(s) ready

auditstatussecondary NOT RUNNINGOnly 2/3 pod(s) ready

clickhouse statussecondary NOT RUNNINGOnly 2/3 pod(s) ready

cloudmanager statussecondary NOT RUNNINGOnly 2/3 pod(s) ready

corednsstatussecondary NOT RUNNINGOnly 2/3 pod(s) ready

device-interaction statussecondary NOT RUNNINGOnly 2/3 pod(s) ready

elasticsearch-recorder statussecondary NOT RUNNINGOnly 2/3 pod(s) ready

elasticsearch-server statussecondary NOT RUNNINGOnly 2/3 pod(s) ready

enroll statussecondary NOT RUNNINGOnly 2/3 pod(s) ready

flannelstatussecondary NOT RUNNINGOnly 2/3 pod(s) ready

ingest statussecondary NOT RUNNINGOnly 2/3 pod(s) ready

inventorystatussecondary NOT RUNNINGOnly 2/3 pod(s) ready 

kafkastatussecondary NOT RUNNINGOnly 2/3 pod(s) ready

labelstatussecondary NOT RUNNINGOnly 2/3 pod(s) ready

local-provider statussecondary NOT RUNNINGOnly 2/3 pod(s) ready

nginx-appstatussecondary NOT RUNNINGOnly 2/3 pod(s) ready

prometheus-node-exporter statussecondary NOT RUNNINGOnly 2/3 pod(s) ready

prometheus-serverstatussecondary NOT RUNNINGOnly 0/1 pod(s) ready

radius-providerstatussecondary NOT RUNNINGOnly 2/3 pod(s) ready

script-executorstatussecondary NOT RUNNINGOnly 2/3 pod(s) ready

script-executor-v2 statussecondary NOT RUNNINGOnly 2/3 pod(s) ready

service-clover statussecondary NOT RUNNINGOnly 2/3 pod(s) ready

snapshot statussecondary NOT RUNNINGOnly 2/3 pod(s) ready

tacacs-providerstatussecondary NOT RUNNINGOnly 2/3 pod(s) ready

task statussecondary NOT RUNNINGOnly 2/3 pod(s) ready

Power on the replacement node.
Log in as cvpadmin.

Enter the cvp cluster configuration.

CentOS Linux 7 (Core)
Kernel 3.10.0-957.1.3.el7.x86_64 on an x86_64

localhost login: cvpadmin
Last login: Fri Mar 15 12:24:45 on ttyS0
Changing password for user root.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
Enter a command
[q]uit [p]rint [s]inglenode [m]ultinode [r]eplace [u]pgrade
>r
Please enter minimum configuration to connect to the other peers
*Ethernet interface for the cluster network: eth0
*IP address of eth0: 172.31.0.216
*Netmask of eth0: 255.255.0.0
*Default route: 172.31.0.1
*IP address of one of the two active cluster nodes: 172.31.0.161
 Root password of 172.31.0.161:

Wait for the RMA process to complete. No action is required.

Root password of 172.31.0.161: 
External interfaces, ['eth1'], are discovered under /etc/sysconfig/network-scripts
These interfaces are not managed by CVP.
Please ensure that the configurations for these interfaces are correct.
Otherwise, actions from the CVP shell may fail.
Running : /bin/sudo /sbin/service network restart
[334.001886] vmxnet3 0000:0b:00.0 eth0: intr type 3, mode 0, 9 vectors allocated
[334.004577] vmxnet3 0000:0b:00.0 eth0: NIC Link is Up 10000 Mbps
[334.006315] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[334.267535] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[348.252323] vmxnet3 0000:13:00.0 eth1: intr type 3, mode 0, 9 vectors allocated
[348.254925] vmxnet3 0000:13:00.0 eth1: NIC Link is Up 10000 Mbps
[348.256504] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[348.258035] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
Fetching version information
Run cmd: sudo -u cvp -- ssh 172.31.0.156 cat /cvpi/property/version.txt 0.18
Fetching version information
Run cmd: sudo -u cvp -- ssh 172.31.0.216 cat /cvpi/property/version.txt 10.19
Fetching version information
Run cmd: sudo -u cvp -- ssh 172.31.0.161 cat /cvpi/property/version.txt 0.16
Running : cvpConfig.py tool...
[392.941983] vmxnet3 0000:0b:00.0 eth0: intr type 3, mode 0, 9 vectors allocated
[392.944739] vmxnet3 0000:0b:00.0 eth0: NIC Link is Up 10000 Mbps
[392.946388] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[393.169460] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[407.229180] vmxnet3 0000:13:00.0 eth1: intr type 3, mode 0, 9 vectors allocated
[407.232306] vmxnet3 0000:13:00.0 eth1: NIC Link is Up 10000 Mbps
[407.233940] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[407.235728] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[408.447642] Ebtables v2.0 unregistered
[408.935626] ip_tables: (C) 2000-2006 Netfilter Core Team
[408.956578] ip6_tables: (C) 2000-2006 Netfilter Core Team
[408.982927] Ebtables v2.0 registered
[409.029603] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
Stopping: ntpd
Running : /bin/sudo /sbin/service ntpd stop
Running : /bin/sudo /bin/systemctl is-active ntpd
Starting: ntpd
Running : /bin/sudo /bin/systemctl start ntpd.service
Waiting for all components to start. This may take few minutes.
Run cmd: su - cvp -c '/cvpi/bin/cvpi -v=3 status zookeeper' 0.45
Run cmd: su - cvp -c '/cvpi/bin/cvpi -v=3 status zookeeper' 0.33
Checking if third party applications exist
Run cmd: su - cvp -c '/cvpi/zookeeper/bin/zkCli.sh ls /apps | tail -1' 0.72
Running : cvpConfig.py tool...
Stopping: cvpi-check
Running : /bin/sudo /sbin/service cvpi-check stop
Running : /bin/sudo /bin/systemctl is-active cvpi-check
Starting: cvpi-check
Running : /bin/sudo /bin/systemctl start cvpi-check.service

Continue waiting for the RMA process to complete. No action is required.

[Fri Mar 15 20:26:28 UTC 2019] :
Executing command. This may take some time...

(E) => Enabled
(D) => Disabled
(?) => Zookeeper Down

Action Output
-------------
COMPONENT ACTIONNODESTATUSERROR
hadoopcluster tertiary(E) DONE
hbase cluster tertiary(E) DONE
Executing command. This may take some time...

(E) => Enabled
(D) => Disabled
(?) => Zookeeper Down

Action Output
-------------
COMPONENT ACTIONNODESTATUSERROR
aerisdiskmonitorconfigprimary (E) DONE
aerisdiskmonitorconfigsecondary (E) DONE
aerisdiskmonitorconfigtertiary(E) DONE
apiserver configprimary (E) DONE
apiserver configsecondary (E) DONE
apiserver configtertiary(E) DONE
cvp-backend configprimary (E) DONE
cvp-backend configsecondary (E) DONE
cvp-backend configtertiary(E) DONE
cvp-frontendconfigprimary (E) DONE
cvp-frontendconfigsecondary (E) DONE
cvp-frontendconfigtertiary(E) DONE
geigerconfigprimary (E) DONE
geigerconfigsecondary (E) DONE
geigerconfigtertiary(E) DONE
hadoopconfigprimary (E) DONE
hadoopconfigsecondary (E) DONE
hadoopconfigtertiary(E) DONE
hbase configprimary (E) DONE
hbase configsecondary (E) DONE
hbase configtertiary(E) DONE
kafka configprimary (E) DONE
kafka configsecondary (E) DONE
kafka configtertiary(E) DONE
zookeeper configprimary (E) DONE
zookeeper configsecondary (E) DONE
zookeeper configtertiary(E) DONE
Executing command. This may take some time...
secondary 89/89 components running
primary 78/78 components running
Executing command. This may take some time...
COMPONENT ACTIONNODESTATUSERROR
Including: /cvpi/tls/certs/cvp.crt
Including: /cvpi/tls/certs/cvp.key
Including: /etc/cvpi/cvpi.key
Including: /cvpi/tls/certs/kube-cert.pem
Including: /data/journalnode/mycluster/current/VERSION
Including: /data/journalnode/mycluster/current/last-writer-epoch
Including: /data/journalnode/mycluster/current/last-promised-epoch
Including: /data/journalnode/mycluster/current/paxos
Including: /cvpi/tls/certs/ca.crt
Including: /cvpi/tls/certs/ca.key
Including: /cvpi/tls/certs/server.crt
Including: /cvpi/tls/certs/server.key
mkdir -p /cvpi/tls/certs
mkdir -p /data/journalnode/mycluster/current
mkdir -p /cvpi/tls/certs
mkdir -p /etc/cvpi
mkdir -p /cvpi/tls/certs
mkdir -p /cvpi/tls/certs
mkdir -p /cvpi/tls/certs
mkdir -p /data/journalnode/mycluster/current
mkdir -p /cvpi/tls/certs
mkdir -p /data/journalnode/mycluster/current
mkdir -p /data/journalnode/mycluster/current
mkdir -p /cvpi/tls/certs
Copying: /etc/cvpi/cvpi.key from secondary
rsync -rtvp 172.31.0.161:/etc/cvpi/cvpi.key /etc/cvpi
Copying: /cvpi/tls/certs/cvp.crt from secondary
rsync -rtvp 172.31.0.161:/cvpi/tls/certs/cvp.crt /cvpi/tls/certs
Copying: /cvpi/tls/certs/server.key from secondary
rsync -rtvp 172.31.0.161:/cvpi/tls/certs/server.key /cvpi/tls/certs
Copying: /cvpi/tls/certs/ca.crt from secondary
rsync -rtvp 172.31.0.161:/cvpi/tls/certs/ca.crt /cvpi/tls/certs
Copying: /cvpi/tls/certs/cvp.key from secondary
rsync -rtvp 172.31.0.161:/cvpi/tls/certs/cvp.key /cvpi/tls/certs
Copying: /cvpi/tls/certs/ca.key from secondary
rsync -rtvp 172.31.0.161:/cvpi/tls/certs/ca.key /cvpi/tls/certs
Copying: /data/journalnode/mycluster/current/last-writer-epoch from secondary
rsync -rtvp 172.31.0.161:/data/journalnode/mycluster/current/last-writer-epoch /data/journalnode/mycluster/current
Copying: /cvpi/tls/certs/kube-cert.pem from secondary
Copying: /cvpi/tls/certs/server.crt from secondary
rsync -rtvp 172.31.0.161:/cvpi/tls/certs/server.crt /cvpi/tls/certs
Copying: /data/journalnode/mycluster/current/VERSION from secondary
rsync -rtvp 172.31.0.161:/data/journalnode/mycluster/current/VERSION /data/journalnode/mycluster/current
Copying: /data/journalnode/mycluster/current/paxos from secondary
rsync -rtvp 172.31.0.161:/data/journalnode/mycluster/current/paxos /data/journalnode/mycluster/current
Copying: /data/journalnode/mycluster/current/last-promised-epoch from secondary
rsync -rtvp 172.31.0.161:/data/journalnode/mycluster/current/last-promised-epoch /data/journalnode/mycluster/current
rsync -rtvp 172.31.0.161:/cvpi/tls/certs/kube-cert.pem /cvpi/tls/certs
Starting: cvpi-config
Running : /bin/sudo /bin/systemctl start cvpi-config.service
Starting: cvpi
Running : /bin/sudo /bin/systemctl start cvpi.service
Running : /bin/sudo /bin/systemctl start cvpi-watchdog.timer
Running : /bin/sudo /bin/systemctl enable docker
Running : /bin/sudo /bin/systemctl start docker
Running : /bin/sudo /bin/systemctl enable kube-cluster.path

Enter "q" to quit the process after the RMA process is complete! message is displayed.

Waiting for all components to start. This may take few minutes.
[560.918749] FS-Cache: Loaded
[560.978183] FS-Cache: Netfs 'nfs' registered for caching
Run cmd: su - cvp -c '/cvpi/bin/cvpi status all --cluster' 48.20
Run cmd: su - cvp -c '/cvpi/bin/cvpi status all --cluster' 2.73
Run cmd: su - cvp -c '/cvpi/bin/cvpi status all --cluster' 7.77
Run cmd: su - cvp -c '/cvpi/bin/cvpi status all --cluster' 2.55
Run cmd: su - cvp -c '/cvpi/bin/cvpi status all --cluster' 2.23
Run cmd: su - cvp -c '/cvpi/bin/cvpi status all --cluster' 2.64
Run cmd: su - cvp -c '/cvpi/bin/cvpi status all --cluster' 2.59
Run cmd: su - cvp -c '/cvpi/bin/cvpi status all --cluster' 2.07
Run cmd: su - cvp -c '/cvpi/bin/cvpi status all --cluster' 2.70
Run cmd: su - cvp -c '/cvpi/bin/cvpi status all --cluster' 2.51
Run cmd: su - cvp -c '/cvpi/bin/cvpi status all --cluster' 2.57
Run cmd: su - cvp -c '/cvpi/bin/cvpi status all --cluster' 2.40
Run cmd: su - cvp -c '/cvpi/bin/cvpi status all --cluster' 2.24
Waiting for all components to start. This may take few minutes.
Run cmd: su - cvp -c '/cvpi/bin/cvpi -v=3 status all' 9.68
RMA process is complete!
[q]uit [p]rint [e]dit [v]erify [s]ave [a]pply [h]elp ve[r]bose
>q

Use the cvpi status all command to ensure that the cluster is healthy.
```
[cvp@cvp87 ~]$ cvpi status all


Executing command. This may take some time...
Completed 215/215 discovered actions
primary 	components total:112 running:104 disabled:8
secondary 	components total:122 running:114 disabled:8
tertiary 	components total:97 running:91 disabled:6
```
When a node is RMA'd, the other nodes will replicate their state via HDFS to the new node. We can track this in real time by issuing the following command:
```
watch -n 30 "hdfs dfsadmin -report | grep 'Under replicated'"
```
Once the count of "Under replicated" blocks hits 0, data synchronization to the new node is complete.

The disk usage on the new node will also grow as the blocks are replicated and the RMA'd node will have a similar disk space utilization as the other nodes once the operation has finished successfully.

CVP / EOS Dependencies

To ensure that CVP can provide a base level of management, all EOS devices must be running at least EOS versions 4.17.3F or later. To ensure device compatibility supported EOS version advice should be sought from the Arista account team.

CVP should not require any additional EOS upgrades to support the standard features and functions in later versions of the appliance. Newer features and enhancements to CVP may not be available for devices on older code versions.

Refer to the latest Release Notes for additional upgrade/downgrade guidance.

Related topics:

Upgrade CV-CUE As Part of a CV Upgrade

In case of a CV upgrade, services go through the following steps:

Services or service containers (such as CV-CUE) are stopped.
Existing container images are deleted.
New component RPMs are installed.
The server is rebooted and all services are started again.
A service on CV is upgraded only if its version is different from the pre-upgrade version (CV stores its pre-upgrade state to decide this). The wifimanager component follows a similar process. When CV boots up after an upgrade, wifimanager starts and upgrades only if the CV upgrade has resulted in a new wifimanager version. The following actions precede every wifimanager start operation:
1. load: Loads the wifimanager container image into docker when CV boots up for the first time after an upgrade.
2. init: Initializes wifimanager before the start. The wifimanager init is versioned init-8.8.0-01, for example. The init-<version> handler initiates a wifimanager upgrade if needed. Thus, if the wifimanager version has not changed after the CV upgrade, the wifimanager upgrade is not invoked. If the wifimanager version has changed, then a wifimanager upgrade is called before its start.
Note: Load and init are internal actions to the wifimanager start operation; they are not run separately. The CV-CUE service might take longer to start than other CV services.

CloudVision Configuration Guide