Print

Appendix

Operator-Level Orchestrator Alerts and Events

Discusses a summary of alerts and events generated within the Arista Edge Cloud Orchestrator at the Operator level.

The document provides details about all Operator-level Orchestrator events. Although these events are stored within the Orchestrator and displayed on the Orchestrator UI, most of them are generated by either an Gateway and/or one of its running components (MGD, PROCMON, and so on) with the exception of a few which are generated by the Orchestrator itself. You can configure notifications/alerts for events in Orchestrator only.

The following table provides an explanation for each of the columns in the Operator-level Orchestrator Events table:

Table 1. Operator-level Orchestrator Events
Column name Details
EVENT Unique name of the event
DISPLAYED ON ORCHESTRATOR UI AS Specifies how the event is displayed on the Orchestrator.
SEVERITY The severity with which this event is usually generated.
GENERATED BY The VeloCloud SD-WAN component generating the notification can be one of the following:
  • Orchestrator
  • Gateway
GENERATED WHEN Technical reason(s) and circumstances under which this event is generated.
RELEASE ADDED IN The release this event was first added. If not specified, this event existed prior to release 2.5.
DEPRECATED Specifies if the event is deprecated from a specific release.

Operator-level Orchestrator Events

 
EVENT DISPLAYED ON ORCHESTRATOR UI AS SEVERITY GENERATED BY GENERATED WHEN RELEASE ADDED IN DEPREC.
GATEWAY_UP Gateway up INFO Orchestrator A Gateway restores after losing connectivity with the Orchestrator.    
GATEWAY_DOWN Gateway down INFO Orchestrator A Gateway fails to communicate after losing connectivity with the Orchestrator.    
GATEWAY_LARGE_PKT_SIZE Packet size limit exceeded INFO Gateway The packet size limit incoming from a Gateway's peer exceeded.    
GATEWAY_SERVICE_FAILED Gateway service failed ERROR Gateway The GWD service on the Gateway fails.    
GATEWAY_BFD _NEIGHBOR_UP BFD session established to Gateway neighbor INFO Gateway A Gateway BFD neighbor comes back up    
GATEWAY_BFD _NEIGHBOR_DOWN Gateway BFD neighbor unavailable INFO Gateway A Gateway BFD neighbor comes back down    
GATEWAY_ICMP _PROBE_UNSTABLE Gateway: ICMP probe unstable ALERT Gateway The ICMP probe goes down on Partner Gateway.    
GATEWAY_REBALANCE Gateways rebalanced INFO Orchestrator      
PROXY_ENABLE_OPERATOR _ACCESS Partner access delegated to operator INFO Orchestrator      
PROXY_DISABLE_OPERATOR _ACCESS Partner access revoked to operator INFO Orchestrator      
VRF_ROUTEMAP_RULE _MAX_LIMIT_HIT VRF route map rules limit exceeded WARNING Gateway The VRF Inbound/Outbound route map maximum limit exceeded (32).    
VRF_LIMIT_EXCEEDED VRF limit exceeded ALERT Gateway The VRF entries configured exceeded maximum limit (1000).    
ENABLE_EXTERNAL_CA External CA Enabled CRITICAL Orchestrator The ca.external.enable property is set to true. 4.3.0  
DISABLE_EXTERNAL_CA External CA Disabled CRITICAL Orchestrator The ca.external.enable property is set to false. 4.3.0  
INSERT_EXTERNAL_CA External CA Inserted CRITICAL Orchestrator External CA is inserted into the VELOCLOUD _CERTIFICATE _AUTHORITY table and becomes a trusted issuer. 4.3.0  
CREATE_COMPOSITE_ROLE Composite Role Created INFO Orchestrator A composite role is created by an Enterprise, Partner, or Operator. 4.5.0  
UPDATE_COMPOSITE _ROLE Composite Role Updated INFO Orchestrator A composite role is updated by an Enterprise, Partner, or Operator. 4.5.0  
DELETE_COMPOSITE_ROLE Composite Role Deleted INFO Orchestrator A composite role is deleted by an Enterprise, Partner, or Operator. 4.5.0  
CA_VALIDATION CA validation failure ALERT Orchestrator The CA certificate attributes are rejected. 5.0.0  
EI_ACTIVATION_CONFIG_SENT EI activation config sent INFO Orchestrator The activation config has been successfully sent to the EI server. 5.0.0  
Auto_Rate_Limit_Enabled Auto Rate-Limit Enabled WARNING Gateway The auto rate-limit capability is activated on Gateways if the Gateway detects that certain Edges are sending large amount of traffic which might be causing the Gateway to be unstable and drop packets. The event message includes the information about the list of Edges (Enterprise, Rate Limit Percentage) on which the auto rate-limit is activated. 5.2.0  
Auto_Rate_Limit_Disabled Auto Rate-Limit Disabled WARNING Gateway Gateway auto rate-limit condition is restored. 5.2.0  
SELF_HEALING_REPORT anomaly_type: <Remote Route inconsistency>, num_routes_recovered: <number>, shr state: <DONE> ALERT Gateway Generated when routes are detected as missing from a customer enterprise connected to a Gateway, and the Gateway corrects this issue by using the Self-Healing Routing feature to recover the missing routes. 5.2.0  
POLL_IDPS_SIGNATURE_FAIL Failure in poll job that queries and downloads signature file from GSM ERROR Orchestrator When Orchestrator backend poll job has failed to retrieve or download suricata signature from GSM and update profiles with the new signature metadata. 5.2.0  
IDPS_SIGNATURE_VCO_VERSION _CHECK_FAIL Querying existing signature version from local DB failed ERROR Orchestrator When Orchestrator backend poll job has failed to retrieve existing suricata signature version from Orchestrator's local database. 5.2.0  
IDPS_SIGNATURE_GSM_VERSION _CHECK_FAIL Querying signature metadata from GSM failed ERROR Orchestrator When Orchestrator backend poll job has failed to retrieve existing suricata signature metadata (that includes signature version) from GSM. 5.2.0  
IDPS_SIGNATURE_SKIP _DOWNLOAD_NO_UPDATE Skipping signature download due to no change in signature version INFO Orchestrator When Orchestrator backend poll job skips downloading suricata signature file due to no change in suricata signature file version. 5.2.0  
IDPS_SIGNATURE_STORE _FAILURE_NO_PATH Filestore path not set to store signature file ERROR Orchestrator When Orchestrator backend poll job fails to store suricata signature file due to filestore path not being set. 5.2.0  
IDPS_SIGNATURE _DOWNLOAD_SUCCESS Successfully downloaded signature file from GSM INFO Orchestrator When Orchestrator backend poll job successfully downloads suricata signature file from GSM. 5.2.0  
IDPS_SIGNATURE _DOWNLOAD_FAILURE Failed to download signature file from GSM ERROR Orchestrator When Orchestrator backend poll job fails to download suricata signature file from GSM. 5.2.0  
IDPS_SIGNATURE _STORE_SUCCESS Successfully stored the signature file in filestore INFO Orchestrator When Orchestrator backend poll job successfully stores the suricata signature file in local file store. 5.2.0  
IDPS_SIGNATURE_STORE _SIGNATURE_FAILURE Failed to store the signature file in filestore ERROR Orchestrator When Orchestrator backend poll job fails to store the suricata signature file in local file store. 5.2.0  
IDPS_SIGNATURE_METADATA _INSERT_SUCCESS Successfully added metadata of the signature file to local DB INFO Orchestrator When Orchestrator backend poll job successfully adds metadata of the suricata signature file to local DB. 5.2.0  
IDPS_SIGNATURE_METADATA _INSERT_FAILURE Failure to add metadata of the signature file to local DB ERROR Orchestrator When Orchestrator backend poll job fails to add metadata of the suricata signature file to local DB. 5.2.0  
POLL_URL_CATEGORIES _FAIL POLL_URL_CATEGORIES _FAIL ERROR Orchestrator Generated when Orchestrator URL categories poll job fails. 6.0.0  
URL_CATEGORIES_STORE _SUCCESS URL_CATEGORIES_STORE _SUCCESS INFO Orchestrator Generated when Orchestrator URL categories are stored successfully. 6.0.0  
URL_CATEGORIES_STORE_FAILURE URL_CATEGORIES_STORE _FAILURE ERROR Orchestrator Generated when Orchestrator URL categories storage job fails. 6.0.0  
VCO_ENTERPRISE_NTICS _LICENSE_REQUEST_FAILED VCO_ENTERPRISE_NTICS _LICENSE_REQUEST_FAILED ERROR Orchestrator Generated when Orchestrator Enterprise NTICS license request fails. 6.0.0  
VCO_ENTERPRISE_NTICS _LICENSE_REQUEST_SUCCEEDED NTICS License request succeeded INFO Orchestrator Generated when Orchestrator Enterprise NTICS license request succeeds. 6.0.0  

Gateway Capacity Events

Currently, the Gateway assignment is based on Geo-proximity and doesn't take Gateway capacity health metrics into account. To improve the Edge-to-Gateway assignment the capacity health metrics (Edge Count, Tunnel Count, PKI Activated Tunnel Count, Flow count, NAT Count, Packet Queue Watermark, and Packet Drops) are monitored periodically based on warning and critical thresholds. When any of the metrics count is above the defined warning and critical thresholds, Gateway capacity events are generated and reported to the Orchestrator. These events provide the Operator and Partners a clear visibility about the Gateway health for making intelligent and correct Gateway assignments.

The following are the Gateway capacity events generated based on the capacity threshold limits.

Table 2. Gateway Capacity Events
Metric Trigger Event Severity Message Event Detail
Edge Count Above Warning Threshold. The Warning threshold value is 90% of following Supported values:
  • 4 CPU, 32G MEM - 2000
  • 8 CPU, 32G MEM - 4000
GATEWAY_DEGRADED NOTICE Over capacity alert due to high number of connected Edges as Gateway has crossed warning threshold.
  • 4 CPU: The number of connected Edges is above the warning threshold (1800).
  • 8 CPU: The number of connected Edges is above the warning threshold (3600).
Edge Count Above Critical Threshold. The Critical threshold value is 95% of following Supported values:
  • 4 CPU, 32G MEM - 2000
  • 8 CPU, 32G MEM - 4000
GATEWAY_CRITICAL NOTICE Over capacity alert due to high number of connected Edges as Gateway has crossed critical threshold.
  • 4 CPU: The number of connected Edges is above the critical threshold (1900).
  • 8 CPU: The number of connected Edges is above the critical threshold (3800).
Edge Count Below Warning Threshold GATEWAY_STABLE INFO Over capacity condition due to high number of connected Edges restored. The number of connected Edges is within the acceptable threshold.
Tunnel Count Above Warning Threshold. The Warning threshold value is 90% of following Supported values:
  • 4 CPU, 32G MEM - 3000
  • 8 CPU, 32G MEM - 6000
GATEWAY_DEGRADED NOTICE Over capacity alert due to high number of tunnels.
  • 4 CPU: The number of tunnels is above the warning threshold (2700).
  • 8 CPU: The number of tunnels is above the warning threshold (5400).
Tunnel Count Above Critical Threshold. The Critical threshold value is 95% of following Supported values:
  • 4 CPU, 32G MEM- 3000
  • 8 CPU, 32G MEM- 6000
GATEWAY_CRITICAL NOTICE Over capacity alert due to high number of tunnels.
  • 4 CPU: The number of tunnels is above the critical threshold (2850).
  • 8 CPU: The number of tunnels is above the critical threshold (5700).
Tunnel Count Below Warning Threshold GATEWAY_STABLE INFO Over capacity condition due to high number of tunnels restored. The number of tunnels is within the acceptable threshold.
Flow Count Above Warning Threshold. The Warning threshold value is 50% of Supported value 1920000. GATEWAY_DEGRADED NOTICE Over capacity alert due to high number of flows. The number of flows is above the warning threshold (960000).
Flow Count Above Critical Threshold. The Critical threshold value is 75% of Supported value 1920000. GATEWAY_CRITICAL NOTICE Over capacity alert due to high number of flows. The number of flows is above the critical threshold (1440000)
Flow Count Below Warning Threshold GATEWAY_STABLE INFO Over capacity condition due to high number of flows restored. The number of flows is within the acceptable threshold.
NAT Entries Count Above Warning Threshold. The Warning threshold value is 50% of Supported value 1920000. GATEWAY_DEGRADED NOTICE Over capacity alert due to high number of NAT entries. The number of NAT entries is above the warning threshold (960000).
NAT Entries Count Above Critical Threshold. The Critical threshold value is 75% of Supported value 1920000. GATEWAY_CRITICAL NOTICE Over capacity alert due to high number of NAT entries. The number of NAT entries is above the critical threshold (1440000).
NAT Entries Count Below Warning Threshold GATEWAY_STABLE INFO Over capacity condition due to high number of NAT entries restored. The number of NAT entries is within the acceptable threshold.
Packet Queue Watermark Above Critical Threshold GATEWAY_CRITICAL NOTICE Over capacity alert due to high packet queue watermark. The packet queue watermark is above the critical threshold (6000) for 5 consecutive seconds.
Packet Queue Watermark Above Warning Threshold GATEWAY_DEGRADED NOTICE Over capacity alert due to high packet queue watermark. The packet queue watermark is above the warning threshold (2000) for 10 consecutive seconds.
Packet Queue Watermark Below Warning Threshold GATEWAY_STABLE INFO Over capacity condition due to high packet queue watermark restored. The packet queue watermark is within the acceptable threshold.
Packet Drop Count Above Critical Threshold GATEWAY_CRITICAL NOTICE Over capacity alert due to high number of packet drops. The number of packet drops is above the critical threshold (2000) for 5 consecutive seconds.
Packet Drop Count Above Warning Threshold GATEWAY_DEGRADED NOTICE Over capacity alert due to high number of packet drops. The number of packet drops is above the warning threshold (500) for 10 consecutive seconds.
Packet Drop Count Below Warning Threshold GATEWAY_STABLE INFO Over capacity condition due to high number of packet drops restored. The number of packet drops is within the acceptable threshold.
..