Simplifying Network Operations through Data Center Automation

 

 
Arista Networks White Paper
White Paper
Simplifying Network Operations through Data Center Automation

It’s simply not good enough to have a great and scalable network alone. A data center can have tens of thousands of compute, storage and network devices, presenting a large operational challenge to IT. In addition, as the network is scaling, IT is being asked to reduce operational expenses and increase responsiveness to changing business needs.

Automation is the key for simplifying network operations from provisioning to day-to-day management. Where manual processes require resources to scale linearly with the network, automation tools amplify the work of each network operations engineer. Simultaneously, the programmatic operation of the network means that it is faster to provision new policies and services in the network. Arista delivers automation with the Arista Extensible Operating System, EOS®—from provisioning and monitoring to troubleshooting for:
  • “Day one” provisioning of the network
  • Day-to-day management for of the network
  • Virtualization management for both networks and workloads.
Arista EOS is open and programmable, providing management and provisioning capabilities that work at scale. Through its programmability, EOS enables a set of software applications that deliver network provisioning, workload automation, unprecedented network and workflow visibility as well as rapid integration with a wide range of third party applications for virtualization, management, automation and orchestration services.

There is a growing need for a fundamental change to the provisioning of networks just like server provisioning has evolved over the years by leveraging automation tools such as Puppet and Chef. The demand for agility and deployment at scale with regards to provisioning and network operations requires a new level of automation and integration with current data center infrastructure. The underlying design of the network operating system provides the architectural foundation to meet these requirements.

Arista EOS: Foundation for Programmability and Automation

Arista EOS is the industry’s most advanced, open and extensible network operating system. EOS combines modern-day software and operating system (O/S) concepts including transparently restartable processes, open platform development, an unmodified Linux kernel, and a stateful, programmable publish/subscribe database model for switching state. The Arista EOS software framework guarantees consistent operations, workflow automation and high availability.

Programmability and Automation
Figure 1: Arista EOS Architecture

Key advantages of using an unmodified Linux kernel include the following:
  • Retaining benefits from Linux community development including bug fixes, feature updates and security updates
  • Full Linux capabilities such as using standard tools right out of the box, installing additional tools through RPM packages, running third party Linux applications, and creating custom tools with bash, perl and python.
  • The ability to use the same Linux-based toolsets to manage network nodes as for server and compute nodes.
Arista EOS has a unique multi-process, state-sharing architecture that separates state information and packet forwarding from protocol processing and application logic. This modular architecture enables stateful fault isolation, stateful fault repair, security exploit containment as well as in-service software updates.

Arista EOS offers the following features that support automation:
  • Modular, state-sharing architecture that enables stateful fault isolation and fault repair
  • Single binary EOS image that can be deployed across any family of products. This improves the testing depth on each platform, reduces time-to-deployment, and keeps features and bug resolution compatibility across all platforms.
  • Programmable at all layers: Linux kernel, hardware forwarding tables, virtual machine orchestration, switch configuration, provisioning automation, and advanced monitoring
  • Open Linux and EOS access with the flexibility and choice to provide authorized and secure access through TACACS+ & RADIUS AAA features

Provision a “Day One” Network

Scaling provisioning as the network grows is a challenge. Often manual configuration is used to provision the network. However, as the network grows, an increasing number of individuals are involved, and often in the coordination and communication of the process, errors get introduced. Simultaneously, businesses are more reliant than ever on data and services being delivered from their data center; data center outages have even an even larger impact today. Automation of initial and ongoing provisioning and network monitoring are key strategies for reducing the human error component.

Arista Zero Touch Provisioning (ZTP)

A first step in automating the data center is the ability to provision an existing or new green field network quickly and programmatically. Arista EOS Zero Touch Provisioning (ZTP) automates the configuration of a new or replacement switch without user intervention or requiring a network engineer with a serial console cable.

With ZTP, a switch loads its image and configuration from a centralized location within the network. Using standards based protocols (e.g. DHCP, T/FTP, HTTP), the network can be rapidly provisioned. Administrators can programmatically tailor boot configurations based on a variety of parameters, meeting the needs of even the most complex data center deployments.

Programmability and Automation
Figure 2: Zero Touch Provisioning for a new or replacement switch

ZTP automates the deployment of network switches such that it is simply a case of racking the switches, cabling them and powering them on. ZTP eliminates manual configuration for provisioning changes and operating system upgrades. Combined with other Arista solutions, like Arista EOS VM Tracer, automatic VLAN configuration, data center managers can fully automate the bring-up of network elements and virtual servers.

Operational savings moving from manual to automated, ZTP-based provisioning for 10K ports
Operational MeasuresManualAutomated with ZTP
Time-to-Provision 2 to 3 days 15 minutes
Engineering Resources 2 to 3 engineers 1 engineer
% Errors 10 to 20% 0%
Table #1

With ZTP, a single engineer can program the configuration updates. With manual configuration, several network engineers are required to roll-out the changes within an acceptable time frame, with each manual change creating an opportunity for introducing error. Automated provisioning reduces the need for people resources as well as the time to deploy the change and likelihood of mistakes.

Arista Zero Touch Replacement (ZTR)

An extension to ZTP, Zero Touch Replacement (ZTR) enables switches to be physically replaced, with the replacement switch picking up the same image and configuration as the switch it replaced. Switch identity and configuration are not tied to switch MAC address but instead are tied to location in the network where the device is attached, using on LLDP information from neighboring devices. ZTR reduces time-to-restoration of service to the time it takes to rack a new switch, cable it and power it on, without any dependency of a network engineer’s availability to physically attach a serial console cable and configure the switch.

Automate Daily Operations

Ongoing management of the data center network is the second area to focus on automating. With hundreds and thousands of compute, storage and network elements requiring maintenance and support, automation is the key to reducing ongoing operating expenses while enabling changes to be made quickly.

Arista EOS integrates with popular Linux-based tools for configuration and monitoring. Arista EOS has built-in tracer tools for monitoring and troubleshooting all aspects of the network, showing key linkages to the application layers. Arista EOS offers an API to the full CLI, Arista eAPI, that can be used to create custom tools and scripts. Lastly, the Smart System Upgrade (SSU) feature automates switch configuration and software update.

Arista EOS DevOps Integration

Consistent Toolsets for Compute and Network Elements


Often the modern data center infrastructure compute component has been provisioned and managed by DevOps tools like Puppet and Chef. Data center IT want to simplify their operations by using the same Linux-based toolsets to manage both network and compute and storage elements. With its unmodified Linux kernel, Arista EOS integrates with the rich ecosystem of Linux DevOps tools for management and workflow orchestration, including Puppet, Chef, Ansible, Splunk, Nagios and Ganglia.

Linux DevOps Tools
Figure 3: Automation with Puppet and EOS

Traditionally, one would have to wait on a change ticket for a network administrator to add a VLAN at the Top-of- Rack (TOR) until a new server is provisioned. With EOS’ DevOps integration, one combined network-server administrator can now use Puppet to make configuration changes on the network devices at the same time while a server is being provisioned.

Monitoring and Troubleshooting Automation

Arista EOS Network Tracers


Arista EOS tracer tools provide a new model for faster troubleshooting from fault detection to fault isolation. The tracers provide critical, real-time information from the network to the application to network operations. The tracers enable the network system to:
  • Proactively detect network issues
  • Automatically react to coordinated actions or take direction from other applications/infrastructures
  • Notify other elements or operations teams of changing condition

Network Tracers
Figure 4: Arista EOS - Network Tracers

Arista EOS provides network tracers for end-to-end visibility

Health Tracer

– This is a suite of EOS agents, which automatically and continuously monitor the health of the switch. Each agent proactively monitors the health status of each field replacement unit (e.g. fan, power, supervisor, etc.) and automatically takes corrective action and sends out appropriate alerts to ensure overall system visibility.

Path Tracer

– This is a protocol independent network monitoring and analysis tool that continuously and actively probes the network for packets that are lost, disordered or duplicated. Using this feature, proactive alerts can send notifications to network operations, initiate the execution of remedial scripts or even notify external controllers.

VM Tracer

– As virtualized data centers have grown in size, the physical and virtual networks that support them have also grown in size and complexity. Virtual machines connect through virtual switches and then to the physical infrastructure, adding a layer of abstraction and complexity. Server side tools have emerged to help VMware administrators manage virtual machines and networks, however, equivalent tools to help the network administrator resolve conflicts between physical and virtual networks have until now not been available.

Arista VM Tracer provides this bridge by automatically discovering which physical servers are virtualized and their associated VLANs, through VMware vCenter APIs, and then automatically applying physical switch port configurations in real time with vMotion events.

This results in automated port configuration and VLAN database membership and the dynamic adding/removing VLANs from trunk ports. VM Tracer extends to VXLAN architectures.

Map Reduce Tracer

– The Map Reduce tracer tracks Hadoop nodes and collects their activity statistics. The goal is to correlate congestion events with jobs running on the servers. The end result is to automatically trigger packet capture and proactively notify on a failed Hadoop node.

LANZ Tracer

– Arista Latency Analyzer (LANZ) enables tracking of network congestion in real time before congestion causes performance issues. Today’s systems often detect congestion when someone complains, “The network seems slow.” The network team gets a trouble ticket, and upon inspection can see packet loss on critical interfaces. The best solution historically available to the network team has been to mirror the problematic port to a packet capture device and hope the congestion problem repeats itself.

Now, with LANZ’s proactive congestion detection and alerting capability both human administrators and integrated applications can:
  • Preempt network conditions that induce latency or packet loss
  • Adapt application behavior on prevailing conditions
  • Isolate potential bottlenecks early, enabling proactive capacity planning
  • Maintain forensic data for post-process correlation and back testing

Custom Tools Through Arista EOS External API (eAPI)

Arista EOS programmatic interface eAPI allows applications and scripts to have complete programmatic control over EOS, with a stable and easy to use syntax. Once the API is enabled, the switch accepts commands using Arista CLI syntax and responds with machine-readable output and errors serialized in JSON, served over HTTP. The EOS eAPI has three major advantages:
  • Comprehensiveness: Arista eAPI gives access to the state and the ability to configure any property on the switch that is accessible with the CLI.
  • Ease-of-use and flexibility: The simplicity of this protocol and the availability of third party JSON clients means that eAPI is language agnostic and can be easily integrated into any existing infrastructure and workflows. Additionally, on-box, interactive documentation for the API and return values makes writing new programs simple.
  • Stability: Arista maintains API compatibility across multiple EOS versions. This allows end users to confidently develop critical applications without compromising their ability to upgrade to newer EOS releases and access new features or run in data centers with multiple versions with multiple versions of EOS.
Network Automation Programmability
Figure 5: EOS eAPI – Network Automation & Programmability

Network Upgrade Automation with Smart System Upgrade

Deploying and taking advantage of new technology is top of mind for most organizations. Balancing the business benefits of adopting a rapid pace of innovation with the associated risks is a constant struggle. A major inhibitor to technology adoption is the ability to transparently insert new technologies into existing facilities without adversely impacting critical applications. Smart System Upgrade (SSU) is a network application designed to address data center network maintenance—software upgrades and configuration changes—with minimal service disruption.

Spine/Leaf SSU Networks
Figure 6: Arista Spine/Leaf SSU – Hitless Upgrade

The intent of SSU is to allow maintenance to be performed on any infrastructure element, without adversely impacting application traffic flow. Combining native Arista EOS functionality and direct integration with other applications and infrastructure components, SSU allows a network element to be transparently removed or added. Designed to be a complete solution for data center infrastructure maintenance, Arista’s SSU provides the following key benefits:
  • Intelligent insertion and removal of network elements
  • Programmatic upgrade to new software releases without causing systemic outages
  • Open integration with all application and infrastructure elements
Data center operations teams need more intelligent tools and extensible feature sets to manage today’s “always on” data center infrastructures. Arista EOS provides the foundation for innovation, driving down operational cost while simultaneously increasing operating uptime.

Automation for Network Virtualization

In addition to automated day-to-day management and monitoring, some companies are automating the entire workflow process for dynamic placement of workloads with end-to-end network virtualization. Arista EOS open architecture integrates with any virtualization and orchestration system, including VMware NSX, OpenStack Neutron and Microsoft SVCMM. With Arista EOS-based virtualization, workloads can be portable while preserving their addressing and policies, simplifying scale-out and workload placement within the data center.

Arista EOS provides automated provisioning and visibility into both virtual and physical cloud network through open controller integration and hardware based VXLAN support on Arista platforms. Provisioning and orchestration with Arista EOS works with any native hypervisor and uniquely solves the challenge of integrating with workflows with functionalities like DANZ for real-time congestion management, VM Tracer to expose virtual and physical connectivity, leveraging sFlow to get traffic statistics for the VXLAN overlay. Monitoring and visibility into these workflows play a pivotal role in simplifying cloud network automation and operations.

Network Applications
Figure 7: Arista Network Application - OpenWorkload

Benefits of workload mobility with Arista EOS include:
  • Seamless Scaling – full support for network virtualization, connecting to major Software Defined Networking (SDN) controllers
  • Integrated Orchestration – interfaces to VMware, OpenStack, Plumgrid for provisioning
  • Workflow Visibility – visibility to the VM-level with VM Tracer, enabling portable policies, persistent monitoring, and rapid troubleshooting of cloud networks.
  • Combined Infrastructure and Application Visibility—Data about network state, including underlay and overlay network statistics can be sent to third party monitoring applications such as Splunk, ExtraHop, Corvil and Riverbed. With critical infrastructure information exposed to the application layer, issues can be proactively avoided.

Summary

Shifting spending from IT Operations to innovation and increased responsiveness to changing business needs are the key goals for every CIO. The only way for Enterprises and Service Providers to obtain substantial operational costs reduction is to automate their network environments. Traditionally enterprises have been shackled in working with closed or hybrid open network operating systems with little to no capabilities for automating provisioning and day-to-day operations. Arista EOS changes the equation.

Arista EOS is truly an open, programmable, next generation network operating system. With Arista EOS, a data center can be fully automated for provisioning and for managing ongoing changes and troubleshooting day-to-day issues. Through automation, IT can achieve operational savings while increasing its agility, even as the network scales.

Copyright © 2017 Arista Networks, Inc. All rights reserved. CloudVision, and EOS are registered trademarks and Arista Networks is a trademark of Arista Networks, Inc. All other company names are trademarks of their respective holders. Information in this document is subject to change without notice. Certain features may not yet be available. Arista Networks, Inc. assumes no responsibility for any errors that may appear in this document.   02-0031-01