Evolutionary Designs for Cloud Networking

Jayshree Ullal

As 2009 begins, I have been thinking of how stale and stagnant the networking designs have been for the past decade, based on the classic three-layer of topology of core, distribution/aggregation and access. However, I do sense an architectural shift in the datacenter for cloud container environments changing the traditional constructs. Guiding this new cloud transformation is one of my major new-year resolutions. I think this may be more achievable than my normal and predictable weight-loss goal!

Access Layer is Changing

The access layer of the enterprise network is becoming subsumed into the hosts either in the form of virtual switches for virtual machines, wireless access points for un-tethered communities or blade switches inside servers. Network performance in a cloud requires non-blocking bandwidth and predictable latency, and designs are emerging that are much more aware of the application flows rather than static addressing of devices. Cloud NetworkingTM is coming of age and can be constructed in an evolutionary manner that co-exists with today’s networks. It brings the promise of greater scalability and optimization, avoiding application silos and closer ties to compute and storage elements.

Cloud Performance Uniformity is Key

Unlike past client server designs predicated on well-defined web (256K), mail (1 MB) or file transfers (10MB), the new clouds in the data center require uniform performance metrics. Modern applications such as market data feeds, high definition video/content delivery, the movement of large numbers of virtual machines, storage and file based systems, web and large scale data analytics demand this. A key aspect of Cloud NetworkingTM platform is uniformity of performance enabling application scale-out across physical and virtual machines. There must be equal amounts of non-blocking bandwidth and predictable latency between all nodes. This is clearly a radical departure from today’s oversubscribed networks in which queuing delays and high transit latency are inherent. Multi-core processors are stressing the bandwidth from the network. Uniformity of performance with a balance of terabit scalability, predictable low latency, non-blocking throughput, and high speed interconnects driving multiple 10GE (future 40/100GE) are all essential characteristics of cloud environments. Cycles are wasted while an application waits for data to show up. Increased latency can come from an oversubscribed network directly, or indirectly by waiting for another cloud or application process. Legacy three-tier, oversubscribed designs most certainly contribute to performance degradation and an inefficient usage of compute resources.

Introducing Cloud Leaf and Spine

In the data center, there is an architectural migration to two-tiered cloud networking designs. The main building blocks are Cloud Leaves (CL) and Cloud Spines (CS). Cloud Spines forward traffic along optimal paths between nodes at Layer 2 or Layer 3 while Cloud Leafs control the flow of traffic between servers. In a data center, servers are centralized and connected to switches for uncompromised performance and high resilience. Cross sectional interconnect bandwidth can be improved though link aggregation groups (LAG) of 10GE as well as multi-pathing between leafs and spines at L2 methods or L3 ECMP.

This two-tiered Cloud Leaf and Spine architecture allows connections scaling from 100’s to 10,000+ servers with high traffic and application workflows. At the spine, routing between nodes that have the highest traffic exchange is desired. At the leaf, line rate performance enabling scale-out application deployments is highly desirable. The notion of equal access to bandwidth and resources, without complex provisioning tools or processes, is key.

It is a myth to assume that oversubscription reduces price per port since modern advances in hardware silicon make it possible to offer the same price/performance for wire speed ports. In current oversubscribed topologies, one has to pay excessive operational attention to connecting servers and switches, sometimes even engineering a specific application flow, to avoid transit delays. By building uniformity in network performance with Arista 71XX switches, one creates seamless any-to-any port access without tedious attention to switch hops and network traffic optimizations.

Virtualization Considerations in a Cloud Network

The virtualization considerations in a cloud network center around policy and control. Virtualization is being reassessed in terms of its role in the cloud itself. With deployments of virtual machines, the network’s awareness of virtual hosts and virtual-physical portability across servers ensures consistent policy and performance regardless of network state or location.

The ability to have the network devices adapt to the virtual environment changes via open direct access is a dramatic departure from a closed network of today. Cloud applications will be written utilizing fine-grained modularization and distributed intelligence, with common sets of APIs for instrumentation and management. Arista’s EOS (Extensible Operating System) provides smooth integration into this strategy. Stay tuned for more on extensibility in my future blogs.

2009 is definitely going to be an exciting year for cloud paradigm shifts! I am reminded of the modification to an old statement: “Route at the spine and switch at the leaf,” which is precisely what the industry did with routers and hubs two decades ago! I am excited by the evolution ahead and hope it makes you all re-think your classical networking designs.
I welcome your comments to feedback@arista.com

Check Out these new interesting sites:

Facebook Cloud Group by Nick Lippis
New White Papers on Cloud Fundamentals and Cloud Case Studies