Internet Engineering Task Force T. Narten Internet-Draft IBM Intended status: Informational June 03, 2013 Expires: December 05, 2013 An Architecture for Overlay Networks (NVO3) draft-narten-nvo3-arch-00 Abstract This document presents a high-level overview of a possible architecture for building overlay networks in NVO3. The architecture is given at a high-level, showing the major components of an overall system. An important goal is to divide the space into individual smaller components that can be implemented independently and with clear interfaces and interactions with other components. It should be possible to build and implement individual components in isolation and have them work with other components with no changes to other components. That way implementers have flexibility in implementing individual components and can optimize and innovate within their respective components without necessarily requiring changes to other components. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on December 05, 2013. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents Narten Expires December 05, 2013 [Page 1] Internet-Draft Overlays for Network Virtualization June 2013 (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.1. VM Orchestration Systems . . . . . . . . . . . . . . . . 4 4. Network Virtualization Edge (NVE) . . . . . . . . . . . . . . 5 4.1. NVE Co-located With Server Hypervisor . . . . . . . . . . 5 4.2. Overlay-Aware Data Appliances . . . . . . . . . . . . . . 6 4.3. Bare Metal Servers . . . . . . . . . . . . . . . . . . . 6 4.4. Hardware Gateways . . . . . . . . . . . . . . . . . . . . 7 4.5. Split-NVE . . . . . . . . . . . . . . . . . . . . . . . . 7 5. Network Virtualization Authority . . . . . . . . . . . . . . 7 5.1. How an NVA Obtains Information . . . . . . . . . . . . . 8 5.2. Intra-NVA Control Protocol . . . . . . . . . . . . . . . 9 6. NVE-to-NVA Protocol . . . . . . . . . . . . . . . . . . . . . 9 6.1. NVE-NVA Interaction Models . . . . . . . . . . . . . . . 9 6.2. Direct NVE-NVA Protocol . . . . . . . . . . . . . . . . . 10 6.3. Push vs. Pull Model . . . . . . . . . . . . . . . . . . . 10 7. Federated NVAs . . . . . . . . . . . . . . . . . . . . . . . 11 7.1. Inter-NVA Peering . . . . . . . . . . . . . . . . . . . . 13 8. Control Protocol Summary . . . . . . . . . . . . . . . . . . 13 9. NVO3 Data Plane Encapsulation . . . . . . . . . . . . . . . . 13 10. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 12. Security Considerations . . . . . . . . . . . . . . . . . . . 14 13. Informative References . . . . . . . . . . . . . . . . . . . 14 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 15 1. Introduction This document presents a high-level overview of a possible architecture for building overlay networks in NVO3. The architecture is given at a high-level, showing the major components of an overall system. An important goal is to divide the space into smaller individual components that can be implemented independently and with clear interfaces and interactions with other components. It should be possible to build and implement individual components in isolation and have them work with other components with no changes to other components. That way implementers have flexibility in implementing Narten Expires December 05, 2013 [Page 2] Internet-Draft Overlays for Network Virtualization June 2013 individual components and can optimize and innovate within their respective components without necessarily requiring changes to other components. The motivation for overlay networks is given in [I-D.ietf-nvo3-overlay-problem-statement]. "Framework for DC Network Virtualization" [I-D.ietf-nvo3-framework] provides a framework for discussing overlay networks generally and the various components that must work together in building such systems. This document differs from the framework document in that it doesn't attempt to cover all possible approaches within the general design space. Rather, it describes one particular approach. This document is intended to be a concrete strawman that can be used for discussion within the IETF NVO3 WG on what the NVO3 architecture should look like. 2. Terminology This document uses the same terminology as [I-D.ietf-nvo3-framework]. In addition, the following terms are used: NVA Domain A Network Virtualization Authority Domain is an administrative construct that defines an NVA and the set of NVEs that are associated with it. NVEs are associated with a single NVA. NV Domain A Network Virtualization Domain is an administrative construct that defines a set of virtual networks that a given NVA manages. An NVE associated with a specific NVA domain supports connectivity to any virtual network within that NVA's NV Domain. NV Region A set of two or more NVA domains that share all or part of a NV Domain. Two NVAs can share information about particular virtual networks for the purpose of supporting connectivity between tenants located at different NVAs. NVAs can share information about an entire NV domain, or just individual virtual networks. 3. Background Overlay networks provide networking service to a set of Tenant Systems (TSs) [I-D.ietf-nvo3-framework]. Tenant Systems connect to Virtual Networks (VNs), with the VN's attributes defining aspects of the network including the set of members belonging to that specific virtual network. Tenant Systems connected to a virtual network communicate freely with other Tenant Systems on the same VN, but communication between Tenant Systems on one VN and those on another Narten Expires December 05, 2013 [Page 3] Internet-Draft Overlays for Network Virtualization June 2013 VN or not connected to a VN is carefully restricted and governed by policy. A Virtual Network provides either L2 or L3 service to connected tenants. For L2 service, VNs transport Ethernet frames, and a Tenant System is provided with a service that is analogous to being connected to a specific L2 C-VLAN. L2 broadcast frames are delivered to all (and multicast frames delivered to a subset of) the other Tenant Systems on the VN. To a Tenant System, it appears as if they are connected to a regular L2 Ethernet link. Within NVO3, tenant frames are tunneled to remote NVEs based on the MAC addresses of the frame headers as originated by the Tenant System. On the underlay, NVO3 packets are forwarded between NVEs based on the outer addresses of tunneled packets. For L3 service, a Tenant System still connects to the network via an L2 Ethernet link, but all traffic to and from the Tenant System is assumed to be IP. The L2 headers are only used to provide backwards compatibility, so that unmodified Tenant Systems can operate unchanged when using NVO3. Within NVO3, tenant frames are tunneled to remote NVEs based on the IP addresses of the packet originated by the Tenant System; the L2 destination addresses provided by Tenant Systems are effectively ignored. It is important to note that whether NVO3 provides L2 or L3 service to a Tenant System, the Tenant end point does not need to be aware of the distinction. The Tenant System still connects to an NVO3 network via an L2 link. L2 service is intended for systems that need native L2 ethernet service and the ability to run protocols directly over Ethernet (i.e., not based on IP). L3 service is intended for systems in which all the traffic can safely be assumed to be IP. 3.1. VM Orchestration Systems VM Orchestration systems manage server virtualization across a set of servers. When a new VM image is started, the VM Orchestration system determines where the VM should be placed, interacts with the hypervisor on the target server to load and start the server, and controls when a VM should be shutdown or migrated elsewhere. VM orchestration systems have global knowledge over the domain they manage. They know on what servers a VM is running, what addresses (MAC and IP) the VMs are running as well as other meta-data associated with a particular VM image. VM orchestration systems run a protocol with an agent running on the hypervisor of the servers they manage. Example VM orchestration systems in use today include VMware's VCenter or Microsoft's System Center. The protocol used between the VM orchestration system and hypervisors are proprietary. Narten Expires December 05, 2013 [Page 4] Internet-Draft Overlays for Network Virtualization June 2013 4. Network Virtualization Edge (NVE) As described in [I-D.ietf-nvo3-framework], a Network Virtualization Edge (NVE) is the entity that resides at the boundary between a Tenant System and the overlay network and implements the overlay functionality. Towards the Tenant System, the NVE provides L2 (or L3) service. Towards the data center network, the NVE sends and receives native IP traffic. When ingressing traffic from a Tenant System, the NVE identifies the egress NVE to which the packet should be sent, adds an overlay encapsulation header, and sends the packet on the underlay network. When egressing traffic, an NVE receives an encapsulated packet from a remote NVE via the underlay network, strips off the encapsulation header, and delivers the (original) packet to the appropriate Tenant System. Conceptually, the NVE is a single entity implementing the NVO3 functionality. An NVE will have two external interfaces: Tenant Facing: On the tenant facing side, an NVE interacts with the Tenant System to provide the NVO3 service. An NVE will need to learn when a Tenant System "attaches" to a virtual network (so it can validate the request and set up any state needed to send and receive traffic on behalf of the Tenant System on that VN). Likewise, an NVE will need to be informed when the Tenant System "detaches" from the virtual network so that it can reclaim state and resources appropriately. DCN Facing: On the data center network facing side, an NVE interfaces with the data center underlay network, sending and receiving IP packets to and from the underlay. 4.1. NVE Co-located With Server Hypervisor When server virtualization is used, the entire NVE functionality will typically be implemented as part of the hypervisor and/or vSwitch on the server. In such cases, the Tenant System interacts with the hypervisor and the hypervisor interacts with the NVE. Because the hypervisor and NVE interaction is implemented entirely in software on the server, there is no "on-the-wire" protocol between Tenant Systems (or the hypervisor) and the NVE that need to be standardized. While there may be APIs between the NVE and hypervisor to support necessary interaction, the details of such an API are not in-scope for the IETF to work on. Implementing NVE functionality entirely on a server has the disadvantage that precious server CPU resources must be spent implementing the NVO3 functionality. Experimentation with overlay approaches suggests that offloading at least the encapsulation and Narten Expires December 05, 2013 [Page 5] Internet-Draft Overlays for Network Virtualization June 2013 decapsulation operations an NVE implements can produce significant performance improvements. As has been done with checksum and/or TCP server offload and other optimization approaches, there may be benefits to offloading common operations onto adaptors where possible. For server systems, such offloading is an implementation matter between server and adaptor vendors and does not require any IETF standardization. 4.2. Overlay-Aware Data Appliances Some Data Appliances (virtual or physical) provide tenant-aware services. That is, the specific service they provide depends on the identity of the tenant making use of the service. For example, firewalls are now becoming available that support multi-tenancy where a single firewall provides virtual firewall service on a per-tenant basis, using per-tenant configuration rules. Such appliances will be aware of the VN an activity corresponds while processing requests. Unlike server virtualization, which shields VMs from needing to know about mult-tenancy, data appliance may be aware of an exploiting multi-tenancy. In such cases, the NVE is implemented within the data appliance. Unlike server virtualization, however, the data appliance will not be running a traditional hypervisor and the VM Orchestration system will not be able to interact with the data appliance. The NVE on such data appliances will need to support a control plane to obtain the necessary information needed to fully participate in an NVO3 Domain. 4.3. Bare Metal Servers Many data centers will continue to have at least some servers operate as non-virtualized (or "bare metal") machines running a traditional operating system and workload. In such systems, there will be no NVE functionality on the server, and the server will have no knowledge of NVO3 (including whether overlays are even in use). In such environments, the NVE functionality can reside on the first-hop physical switch that understands NVO3. In such a case, the network administrator would (manually) configure the switch to enable the appropriate NVO3 functionality on the network port that connects to the server. Such configuration would typically be static, since the server is not virtualized, and once configured, is unlikely to change frequently. Consequently, this scenario does not require any protocol or standards work. Narten Expires December 05, 2013 [Page 6] Internet-Draft Overlays for Network Virtualization June 2013 4.4. Hardware Gateways Gateways on VNs relay traffic onto and off of a virtual network. Tenant Systems use gateways to reach destinations outside of VN. Gateways receive encapsulated traffic from one VN, remove the encapsulation header, and send the native packet out onto the data center network for delivery. Outside traffic enters a VN in a reverse manner. For performance, standalone hardware gateways gateways may be required. Such gateways could consist of a simple switch forwarding traffic from a VN onto the local data center network, or may embed router functionality. Such gateways will support an embedded NVE associated with interfaces connected to VNs. As in the case with data appliances, gateways will not support a hypervisor and will need an appropriate control plane to obtain the information needed to provide NVO3 service. 4.5. Split-NVE One final possible scenario leads to the need for a split NVE implementation. A hypervisor running on a server could be aware that NVO3 is in use, but have some of the actual NVO3 functionality implemented on an adjacent switch to which the server is attached. While one could imagine a number of link types between a server and the NVE, the simplest deployment scenario would support a server and NVE separated by a simple L2 ethernet link, across which LLDP runs. A more complicated scenario would have the server and NVE separated by a bridged access network, such as when the NVE resides on a ToR, but an embedded switch resides between a server and the ToR. should be considered only if a compelling use case emerges. For the split NVE case, protocols will be needed that allow the hypervisor and NVE to negotiate and setup the necessary state so that traffic sent across the access link between a server and the NVE can be associated with the correct virtual network instance. Specifically, on the access link, traffic belonging to a specific Tenant System would be tagged with a specific VLAN C-TAG that identifies which specific NVO3 virtual network instance it belongs too. The hypervisor-NVE protocol would negotiate which VLAN C-TAG to use for a particular virtual network instance. More details of the protocol requirements for this functionality can be found in [I-D.kreeger-nvo3-hypervisor-nve-cp]. 5. Network Virtualization Authority Address dissemination refers to the process of learning, building and distributing the mapping/forwarding information that NVEs need in order to tunnel traffic to each other on behalf of communicating Tenant Systems. Before sending to and receiving traffic from a Narten Expires December 05, 2013 [Page 7] Internet-Draft Overlays for Network Virtualization June 2013 virtual network, the NVE must obtain the information needed to build its internal forwarding tables and state. For unicast traffic, such information includes knowing the location of the NVE where the target VM currently resides. For tenant multicast traffic, the information needed depends on how multicast traffic is delivered. If the underlay network supports IP multicast, tenant multicast traffic could be mapped into an underlay multicast address for delivery using the native multicast delivery capabilities of the underlay network. In such cases, the NVE would need to know which underlay multicast address to associate with a particular VN and tenant multicast address. For small groups, or on underlay networks that do not support IP multicast, an NVE could use serial unicast to deliver traffic. In such cases, the NVE needs to know which destination NVEs have listeners for a particular tenant multicast address. In addition to mapping information, the NVE will also need to know what encapsulation header to use (in the case that there are choices), what VN Context to associate with a given VN, possibly quality-of- service (QoS) settings, etc. An NVE obtains such information from a Network Virtualization Authority. The Network Virtualization Authority (NVA) is the entity that provides address mapping and other information to NVEs. NVEs interact with an NVA to obtain any required address mapping information they need in order to properly forward traffic on behalf of tenants. The term NVA refers to the overall system, without regards to its scope or how it is implemented. NVAs provide a service, and NVEs access that service via an NVE-to-NVA protocol. 5.1. How an NVA Obtains Information There are two primary ways in which an NVA can obtain the address dissemination information it manages. On virtualized systems, the NVA can obtain the information associated with VMs from the VM orchestration system itself. Since the VM orchestration system is effectively a master database for all the virtualization information, having the NVA obtain information directly to the orchestration system would be a natural approach. Indeed, the NVA could effectively be co-located with the VM orchestration system itself. However, not all NVEs are associated with hypervisors. NVAs will also need to peer directly with NVEs not associated with virtualized systems in order to obtain information about the TSes connected to that NVE and to distribute information about the VNs those TSes are associated with. For example, whenever a Tenant System connects to an NVE, that NVE would notify the NVA that the TS is now associated with that NVE. Likewise when a TS detaches from an NVE, that NVE Narten Expires December 05, 2013 [Page 8] Internet-Draft Overlays for Network Virtualization June 2013 would inform the NVA. By communicating directly with NVEs, both the NVA and the NVE are able to maintain up-to-date information about all active tenants and the NVEs to which they are attached. 5.2. Intra-NVA Control Protocol To avoid single points of failure, an NVA would be implemented in a distributed or replicated manner, but the internal details of an NVA implementation are not visible to NVEs. How the NVA is implemented is not important to an NVE so long as it provides a consistent and standard interface to the NVE. For example, an NVA could be implemented via database techniques whereby a server stores address mapping information in a traditional database. Alternatively, an NVA could be implemented in a distributed fashion using an existing (or modified) routing protocol to distribute mappings. So long as there is a clear interface between the NVE and NVA, how the NVA is architected and implemented is not important to an NVE. A number of architectural approaches could be used to implement local NVAs themselves. NVAs manage address bindings and distribute them to where they need to go. One approach would be to use BGP (possibly with extensions) and route reflectors. Another approach could use a transaction-based database model with replicated servers. Because the implementation details are local to an NVA, there is no need to pick exactly one solution technology, so long as the external interfaces to the NVEs (and remote NVAs) are sufficiently well defined to achieve interoperability. 6. NVE-to-NVA Protocol 6.1. NVE-NVA Interaction Models An NVE interacts with an NVA in at least two (quite different) ways: o NVEs supporting VMs and hypervisors can obtain necessary information entirely through the hypervisor-facing side of the NVE. Such an approach is a natural extension to existing VM orchestration systems supporting server virtualization because an existing protocol between the hypervisor and VM Orchestration system already exists and can be leveraged to obtain any needed information. Specifically, VM orchestration systems used to create, terminate and migrate VMs already have well-defined (though typically proprietary) protocols to handle the interactions between the hypervisor and VM orchestration system. For such systems, it would be a natural extension to leverage the existing orchestration protocol as a sort of proxy protocol for handling the interactions between an NVE and the NVA. Narten Expires December 05, 2013 [Page 9] Internet-Draft Overlays for Network Virtualization June 2013 o Alternatively, an NVE can obtain needed information by interacting with the NVA directly via a protocol operating over the data center underlay network. Such an approach is needed for the case where the NVE is not associated with server virtualization (e.g., is a standalone gateway) or where the NVE needs to communicate directly with the NVA for other reasons. The NVO3 architecture should support both of the above models and indeed, it is possible that both models could be used simultaneously. Existing virtualization environments will (and already are) using the first model. But they are not sufficient to cover the case of standalone gateways -- such gateways do not support virtualization and do not interface with existing VM orchestration systems. Also, A hybrid approach might be desirable in some cases where the first model is used to obtain the information, but the latter approach is used to validate and further authenticate the information before using it. 6.2. Direct NVE-NVA Protocol An NVE can interact directly with an NVA via a dedicated NVE-to-NVA protocol. Using a dedicated protocol provides architectural separation and independence between the NVE and NVA. The NVE and NVA interact in a well-defined way, and changes in the NVA (or NVE) do not need to impact each other. Using a dedicated protocol also ensures that both NVE and NVA implementations can evolve independently and without dependencies on each other. Such independence is important because the upgrade path for NVEs and NVAs is quite different. Upgrading all the NVEs at a site will likely be more difficult in practice than upgrading NVAs because of their large number - one on each end device. In practice, it is assumed that an NVE will be implemented once, and then (hopefully) not again, whereas an NVA (and its associated protocols) are more likely to evolve over time as experience is gained from usage. Requirements for a direct NVE-NVA protocol can be found in [I-D.kreeger-nvo3-overlay-cp] 6.3. Push vs. Pull Model There has been discussion within NVO3 about a "push vs. pull" approach for NVE-to-NVA interaction. In the push model, the NVA would push address binding information to the NVE. Since the NVA has current knowledge of which NVE each Tenant System is connected to, the NVA can proactively push updates out to the NVEs when they occur. With a push model, the NVE can be more passive, relaying on the NVA to ensure that an NVE always has most current information. The push model has the benefit that NVEs will always have the mapping Narten Expires December 05, 2013 [Page 10] Internet-Draft Overlays for Network Virtualization June 2013 information they need, and do not need to query the NVA on a cache miss. Note that in the push model, it is not required that an NVE maintain information about all virtual networks in the entire NV Domain; an NVE only needs to maintain information about the VNs associated with TSs associated with the NVE. In the pull model, an NVE may not have all the mappings it needs when it attempts to forward tenant traffic. If an NVE attempts to send traffic to a destination for which it has no forwarding entry, the NVE queries the NVA to get the needed information or to definitively determine that no such entry exists. While the pull model has the advantage that an NVE doesn't need table entries for destinations it is not forwarding traffic to, it has the disadvantage of delaying the sending of traffic on a cache miss. The NVO3 architecture should support both models or even a combination model that supports elements of both push and pull. In the case that the NVA has updated information to push to the NVEs, there is no reason prohibit such a model. Likewise, when the NVA is willing to generate queries for missing information on demand, there is no reason to have the architecture prevent such a model. 7. Federated NVAs An NVA provides service to the set of NVEs in its NVA Domain. Each NVA manages network virtualization information for the virtual networks within its NV Domain. An NV domain is administered by a single entity. In some cases, it is desirable to expand the scope of a specific VN or even an entire NV domain beyond a single NVA. Such cases are handled by having different NVAs peer with each other to exchange mapping information about specific VNs. NVAs operate in a federated manner with a set of NVAs operating as a loosely-coupled federation of individual NVAs. If a virtual network spans multiple NVAs (e.g., located at different data centers), and an NVE needs to deliver tenant traffic to an NVE at a remote NVA, it still interacts only with its local NVA, even when obtaining mappings for NVEs associated with domains at a remote NVA. NVAs at one site share information and interact with NVAs at other sites, but only in a controlled manner. It is expected that policy and access control will be applied at the boundaries between different sites (and NVAs) so as to minimize dependencies on external NVAs that could negatively impact the operation within a site. It is an architectural principle that operations involving NVAs at one site not be immediately impacted by failures or errors at another site. (Of course, communication between NVEs in different NVO3 domains may Narten Expires December 05, 2013 [Page 11] Internet-Draft Overlays for Network Virtualization June 2013 be impacted by such failures or errors.) It is a strong requirement that a local NVA continue to operate properly for local NVEs even if external communication is interrupted (e.g., should communication between a local and remote NVA fail). At a high level, a federation of interconnected NVAs has some analogies to BGP and Autonomous Systems. Like an Autonomous System, NVAs at one site are managed by a single administrative entity and do not interact with external NVAs except as allowed by policy. Likewise, the interface between NVAs at different sites is well defined, so that the internal details of operations at one site are invisible to another site. Finally, an NVA only peers with other NVAs that it has a trusted relationship with, i.e., where an virtual network needs to span multiple NVAs. Reasons for using a federated model include: o Provide isolation between NVAs operating at different sites at different geographic locations. o Control the quantity and rate of information updates that flow (and must be processed) between different NVAs in different data centers. o Control the set of external NVAs (and external sites) a site peers with. A site will only peer with other sites that are cooperating in providing an overlay service. o Allow policy to be applied between sites. A site will want to carefully control what information it exports (and to whom) as well as what information it is willing to import (and from whom). o Allow different protocols and architectures to be used to for intra- vs. inter-NVA communication. For example, within a single data center, a replicated transaction server using database techniques might be an attractive implementation option for an NVA, and protocols optimized for intra-NVA communication would likely be different from protocols involving inter-NVA communication between different sites. Narten Expires December 05, 2013 [Page 12] Internet-Draft Overlays for Network Virtualization June 2013 o Allow for optimized protocols, rather than using a one-size-fits all approach. Within a data center, networks tend to have lower- latency, higher-speed and higher redundancy when compared with WAN links interconnecting data centers. The design constraints and tradeoffs for a protocol operating within a data center network are different from those operating over WAN links. While a single protocol could be used for both cases, there could be advantages to using different and more specialized protocols for the intra- and inter-NVA case. 7.1. Inter-NVA Peering To support peering between different NVAs, an inter-NVA protocol is needed. The inter-NVA protocol defines what information is exchanged between NVAs. It is assumed that the protocol will be used to share addressing information between data centers and must scale well over WAN links. 8. Control Protocol Summary The NVO3 address dissemination architecture consists of two major distinct components: NVEs and NVAs. In order to provide isolation and independence for these entities, the NVO3 architecture calls for well defined protocols for interfacing between them. For an individual NVA, the architecture calls for a single conceptual entity, that could be implemented in a distributed or replicated fashion. While the IETF may choose to define one or more specific approaches to the local NVA, there is little need for it to pick exactly one to the exclusion of others. An NVA for a single domain will likely be deployed as a single vendor product and thus the internal structure does not need to be standardized. NVAs peer with each other in a federated manner. The NVO3 architecture calls for a well-defined interface between NVAs. For the inter-NVA protocol, a protocol similar to BGP might work well. A profile would be needed to define the specific set of features and extensions needed to support NVO3. For the NVE to NVA protocol, a purpose-specific protocol may be needed. In both cases, an gap analysis of proposed solutions against a list of requirements will be needed to inform the discussion. 9. NVO3 Data Plane Encapsulation A key requirement for the NVO3 encapsulation protocol is support for a VN Context of sufficient size. A number of encapsulations already exist that provide a VN Context of sufficient size for NVO3. For example, VXLAN [I-D.mahalingam-dutt-dcops-vxlan] has a 24-bit VXLAN Network Identifier (VNI). NVGRE [I-D.sridharan-virtualization-nvgre] Narten Expires December 05, 2013 [Page 13] Internet-Draft Overlays for Network Virtualization June 2013 has a 24-bit Tenant Network ID (TNI). MPLS-over-GRE provides a 20-bit label field. While there is widespread recognition that a 12-bit VN Context would be too small (only 4096 distinct values), it is generally agreed that 20 bits (1 million distinct values) and 24 bits (16.8 million distinct values) are sufficient for a wide variety of deployment scenarios. While one might argue that a new encapsulation should be defined just for NVO3, no compelling requirements for doing so have been identified yet. Moreover, optimized implementations for existing encapsulations are already starting to become available on the market (i.e., in silicon). If the IETF were to define a new encapsulation format, it would take at least 2 (and likely more) years before optimized implementations of the new format would become available in products. In addition, a new encapsulation format would not likely displace existing formats, at least not for years. Thus, there seems little reason to define a new encapsulation. However, it does make sense for NVO3 to support multiple encapsulation formats, so as to allow NVEs to use their preferred encapsulations when possible. This implies that the address dissemination protocols must also include an indication of supported encapsulations along with the address mapping details. 10. Summary This document provides a start at a general architecture for overlays in NVO3. 11. IANA Considerations This memo includes no request to IANA. 12. Security Considerations 13. Informative References [I-D.ietf-nvo3-framework] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. Rekhter, "Framework for DC Network Virtualization", draft- ietf-nvo3-framework-02 (work in progress), February 2013. [I-D.ietf-nvo3-overlay-problem-statement] Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., and M. Napierala, "Problem Statement: Overlays for Network Virtualization", draft-ietf-nvo3-overlay-problem- statement-03 (work in progress), May 2013. [I-D.kreeger-nvo3-hypervisor-nve-cp] Narten Expires December 05, 2013 [Page 14] Internet-Draft Overlays for Network Virtualization June 2013 Kreeger, L., Narten, T., and D. Black, "Network Virtualization Hypervisor-to-NVE Overlay Control Protocol Requirements", draft-kreeger-nvo3-hypervisor-nve-cp-01 (work in progress), February 2013. [I-D.kreeger-nvo3-overlay-cp] Kreeger, L., Dutt, D., Narten, T., Black, D., and M. Sridharan, "Network Virtualization Overlay Control Protocol Requirements", draft-kreeger-nvo3-overlay-cp-03 (work in progress), May 2013. [I-D.mahalingam-dutt-dcops-vxlan] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, L., Sridhar, T., Bursell, M., and C. Wright, "VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-04 (work in progress), May 2013. [I-D.sridharan-virtualization-nvgre] Sridharan, M., Greenberg, A., Venkataramaiah, N., Wang, Y., Duda, K., Ganga, I., Lin, G., Pearson, M., Thaler, P., and C. Tumuluri, "NVGRE: Network Virtualization using Generic Routing Encapsulation", draft-sridharan- virtualization-nvgre-02 (work in progress), February 2013. Author's Address Thomas Narten IBM Email: narten@us.ibm.com Narten Expires December 05, 2013 [Page 15]