Knowledge
- Identify VMware NIC Teaming policies
- Identify common network protocols
Skills and Abilities
- Understand the NIC Teaming failover types and related physical network settings
- Determine and apply Failover settings
- Configure explicit failover to conform with VMware best practices
- Configure port groups to properly isolate network traffic
Tools
- ESX Configuration Guide
- ESXi Configuration Guide
- vSphere Command-Line Interface Installation and Scripting Guide
- Product Documentation
- vSphere Client
- vSphere CLI
- vicfg-*
Notes
Identify VMware NIC Teaming policies
There are 5 different NIC Teaming Policies.
- Route based on the originating virtual port ID
- Route based on IP hash
- Route based on source MAC hash
- Route based on physical NIC load (vSphere 4.1 only)
- Use explicit failover order
Route based on the originating virtual switch port ID
Choose an uplink based on the virtual port where the traffic entered the virtual switch. This is the default configuration and the one most commonly deployed. When you use this setting, traffic from a given virtual Ethernet adapter is consistently sent to the same physical adapter unless there is a failover to another adapter in the NIC team. Replies are received on the same physical adapter as the physical switch learns the port association. This setting provides an even distribution of traffic if the number of virtual Ethernet adapters is greater than the number of physical adapters. A given virtual machine cannot use more than one physical Ethernet adapter at any given time unless it has multiple virtual adapters. This setting places slightly less load on the ESX Server host than the MAC hash setting.
Note: If you select either srcPortID or srcMAC hash, you should not configure the physical switch ports as any type of team or bonded group.
Route based on IP hash
Choose an uplink based on a hash of the source and destination IP addresses of each packet. (For non-IP packets, whatever is at those offsets is used to compute the hash.) Evenness of traffic distribution depends on the number of TCP/IP sessions to unique destinations. There is no benefit for bulk transfer between a single pair of hosts. You can use link aggregation — grouping multiple physical adapters to create a fast network pipe for a single virtual adapter in a virtual machine. When you configure the system to use link aggregation, packet reflections are prevented because aggregated ports do not retransmit broadcast or multicast traffic. The physical switch sees the client MAC address on multiple ports. There is no way to predict which physical Ethernet adapter will receive inbound traffic. All adapters in the NIC team must be attached to the same physical switch or an appropriate set of stacked physical switches. (Contact your switch vendor to find out whether 802.3ad teaming is supported across multiple stacked chassis.) That switch or set of stacked switches must be 802.3ad-compliant and configured to use that link-aggregation standard in static mode (that is, with no LACP). All adapters must be active. You should make the setting on the virtual switch and ensure that it is inherited by all port groups within that virtual switch.
Route based on source MAC hash
Choose an uplink based on a hash of the source Ethernet MAC address. When you use this setting, traffic from a given virtual Ethernet adapter is consistently sent to the same physical adapter unless there is a failover to another adapter in the NIC team. Replies are received on the same physical adapter as the physical switch learns the port association. This setting provides an even distribution of traffic if the number of virtual Ethernet adapters is greater than the number of physical adapters. A given virtual machine cannot use more than one physical Ethernet adapter at any given time unless it uses multiple source MAC addresses for traffic it sends.
Route based on physical NIC load
Source Frank Denneman blog: http://frankdenneman.nl/2010/07/load-based-teaming/
The option “Route based on physical NIC load” takes the virtual machine network I/O load into account and tries to avoid congestion by dynamically reassigning and balancing the virtual switch port to physical NIC mappings. The three existing load-balancing policies, Port-ID, Mac-Based and IP-hash use a static mapping between virtual switch ports and the connected uplinks. The VMkernel assigns a virtual switch port during the power-on of a virtual machine, this virtual switch port gets assigned to a physical NIC based on either a round-robin- or hashing algorithm, but all algorithms do not take overall utilization of the pNIC into account. This can lead to a scenario where several virtual machines mapped to the same physical adapter saturate the physical NIC and fight for bandwidth while the other adapters are underutilized. LBT solves this by remapping the virtual switch ports to a physical NIC when congestion is detected. After the initial virtual switch port to physical port assignment is completed, Load Based teaming checks the load on the dvUplinks at a 30 second interval and dynamically reassigns port bindings based on the current network load and the level of saturation of the dvUplinks. The VMkernel indicates the network I/O load as congested if transmit (Tx) or receive (Rx) network traffic is exceeding a 75% mean over a 30 second period. (The mean is the sum of the observations divided by the number of observations). An interval period of 30 seconds is used to avoid MAC address flapping issues with the physical switches. Although an interval of 30 seconds is used, it is recommended to enable port fast (trunk fast) on the physical switches, all switches must be a part of the same layer 2 domain.
Use explicit failover order
This allows you to override the default ordering of failover on the uplinks. The only time I can see this being useful is if the uplinks are connected to multiple physical switches and you wanted to use them in a particular order. Either that or you think a pNIC In the ESX(i) host is not working correctly. If you use this setting it is best to configure those vmnics or adapters as standby adapters as any active adapters will be used from the highest in the order and then down.
For more information see Simon Greaves Blog at: http://simongreaves.co.uk/drupal/NIC_Teaming_Design
Identify common network protocols
On Wikipedia there is a complete list of network protocols. See http://en.wikipedia.org/wiki/List_of_network_protocols
Understand the NIC Teaming failover types and related physical network settings
The five available policies are:
- Route based on virtual port ID (default)
- Route based on IP Hash (MUST be used with static Etherchannel – no LACP). No beacon probing.
- Route based on source MAC address
- Route based on physical NIC load (vSphere 4.1 only)
- Explicit failover
NOTE: These only affect outbound traffic. Inbound load balancing is controlled by the physical switch.
Failover types and related physical network settings Failover types
- Cable pull/failure
- Switch failure
- Upstream switch failure
- Change NIC teaming for FT logging (use IP hash) – VMwareKB1011966
Use uplink failure detection (also known as link state tracking) to handle physical network failures outside direct visibility of the host.
With blades you typically don’t use NIC teaming as each blade has a 1 to 1 mapping from its multiple pNIC to the blade chassis switch. That switch in turn may use an Etherchannel to an upstream switch but from the blade (and hence ESX perspective) it simply has multiple independent NICs (hence route on virtual port ID is the right choice).
Determine and apply Failover settings
Configurable from the NIC teaming tab of the vSwitch.
See page 46 of the ESXi Configuration Guide or page 48 of the ESX Configuration Guide.
Load Balancing Settings
- Route based on the originating port ID. Choose an uplink based on the virtual port where the traffic entered the virtual switch.
- Route based on ip hash. Choose an uplink based on a hash of the source and destination IP addresses of each packet. For non-IP packets, whatever is at those offsets is used to compute the hash.
- Route based on source MAC hash. Choose an uplink based on a hash of the source Ethernet.
- Use explicit failover order. Always use the highest order uplink from the list of Active adapters which passes failover detection criteria.
NOTE IP-based teaming requires that the physical switch be configured with EtherChannel. For all other options, EtherChannel should be disabled.
Network Failover Detection
- Link Status only. Relies solely on the link status that the network adapter provides. This option detects failures, such as cable pulls and physical switch power failures, but not configuration errors, such as a physical switch port being blocked by spanning tree or that is misconfigured to the wrong VLAN or cable pulls on the other side of a physical switch.
- Beacon Probing. Sends out and listens for beacon probes on all NICs in the team and uses this information, in addition to link status, to determine link failure. This detects many of the failures previously mentioned that are not detected by link status alone.
Notify Switches
Select Yes or No to notify switches in the case of failover. If you select Yes, whenever a virtual NIC is connected to the vSwitch or whenever that virtual NIC’s traffic would be routed over a different physical NIC in the team because of a failover event, a notification is sent out over the network to update the lookup tables on physical switches. In almost all cases, this process is desirable for the lowest latency of failover occurrences and migrations with VMotion.
NOTE Do not use this option when the virtual machines using the port group are using Microsoft Network Load Balancing in unicast mode. No such issue exists with NLB running in multicast mode.
Failback
Select Yes or No to disable or enable failback.
This option determines how a physical adapter is returned to active duty after recovering from a failure. If failback is set to Yes (default), the adapter is returned to active duty immediately upon recovery, displacing the standby adapter that took over its slot, if any. If failback is set to No, a failed adapter is left inactive even after recovery until another currently active adapter fails, requiring its replacement.
Failover Order
Specify how to distribute the work load for uplinks. If you want to use some uplinks but reserve others for emergencies in case the uplinks in use fail, set this condition by moving them into different groups:
- Active Uplinks. Continue to use the uplink when the network adapter connectivity is up and active.
- Standby Uplinks. Use this uplink if one of the active adapter’s connectivity is down.
- Unused Uplinks. Do not use this uplink.
Configure explicit failover to conform with VMware best practices
To configure explicit failover, just go to the NIC teaming tab of the vSwitch properties to configure this. Set Load balancing to ‘Use explicit failover order’ and configure the appropriate order for NIC’s in your environment.
Configure port groups to properly isolate network traffic
The following are generally accepted best practices. (Source vexperienced.co.uk blog)
- Separate VM traffic and infrastructure traffic (vMotion, NFS, iSCSI, FT)
- Use separate pNICs and vSwitches where possible
- VLANs can be used to isolate traffic(both from a broadcast and security perspective)
- When using NIC teams use pNICs from separate buses (ie don’t have a team comprising two pNICs on the same PCI card – use one onboard adapter and one from an expansion card)
- Keep FT logging on a separate pNIC and vSwitch(ideally 10GB)
- Use dedicated network infrastructure (physical switches etc) for storage (iSCSI and NFS)
When you move to 10GB networks isolation is implemented differently (often using some sort of IO virtualisation like FlexConnect, Xsigo, or UCS) but the principals are the same. VMworld 2010 session TA8440 covers the move to 10GB and FCoE.
Links
Documents and manuals
ESX Configuration Guide: http://www.vmware.com/pdf/vsphere4/r41/vsp_41_esx_server_config.pdf
ESXi Configuration Guide: http://www.vmware.com/pdf/vsphere4/r41/vsp_41_esxi_server_config.pdf
Source
- http://frankdenneman.nl/2010/07/load-based-teaming/
- http://simongreaves.co.uk/drupal/NIC_Teaming_Design
- http://en.wikipedia.org/wiki/List_of_network_protocols
- http://www.vexperienced.co.uk/2011/04/01/vcap-dca-study-notes-2-3-deploy-and-maintain-scalable-virtual-networks/
Disclaimer.
The information in this article is provided “AS IS” with no warranties, and confers no rights. This article does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion.