Knowledge
- Identify RAID levels
- Identify supported HBA types
- Identify virtual disk format types
Skills and Abilities
- Determine use cases for and configure VMware DirectPath I/O
- Determine requirements for and configure NPIV
- Determine appropriate RAID level for various Virtual Machine workloads
- Apply VMware storage best practices
- Understand use cases for Raw Device Mapping
- Configure vCenter Server storage filters
- Understand and apply VMFS resignaturing
- Understand and apply LUN masking using PSA-related commands
- Analyze I/O workloads to determine storage performance requirements
Tools
- Fibre Channel SAN Configuration Guide
- iSCSI SAN Configuration Guide
- ESX Configuration Guide
- ESXi Configuration Guide
- vSphere Command-Line Interface Installation and Scripting Guide
- Product Documentation
- vSphere Client
- vscsiStats
- vSphere CLI
- vicfg-*
- vifs
- vmkfstools
- esxtop/resxtop
Notes.
Identify RAID levels.
Following is a brief textual summary of the most commonly used RAID levels. (Source: http://en.wikipedia.org/wiki/RAID)
- RAID 0 (block-level striping without parity or mirroring) has no (or zero) redundancy. It provides improved performance and additional storage but no fault tolerance. Hence simple stripe sets are normally referred to as RAID 0. Any disk failure destroys the array, and the likelihood of failure increases with more disks in the array (at a minimum, catastrophic data loss is twice as likely compared to single drives without RAID). A single disk failure destroys the entire array because when data is written to a RAID 0 volume, the data is broken into fragments called blocks. The number of blocks is dictated by the stripe size, which is a configuration parameter of the array. The blocks are written to their respective disks simultaneously on the same sector. This allows smaller sections of the entire chunk of data to be read off the drive in parallel, increasing bandwidth. RAID 0 does not implement error checking, so any error is uncorrectable. More disks in the array means higher bandwidth, but greater risk of data loss.
- In RAID 1 (mirroring without parity or striping), data is written identically to multiple disks (a “mirrored set”). Although many implementations create sets of 2 disks, sets may contain 3 or more disks. Array provides fault tolerance from disk errors or failures and continues to operate as long as at least one drive in the mirrored set is functioning. With appropriate operating system support, there can be increased read performance, and only a minimal write performance reduction. Using RAID 1 with a separate controller for each disk is sometimes called duplexing.
- In RAID 2 (bit-level striping with dedicated Hamming-code parity), all disk spindle rotation is synchronized, and data is striped such that each sequential bit is on a different disk. Hamming-code parity is calculated across corresponding bits on disks and stored on one or more parity disks. Extremely high data transfer rates are possible.
- In RAID 3 (byte-level striping with dedicated parity), all disk spindle rotation is synchronized, and data is striped such that each sequential byte is on a different disk. Parity is calculated across corresponding bytes on disks and stored on a dedicated parity disk. Very high data transfer rates are possible.
- RAID 4 (block-level striping with dedicated parity) is identical to RAID 5 (see below), but confines all parity data to a single disk, which can create a performance bottleneck. In this setup, files can be distributed between multiple disks. Each disk operates independently which allows I/O requests to be performed in parallel, though data transfer speeds can suffer due to the type of parity. The error detection is achieved through dedicated parity and is stored in a separate, single disk unit.
- RAID 5 (block-level striping with distributed parity) distributes parity along with the data and requires all drives but one to be present to operate; drive failure requires replacement, but the array is not destroyed by a single drive failure. Upon drive failure, any subsequent reads can be calculated from the distributed parity such that the drive failure is masked from the end user. The array will have data loss in the event of a second drive failure and is vulnerable until the data that was on the failed drive is rebuilt onto a replacement drive. A single drive failure in the set will result in reduced performance of the entire set until the failed drive has been replaced and rebuilt.
- RAID 6 (block-level striping with double distributed parity) provides fault tolerance from two drive failures; array continues to operate with up to two failed drives. This makes larger RAID groups more practical, especially for high-availability systems. This becomes increasingly important as large-capacity drives lengthen the time needed to recover from the failure of a single drive. Single-parity RAID levels are as vulnerable to data loss as a RAID 0 array until the failed drive is replaced and its data rebuilt; the larger the drive, the longer the rebuild will take. Double parity gives time to rebuild the array without the data being at risk if a single additional drive fails before the rebuild is complete.
Identify supported HBA types
There is a VMware document about all the supported HBA types. See the Storage/SAN Compatibility Guide on the VMware website. http://www.vmware.com/resources/compatibility/pdf/vi_san_guide.pdf
Identify virtual disk format types
- The supported disk formats in ESX and ESXi are. (Source: VMware KB Article:1022242)
- zeroedthick (default) – Space required for the virtual disk is allocated during the creation of the disk file. Any data remaining on the physical device is not erased during creation, but is zeroed out on demand at a later time on first write from the virtual machine. The virtual machine does not read stale data from disk.
- eagerzeroedthick – Space required for the virtual disk is allocated at creation time. In contrast to zeroedthick format, the data remaining on the physical device is zeroed out during creation. It might take much longer to create disks in this format than to create other types of disks.
- thick – Space required for the virtual disk is allocated during creation. This type of formatting does not zero out any old data that might be present on this allocated space. A non-root user cannot create disks of this format.
- thin – Space required for the virtual disk is not allocated during creation, but is supplied and zeroed out, on demand at a later time.
- rdm – Virtual compatibility mode for raw disk mapping.
- rdmp – Physical compatibility mode (pass-through) for raw disk mapping.
- raw – Raw device.
- 2gbsparse – A sparse disk with 2GB maximum extent size. You can use disks in this format with other VMware products, however, you cannot power on sparse disk on a ESX host till you reimport the disk with vmkfstools in a compatible format, such as thick or thin.
- monosparse – A monolithic sparse disk. You can use disks in this format with other VMware products.
- monoflat – A monolithic flat disk. You can use disks in this format with other VMware products.
Determine use cases for and configure VMware DirectPath I/O
VMDirectPath allows guest operating systems to directly access an I/O device, bypassing the virtualization layer. This direct path, or passthrough can improve performance for VMware ESX and VMware ESXi systems that utilize high‐speed I/O devices, such as 10 Gigabit Ethernet.
ESX Host Requirements
VMDirectPath supports a direct device connection for virtual machines running on Intel Xeon 5500 systems, which feature an implementation of the I/O memory management unit (IOMMU) called Virtual Technology for Directed I/O (VT‐d). VMDirectPath can work on AMD platforms with I/O Virtualization Technology (AMD IOMMU), but this configuration is offered as experimental support. Some machines might not have this technology enabled in the BIOS by default. Refer to your hardware documentation to learn how to enable this technology in the BIOS.
Enable or Disable VMDirectPath
Enable or disable VMDirectPath through the hardware advanced settings page of the vSphere Client. Reboot the ESX host after enabling or disabling VMDirectPath.
Disable VMDirectPath and reboot the ESX host before removing physical devices.
To find the VMDirectPath Configuration page in the vSphere Client
1 Select the ESX host from Inventory.
2 Select the Configuration tab.
3 Select Advanced Settings under Hardware.
To disable and disconnect the PCI Device
1 Use the vSphere Client to disable or remove the VMDirectPath configuration.
2 Reboot the ESX host.
3 Physically remove the device from the ESX host.
For more information see VMware document Configuration Examples and Troubleshooting for VMDirectPath
See also VMware KB article 1010789 Configuring VMDirectPath I/O pass-through devices on an ESX host.
For more information see Petri IT Knowledgebase http://www.petri.co.il/vmware-esxi4-vmdirectpath.htm
Determine requirements for and configure NPIV
This information is from a blog article made by Simon Long. See http://www.simonlong.co.uk/blog/2009/07/27/npiv-support-in-vmware-esx4/
What does NPIV do? NPIV is a useful Fibre Channel feature which allows a physical HBA (Host BUS Adapter) to have multiple Node Ports. Normally, a physical HBA would have only 1 N_Port ID. The use of NPIV enables you to have multiple unique N_Port ID’s per physical HBA. NPIV can be used by ESX4 to allow more Fibre Channel connections than the maximum physical allowance which is currently 8 HBA’s per Host or 16 HBA Ports per Host.
What are the Advantages of using NPIV?
- Standard storage management methodology across physical and virtual servers.
- Portability of access privileges during VM migration.
- Fabric performance, as NPIV provides quality of service (QoS) and prioritization for ensured VM-level bandwidth assignment.
- Auditable data security due to zoning (one server, one zone).
Can NPIV be used with VMware ESX4? Yes! But NPIV can only be used with RDM disks and will not work for virtual disks. VM’s with regular virtual disks use the WWN’s of the host’s physical HBA’s. To use NPIV with ESX4 you need the following;
- FC Switches that are used to access storage must be NPIV-Aware.
- The ESX Server host’s physical HBA’s must support NPIV.
How does NPIV work with VMware ESX4? When NPIV is enabled on a Virtual Machine, 8 WWN (Worldwide Name) pairs (WWPN (Port) & WWNN (Node) ) are specified for that VM on creation. Once the VM has been Powered On the VMKernel initiates a VPORT (Virtual Port) on the physical HBA which is used to access the Fibre Channel network. Once the VPORT is ready the VM then uses each of these WWN pairs in sequence to try to discover an access path to the Fibre Channel network.
VPORT’s appear to the FC network as a physical HBA because of its unique WWN’s, but an assigned VPORT will be removed from the ESX Host when the VM is Powered Off.
How is NPIV configured in VMware ESX4?
Before you try to enable NPIV on a VM, the VM must have an RDM added. If your VM does not, the NPIV options are greyed out and you will see a warning.
For a document with some screenshots on a ESX4 server see the Brocade Technical Brief: How to Configure NPIV on VMware vSphere 4.0
Determine appropriate RAID level for various Virtual Machine workloads
There are some very interesting sites and documents to check out.
There is a VMware document about the best practices for VMFS, http://www.vmware.com/pdf/vmfs-best-practices-wp.pdf
The Yellow Bricks Blog article IOps?, http://www.yellow-bricks.com/2009/12/23/iops/
The VMToday blog article about Storage Basics – Part VI: Storage Workload Characterization, http://vmtoday.com/2010/04/storage-basics-part-vi-storage-workload-characterization/
All these blogs provide information about how to select the best RAID level for Virtual Machine workloads.
Apply VMware storage best practices
Best Practices for Configuring Virtual Storage. Source http://www.vmware.com/technical-resources/virtual-storage/best-practices.html
Many of the best practices for physical storage environments also apply to virtual storage environments. It is best to keep in mind the following rules of thumb when configuring your virtual storage infrastructure:
Configure and size storage resources for optimal I/O performance first, then for storage capacity.
This means that you should consider throughput capability and not just capacity. Imagine a very large parking lot with only one lane of traffic for an exit. Regardless of capacity, throughput is affected. It’s critical to take into consideration the size and storage resources necessary to handle your volume of traffic—as well as the total capacity.
Aggregate application I/O requirements for the environment and size them accordingly.As you consolidate multiple workloads onto a set of ESX servers that have a shared pool of storage, don’t exceed the total throughput capacity of that storage resource. Looking at the throughput characterization of physical environment prior to virtualization can help you predict what throughput each workload will generate in the virtual environment.
Base your storage choices on your I/O workload.Use an aggregation of the measured workload to determine what protocol, redundancy protection and array features to use, rather than using an estimate. The best results come from measuring your applications I/O throughput and capacity for a period of several days prior to moving them to a virtualized environment.
Remember that pooling storage resources increases utilization and simplifies management, but can lead to contention.There are significant benefits to pooling storage resources, including increased storage resource utilization and ease of management. However, at times, heavy workloads can have an impact on performance. It’s a good idea to use a shared VMFS volume for most virtual disks, but consider placing heavy I/O virtual disks on a dedicated VMFS volume or an RDM to reduce the effects of contention.
There is also a VMware document named Performance Best Practices for VMware vSphere 4.1
http://www.vmware.com/pdf/Perf_Best_Practices_vSphere4.1.pdf Page 11 describes some Storage considerations.
Understand use cases for Raw Device Mapping
The source for this article is http://www.douglaspsmith.com/home/2010/7/18/understand-use-cases-for-raw-device-mapping.html
Raw device mapping (RDM) is a method for a VM to have direct access to a LUN on a Fibre Channel or iSCSI system. RDM is a mapping file in a separate VMFS volume that acts as a proxy for a raw physical storage device. The RDM allows a virtual machine to directly access and use the storage device. The RDM contains metadata for managing and redirecting disk access to the physical device.
RDM offers several benefits:
- User-Friendly Persistent Names
- Dynamic Name Resolution
- Distributed File Locking
- File Permissions
- File System Operations
- Snapshots
- vMotion
- SAN Management Agents
- N-Port ID Virtualization
Certain limitations exist when you use RDMs:
- Not available for block devices or certain RAID devices
- Available with VMFS-2 and VMFS-3 volumes only
- No snapshots in physical compatibility mode
- No partition mapping
You need to use raw LUNs with RDMs in the following situations:
- When SAN snapshot or other layered applications are run in the virtual machine. The RDM better enables scalable backup offloading systems by using features inherent to the SAN.
- In any MSCS clustering scenario that spans physical hosts — virtual-to-virtual clusters as well as physical-to-virtual clusters. In this case, cluster data and quorum disks should be configured as RDMs rather than as files on a shared VMFS.
Configure vCenter Server storage filters
The information was gathered from the ESX Configuration Guide.
vCenter Server provides storage filters to help you avoid storage device corruption or performance degradation that can be caused by an unsupported use of LUNs. These filters are available by default:
- VMFS Filter – Filters out storage devices, or LUNs, that are already used by a VMFS datastore on any host managed by vCenter Server.
- RDM Filter – Filters out LUNs that are already referenced by an RDM on any host managed by vCenter Server.
- Same Host and Transports Filter – Filters out LUNs ineligible for use as VMFS datastore extents because of host or storage type incompatibility.
- Host Rescan Filter – Automatically rescans and updates VMFS datastores after you perform datastore management operations.
Procedure
1. In the vSphere Client, select Administration > vCenter Server Settings.
2. In the settings list, select Advanced Settings.
3. In the Key text box, type a key.
config.vpxd.filter.vmfsFilter -> VMFS Filter config.vpxd.filter.rdmFilter -> RDM Filter config.vpxd.filter.SameHostAndTransportsFilter -> Same Host and Transports Filter config.vpxd.filter.hostRescanFilter -> Host Rescan Filter
4. In the Value text box, type False for the specified key.
5. Click Add.
6. Click OK.
For a more background info see the Yellow Bricks blog article Storage Filters. http://www.yellow-bricks.com/2010/08/11/storage-filters/
Currently 4 filters have been made public:
- config.vpxd.filter.hostRescanFilter
- config.vpxd.filter.vmfsFilter
- config.vpxd.filter.rdmFilter
- config.vpxd.filter.SameHostAndTransportsFilter
The “Host Rescan Filter” makes it possible to disable the automatic storage rescan that occurs on all hosts after a VMFS volume has been created. The reason you might want to avoid this is when you adding multiple volumes and want to avoid multiple rescans but just initiate a single rescan after you create your final volume. By setting “config.vpxd.filter.hostRescanFilter” to false the automatic rescan is disabled. In short the steps needed:
1. Open up the vSphere Client
2. Go to Administration -> vCenter Server
3. Go to Settings -> Advanced Settings
4. If the key “config.vpxd.filter.hostRescanFilter” is not available add it and set it to false
To be honest this is the only storage filter I would personally recommend using. For instance “config.vpxd.filter.rdmFilter” when set to “false” will enable you to add a LUN as an RDM to a VM while this LUN is already used as an RDM by a different VM. Now that can be useful in very specific situations like when MSCS is used, but in general should be avoided as data could be corrupted when the wrong LUN is selected.
The filter “config.vpxd.filter.vmfsFilter” can be compared to the RDM filter as when set to false it would enable you to overwrite a VMFS volume with VMFS or re-use as an RDM. Again, not something I would recommend enabling as it could lead to loss of data which has a serious impact on any organization.
Same goes for “config.vpxd.filter.SameHostAndTransportsFilter”. When it is set to “False” you can actually add an “incompatible LUN” as an extend to an existing volume. An example of an incompatible LUN would for instance be a LUN which is not presented to all hosts that have access to the VMFS volume it will be added to. I can’t really think of a single reason to change the defaults on this setting to be honest besides troubleshooting, but it is good to know they are there.
Most of the storage filters have its specific use cases. In general storage filters should be avoided, except for “config.vpxd.filter.hostRescanFilter” which has proven to be useful in specific situations.
Understand and apply VMFS resignaturing
The information was gathered from the ESX Configuration Guide.
Resignaturing VMFS Copies.
Use datastore resignaturing to retain the data stored on the VMFS datastore copy. When resignaturing a VMFS
copy, ESX assigns a new UUID and a new label to the copy, and mounts the copy as a datastore distinct from
the original.
The default format of the new label assigned to the datastore is snap-snapID-oldLabel, where snapID is an
integer and oldLabel is the label of the original datastore.
When you perform datastore resignaturing, consider the following points:
- Datastore resignaturing is irreversible.
- The LUN copy that contains the VMFS datastore that you resignature is no longer treated as a LUN copy.
- A spanned datastore can be resignatured only if all its extents are online.
- The resignaturing process is crash and fault tolerant. If the process is interrupted, you can resume it later.
- You can mount the new VMFS datastore without a risk of its UUID colliding with UUIDs of any other datastore, such as an ancestor or child in a hierarchy of LUN snapshots.
Resignature a VMFS Datastore Copy.
Use datastore resignaturing if you want to retain the data stored on the VMFS datastore copy.
To resignature a mounted datastore copy, first unmount it. Before you resignature a VMFS datastore, perform a storage rescan on your host so that the host updates its view of LUNs presented to it and discovers any LUN copies.
Procedure
1. Log in to the vSphere Client and select the server from the inventory panel.
2. Click the Configuration tab and click Storage in the Hardware panel.
3. Click Add Storage.
4. Select the Disk/LUN storage type and click Next.
5. From the list of LUNs, select the LUN that has a datastore name displayed in the VMFS Label column and click Next. The name present in the VMFS Label column indicates that the LUN is a copy that contains a copy of an existing VMFS datastore.
6. Under Mount Options, select Assign a New Signature and click Next.
7. In the Ready to Complete page, review the datastore configuration information and click Finish.
Understand and apply LUN masking using PSA-related commands
What is Lun Masking?
LUN (Logical Unit Number) Masking is an authorization process that makes a LUN available to some hosts and unavailable to other hosts.
LUN Masking is implemented primarily at the HBA (Host Bus Adapater) level. LUN Masking implemented at this level is vulnerable to any attack that compromises the HBA.
Some storage controllers also support LUN Masking.
LUN Masking is important because Windows based servers attempt to write volume labels to all available LUN’s. This can render the LUN’s unusable by other operating systems and can result in data loss.
A Blogpost from Jason Langer describes this. See http://virtuallanger.wordpress.com/2010/10/08/understand-and-apply-lun-masking-using-psa-related-commands/
Masking Paths will allow you to prevent an ESX/ESXi host from accessing storage devices or LUNs or from using individual paths to a LUN. When you mask paths, you create claim rules that assign the MASK_PATH plug-in to the specified paths. Use the vSphere CLI commands to mask the paths.
Look at the Multipath Plug-ins currently install on your ESX/ESXi host:
esxcfg-mpath –G
The output indicates that there are, at a minimum, 2 plug-ins: the VMware Native Multipath Plug-in (NMP) and the MASK_PATH plug-in, which is used for masking LUNs
List all the claimrules currently on the ESX/ESXi host:
esxcli corestorage claimrule list
There are two MASK_PATH entries: one of class runtime and the other of class file. The runtime is the rules currently running in the PSA. The file is a reference to the rules defined in /etc/vmware/esx.conf. These are identical, but they could be different if you are in the process of modifying the /etc/vmware/esx.conf.
Add a rule to hide the LUN with the command
esxcli corestorage claimrule add –rule <number> -t location –A <hba_adapter> -C <channel> -T <target> -L <lun> -P MASK_PATH
Note – Use the esxcfg-mpath –b and esxcfg-scsidevs –l commands to identify disk and LUN information
Verify that the rule has taken with the command:
esxcli corestorage claimrule list
Re-examine your claim rules and you verify that you can see both the file and runtime class:
esxcli corestorage claimrule list
Unclaim all paths to a device and then run the loaded claimrules on each of the paths to reclaim them:
esxcli corestorage claiming reclaim –d <naa.id>
Verify that the masked device is no longer used by the ESX/ESXi host:
esxcfg-scsidevs –m
The masked datastore does not appear in the list
To verify that a masked LUN is no longer an active device
esxcfg-mpath –L | grep <naa.id>
Empty output indicates that the LUN is not active
Source: VMware KB 1009449 and KB 1014953
For an overview of PSA and commands, see VMware vSphere 4.1 PSA at http://geeksilver.wordpress.com/2010/08/17/vmware-vsphere-4-1-psa-pluggable-storage-architecture-understanding/
Analyze I/O workloads to determine storage performance requirements
There is a document from VMware called Storage Workload Characterization and Consolidation in Virtualized
Environments, with a lot of info.
Josh Townsend created a series of blog posts on his blog VMToday about everything you need to know about storage. See:
- Storage Basics – Part I: An Introduction
- Storage Basics – Part II: IOPS
- Storage Basics – Part III: RAID
- Storage Basics – Part IV: Interface
- Storage Basics – Part V: Controllers, Cache and Coalescing
- Storage Basics – Part VI: Storage Workload Characterization
- Storage Basics – Part VII: Storage Alignment
Links
Documents and manuals
Fibre Channel SAN Configuration Guide: http://www.vmware.com/pdf/vsphere4/r41/vsp_41_san_cfg.pdf
iSCSI SAN Configuration Guide: http://www.vmware.com/pdf/vsphere4/r41/vsp_41_iscsi_san_cfg.pdf
ESX Configuration Guide: http://www.vmware.com/pdf/vsphere4/r41/vsp_41_esx_server_config.pdf
ESXi Configuration Guide: http://www.vmware.com/pdf/vsphere4/r41/vsp_41_esxi_server_config.pdf
vSphere Command-Line Interface Installation and Scripting Guide: http://www.vmware.com/pdf/vsphere4/r41/vsp4_41_vcli_inst_script.pdf
Sources
- http://en.wikipedia.org/wiki/RAID
- http://www.vmware.com/resources/compatibility/pdf/vi_san_guide.pdf
- http://kb.vmware.com/kb/1022242
- http://vmware.com/files/pdf/techpaper/VMW-vSphere4-directpath-host.pdf
- http://kb.vmware.com/kb/1010789
- http://www.petri.co.il/vmware-esxi4-vmdirectpath.htm
- http://www.simonlong.co.uk/blog/2009/07/27/npiv-support-in-vmware-esx4
- http://www.brocade.com/downloads/documents/white_papers/white_papers_partners/NPIV_ESX4_0_GA-TB-145-01.pdf
- http://www.vmware.com/pdf/vmfs-best-practices-wp.pdf
- http://www.yellow-bricks.com/2009/12/23/iops/
- http://vmtoday.com/2010/04/storage-basics-part-vi-storage-workload-characterization/
- http://www.vmware.com/technical-resources/virtual-storage/best-practices.html
- http://www.vmware.com/pdf/Perf_Best_Practices_vSphere4.1.pdf
- http://www.douglaspsmith.com/home/2010/7/18/understand-use-cases-for-raw-device-mapping.html
- http://www.yellow-bricks.com/2010/08/11/storage-filters/
- http://virtuallanger.wordpress.com/2010/10/08/understand-and-apply-lun-masking-using-psa-related-commands/
- http://geeksilver.wordpress.com/2010/08/17/vmware-vsphere-4-1-psa-pluggable-storage-architecture-understanding/
- http://www.vmware.com/files/pdf/partners/academic/vpact-workloads.pdf
- http://vmtoday.com/2009/12/storage-basics-part-i-intro/
- http://vmtoday.com/2009/12/storage-basics-part-ii-iops/
- http://vmtoday.com/2010/01/storage-basics-part-iii-raid/
- http://vmtoday.com/2010/01/storage-basics-part-iv-interface/
- http://vmtoday.com/2010/03/storage-basics-part-v-controllers-cache-and-coalescing/
- http://vmtoday.com/2010/04/storage-basics-part-vi-storage-workload-characterization/
- http://vmtoday.com/2010/06/storage-basics-part-vii-storage-alignment/
If there are things missing or incorrect please let me know.
Disclaimer.
The information in this article is provided “AS IS” with no warranties, and confers no rights. This article does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion.