High write latency on HP StoreVirtual environment with VMware vSphere 5.1

Last week I started a new project with six new HP StoreVirtual storage nodes (HP P4330G3) and four VMware vSphere 5.1U1 ESXi servers. After successfully installing VMware vCenter 5.1U1 I started to notice a big write latency issue on my freshly created storage LUNs. The latency’s ran up to 200ms and sometimes even higher. This was not normal, so I started searching.

First I thought it was the networking. We are using two brand new Cisco 3750x switches, and two existing ones. After a couple of tests and asking the network guys if there is something wrong we concluded that network wise everything was ok. But why the terrible performance on the Storevirtual cluster.

I changed some networking configurations on my VMware ESXi hosts. Still the same high latencies. This was also happening when I copy files from one datastore to another datastore.

To rule out VMware I created a separate test LUN and connected a physical Windows 2008R2 server to that LUN. I started to copy some big files. Again a very high latency, even up to 400ms.

See screenshot of the Write Latency when copying a big file from Windows 2008R2 to the test LUN. This is a HP StoreVirtual problem I concluded.


I contacted HP Support and explained the problem. They asked me to create a Support Bundle and send it to them. The Support Engineer thought it was an RAID cache problem. There are some units shipped out with the wrong cache settings configured he told me.

So I created the Support Bundle. And send it out to HP. I am a very curious guy and wanted to check out if this was really the problem, so i started searching in the support bundle.

I opened the support bundle file, and located a file called ADUReport.htm which is located deep in the support bundle in a vendorlogs.tar file.

I try to explain where..

  1. Open the file [IP_of_storagenode.tar.gz]
  2. Open mnt\logs\vendorlogs.tar
  3. Open hpadu\ADUReport.htm

This file contains the complete configuration of the StoreVirtual server. Look for the Smart Array P420i subject and check the Cache size settings. 

Screenshot of the wrong settings.


In my case this is not set correctly. I also checked the other nodes. And some of them where correctly configured. 

Screenshot of the correct settings.


In total, three nodes where wrong and three where correct.

After changing the settings in de RAID controller my latency’s disappeared.. And everything was performing normally again. See screenshot.

Conclusion, when there is a high write latency on all your LUN’s. Create a support bundle and check the cache settings of the RAID controller. This could be the problem…

Thanks HP for your support and the fix…

Related articles:

Disclaimer.
The information in this article is provided “AS IS” with no warranties, and confers no rights. This article does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion.

Marco

Marco works for ViaData as a Senior Technical Consultant. He has over 15 years experience as a system engineer and consultant, specialized in virtualization. VMware VCP4, VCP5-DC & VCP5-DT. VMware vExpert 2013, 2014,2015 & 2016. Microsoft MCSE & MCITP Enterprise Administrator. Veeam VMSP, VMTSP & VMCE.