High write latency on HP StoreVirtual environment with VMware vSphere 5.1

Last week I started a new project with six new HP StoreVirtual storage nodes (HP P4330G3) and four VMware vSphere 5.1U1 ESXi servers. After successfully installing VMware vCenter 5.1U1 I started to notice a big write latency issue on my freshly created storage LUNs. The latency’s ran up to 200ms and sometimes even higher. This was not normal, so I started searching.

First I thought it was the networking. We are using two brand new Cisco 3750x switches, and two existing ones. After a couple of tests and asking the network guys if there is something wrong we concluded that network wise everything was ok. But why the terrible performance on the Storevirtual cluster.

I changed some networking configurations on my VMware ESXi hosts. Still the same high latencies. This was also happening when I copy files from one datastore to another datastore.

To rule out VMware I created a separate test LUN and connected a physical Windows 2008R2 server to that LUN. I started to copy some big files. Again a very high latency, even up to 400ms.

See screenshot of the Write Latency when copying a big file from Windows 2008R2 to the test LUN. This is a HP StoreVirtual problem I concluded.


I contacted HP Support and explained the problem. They asked me to create a Support Bundle and send it to them. The Support Engineer thought it was an RAID cache problem. There are some units shipped out with the wrong cache settings configured he told me.

So I created the Support Bundle. And send it out to HP. I am a very curious guy and wanted to check out if this was really the problem, so i started searching in the support bundle.

I opened the support bundle file, and located a file called ADUReport.htm which is located deep in the support bundle in a vendorlogs.tar file.

I try to explain where..

  1. Open the file [IP_of_storagenode.tar.gz]
  2. Open mnt\logs\vendorlogs.tar
  3. Open hpadu\ADUReport.htm

This file contains the complete configuration of the StoreVirtual server. Look for the Smart Array P420i subject and check the Cache size settings. 

Screenshot of the wrong settings.


In my case this is not set correctly. I also checked the other nodes. And some of them where correctly configured. 

Screenshot of the correct settings.


In total, three nodes where wrong and three where correct.

After changing the settings in de RAID controller my latency’s disappeared.. And everything was performing normally again. See screenshot.

Conclusion, when there is a high write latency on all your LUN’s. Create a support bundle and check the cache settings of the RAID controller. This could be the problem…

Thanks HP for your support and the fix…

Related articles:

Disclaimer.
The information in this article is provided “AS IS” with no warranties, and confers no rights. This article does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion.

Marco

Marco works for ViaData as a Senior Technical Consultant. He has over 15 years experience as a system engineer and consultant, specialized in virtualization. VMware VCP4, VCP5-DC & VCP5-DT. VMware vExpert 2013, 2014,2015 & 2016. Microsoft MCSE & MCITP Enterprise Administrator. Veeam VMSP, VMTSP & VMCE.

6 Comments

  1. jbrunner007

    Just an FYI – you want to use an Arista or HP procurve switch – the 3750X are not good iscsi switches. they dont have good asic’s and large enough buffers for storage traffic even with jumbo’s turned on

  2. jan maas

    After changing the settings in de RAID controller my latency’s disappeared

    How did you change the settings?

  3. Start the HP ACU from the HP Support Pack DVD. And if you are using a HP Gen8 server, you can boot the server into the ACU tool. I think it is by pressing the F5 key when the RAID controller is initializing. Good luck.

  4. jan maas

    Marco,

    Thanks for the fast reply.
    The settings must be changed on de Storevirtual 4730.
    I thought maybe some setting or extra software to manage de cache of the raid set.
    No option to bring down the Storevirtual at the moment because its in use.
    If you have any tip I would be thankful.

    Kind regards,

    Jan

  5. Jan, I don’t think there is any other solution. The Storevirtual must power down to change the cache setting.

  6. ced.syn

    Hello,
    The mnt\logs\vendorlogs.tar do not contains the html file.
    Do you generate the SupportBundle from the Management Group?

    About changing the settings in the ACU, are you talking about the 10%(read)/90%(write) default settings?

    We have SSD on our HP P840 and poor performances. We have disabled the SSDSmartPath and enable the raid controller cache.

    Thanks

Leave a Reply