Wednesday, December 3, 2014

VNXe- proof of how to do storage configuration wrong... very wrong.

~This article was originally to be published in February of 2014~


For the last few weeks, I've been dealing with an environment where storage has been massively misconfigured. Dual VNXe 3100s, only one in use. Storage network running on a single, incredibly slow HP 1410 switch, single NIC, single path... pretty much everything you can do wrong and still have it work.

So I've been trying to fix this in order to get backups to run in a reasonable amount of time- added dedicated switching, trying to get the second VNXe available for access to the local vmware cluster, etc. The first thing I run into is how incredibly slow this system is- any thing I do literally takes 160 seconds to complete. How do I know? I sat there with a stopwatch and clocked several of my actions, because I couldn't believe how slow it seemed- I must have been spoiled! I found to my horror that it's literally 2 minutes, 40 seconds per action. Dear god I hate this thing.

Going over the vmhosts, I come to find that there isn't consistent datastore mappings- each host has most datastores in common, but there were several that were only on one host, or two hosts. So I spend some time correcting this when I come across another horrible discovery- there were duplicate entries for hosts, and some were flat out wrong (or in one case, completely identical!)

This is where I made the critical mistake- I had gotten the switching in place, I had gotten the SANs moved over, but I hadn't cleaned up the access lists. I started on that, with the rational thought that it would work in a sensible manner. In my mind, I would simply change access over from IP allow lists to IQNs (simpler to manage, right?)- oh so wrong.

This thing had been setup to talk to vCenter, and so talk it did. For some ungodly reason, it decided that since vCenter knew about the datastore associated with the LUN that it could not change the access method (IQN vs IP)- and throws up this beautiful error:

"The changes could not be applied the following error was encountered:
 The datastore name is already in use on the ESX server
error code: 0x600d50"

WTF? Why on earth would this matter? Proper MPIO would allow multiple connections to the same LUN from the same initiator without error, so why on earth would this matter in the least? So, being the trusting, happy go-lucky admin that I am, I click OK.

And developed an ulcer. That instant.

Two of the three hosts were kicked off the VNXe immediately.

Ok I think to myself, easy fix. I'll just go back in and give myself permission again, undoing my changes.

Oh no. no no no- it's not that easy. It's nowhere near that easy. Nothing I do is letting me reattach these LUNs. _Nothing_. Now I'm getting spooked- is the data even still there? Did I somehow just obliterate the customer's data? I dig through emails and documents, finding the credentials I need to get on EMCs support site, where I run into the first hurdle.

Error code 0x600d50 is apparently not something customers need know about, so if you get that error, you're pretty well screwed. It also doesn't help that every reference to it is in regards to renaming datastores on the vmware side without doing it on the storage side- apparently does bad things. But this doesn't concern me, right? I made no such change!

There's apparently a lot more to this error than one condition- but it's so piss poorly documented that one will never find out. Now I'm really panicked, so I click on the chat with support button. I fill in all the details, and even manage to find the serial number for the device I'm having a headache with- and then click "submit".

And promptly get told support's not available via web chat.

ok...

Calling in to EMC's support line, I get told immediately by a very friendly recording that I'll get quicker support... if I use the chat client. Yep, this was going to be one of those calls. After navigating the menu system, I get to a young man who is completely lost by the gibberish coming out of my mouth- but he does get me in touch with a woman who understands me perfectly (I wish I could say the same about what she was saying)- I managed to secure a call back promise from her.

I wish I could say the nightmare ends here. The customer is down, and I've notified them that I'm working on it. They're mostly ok with it, as I'm working on it. At this point I'm pretty freaked, as I'm waiting on a call back that I'm not even sure is coming. 30 minutes later, the call back finally happens. And I walk the tech through what's going on, and he immediately starts trying to do all the things I had done.

Which is about when I really start to shiver- the tech was actually expecting it to work too. He gets the senior tech involved, which I overhear in the background saying that I need to remove all the datastores from the VM hosts.

Can't do it-  vmware refuses to unmount the datastores as long as there's a vm on it. So after arguing with support and realizing I'm not going to get anywhere otherwise, I power off every vm. and rescan the HBAs...

Which changes nothing.
At all.

And now I'm freaked because the support techs want me to remove the VMs from inventory. Now that I can't see the datastores in order to record what VM's go where (why is this a problem you ask? because I just took over the environment!)

I flat out refuse- and we go onto the next step. Which involves resetting the SAN. Needless to say, I refused to do that too.

What did eventually work you ask? Going into the VNXe where the problem originally existed and removing every host entry. Letting the SAN sit and fiddle with itself for awhile, while rescanning the HBA on the hosts. This at least cleared out the datastore lists- not a reassuring thing in the least mind you. Now I re-added the hosts one by one until I had them back in using the access methods I wanted them to use.

And I attached the first LUN.

And waited forever, or so it seemed. This was when I decided to use a stopwatch to figure out how long this was taking. Each LUN I reattach is taking ~3 minutes per LUN. 44 LUNs.

44 LUNs at 3 minutes per. 132 minutes to reattach these datastores. 132 minutes before I can even attempt to get the customer back online. 132 minutes of mind numbing, nail biting, customer frustrating hell.

So, I ask... Is there some way I can do these in bulk? "Nope"

I finally managed to thank the tech for his time and get off the phone. Where upon I've been stewing for over 2 hours, thinking about how utterly stupid this is. Fuming that I never had issues like this with Equallogic, HUS, or even open source linux iSCSI targets.

Why had I never run into these problems? Because none of those care one iota about what's accessing the volume. None of them try to do any screwed up LUN per datastore mapping, nor trying to enforce single host access or otherwise- why? Because they assume the storage admin knows what he's doing it.

Thursday, March 6, 2014

The horrid beast known as "vCloud"

So once again, I've been tasked with doing something that normally can be done just by going into the viclient or the webclient- namely, modifying disks for a vm. Only... this is in vcloud. The horrid nightmare that it is.

So we start off by logging in and getting auth. To do this, I've been using curl. I'm sure there's another way, possibly a way to do this using IIS/apache, what have you, but it's far outside the scope of what I'm doing at the moment.

curl -i -k -H "Accept:application/*+xml;version=1.5" -u <user>@system:<password> -X POST https://<vcloudHost>/api/sessions

This will return a string you will need for the rest of your transactions:

HTTP/1.1 200 OK
Date: Thu, 06 Mar 2014 16:32:03 GMT
x-vcloud-authorization: HeJTbAEggL97iqdVsqMiINFhu4ZIRsYlPRd96dipjvc=
Set-Cookie: vcloud-token=HeJTbAEggL97iqdVsqMiINFhu4ZIRsYlPRd96dipjvc=; Secure; Path=/
Content-Type: application/vnd.vmware.vcloud.session+xml;version=1.5
Date: Thu, 06 Mar 2014 16:32:03 GMT
Content-Length: 980

<?xml version="1.0" encoding="UTF-8"?>
<Session xmlns="http://www.vmware.com/vcloud/v1.5" user="<user>" org="System" type="application/vnd.vmware.vcloud.session+xml" href="https://<vcloudHost>/api/session/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.vmware.com/vcloud/v1.5 http://<vcloudHost>/api/v1.5/schema/master.xsd">
    <Link rel="down" type="application/vnd.vmware.vcloud.orgList+xml" href="https://<vcloudHost>/api/org/"/>
    <Link rel="down" type="application/vnd.vmware.admin.vcloud+xml" href="https://<vcloudHost>/api/admin/"/>
    <Link rel="down" type="application/vnd.vmware.admin.vmwExtension+xml" href="https://<vcloudHost>/api/admin/extension"/>
    <Link rel="down" type="application/vnd.vmware.vcloud.query.queryList+xml" href="https://<vcloudHost>/api/query"/>
    <Link rel="entityResolver" type="application/vnd.vmware.vcloud.entity+xml" href="https://<vcloudHost>/api/entity/"/>
</Session>

Notice the line that starts off with "x-vcloud-authorization"? You need the string following it.

Now to get a list of VM's

 curl -i -k -H "Accept:application/*+xml;version=1.5" -H "x-vcloud-authorization: HeJTbAEggL97iqdVsqMiINFhu4ZIRsYlPRd96dipjvc=" -X GET 'https://<vcloudHost>/api/query?type=adminVM&fields=name,datastoreName'

This will return a list of machines:

HTTP/1.1 200 OK
Date: Thu, 06 Mar 2014 16:32:26 GMT
Content-Type: application/*+xml;version=1.5
Date: Thu, 06 Mar 2014 16:32:26 GMT
Content-Length: 1911

<?xml version="1.0" encoding="UTF-8"?>
<QueryResultRecords xmlns="http://www.vmware.com/vcloud/v1.5" total="7" pageSize="25" page="1" name="adminVM" type="application/vnd.vmware.vcloud.query.records+xml" href="https://<vcloudHost>/api/query?type=adminVM&amp;page=1&amp;pageSize=25&amp;format=records&amp;fields=name,datastoreName" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.vmware.com/vcloud/v1.5 http://<vcloudHost>/api/v1.5/schema/master.xsd">
    <Link rel="alternate" type="application/vnd.vmware.vcloud.query.references+xml" href="https://<vcloudHost>/api/query?type=adminVM&amp;page=1&amp;pageSize=25&amp;format=references&amp;fields=name,datastoreName"/>
    <Link rel="alternate" type="application/vnd.vmware.vcloud.query.idrecords+xml" href="https://<vcloudHost>/api/query?type=adminVM&amp;page=1&amp;pageSize=25&amp;format=idrecords&amp;fields=name,datastoreName"/>
    <AdminVMRecord name="ubieFS3" datastoreName="HUS-3" href="https://<vcloudHost>/api/vApp/vm-0ccbd815-101c-4f3f-bc53-a482dd977e57"/>
    <AdminVMRecord name="ubieTS1" datastoreName="HUS-3" href="https://<vcloudHost>/api/vApp/vm-1dac5547-1764-4fff-a2f9-feca10629d3b"/>
    <AdminVMRecord name="ubieAPP1" datastoreName="HUS-3" href="https://<vcloudHost>/api/vApp/vm-2f1de9cf-1d77-4c6c-b454-58fdce96ceed"/>
    <AdminVMRecord name="ubieSQL1" datastoreName="HUS-3" href="https://<vcloudHost>/api/vApp/vm-58f1c7a3-7bd2-45e5-80c8-fb84674aabe4"/>
    <AdminVMRecord name="ubieDC11" datastoreName="HUS-3" href="https://<vcloudHost>/api/vApp/vm-83cdd93a-ee48-4846-a27c-1919ade3bf9c"/>
    <AdminVMRecord name="ubieEMAIL1" datastoreName="HUS-3" href="https://<vcloudHost>/api/vApp/vm-af966c82-e9e9-4f39-a7f3-21fbf9560ed4"/>
    <AdminVMRecord name="ubieDC12" datastoreName="HUS-3" href="https://<vcloudHost>/api/vApp/vm-dc5f8f09-072e-4717-94ed-d459ec566992"/>
</QueryResultRecords>

Now, we want the specifics on one VM:

 curl -i -k -H "Accept:application/*+xml;version=1.5" -H "x-vcloud-authorization: HeJTbAEggL97iqdVsqMiINFhu4ZIRsYlPRd96dipjvc=" -X GET 'https://<vcloudHost>/api/query?type=adminVM&filter=(name==ubieEMAIL1)'
HTTP/1.1 200 OK
Date: Thu, 06 Mar 2014 16:47:03 GMT
Content-Type: application/*+xml;version=1.5
Date: Thu, 06 Mar 2014 16:47:03 GMT
Content-Length: 1833

<?xml version="1.0" encoding="UTF-8"?>
<QueryResultRecords xmlns="http://www.vmware.com/vcloud/v1.5" total="1" pageSize="25" page="1" name="adminVM" type="application/vnd.vmware.vcloud.query.records+xml" href="https://<vcloudHost>/api/query?type=adminVM&amp;page=1&amp;pageSize=25&amp;format=records&amp;filter=(name==ubieEMAIL1)" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.vmware.com/vcloud/v1.5 http://<vcloudHost>/api/v1.5/schema/master.xsd">
    <Link rel="alternate" type="application/vnd.vmware.vcloud.query.references+xml" href="https://<vcloudHost>/api/query?type=adminVM&amp;page=1&amp;pageSize=25&amp;format=references&amp;filter=(name==ubieEMAIL1)"/>
    <Link rel="alternate" type="application/vnd.vmware.vcloud.query.idrecords+xml" href="https://<vcloudHost>/api/query?type=adminVM&amp;page=1&amp;pageSize=25&amp;format=idrecords&amp;filter=(name==ubieEMAIL1)"/>
    <AdminVMRecord vmToolsVersion="8389" vdc="https://<vcloudHost>/api/vdc/099a3580-ee14-4262-8eb5-eb0586786b58" vc="https://<vcloudHost>/api/admin/extension/vimServer/7c443115-8d45-42f0-b2f0-86d255d0e552" status="POWERED_ON" org="https://<vcloudHost>/api/org/dcd46410-3dee-47e8-a47f-e2bb99eb6cc7" numberOfCpus="2" networkName="ubie Org Ext" name="ubieEMAIL1" moref="vm-161" memoryMB="3072" isVdcEnabled="true" isVAppTemplate="false" isPublished="false" isDeployed="true" isDeleted="false" hostName="<clusterMember>" hardwareVersion="8" guestOs="Microsoft Windows Server 2008 R2 (64-bit)" datastoreName="HUS-3" containerName="ubieEMAIL1" container="https://<vcloudHost>/api/vApp/vapp-03219d73-4fe2-406b-8d32-85121f773a6a" href="https://<vcloudHost>/api/vApp/vm-af966c82-e9e9-4f39-a7f3-21fbf9560ed4" pvdcHighestSupportedHardwareVersion="8" containerStatus="RESOLVED"/>

Now that last section is what we need, specifically the "https://<vcloudHost>/api/vApp/vm-af966c82-e9e9-4f39-a7f3-21fbf9560ed4". We're going to use that to pull data on the disk layout.

curl -i -k -H "Accept:application/*+xml;version=1.5" -H "x-vcloud-authorization: HeJTbAEggL97iqdVsqMiINFhu4ZIRsYlPRd96dipjvc=" -X GET 'https://<vcloudHost>/api/vApp/vm-af966c82-e9e9-4f39-a7f3-21fbf9560ed4/virtualHardwareSection/disks'

HTTP/1.1 200 OK
Date: Thu, 06 Mar 2014 16:49:28 GMT
Content-Type: application/vnd.vmware.vcloud.rasditemslist+xml;version=1.5
Date: Thu, 06 Mar 2014 16:49:28 GMT
Content-Length: 2018

<?xml version="1.0" encoding="UTF-8"?>
<RasdItemsList xmlns="http://www.vmware.com/vcloud/v1.5" xmlns:rasd="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData" type="application/vnd.vmware.vcloud.rasdItemsList+xml" href="https://<vcloudHost>/api/vApp/vm-af966c82-e9e9-4f39-a7f3-21fbf9560ed4/virtualHardwareSection/disks" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.vmware.com/vcloud/v1.5 http://<vcloudHost>/api/v1.5/schema/master.xsd http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2.22.0/CIM_ResourceAllocationSettingData.xsd">
    <Link rel="edit" type="application/vnd.vmware.vcloud.rasdItemsList+xml" href="https://<vcloudHost>/api/vApp/vm-af966c82-e9e9-4f39-a7f3-21fbf9560ed4/virtualHardwareSection/disks"/>
    <Item>
        <rasd:Address>0</rasd:Address>
        <rasd:Description>SCSI Controller</rasd:Description>
        <rasd:ElementName>SCSI Controller 0</rasd:ElementName>
        <rasd:InstanceID>2</rasd:InstanceID>
        <rasd:ResourceSubType>lsilogicsas</rasd:ResourceSubType>
        <rasd:ResourceType>6</rasd:ResourceType>
    </Item>
    <Item>
        <rasd:AddressOnParent>0</rasd:AddressOnParent>
        <rasd:Description>Hard disk</rasd:Description>
        <rasd:ElementName>Hard disk 1</rasd:ElementName>
        <rasd:HostResource xmlns:vcloud="http://www.vmware.com/vcloud/v1.5" vcloud:capacity="40960" vcloud:busSubType="lsilogicsas" vcloud:busType="6"></rasd:HostResource>
        <rasd:InstanceID>2000</rasd:InstanceID>
        <rasd:Parent>2</rasd:Parent>
        <rasd:ResourceType>17</rasd:ResourceType>
    </Item>
    <Item>
        <rasd:Address>0</rasd:Address>
        <rasd:Description>IDE Controller</rasd:Description>
        <rasd:ElementName>IDE Controller 0</rasd:ElementName>
        <rasd:InstanceID>3</rasd:InstanceID>
        <rasd:ResourceType>5</rasd:ResourceType>
    </Item>
</RasdItemsList>

Now we're going to add a couple disks- this is where it gets weird. You're going to need to create an XML response file from the above configuration info, and you'll need to keep all the current info as well. Failure to do so can completely destroy your VM! You've been warned.

create your response file:

<?xml version="1.0" encoding="UTF-8"?>
<RasdItemsList xmlns="http://www.vmware.com/vcloud/v1.5" xmlns:rasd="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData" type="application/vnd.vmware.vcloud.rasdItemsList+xml" href="https://<vcloudHost>/api/vApp/vm-af966c82-e9e9-4f39-a7f3-21fbf9560ed4/virtualHardwareSection/disks" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.vmware.com/vcloud/v1.5 http://<vcloudHost>/api/v1.5/schema/master.xsd http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2.22.0/CIM_ResourceAllocationSettingData.xsd">
    <Link rel="edit" type="application/vnd.vmware.vcloud.rasdItemsList+xml" href="https://<vcloudHost>/api/vApp/vm-af966c82-e9e9-4f39-a7f3-21fbf9560ed4/virtualHardwareSection/disks"/>
    <Item>
        <rasd:Address>0</rasd:Address>
        <rasd:Description>SCSI Controller</rasd:Description>
        <rasd:ElementName>SCSI Controller 0</rasd:ElementName>
        <rasd:InstanceID>2</rasd:InstanceID>
        <rasd:ResourceSubType>lsilogicsas</rasd:ResourceSubType>
        <rasd:ResourceType>6</rasd:ResourceType>
    </Item>
    <Item>
        <rasd:AddressOnParent>0</rasd:AddressOnParent>
        <rasd:Description>Hard disk</rasd:Description>
        <rasd:ElementName>Hard disk 1</rasd:ElementName>
        <rasd:HostResource xmlns:vcloud="http://www.vmware.com/vcloud/v1.5" vcloud:capacity="40960" vcloud:busSubType="lsilogicsas" vcloud:busType="6"></rasd:HostResource>
        <rasd:InstanceID>2000</rasd:InstanceID>
        <rasd:Parent>2</rasd:Parent>
        <rasd:ResourceType>17</rasd:ResourceType>
    </Item>
    <Item>
        <rasd:AddressOnParent>1</rasd:AddressOnParent>
        <rasd:Description>Hard disk</rasd:Description>
        <rasd:ElementName>Hard disk 2</rasd:ElementName>
        <rasd:HostResource xmlns:vcloud="http://www.vmware.com/vcloud/v1.5" vcloud:capacity="153600" vcloud:busSubType="lsilogicsas" vcloud:busType="6"></rasd:HostResource>
        <rasd:InstanceID>2001</rasd:InstanceID>
        <rasd:Parent>2</rasd:Parent>
        <rasd:ResourceType>17</rasd:ResourceType>
    </Item>
    <Item>
        <rasd:AddressOnParent>2</rasd:AddressOnParent>
        <rasd:Description>Hard disk</rasd:Description>
        <rasd:ElementName>Hard disk 3</rasd:ElementName>
        <rasd:HostResource xmlns:vcloud="http://www.vmware.com/vcloud/v1.5" vcloud:capacity="102400" vcloud:busSubType="lsilogicsas" vcloud:busType="6"></rasd:HostResource>
        <rasd:InstanceID>2002</rasd:InstanceID>
        <rasd:Parent>2</rasd:Parent>
        <rasd:ResourceType>17</rasd:ResourceType>
    </Item>
    <Item>
        <rasd:Address>0</rasd:Address>
        <rasd:Description>IDE Controller</rasd:Description>
        <rasd:ElementName>IDE Controller 0</rasd:ElementName>
        <rasd:InstanceID>3</rasd:InstanceID>
        <rasd:ResourceType>5</rasd:ResourceType>
    </Item>
</RasdItemsList>

Now, provided you have enough room, you'll run the following:


curl -i -k -H "Accept:application/*+xml;version=1.5" -H "x-vcloud-authorization: HeJTbAEggL97iqdVsqMiINFhu4ZIRsYlPRd96dipjvc=" -X PUT 'https://<vcloudHost>/api/vApp/vm-af966c82-e9e9-4f39-a7f3-21fbf9560ed4/virtualHardwareSection/disks' -H “Content-Type: application/vnd.vmware.vcloud.rasdItemsList+xml” -d @create-disk

New disks created. Why on earth this is so miserable I have no idea, but it really shouldn't be.


Tuesday, January 28, 2014

Holy crap true believers!

And yes, stan the man would probably frown on that- but still.

So it's been the usual 4-5 months (or more) since my last update... talk about the more things change...

I've taken a job in Houston Texas to work for an MSP that I've known for awhile. In the two months that I've been here, I've learned quite a bit about tech that I've only wished I could work with. For instance- the HUS series of SANs from Hitachi. Beautiful boxes, easy to physically install... and incredibly easy to configure. I never thought I'd find something as simple as iscsi-target, but I was definitely wrong.  Add on top of that, a crash course in hacking vCloud. That's right, today I get to migrate VM's under vCloud using the REST API's. Tons more details at virtuallyghetto, but the highlight is- make sure your datastores are visible to vCloud.

Inside vCloud, you have to do all your management using vCloud Director- a web based headache, but it has it's purposes. To add your new datastores, first you have to add them the usual way using vmware's viclient or via the web console- either will work fine. Next, you have to log in via the web to your vCloud Director- not fun if you've not done this before.

You'll have to login as a system administrator for your vCloud, and navigate to "manage & monitor", find "Provider vDCs" and then select the right provider. In here, you'll add your datastore.

Not quite brain surgery, but nerve wracking when you're convinced it's going to explode on you at a moment's notice.