Virtualization: best practices for real life.

October 30th, 2008 by Josh Leave a reply »

My whole career has been about having things thrown at me and learning on the fly. This works out pretty well most of the time, especially when it’s a new-to-us technology that’s been around for a while already like Active Directory (which we didn’t start using until Server 2003 was out). With all of the published best-practices and configuration guides that were available by the time we implemented AD, it was practically a walk in the park.

There are times, however, when you’re actually keeping up with the trends in technology and have to deploy something that’s fairly new. I first deployed VMWare ESX Server a few years ago and there weren’t so many configuration options as there are now. It was a standalone server with a lot of local storage, and then all I had to do was manage the resources. Easy! So easy that we got another one to run some automation systems for the Engineering Department. Another success! After successfully reducing our hardware inventory without lowering our availability or system performance index, I convinced the powers-that-be that a full-blown virtualization platform was clearly the next step in our technological evolution.

What a day I had when the new toys were delivered. 4 ESX3 host servers connected to a 7TB EqualLogic iSCSI array, unleashing 64GHz of multi-core processing power and 64 Gigabytes of RAM into our available resource pool. The EqualLogic promotional material (and sales reps) touted that it would take longer to unpack than it would take to configure, and they were right. Within 2 hours of UPS dropping it off I had the array installed and configured, ESX installed on all 4 hosts, VirtualCenter Server installed on the management server, and several guest systems migrated from one of the standalone ESX servers. It was perfect!

But was it really perfect? Had I implemented this solution in the smartest way possible? No. No I hadn’t. There weren’t any show-stopping mistakes but there were a few things, which I will address in a moment, that I ended up going back and changing well after the initial installation.

iSCSI Network
Your switch should support flow control and jumbo frames at the same time. 10Ge uplink ports are great for future-proofing your switching platform.

I had a spare Procurve 2810-48 gigabit switch, so I was good to go! Perhaps in a small deployment, but the 2810 doesn’t support jumbo frames AND flow control at the same time, it’s one or the other. You can’t adjust the MTU on the EQL interfaces so there will be additional overhead on the switch as it negotiates the MTU size with the EQL interfaces. While VMWare does not officially support jumbo frames on the software iSCSI initiator, it has been recommended by several reputable sources that I do it anyway. I did so, replacing the 2810 with a 2900, which also has 10G uplink capability, making it a perfect fit for the two-switch configuration recommended by Dell/EQL.

NIC Teaming
You want load balancing and failover. Yes, you do.

Without knowing exactly how it would work, I used the default NIC teaming configuration in ESX for failover and load balancing. All that does is offer failover, but no load balancing. Eventually I discovered this and found documentation for the proper configuration for load balancing and failover. One setting in ESX was changed (route based on ip hash) and then port trunking was enabled on the network switch. Load balancing and failover. Yay!

iSCSI Volumes
Plan for volume portability and scalability using reasonably-sized volumes with a smart naming convention.

I had no idea what would be best when it came to the volumes on the iSCSI array. No idea whatsoever, and I couldn’t find any best-practices guides for it either. I came up with something that made sense at the time, but in the end wasn’t a good idea and needed to be changed. We have a few primary virtual server types here, Domino, File, and Application, so I thought that creating volumes for each was appropriate. I started with volumes like F01, and A01, and D01, but in the end I realized that this wasn’t good, as it restricted me from deployment of virtual disks based solely on available space. I’ve got 100G free on D01 and I need to put up a new app server, but I can’t put it on D01 because it’s not a Domino server. Yes, I’m anal and probably suffer from a bit of OCD.

What I have now is generic SAN volumes, Vol0-0, Vol 0-1, Vol 1-0, and so on. My naming convention is Vol (Volume, duh) 0 (disk 0) -0 (extent 0, used for adding extents to datastores in ESX, where required). This allows me to allocate space, based solely on the availability of the space required without triggering any of the many adverse affects of my OCD. I have allocated 512G of space for each volume on the SAN*, which I think is a good compromise between usable space and volume portability, which will likely be important in the future when we add more EQL arrays to this platform. Here’s a ‘map’ if you will, of the SAN volumes and the ESX datastores that they correspond to.

*three of the SAN volumes are 1TB each, but it was too late to fix that once it was in place. Vol4 is Vol4-0 with Vol4-1 and Vol4-2 added as datastore extents in ESX.

SAN: ESX:
Vol0-0 -> Vol0 (512G)
Vol1-0 -> Vol1 (512G)
Vol2-0 -> Vol2 (512G)
Vol3-0 -> Vol3 (512G)
Vol4-0 -> Vol4 (3TB)
Vol4-1 -> Vol4
Vol4-2 -> Vol4

This approach provides service-agnostic provisioning of disk space, while allowing for scalability of datastores via extents, while also allowing for volume portability at the SAN level. Volume portability is handy especially with the EQL arrays, if you plan to run more than one array within a storage group. In this configuration, multiple arrays will begin to function as one and automatically load balance volumes between them.

pCPU vs vCPU
Multi-vCPU = more processing power, and host overhead.

Is it better to get a dualcore processor at 3GHz, or a quadcore at 2GHz? In the beginning, I thought that ESX could schedule a single vCPU virtual machine across multiple physical processors, or processor cores. Eventually I discovered that a single vCPU virtual machine running on a host with dual 3GHz dualcore processors will only have 3GHz available to it. The simple solution for adding more cpu is to add a second vCPU (and of course change the HAL to ACPI Multiprocessor PC). The downside of multiple vCPU virtual servers is the additional overhead on the host server. As a rule of thumb, all new VM deployements are done as a single vCPU, and if the needs change we add more processors.

In light of this, I’m tempted to say that core GHz is more important than the number of cores, but that comes with caveats a’ plenty. In the end, I think it’s best to assess your needs and then choose the right CPU platform. Now that 3GHz quadcore processors are around, the issue is almost moot.

That’s it for the big issues I’ve faced with virtualization, except for the issues I discuss in this blog, which I am still trying to solve and will write up any relevant tips if and when I get to the end of that ordeal.

Related Posts with Thumbnails
  • Print
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • Twitter
Advertisement

2 comments

  1. Rob D. says:

    Thanks for the post. I have had some of these questions on my mind recently. Yes, EqualLogic is easy to setup, but in my mind, that also means easy to screw up. One question I have, are you running your file servers using VMware virtual disks, or dedicated volumes on the array? Secondly, did you use thin provisioning for any of your volumes? It seems to me thin provisioning could help quite a bit in the early stages of SAN deployment.

  2. Josh says:

    Hi Rob,
    Thanks for your comment! I was sold on EQL immediately due to its ease of setup, but this tricked me into not putting any real thought into the SAN configuration. I thought “yay it’s easy so I don’t have to think!”. Not exactly, as it turned out. Since posting this article I’ve changed the configuration again and so far it’s working out quite well. 1.2TB thin-provisioned volumes with 1TB VMWare datastores (because a little birdy at Dell/EQL told me that the PS array volumes run poorly, on purpose, when they’re low on space).

    You’re right, thin provisioning is the way to go, especially when getting started with these arrays. I thought that since I was using VMWare datastores that creating a 1TB datastore on a SAN volume would immediately take up all 1TB, which isn’t the case. Once I realized that, I immediately converted all of our volumes to thin provisioning and it has freed up significant space on the SAN, giving me better flexibility to move things around when needed.

    Now I just have to get another array or two (or five!) so we can take advantage of the volume load balancing and increased iSCSI throughput. :)

Leave a Reply

Josh Currier - Blogged