VMNgo: June 2014

Tuesday, June 24, 2014

Microsoft Exchange 2013 Office 365 - Set explicit folder share permissions with recursive sub-folder permissions

Had an interesting request to share a folder within an individual's exchange account. What I found out is that if you add permissions to the parent folder, it does not propagate to the child sub-folders. What I had to do was use powershell to define recursive sub-folder permissions. The other interesting note is that I could not run this under my user account (exchange admin) for the individual. The individual had to login via powershell to apply the recursive permissions:

Need to login to powershell as the user who is attempting to share the folder**

Get-MailboxFolder –Identity <owner username>:\<folder name> -Recurse | Add-MailboxFolderPermission -User <username of person who will have access> -AccessRights <permission level: owner/reviewer/etc…>

Need to fill all <fields> with proper input. Please see the following page for permission level explanations:

http://technet.microsoft.com/en-us/library/ff522363%28v=exchg.150%29.aspx

Ex:

John wanted to give Jane rights to view items and add items, but not to delete existing items to a folder on the root (same level as inbox)

Get-MailboxFolder –Identity "john@vmngo.com:\shareme" -Recurse | Add-MailboxFolderPermission -User jane@vmngo.com -AccessRights NonEditingAuthor

Reference:

http://community.office365.com/en-us/forums/158/p/43423/146798.aspx

RDM Luns on a MSCS SQL Cluster

I had an outage on my production MSCS SQL cluster last year which resulted in the loss of storage by the primary node in the cluster. Following some simple troubleshooting steps, we shutdown the passive node (it was not able to start the cluster), then the primary node and restarted them each individually. This brought back the cluster. Seeking for a root cause analysis (RCA), I perused the VMware logs to notice several issues with the RDMs:

Sep 10 12:52:11 vmkernel: 47:02:52:19.382 cpu17:9624)WARNING: NMP: nmp_IsSupportedPResvCommand: Unsupported Persistent Reservation Command,service action 0 type 4

I was seeing this message presented on all hosts that had visibility to the RDM LUN. During a maintenance window, I decided to perform updates on drivers and firmware and noticed that the hosts were taking a extremely long time to restart ESXi after a reboot.

Investigating this further, I looked at the logs during the restart and saw messages similar to:

Sep 13 22:25:56 p-esx-01 vmkernel: 0:00:01:57.828 cpu0:4096)WARNING: ScsiCore: 1353: Power-on Reset occurred on naa.########################

which lead me to believe that it was an issue with the RDM based on the naa.#######.

Looking into the VMware KB, I found this article which was relevant in my situation since I had originally started the cluster in ESXi 4.1, upgraded to 5.0, and eventually to 5.1 U1:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1016106

To fix this issue, I had to run the following command on every host that had visibility to the RDM lun (replaced naa.########### with your specific LUN naa number):

esxcli storage core device setconfig -d naa.################# --perennially-reserved=true

restarting the hosts now resulted in quick restarts rather than 5-10 minutes.

Updating drivers manually for VMware ESX/ESXi hosts through VMware Update Manager or manually with CLI

To update drivers through VMware Update Manager:

Find potential driver update information through Vmware HCLhttp://www.vmware.com/resources/compatibility/search.php
Download driver updates by finding through Vmware updates:https://my.vmware.com/web/vmware/info/slug/datacenter_cloud_infrastructure/vmware_vsphere/5_5#drivers_tools
Import driver package into VMware Update Manager (VUM)
Create new baseline with specific driver update package
Attach host into baseline
Check host compliance
Enter Maintenance Mode (if DRS is on auto or evacuate VMs on host via vMotion) and run VUM to install driver through the baseline
Host should automatically restart
Check if driver is installed by using the command in SSH to the host: esx-update query
Exit Maintenance Mode

To update drivers manually through CLI:

Find potential driver update information through Vmware HCLhttp://www.vmware.com/resources/compatibility/search.php
Download driver updates by finding through Vmware updates:https://my.vmware.com/web/vmware/info/slug/datacenter_cloud_infrastructure/vmware_vsphere/5_5#drivers_tools
Unzip the [driver_name]-offline_bundle-[build#].zip and upload it to a datastore that your esx/esxi host can use
Enter Maintenance Mode (if DRS is on auto or evacuate VMs on host via vMotion) and run VUM to install driver through the baseline
Connect to host via SSH (with root access) and run the following command:

esxcli software vib install -d [Path to offline zip bundle]

ex:

esxcli software vib install -d /vmfs/volumes/TMPL/Drivers/03-03-14/igb-4.2.16.3-offline_bundle-1138313.zip
```
Installation Result
   Message: Operation finished successfully.
   Reboot Required: true
   VIBs Installed: igb-4.2.16.3-esx_4.2.16.3-3.0.1
   VIBs Removed: 
   VIBs Skipped:
```
Host will not restart after the install so you will have to restart manually (depending on driver install/update
Check if driver is installed by using the command in SSH to the host: esx-update query
Exit Maintenance Mode

Thursday, June 19, 2014

Microsoft SQL Cluster Server 2008 R2 on ESXi 5.1 using iSCSI

ESXi 5.1 only supports in guest iSCSI for MSCS shared storage, thus you cannot use RDM from iSCSI luns presented directly to ESXi hosts.

Use of software iSCSI initiators within guest operating systems configured with MSCS, in any configuration supported by Microsoft, is transparent to ESXihosts and there is no need for explicit support statements from VMware

In my test environment, I tested attempting to present iSCSI RDMs to Server 2008 R2 machines. The issue that I was seeing is the test for the cluster was failing sporadically during the storage reservation tests. The issue is that ESXi 5.1 only supports SCSI-3 reservations, but has difficulty with persistent reservations from RDMs. What this will do is cause the cluster to not be able to reserve the volume for a particular node to be able to read/write to it.

My work around was to present the iSCSI volumes directly to the guest's iSCSI initiators. After I did this, the cluster has had zero issues (knock on wood) since I've created the cluster in my production environment.

Update: I saw that ESXi 5.5 now supports RDMs through iSCSI but I have not tested this yet. Will be interesting if I had time to re-design the cluster for Server 2012.

References:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1037959

http://pubs.vmware.com/vsphere-51/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-51-setup-mscs.pdf

Designing and Implementing Microsoft Clustering SQL Server with VMware

In today’s software market, society and companies demand production environments to be always on 24/7 and also requiring a disaster recovery (DR) plan with the minimal recovery time/point objectives. However, these requirements prove to be expensive and most businesses, if they cannot afford it, have to accept some downtime for maintenance and DR. One technology that has been a part of Microsoft Windows Servers since the NT 4.0 era is Microsoft Clustering. Clustering allows a service, such as SQL server, to run on multiple nodes. This allows for fail over within a data center if one node goes down, as well as making maintenance periods having minimal amounts of downtime. In addition to clustering services, virtualization has become an integral part of IT infrastructure. The age of virtualization grants the ability to run multiple instances of most operating systems, within one physical machine. These physical machines are referred to as hypervisors and the multiple instances running on them are called virtual machines. VMware in particular utilizes many features that allows for higher availability of production systems in a similar fashion like clustering where you can move these virtual machines through a process called vMotion between multiple hypervisors with similar CPU architecture (i.e. Intel vs AMD). I will be going over the designing and implementing of Microsoft Clustering SQL servers using VMware technology and specifically the issues that I ran into with my production and DR sites.

1. Introduction

a. Data center Design

i. Storage

1. Fiber Channel (FC)

a. Expensive to implement

i. Fabric Switches

ii. Fiber cables

iii. SFP+ transceivers

iv. HBAs (up tp 16 Gbps)

2. iSCSI

a. Gigabit Ethernet

i. Inexpensive as it utilizes standard gigabit infrastructure

1. Cat5/6 cables

2. 1 Gbps nics

b. 10 Gigabit Ethernet (GbE)

i. Expensive

1. Requires additional infrastructure

a. 10 GbE switches, cables and transceivers (XFP)

b. Can utilize Fiber Channel over Ethernet (FcOE) if migrating from FC back to copper medium

c. iSCSI initiators

i. Hardware based

1. Increased performance as Nic HBA utilizes Tcp offload engine to offload processing of packets away from CPU

2. Typically would want to stay with homogeneous nics

ii. Software based

1. Can utilize heterogeneous nics

2. Potentially cost saving if network is not bottle neck for IO

3. VMware recommended for most basic iSCSI setups

d. Physical or Virtual Raw Device Mappings (RDMs)

i. Physical

1. Pass through storage directly to guest

a. Able to utilize SAN management software to maximize performance

2. Unable to create VMware snapshot

3. Higher performance (on paper)

ii. Virtual

1. VMkernel only sends read/write commands to the presented storage

2. Able to utilize VMFS features such as file locking, VMware snapshots, and easier to migrate

e. Requirements

i. IOPs

1. Number of disks

2. Types of Disks

ii. Redundancy

1. Number of controllers (Active/Active [expensive] or Active/Passive [cheaper])

iii. Load Balancing

1. Paths to storage

ii. Servers

1. CPUs, RAM, HBAs,

iii. WAN

1. Considerations

a. Throughput

b. Synchronous or Asynchronous

c. Homogeneous or Heterogeneous Production and DR sites

2. Actual Implementation

a. Production Environment

i. Hardware

1. SAN - Compellent (active/passive)

a. Two tier storage

b. 15k SAS and 7.2k SAS-class drives

2. Servers – 2x Dell r810

a. Intel E7-4830 – 2.13 ghz octa-core w/ hyperthreading (16 physical cores with 32 logical cores total)

b. 64gb RAM

3. HBA – qlogic 8 Gbps

4. 2x – 24-port Brocade 300 SAN Switch (FC)

ii. Requirements

1. Current IOPs on leased hardware was ~1500 IOPs

2. Multipathing – Redundancy

iii. Setup

1. Two hypervisors with a dedicated nic for heartbeat channel

2. Two port – HBA per host

3. Physical RDM

a. No requirement for VMware snapshots and preference for increased performance

b. Block size format set to 64KB for presented storage dedicated for SQL DB files, logs, etc…

4. Straight forward install and setup.

iv. Caveats

1. Expanding drive space on physical RDM requires downtime.

2. VMware Fault Tolerance and vMotion not supported

3. Round robin not supported with Native Multipathing Plugin. Dell/Compellant provides plugin on esxi for multipathing

b. Disaster Recovery

i. Hardware

1. SAN – Dell Equallogic iSCSI active/passive

a. 10k SAS and 7.2k sas-class drives

2. Servers – 2x Dell r810

a. Intel E7-4830 – 2.13 ghz octa-core w/ hyperthreading (16 physical cores with 32 logical cores total)

b. 64gb RAM

3. Nics – 4 port Intel and 4 port Broadcom

a. Broadcom supports HW iSCSI, but there are some known issues with TCPoE with our specific model.

b. Utilizing SW iSCSI

4. 2x 48 port dell powerswitch

ii. Requirements

1. Handle full disaster recovery situation from production environment.

2. Multipathing – Redundancy

3. Ability to test disaster recovery at least 4 times a year

4. Less than ½ budget compared to production environment

iii. Setup

1. Two ESXi hypervisors with a dedicated nic for heartbeat channel

2. Two nics dedicated for iSCSI traffic to fulfill redundancy/multipathing requirement

3. VMware Site Recovery Manager (SRM) for DR solution

a. Vsphere replication for heterogeneous SAN solutions

4. iSCSI Storage

a. Block size format set to 64KB for presented storage dedicated for SQL DB files, logs, etc…

5. Initial issues

a. Cannot use iSCSI storage presented to host, then mapped using RDMs to the MSCS nodes

i. Not supported because VMware cannot pass scsi-3 reservation codes to the SAN

1. This will not pass the initial tests for creating a MSCS cluster through Microsoft wizards

b. Had to add virtualized nics to the MSCS nodes that had access to the iSCSI network

c. Presented storage directly to the guest and bypasses ESXi

d. Presented nics using either intel or broadcom had to modify settings within Microsoft Server 2008 R2 OS

i. Issues that were presented was the nodes would randomly lose access to storage, thus destroying the cluster altogether

ii. After a week working with VMware, we identified the issue was caused by Microsoft was attempting to force TCP offload engine on the virtualized nic and segmentation offload:

1. netsh int tcp set global chimney=disabled

2. HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters Value(DWORD): DisableTaskOffload = 1

e. Jumbo Frames

i. DO NOT FORGET TO ENABLE ON EACH NIC and SWITCH PORTS (Server end to end SAN)

References:

http://pubs.vmware.com/vsphere-51/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-511-installation-setup-guide.pdf

http://pubs.vmware.com/vsphere-51/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-511-storage-guide.pdf

http://pubs.vmware.com/vsphere-51/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-51-setup-mscs.pdf

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009517

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CDUQFjAA&url=http%3A%2F%2Fen.community.dell.com%2Fdell-groups%2Fdtcmedia%2Fm%2Fmediagallery%2F20094620%2Fdownload.aspx&ei=42sRUbyWEoXRyQGR8oDIAQ&usg=AFQjCNFNfCMFGNqPuzstAD7Uilbmv6-_6A&sig2=gw4VGTbvVaDU4CkxDg2gbQ&bvm=bv.41934586,d.aWc&cad=rja