Monday, January 31, 2011

VMware cloning of Linux guests using LVM

In the past, my attempts to use cloning with customization specifications in combination with Linux guests using LVM would fail. A search of the VMware KB lead me to these two articles both of which say LVM is unsupported:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1488
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=5195811
In testing this again on vSphere 4.1 with SLES 11 SP1, I discovered that the combination now works as long as the appropriate VMware tools are installed.
I did get this revealing error during an attempt that did not have appropriate tools:
“Customization of the guest operating system 'sles11_64Guest' is not supported in this configuration. Microsoft Vista (TM) and Linux guests with Logical Volume Manager are supported only for recent ESX host and VMware Tools versions. Refer to vCenter documentation for supported configurations.”
I could not find any references in the vCenter documentation, but I am checking with VMware to determine if the KBs are still correct.

Thursday, January 27, 2011

Things to Think About Before Virtualizing SQL Server’s Underlying OS, or Adopting a Pit Bull – Part Two

Note: This is an old article, and while Pit Bulls, in general, have remained largely the same in the meantime (although Gerald in particular has a very distinguished grey face now), SQL Server licensing for virtualized platforms has changed dramatically. For more information on the current licensing requirements see the SQL 2012 Licensing Guide, or consult your Microsoft Rep.

Part One – Know if you Can, and if You Can, Be Patient

Part Two:  The Good:

There will be people who have done research on Virtualization or Pit Bulls and who have legitimate concerns and questions for which you should have answers before “going live.”

These are awesome people.  They know enough to know that there are things that can go very wrong with any given virtualization implementation or individual pit bull, but if you provide them the information about your particular virtual infrastructure, or your big goofy dog, they will look at the evidence with a fair mind, and often ask questions  that lead to a better infrastructure going forward.  In the case of pit bulls, It’s perfectly sensible to want to know if your pit bull is dog aggressive (and you should be sure to know this -- if he is, you need to know how you will handle things like walks, vet visits, and other multiple dog situations).   It’s sensible, not insulting, to want to know if your pit bull can be trusted around small animals (Pit Bull Terriers are Terriers after all).   Personally, I’d also want to know how he got so handsome, but that’s just me.  You put a lot of work into training such a well-mannered gentleman, take the time to show off what a great example of a sweet, stable, dog he is, and show he’s a great ambassador for the breed.

questions

Fig 1.  This is what happens when technical people attempt to draw  their own cartoons –
Also, that’s a pit bull not a kangaroo… no disrespect to kangaroos who probably also have a lot of good ideas.
 
With respect to virtualization, it’s smart of a DBA, or application owner, to ask questions about your virtual infrastructure, and you should have visibility into, have tested, and be able to answer those questions.   Below are some examples of questions I get that I find comforting to know other people are thinking about before jumping into virtualization.
  • Storage:
    • What does the back end storage look like?
    • What do you consider acceptable latency?
    • How many IOPs at what size, can it do without generating big latency?
    • If the disks are shared, how much overhead does it have for my I/O?
    • What is the storage connectivity?
    • If I need to physically separate logs from data files for DR or performance (the tide seems to be turning on the performance question regarding logs and data, but it’s still a deal breaker for many DBAs, and a serious concern for DR), how will that be done?
    • Will the VM think that storage is local to the VM, or will it be directly mapped? 
      • If it’s directly mapped, what technology is used to map it?
      • If it appears to be local storage, how is it presented to the Host server
      • If it will be presented to the VM as local, can I do live storage migration?
      • If it appears to be local to the VM is there still fault tolerance on the storage side?
      • If storage is directly mapped, will a host failover cause any loss of storage connectivity?
  • High Availability
    • Is the cluster configured for HA?
    • Has HA been tested properly?
    • If only some of the machines in the cluster are fully licensed for SQL server, has DRS (or the Hyper-V equivalent) been turned off, so that servers can’t migrate automatically (which would result in being out of licensing compliance)?
    • If my SQL server has mirrored databases, where will the mirrors run? 
      • Will the mirrors be kept on physically separate Host servers and storage from the primary?
  • Resources:
    • Do I get a dedicated OS per instance?
    • Is network failover configured and tested?
    • How many CPUs/how much memory can be dedicated per VM.
    • Does the host server have at least as many cores per socket as the number of VCPUs to be provisioned per virtual machine?  You can comfortably over-provision the total number of vcpus allocated on the host with respect to the number of physical cores.  However, if your physical machine has 4 sockets, each with 4 cores, if you allocate more than 4 vcpus to any individual VM, spanning more than one physical socket may reduce performance by reducing the number of opportunities to schedule threads across sockets.
It may go without saying (but I’m going to say it anyway, and I’ll be posting more on the subject later), you need a good dialogue with the people who run your storage and virtualization tiers.   If you’re the kind of DBA who believes that DBAs are from Mars, and Storage Guys are from a completely different solar system (you know who you are), think really hard about whether you’ll be able to pull this off.   Ideally, you’re able to at least see eye to eye with the guys who keep the rug under you.

Sunday, January 23, 2011

Things to Think About Before Virtualizing SQL Server’s Underlying OS, or Adopting a Pit Bull – Part One

Note: This is an old article, and while Pit Bulls, in general, have remained largely the same in the meantime (although Gerald in particular has a very distinguished grey face now), SQL Server licensing for virtualized platforms has changed dramatically. For more information on the current licensing requirements see the SQL 2012 Licensing Guide, or consult your Microsoft Rep.

These days, I think a lot about both SQL Server Virtualization and Pit Bulls.  They’re partly on my mind because they‘re two of my favorite things, and partly because both get a lot of unjustified bad press.  I'll admit that in both cases, I can be a little biased.  I may have already tipped my hand by playing the "favorite" card.  You can probably hear the pit bull snoring on my feet from wherever you are, and my constant posts on Virtualization probably don‘t help hide my feelings about virtualizing SQL – so  it’s pretty obvious which side of the debate I’m on.  In my environment, we've gone so far down the virtualization road, and gotten so much out of it, that it's sometimes hard to understand why opportunities for virtualization would be passed up.  However, a couple of  months ago at SQLPASS, I had a bunch of conversations with people who had thought a whole lot about virtualizing SQL and had decided for very good reasons, that it wasn't an appropriate solution.  This got me thinking about…

Things everyone should put thought into before Virtualizing SQL Server’s underlying OS, or adopting a Pit Bull -- a Several Part Series (now with 100% more comics – thanks Lisa!).

1.  Find out if you're even allowed to do it, and if you are, license properly:
Pit Bull:  Some home insurance carriers won't insure your home if you adopt a pit bull.  Some cities have Breed Specific Legislation outlawing pit bull ownership.  If you rent, some landlords don't allow it.  Some states and/or Insurance companies require additional liability insurance if you own a pit bull type dog.  You need to know if any of these apply to you.

Virtualized SQL Server:  Before Virtualing SQL, it's very important to find out what licensing you'll need, whether consolidating onto a virtual platform will be cost effective, and whether your management will be amenable to the idea.   To be allowed unlimited virtual machines on a fully socket licensed box you need to be licensed first for Windows (Datacenter), and second for SQL Server.  For SQL 2005 through 2008 R0, unlimited virtualization requires SQL Server Enterprise socket licensing.  For SQL 2008 R2 and up, Datacenter is required (in some cases, Enterprise is grandfathered under an ELA, but your Microsoft Rep will have to explicitly tell you if you’re grandfathered).   If your virtualization platform is not Hyper-V, you also need to be fully socket licensed for your virtualization platform.    Note:  As of the release of 2012, SQL Server licensing for virtualized platforms has changed dramatically. For more information on the current licensing requirements see the SQL 2012 Licensing Guide, or consult your Microsoft Rep.


While Microsoft fully supports SQL on any virtualization platform that is validated through SVVP, not all application vendors support their application back end being on virtualized SQL -- this is becoming less and less common.  As a side note, Oracle does support virtualizing the operating system on Oracle Enterprise Linux, but no other virtualization platforms are supported.  This post doesn’t address Oracle, but if you’re looking at virtualizing Oracle, read their support statements carefully.

2.  Realize that adoption will take longer
Pit Bull:  I'll call our example pit bull Gerald the Gentleman.  It took Gerald six months to get adopted.  Some great folks at a shelter saw what a special fella he was, and worked with him to find a home.  Everyone who met him agreed he was a great dog, but that didn’t get him adopted.  Gerald started to get discouraged, and he started to look sad (see fig. 1).  The folks at the shelter knew he was too good for the right home to pass up, and that once the right home came along it would be Gerald‘s time to shine.  Now he's a rockstar, he can't help but win over everyone he meets (he had surgery recently and it was touch and go whether the Vet’s office would give him back, given that he is such a charmer).  It took time to find him a home, and he had to win over the people who would take him home, but these days, Gerald's a big goofy social butterfly, and has a dorky dance to show you just how happy he is.  A few years later now, and he's not allowed to do the Gerald dance anymore, he's an old man and threw his back out getting over-enthusiastic to Brit-Pop (why Brit-Pop?  I guess we'll never know).

Print
fig. 1.  Gerald at the Shelter.
Virtualized SQL Server:  I don’t have a nickname for our example SQL Server environment.  In the beginning no-one wanted anything to do with Virtualized SQL, but as Microsoft came out with a support statement for SQL Server on a virtual platform, and application groups were staring down expensive hardware refreshes, it began to look more attractive.  It took that first brave application group who were willing to look at the numbers, look at the testing, performance, and stability data, and jump in.  Virtualized SQL server got a big boost from the success of the first trailblazer application group who were willing to move on to a virtual platform.   Other groups began to consider virtualization as an option to avoid an expensive hardware refresh and to share the cost of licensing.   Those application group customers were happy, and they told other customers until everyone wanted virtualized SQL (and why wouldn’t they?   It‘s quick, it‘s cheap, and it‘s really, really, stable).   It's a long way from the first implementation, to being the default choice.  Establishing a track record or reputation takes time.  Customers (internal or external) naturally want the best for their environments, and all the documentation in the world, isn't as good happy customers and months of uptime stats.  These days SQL on (in our case) VMware, is almost as popular as Gerald the Gentleman.  I haven’t provisioned a physical SQL server in over two years.  Small SQL servers, serving a single application get their own, dedicated, OS and application custodians can schedule downtimes without having to coordinate with the six other small apps that would otherwise live on a single SQL Server.  Speaking of those small SQL servers, we’re getting consolidation ratios of up to 50:1 (which makes the pricing attractive).  Even SQL servers requiring dedicated hardware live on top of a virtualized OS to take advantage of inherent HA, abstraction from the hardware, storage agnosticism, and live storage migration.  It’s hard to look back and imagine how we could have scaled out our environment without virtualizing the SQL Server OS.  It’s been a stunning success  -- full adoption just took time.

Thursday, January 20, 2011

SharePoint 2010 Content Database Migration, Re-Deploy Results in Foreign Key Constraint Error

Short Description: In an upgrade from WSS 3.0 to SharePoint 2010 via the content database attach/re-attach with PowerShell upgrade, a content database can be attached and upgraded only once.  A second attempt results in the following errors:

SharePoint log:

[powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]: Cannot upgrade [SPSite Url=http://sitename/site].
[powershell] [SPUpgradeSession] [DEBUG] [12/9/2010 11:49:06 AM]: Skip upgrading [SPSite Url=http://sitename/site].
[powershell] [SPUpgradeSession] [DEBUG] [12/9/2010 11:49:06 AM]: Disposing SPSite Url=http://sitename/site.
[powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]: CanUpgrade [SPSite Url=http://sitename/site] failed.
[powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]: Exception: Object reference not set to an instance of an object.
[powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]:    at Microsoft.SharePoint.Upgrade.SPSiteSequence.get_CanUpgrade()
   at Microsoft.SharePoint.Upgrade.SPUpgradeSession.CanUpgrade(Object o)
Windows Event Log
Date  PowerShell.exe (0x0938)                  0x164C SharePoint Foundation          Database                       5586 Critical Unknown SQL Exception 547 occurred. Additional error information from SQL Server is included below.  The DELETE statement conflicted with the REFERENCE constraint "FK_Dependencies1_Objects". The conflict occurred in database "SharePoint_Config", table "dbo.Dependencies", column 'ObjectId'.  The statement has been terminated. 695f52b2-6e6d-4b5e-97d9-906e71cedd0d
 The DELETE statement conflicted with the REFERENCE constraint "FK_Dependencies1_Objects". The conflict occurred in database "SharePoint_Config", table "dbo.Dependencies", column 'ObjectId'.  The statement has been terminated. 695f52b2-6e6d-4b5e-97d9-906e71cedd0d

Problem:
I reported this to Microsoft and they confirmed they can re-produce the issue.  More information will be forthcoming.

Steps to Reproduce:

WSS Environment:
2 Web Front Ends, 1 Application Server -- WSS 3.0 12.0.0.6535
Database Server -- SQL 2005 sp3 9.0.4053

SharePoint 2010 Environment:
2 Web Front Ends, 2 Application Servers -- 14.0.5128.5000
Database Servers (mirrored in this case, but can be re-produced on standalone dbs) -- SQL 2008 R2 10.5.1746

First upgrade:
  1. Take a backup of the WSS 3.0 content database in SQL 2005 SP3
  2. Restore backup of the WSS 3.0 content database to the SQL 2008 R2 server associated with SharePoint 2010 environment (you can change the compatibility level without affecting the reproduceability of the problem).
  3. From the SharePoint 2010 application server in the SharePoint Management Shell (Powershell) run the following command to upgrade the database:
    • Mount-SPContentDatabase -Name -DatabaseServer -WebApplication -Updateuserexperience
  4. Upgrade completes successfully with errors – site collections render in SP2010 look and feel and are fully working
 Second upgrade (this is the one with the issue):
  1. In SharePoint 2010 environment, remove content database from web application in Central Admin
  2. In SQL 2008 R2, drop content database
  3. Take another backup of the original WSS 3.0 database in SQL 2005 SP3
  4. Restore that backup of the WSS 3.0 content database to SQL 2008 R2 associated with the SharePoint 2010 environment.
  5. From the SharePoint 2010 application server in the SharePoint Management Shell (Powershell) run the following command to upgrade the database:
    • Mount-SPContentDatabase -Name -DatabaseServer -WebApplication -Updateuserexperience
  6. Upgrade completes successfully with errors.   Site collections now render in the WSS 3.0 look and feel or will not render at all and throw errors when working with the site.
SharePoint Upgrade log contains the errors
 [powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]: Cannot upgrade [SPSite Url=http://sitename/site].
[powershell] [SPUpgradeSession] [DEBUG] [12/9/2010 11:49:06 AM]: Skip upgrading [SPSite Url=http://sitename/site].
[powershell] [SPUpgradeSession] [DEBUG] [12/9/2010 11:49:06 AM]: Disposing SPSite Url=http://sitename/site.
[powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]: CanUpgrade [SPSite Url=http://sitename/site] failed.
[powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]: Exception: Object reference not set to an instance of an object.
[powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]:    at Microsoft.SharePoint.Upgrade.SPSiteSequence.get_CanUpgrade()
   at Microsoft.SharePoint.Upgrade.SPUpgradeSession.CanUpgrade(Object o)
 Windows Event Log contains the error
Date  PowerShell.exe (0x0938)                  0x164C SharePoint Foundation          Database                       5586 Critical Unknown SQL Exception 547 occurred. Additional error information from SQL Server is included below.  The DELETE statement conflicted with the REFERENCE constraint "FK_Dependencies1_Objects". The conflict occurred in database "SharePoint_Config", table "dbo.Dependencies", column 'ObjectId'.  The statement has been terminated. 695f52b2-6e6d-4b5e-97d9-906e71cedd0d
 The DELETE statement conflicted with the REFERENCE constraint "FK_Dependencies1_Objects". The conflict occurred in database "SharePoint_Config", table "dbo.Dependencies", column 'ObjectId'.  The statement has been terminated. 695f52b2-6e6d-4b5e-97d9-906e71cedd0d
When looking at the database it is clear that the GUID referenced appears in the objects and dependencies table only when the site collection is attached.  When it is removed the GUID no longer appears in either table, so it's not immediately clear why a delete statement would cause a foreign key constraint violation on a guid that should no longer exist.

Microsoft have confirmed that this is an issue with the PowerShell method of upgrading to SharePoint 2010, and that the workaround is to use stsadm.  Since PowerShell is Microsoft's recommended tool for administering SharePoint 2010 and according to this technet article Microsoft's position is
We recommend that you use Windows PowerShell when performing command-line administrative tasks. The Stsadm command-line tool has been deprecated, but is included to support compatibility with previous product versions.
I requested that there be a fix to the PowerShell method of upgrading a WSS 3.0 content database to SharePoint 2010, and I'm excited to hear of a fix soon!


Microsoft's workaround (which I haven't tried but which I'm providing in case you're really stuck and can't wait for a fix -- is below).
Instead of the PowerShell command, run
stsadm -o addcontentdb -url yoursiteurl -databasename yourdatabaseame -preserveolduserexperience false

My Workaround:
For us, this is a problem primarily with the prep for a production rollout, and not with the production rollout itself (unless something goes horribly wrong and you have to retract and re-upgrade the same site collection twice).  To workaround (actually avoid is more appropriate), you can take advantage of technologies like cloning and snapshots.
Option 1:  Prior to each test run, clone your environment (this is only really feasible if you're using virtual servers) either with a clone or a snapshot, and reset the environment before each test upgrade.
Option 2:  Create a gold point in time "image" of your environment and revert all servers back to that point in time after each test run.

A couple of RMAN errors that took me a few minutes to figure out

Problem:  When restoring a database to a new host with RMAN, I ran into the following errors.

Error 1:
On restoring a control file to a database started in nomount, and listener up:
ORA-19870 error reading backup piece
ORA-19504 failed to create file
ORA-27040 file create error, unable to create file.
This error is pretty cut and dried.  Either the file path does not exist, or the oracle user doesn't own/have permissions to create the file.

Error 2:
on restoring a database to a new host with the control file restored, database in nomount, and listener up:
Failure of recover command at
ORA-01013 user requested cancel of current operation

This error could be much more clear.  It can be caused by an open connection to the database.  Since you're already restoring the database, the easiest way to remedy is to shutdown the database, then startup nomount, and make sure you exit your session before re-running from RMAN.

Installing Reporting Services on SQL 2005 on Windows 2008

Quick Description:  On Windows 2008 R2, even after installing the web server role and registering the .net framework with IIS, reporting services is greyed out as an option when attempting to install it to an existing instance.

Problem:
A while back I wrote an article on installing Reporting Services on an existing SQL 2005 instance.  These steps work for Windows 2003.  However, if you should for some reason want to install SQL 2005 SP3 on Windows 2008 R2 (it's not the most common configuration, but maybe your licensing only allows up to 2005), these steps won't get you to the point where SQL 2005 will allow you to install reporting services.  In my case, after following the steps in the original article (substituting adding the role in Server Manager, instead of adding IIS in add/remove windows components), and registering .net with IIS, Reporting Services was still greyed out as an option in setup.

Solution:
I fumbled my way through this one before finding this article that Microsoft published on the subject.   It's messy, but it works.  It also probably demonstrates that SQL 2005 isn't really meant to be installed on Windows 2008 R2.

Wednesday, January 19, 2011

Office Search Server Indexing Problems

Quick Description:  An existing, working installation of Office Search Server protecting a WSS farm stops updating the index and the crawl appears to be hung for an extended period of time.

Problem:  When the crawl hangs, new items and documents added to the WSS farm will not show up in searches.  The crawl never finishes.

I'm temporarily doing a bit more SharePoint administration, which includes the care and feeding of our old WSS 3.0 farm.  It's not as exciting as the shiny new 2010 architecture, but it's our day to day workhorse and I'm learning little tips and tricks for coaxing WSS 3.0 into lasting until you can fully migrate to 2010.

In these cases, we've traditionally been told to rebuild the index.  This fixes the problem, but effectively kills search for the time it takes to get items into the index.  If you have a large farm like us, it could be several hours before you have all items indexed.

Solution: On the application running Office Search Server, go to Start, Programs, Office Search Server, Office Search Server Administration.

In the admin console go to crawling, and click content sources.  Check the dates of the last crawl, and the duration of the current crawl.  If the current crawl has been going for an unreasonably long amount of time, stop the crawl by right clicking on the content source and selecting stop crawl.

To start a full crawl without destroying the current index, select the content source, and on the Edit Content Source page, scroll to the bottom and check "start full crawl of this content source."  The index will now rebuild without resetting and destroying the existing index.

If this doesn't solve the problem, you may need to reset all crawled content which will destroy the index.  If you reset all crawled content, searches will not return results until the content being sought is crawled.

Monday, January 17, 2011

How to Move a Server to a Different Group in DPM 2010

Quick Description:  DPM doesn't provide an obvious mechanism for moving an object from one protection group to another.


Problem:  There are many reasons you might want to move an object into a different protection group in DPM.  For example, if you have a database in a protection group that allows for 5 days online retention, but the requirement changes to 10 days for that database only.  An application changes groups and the objects protected by DPM need to be grouped differently to reflect that change (even just for the sake of clarity).  You've moved a protection group from its original DPM server to the Disaster Protection DPM server by switching disaster protection, and now want to put it in its own group with it's own retention and backup policy.

Solution:  To do this you need to stop protecting the member, and then create a new protection group (or modify an existing one), and re-import the inactive protection into the new group.

Remove the items from the existing group
  • Select the item to be moved in the protection console and right click it.
  • Select "Stop Protection of Member" from the drop down menu.
  • In the Remove from Group screen, verify that Delete Replica on Disk is NOT checked
    • Verify it again, it's important.  If you leave it checked you will lose your existing data
  • Select Okay
  • rinse and repeat as necessary for the items you want to remove.
Re-Add or Create a New Protection Group
  • In the Protection console, either select the group into which you want to move the inactive items, or select Create Protection Group.
  • Select your protection group type
  • In the Select Group Members screen, expand and select the items you want to protect, then click next
  • In the Select Data Protection Method screen, provide a group name, and select the type of protection you want (in this case, disk only).
  • In the Specify Short Term Goals screen, provide values for Retention Range, Synchronization Frequency, and a time for the Express Full Backup
  • In the Review Disk Allocation screen, review the suggested data size, and make any adjustments necessary.  Decide whether storage should be co-located, and whether the volumes should automatically grow.
  • In the Choose Replication Method section select whether DPM should replicate now, later, or manually.
  • In the Consistency Check Options screen select you preferences
  • In the Summary screen, verify the details and select Create Group
  • You may get a message that the action will initiate a consistency check.  This is expected.

Thursday, January 6, 2011

SQL 2008 R2 Replication Config Errors

Quick Description:  SQL 2008 R2 installed on Windows 2008 R2, replication configures and starts, then fails with an Access Denied error.

Problem: 
Transactional Replication is configured on a SQL 2008 R2 server that is installed on a Windows 2008 R2 operating system.  When you run the agent, it fails with error.
Replication-Replication Distribution Subsystem: agent DistributionName failed. The distribution agent failed to create temporary files in 'c:\Program Files\Microsoft SQL Server\100\COM' directory. System returned errorcode 5.
 This Microsoft article addresses this error if you are running your replication agent as the SQL Server Agent account, but in the case where the subscriber, publisher, and replication agent all use different domain service accounts, this did not fix the error.

Solution:
The following fixes the problem, it may be overkill, I wasn't able to isolate which of the two additional permissions is responsible for fixing it.

On the subscriber grant write permissions for the directory 'c:\Program Files\Microsoft SQL Server\100\COM' to the following accounts:
SQL agent account for both subscriber and publisher
Replication agent account