Wednesday, December 7, 2011

SSIS Derived Column Expression for Converting Unix int Epoch Dates to datetime

For source systems that use milliseconds since the beginning of the “Unix Epoch” for dates (some Unix, Java etc.), I was having some trouble converting them in SQL Server Integration Services with a Derived Column.  The goal, as always, is to end up with a usable datetime data type rather than a (still useful for humans but a pain for systems), string.
The expression I came up with is below, there may be a much more elegant way to do this, please feel free to correct me:

SSIS Expression:
DATEADD("DD", ((DT_I8)(<YOURUNIXDATECOLUMN>/1000/60/60/24)), (DT_DBTIMESTAMP)((DT_WSTR, 25)("1970-01-01 00:00:00.000")))

For good measure, here it is in PL/SQL and T-SQL

        TO_DATE ('1970-01-01 00:00','YYYY-MM-DD HH24:MI') + 

T-SQL's DATEADD makes it easy to work with the date without having to convert the number.

Add Milliseconds directly

If you convert a number you know should be relatively recent and it doesn't budge from 1970, it's probably in seconds.

Mapping SSIS Data Conversion Transform Data Types to SQL Data Types


SQL Server Integration Services Data Conversion Transform Data Types don’t match up to SQL Server Data Types, which makes sense since SSIS isn’t just SQL to SQL.

I found this MSDN article really helpful, albeit, noting the caveat stated on the MSDN page: “These mappings are not intended to represent strict equivalency”

Pulled from the above article, my top 8 are below.

  • int – DT_I4
  • datetime – DT_DBTIMESTAMP (I don’t get it either)
  • nvarchar – DT_WSTR (give length separately)
  • varchar – DT_STR (give length separately)
  • datetime2 – DT_DBTIMESTAMP2 (use for Oracle DATE/TIMESTAMP compatibility, dates less than 1753 etc)
  • bit – DT_BOOL
  • float – DT_R8
  • bigint – DT_I8

Monday, December 5, 2011

Can’t Believe I Didn’t Know This One

I’m endlessly frustrated by working with date and datetime types in T-SQL and PL/SQL, so I often end up complicating myself into a corner working with them. 

I’m running the risk of another “duh” comment by posting something so simple, but the syntax in T-SQL and PL/SQL to get the date of a year, or any number of days from today’s date is just:



Thursday, November 17, 2011

Quick Query to Look for Columns in a DB


Nothing fancy, just a quick query for ‘stab in the dark’ hunts for data.   I get that when you’re at the point that you’re hoping someone named a column helpfully, you’ve got bigger problems;  nevertheless I needed to throw it together.

Find a string in the list of column names in a database and return table and datatype information.



With only user Data Types:


Friday, November 11, 2011

SharePoint 2010 Application Crawl Store DB Grows Out of Control

Note:*  I've been out of the SharePoint game a while, and not keeping up with its advances, there is definitely a way to manually clear the log from the GUI now.  I'm not sure if there is also a way to schedule grooming, other blogs would have more up to date info. 

Problem:  On SharePoint 2010, the Search Service Application Store DB grows out of control.  One table MSSCrawlURLLog is giant, and compared to the other tables, I mean really huge -- a couple of orders of magnitude bigger (MSSCrawlhostsLog may also be very large).

Reason:  Crawl logs are maintained for a very long time by default (it appears to be 30 days), there isn’t a way to configure this in the GUI.  Only errors are logged, and they’re logged with the frequency of the indexing crawl schedule.  This should be fine, but if you, for instance, have a pretty large farm and are migrating from 2007 to 2010, and you, say, installed the Fab40 templates, or have users who hardcoded URLs, these logs will get very large, very quickly.  There are a lot of postings about this issue, and the most recommended way to fix it is to reset the crawled content and re-crawl. With a large farm this is a search outage (search may be up, but you won’t get results until content is indexed), and takes a long time. I’m not crazy about this solution, but I haven’t found anything explicitly saying my fix is supported, so if in doubt, go with the reset.

One Fix:  In the Search Service Application Store DB (default name Search_Service_Application_StoreDB_funGUID), there is a procedure proc_MSS_CrawlReportCleanup.   It’s not super documented, and  I can’t find any way to call it through the GUI or PowerShell.  But it does exactly what you think it would do, and cleans up the offending tables.
Our environment is mirrored, so I wrapped calling the proc in the proc below that checks for mirroring primary first.  This shouldn’t be a problem on a standalone DB, but you could take it out for good measure if it doesn’t apply.  Pass the name of your crawl store database and the number of days you want to retain to the procedure.  It will generate a lot of IO if your table is growing a lot.  We run it off-peak.  Also, it’s not a bad idea to take a look at the error log and see if there are any errors that could be cleaned up since errors that don’t get logged, don’t have to be cleaned up.


/****** Object:  StoredProcedure [dbo].[PRUNE__SEARCH_CRAWL_STORE]    Script Date: 11/11/2011 16:29:13 ******/


--Pass the name of your crawlstore database as CrawlstoreDBName default is Search_Service_Application_CrawlStoreDB_somecrazyguid
--@pruntime sets the number of days of error history to retain


declare @cmd nvarchar

set @cmd = @CrawlStoreDBName +'.dbo.proc_MSS_CrawlReportCleanup '

-- For mirrored databases, the proc/sqlAgent job should live on both servers, and only execute on the primary
--Check the current status of the database because the command will fail if executed on a mirror database
if(select m.mirroring_rolefromsys.databases djoinsys.database_mirroring mon d.database_id = m.database_idwhere = @CrawlStoreDBName) = 1
--If the database is primary, run the log prune job
exec @cmd @prunetime


SQL 2008 R2 Mirroring. Add a Witness After the Fact

Just a quick note:
We ran into a situation with a database already running mirroring without witness/failover to which we wanted to add a witness.  To add a witness server after the fact, we ran the mirroring wizard which appeared to be successful.  Failover, however, didn’t work, and the log on the secondary was full of permissions errors.
If you run all SQL services as the same user across your primary/secondary/witness, you won’t see this error…I’m not suggesting that’s a good idea.  In our case all the SQL services run as dedicated domain accts per server, and running the mirroring wizard doesn’t add the witness server account to the secondary if mirroring without witness is already configured.

There are two ways to fix this:

  • The right way:  Figure out how to assign permissions to the mirroring endpoint for the witness server service account on the secondary (I gave up on this because cheating is so much easier, and it’s not something we have to repeat).
  • The Kind of Cheating Way I Fixed It:  Create a dummy DB on the primary and secondary (don’t worry about the real mirroring setup, you just need the permission wizard to run).  On the dummy DB, run the mirroring setup as you would normally for synchronous mirroring with witness.  Drop the dummy DB.  Re-run the mirroring setup to add a witness on the primary database.  Test failover.

Monday, October 3, 2011

Updating Dynamic DNS registration after security group membership change

Updating a dynamic DNS record after a computer is added to a group providing the necessary permission usually requires a restart.  An alternative is to clear the Network Service Kerberos ticket cache and restart the DNS Client service.  This procedure was tested on Windows Server 2008 R2:
Clear the current list of tickets for the Network Service account
klist -lh 0 -li 0x3e4 purge
Restart the DNS Client service
sc stop dnscache
sc start dnscache

How to refresh Windows 2008 R2 computer group membership without reboot using klist

There are many cases where it is useful to force a server to refresh its group membership without restarting the server.  This is a simple task using the klist.exe utility on Windows 2008 R2.  In an elevated command prompt:
Retrieve the current list of tickets for the computer account
klist -lh 0 -li 0x3e7
Clear the current list of tickets for the computer account
klist -lh 0 -li 0x3e7 purge
Note that the syntax of this command is different than reported in many posts on the internet that were created prior to the release of Windows 2008 R2.  In Windows 2008 R2 the lh parameter is now required.  If the lh parameter is not specified, klist will return the usage.

Thursday, September 29, 2011

How to Enable Powershell Remoting using VMware invoke-vmscript

Enabling Powershell remoting using scripting can be tricky due to permissions issues. One work-around is to use the task scheduler to invoke the script locally. Another method I have used is to enable remoting using VMware’s PowerCLI—obviously this will only work for Windows guests running on VMware with functioning tools.
The script is very simple:
$script = 'Enable-PsRemoting –F'
$vms = get-vm | where {$_.PowerState -eq "PoweredOn" -and $_.Guest.OSFullName -match "Microsoft Windows*" } | Sort Name
foreach($vm in $vms){invoke-vmscript -scripttext $script -vm $vm  -guestcredential $guestcred -hostcredential $hostcred}
The first two lines get the guest and host credentials that will be used to launch the script. The host credentials are usually the actual root account on the host—not the credentials for vCenter. The script that needs to be executed in each guest is stored in the $script variable.
The VMs to run the script on are then collected. I am limiting it to only Windows guests that are in a powered on state.
Finally I run the script inside each guest using a foreach loop. It is possible to pass in an array of VMs, but I have found that if you have a heavy consolidation ratio it can cause dramatic CPU spikes.
You can then test your connections using enter-pssession or invoke-command.

Wednesday, September 21, 2011

Powershell PSComputerName is empty or blank when selected after invoke-command

When using PowerShell remoting a NoteProperty object is attached to each result to indicate the computer the result was generate on.  This property is empty or blank when a select object is done further down the pipeline, but inspecting the object shows the data is present.  Here is an example:
invoke-command -computername MyServer -scriptblock {gwmi win32_operatingsystem}
SystemDirectory : C:\WINDOWS\system32
Organization    : IT
BuildNumber     : 3790
RegisteredUser  : User
SerialNumber    : 69713-640-2887872-45140
Version         : 5.2.3790
PSComputerName  : MyServer

invoke-command -computername MyServer -scriptblock {gwmi win32_operatingsystem} | Select PSComputerName, Version

A simple way to work around this problem is to use the format-table cmdlet:
invoke-command -computername MyServer -scriptblock {gwmi win32_operatingsystem} | Format-Table PSComputerName, Version
PSComputerName                                                                                      Version
--------------                                                                                      -------
MyServer                                                                                            5.2.3790

Saturday, September 17, 2011

Enabling Powershell Remoting on Remote Computers

Powershell remoting is a powerful tool that allows you to run a powershell session on a remote computer. Enabling remoting on a single computer is simple:

Start a powershell.exe prompt (Run as administrator after XP/2003)
Type: Enable-PsRemoting -F

If you have hundreds of computers you want to enable it on it is more difficult. Psexec will hang when executing the command which prevents the standard administrator method. There are some complex scripts floating around the internet. I found the following method to be very simple:

schtasks /create /s servername /rl highest /ru SYSTEM /sc once /st 18:00 /TN EnableRemoting /TR "powershell -noprofile -command Enable-PsRemoting -F" /F /Z /V1

Change the /s servername to the actual server name
Change /st time to a time in the future (be careful if you have servers in different time zones)

After the task runs, the /Z parameter will cause it to be deleted automatically.

If you do not use the /V1 parameter when scheduling a task against a 2003 server you will receive the following error:

ERROR: The task XML contains a value which is incorrectly formatted or out of range.

To execute the schtasks command against a large number of servers, you can use a text file with the server names and a for loop:

for /f %i in (textfile.txt) do schtasks /create /s %i /rl highest /ru SYSTEM /sc once /st 18:00 /TN EnableRemoting /TR "powershell -noprofile -command Enable-PsRemoting -F" /F /Z /V1

To test your work start powershell:
invoke-command -computername remotecomputername -scriptblock {hostname}

Thursday, July 21, 2011

Windows 7 Explorer Hangs Interminably On Network Copy to 2008 R2 Server

Quick Description:  Windows explorer hangs copying large files at “discovering” and never returns when copying from Windows 7 to a Windows 2008 R2 network location.  Attempting to cancel, or close the dialog will switch the dialog to say it is copying with n minutes left, but no progress is actually made.
Solution:  As referenced in KB951037 disabling TCP Chimney Offload instantly solved the problem for me.
From an administrator command prompt type:
netsh int tcp set global chimney=disabled

Update:  This can also a problem with Symantec Enterprise Protection 11.  My feelings about the product notwithstanding, many company domain policies will allow you temporarily kill the service (services.msc) while you perform a specific task.

Friday, June 24, 2011

Can't consume Oracle Synonyms with a SQL Server linked server endpoint

Generally, I'd prefer to post how you *can* do something, but to save anyone having the same dilemma some time -- SQL Server linked server objects can't consume Oracle Synonyms.  This is due to the same reason you can send a "select top 10 *" query through a linked server even though that's just gibberish to Oracle.  Commands directly to a linked server object rely on SQL Server objects and syntax regardless of the fact that the back end is Oracle (although still magically persist case sensitivity).

If you need to use a synonym in a query against an Oracle linked server, you will have to use openquery.  This isn't pretty for more complex queries/loads.  The syntax for using openquery for this is
  • select * from openquery(LinkedServerName, 'select object from synonym.object')

It is also possible (but bad behavior) to look directly at the Synonym and find the referenced object (easiest in a product like toad, but if you use sqlplus the command is DESC synonymname).  Again, the developer who created the synonym probably created it because he wants you to use it, so unless you are the developer for the target server, you should really check this is okay.

Tuesday, June 14, 2011

The prize for weirdest DPM error goes to...

Problem: In DPM 2010 (and DPM 2007),
1.  An existing SQL server or SharePoint farm begins failing recovery points and throwing VSS errors.   A new attempt to create a DPM protection group for SQL server fails.
2.  DPM does not find a SQL server on a server that is running SQL Server.
3.  DPM cannot create a SharePoint replica because of a VSS error on the SharePoint database server.

The DPM server will register the error "The VSS application writer or the VSS provider is in a bad state. Either it was already in a bad state or it entered a bad state during the current operation. (ID: 30111)" when attempting to create a recovery point, or when attempting to manually rebuild the replica.  This error can be caused by a lot of different problems including just needing to restart the VSS services (for SQL server, that's the VSS service itself, the SQL VSS service, and then the DPMRA service).

A check of the VSS writers on the SQL server will not show the SQL Writer in the list.  From an administrative command prompt type "vssadmin list writer" and check for a sql writer.  If it's missing, and you know the agent was installed correctly, you might be running into the same problem I did.


I have a hard time believing the reason for this problem, but...this problem is caused by a trailing space at the end of a database name on the SQL server.  It sounds like a joke, but it's not.  This might not be an easy problem to fix (if applications are using that database), so you may need to start old-school SQL Agent backups until the problem can be fixed.  The problem is documented for SQL 2005/DPM 2007, but I "accidentally confirmed" that it is also an issue on DPM 2010 and SQL 2008 R2.  It also affects upstream SharePoint farm replicas.  From what I can find removing the trailing space, is the only solution to the problem.

It's bad behavior to have a trailing space on a database name.  It's certainly not something you would intentionally put into production, but--hypothetically speaking-- if you were, say, doing a test restore of a database, weren't being careful, and accidentally got whitespace in the name, you could take out all other backups for the whole SQL Server (and SharePoint Farm, if SharePoint databases are hosted on the SQL Server).  This seems a bit like getting grounded for life, as a consequence of being 3 minutes past curfew.  Nobody's saying trailing spaces in database names are a good idea, but to essentially give the whole server the silent treatment (including not logging any useful errors) seems a little harsh.

Here's a query to quickly check for trailing spaces in database names on a SQL server.

select name from sys.databases where name like '% '

Thursday, June 9, 2011

Query to get non-relative navigation links in a wss 3.0 database

Quick Description:  If non-relative  links are hard-coded in SharePoint navigation, problems will occur if the site is moved (for instance in a migration to the new version).  It is hard to identify and locate those links before they are found when they break.

One Solution:

The query below finds non-relative links embedded in the Quick Launch and Top Bar on WSS 3.0 sites.

Ideally this should be run on a non-production system with a restored copy of the content database, it's very bad behavior to query the live SharePoint database directly.  Replace topsite% with the name of the site under which you want to search.

use [ContentDBofSiteInQuestion]

select dbo.webs.title as SiteName, dbo.navnodes.url as NonRelativeURL, dbo.webs.fullurl as location
from dbo.navnodes
join dbo.webs on dbo.navnodes.webid =
where dbo.webs.fullurl like '/topsite%'
and dbo.navnodes.url like '%http%'

Stop 0x0000005D on VMware when using WinPE or WAIK

Problem: WinPE images crash with a stop 0x0000005D blue screen error on some VMware virtual machines.
Solution: 64-bit WinPE images cannot execute on VMs defined with a 32-bit operating system within VMware.  To temporarily work around the issue, change the Guest Operating System to a 64-bit version.  Remember to change it back when finished using WinPE on the guest.

Wednesday, April 13, 2011

DPM SharePoint Item Level Restore Problems at Site Collection Level

Short Description:  When using DPM Item Level Restore to restore a (SharePoint 2010) site collection that has been deleted, the DPM recovery job succeeds, but the site is not restored.
Problem:  When a site collection has been deleted, it shows as restorable from the last recovery point (preceding deletion) in Data Protection Manager.  The restore completes with no indication of a problem.  However, when you attempt to access the site collection, the top level URL results in an error stating the IIS version, and Connection Closed.  Attempts to hit a known page further down the hierarchy will result in a 404 error.
It appears that the DPM restore does not correctly re-create the site collection if it has been deleted. 
Work Around: 
  1. In Central Admin create a new site collection with the same name and URL as the old Site Collection
    • don't worry about permissions and templates, these will be restored
    • if you don’t have the URL, it’s available in the DPM recovery console for the restore point you’re using
  2. Re-run the recovery job (make sure to choose the option to apply the recovery point permissions instead of the destination permissions). 
  3. Profit!  (actually, VoilĂ )

Friday, March 4, 2011

Data Protection Manager Continually Crashes after Update KB2250444

Short Description:  After applying KB2250444 the DPM service continuously crashes with error 945 “Unable to connect to the DPM database because of a general database failure.  Make sure that SQL Server is running and that it is configured correctly.”
Problem:  In an experience that has probably shaved years off my life, one of our DPM servers crashed horribly and repeatedly after applying the latest Cumulative Update for Microsoft Data Protection Manager KB2250444.  Our DPM servers are multi-terabyte and are backing up our SQL and SharePoint infrastructure, the thought of having to recover from snapshot and possibly lose several hours of SQL recovery points was starting to look terrifyingly likely. 
The update appeared to apply successfully, but DPM subsequently crashed repeatedly with the error below.
Unable to connect to the DPM database because of a general database failure.  Make sure that SQL Server is running and that it is configured correctly.
Problem Details:
<FatalServiceError><__System><ID>19</ID><Seq>44</Seq><TimeCreated>3/1/2011 9:45:21 PM</TimeCreated><Source>DpmThreadPool.cs</Source><Line>163</Line><HasError>True</HasError></__System><ExceptionType>SqlException</ExceptionType><ExceptionMessage>Could not find stored procedure 'dbo.prc_RM_ReplicaTrail_GetActiveTaskId'.</ExceptionMessage><ExceptionDetails>System.Data.SqlClient.SqlException: Could not find stored procedure 'dbo.prc_RM_ReplicaTrail_GetActiveTaskId'.
I snipped out the stack trace…you’re welcome
The update could not be removed, and a re-apply did nothing to help.  It was obvious from looking at the database that the procedure referenced in the error was in fact missing, and when compared with a healthy, updated, DPM server that had the procedure, the structure of the table updated by the procedure was also not the same.  It’s really not best practices to directly edit the DPM database, or to try to parse through what an update is doing and try to make your own update, so I called Microsoft Support. 
At this point I should mention that many people have solved errors with applying the patch, by following this thread, unfortunately it didn’t work for me.  It’s also necessary to  check are that there is plenty of disk space on the drive hosting the dpmdb as the patch creates a backup of the database prior to applying.  It seems a bit like adding insult to injury that a patch attempts a backup, and if that backup fails, it can cause your database to get into a state that can neither be backed out, nor recovered by re-applying the patch (on Windows 2008R2, there is no uninstall for KB2250444). 
After several hours on the phone with Microsoft support, it was decided that it is supported to manually create the procedure and alter the table directly.  Below are the steps I had to take.  This fixed the problem, the replicas recovered after consistency checks, and test restores completed successfully, still, I really feel this is a last resort kind of fix. 
Alter the Table first, the procedure will not create without the new column “lastbackuptime” :
USE yourdpmdb

ALTER TABLE tbl_PRM_LogicalReplica
ADD LastBackupTime DateTime NULL;

Then Create the Procedures:

USE yourdpmdb

IF EXISTS (SELECT * FROM dbo.sysobjects    
WHERE id = OBJECT_ID(N'prc_PRM_LogicalReplica_UpdateLastBackupTime') 
AND OBJECTPROPERTY(id, N'IsProcedure') = 1)
DROP PROCEDURE dbo.prc_PRM_LogicalReplica_UpdateLastBackupTime

CREATE PROCEDURE dbo.prc_PRM_LogicalReplica_UpdateLastBackupTime
@ReplicaId GUID,     
@LastBackupTime DateTime

DECLARE @error int
SET @error = 0

UPDATE tbl_PRM_LogicalReplica
SET LastBackupTime = @LastBackupTime
WHERE ReplicaId = @ReplicaId
AND IsGCed = 0

SELECT @error = dbo.udf_DPS_CheckRowCount(1)
RETURN @error

GRANT EXECUTE ON  dbo.prc_PRM_LogicalReplica_UpdateLastBackupTime TO MSDPMExecRole

USE yourdpmdb

IF EXISTS (SELECT * FROM dbo.sysobjects    
WHERE id = OBJECT_ID(N'prc_PRM_LogicalReplica_GetLastBackupTimeOnProtectedGroupId') 
AND OBJECTPROPERTY(id, N'IsProcedure') = 1)
DROP PROCEDURE dbo.prc_PRM_LogicalReplica_GetLastBackupTimeOnProtectedGroupId

CREATE PROCEDURE dbo.prc_PRM_LogicalReplica_GetLastBackupTimeOnProtectedGroupId
@ProtectedGroupId GUID

DECLARE @error int
SET @error = 0

LastBackupTime = CASE
WHEN LogicalReplica.LastBackupTime IS NOT NULL              -- Can be null if no backups have happened
THEN LogicalReplica.LastBackupTime
ELSE DS.LastAddedToCurrentPG                                -- In that case, use the time when this client was protected.
JOIN tbl_PRM_LogicalReplica LogicalReplica WITH (NOLOCK)
ON DS.DataSourceId = LogicalReplica.DataSourceId
WHERE DS.ProtectedGroupId = @ProtectedGroupId
AND LogicalReplica.Validity <> 6
AND DS.IsGCed = 0
AND LogicalReplica.IsGCed = 0

SELECT @error = @@ERROR
RETURN @error

GRANT EXECUTE ON  dbo.prc_PRM_LogicalReplica_GetLastBackupTimeOnProtectedGroupId TO MSDPMExecRole


Thursday, March 3, 2011

Oracle RMAN: Unexpiring Database Backups in Bulk

Quick Description:  On a recent Disaster Recovery drill, we ended up with a disparity between the date of the RMAN catalog database and the backup files it keeps track of. 

Normally, the backup catalog and raw backup files would be in sync.  In our recent DR drill, we had a problem with the catalog on tape and had to restore a more recent copy of the catalog database.  The  database was several days newer than the raw files, and it expected that the files in question were no longer online.  RMAN came up after restoring the catalog.  A restore set to an 'until time' which was  available in the raw backup files started, but RMAN immediately failed to find a copy of any of the datafiles in the database.  Running a 'list backup' showed that the backups for the target date were now expired.

There is a very tedious process for unexpiring backups individually.  This involves checking the backup list for the files from the target time (or just the available files), determining their BS key, listing the backupset for that BS Key, and then setting the backupset to available for each individual BS key (change backupset available).  If you have only one expired backupset this may be feasible.  We had thousands of raw files.

If you just want to force the RMAN catalog to set all backups which are available on disk to available in the catalog, the command is much more simple.   In the RMAN shell, while connected to the rman catalog, and the target, send the command:
crosscheck backup;

This will check all backups in the catalog to see if they are available at the location stored in the catalog.  It will set available backups to available, and expire backups which are not on disk.

This was a lifesaver for our DR drill.

Tuesday, February 1, 2011

Things to Think About Before Virtualizing SQL Server’s Underlying OS, or Adopting a Pit Bull – Part Three


Note:  This is an old article, and while Pit Bulls, in general, have remained largely the same in the meantime (although Gerald in particular has a very distinguished grey face now), SQL Server licensing for virtualized platforms has changed dramatically.  For more information on the current licensing requirements see the SQL 2012 Licensing Guide, or consult your Microsoft Rep.

Part One – Know if you Can, and if You Can, Be Patient
Part Two – Know your environment and be excited to answer questions
Part Three:  You Will be Held to a Higher Standard – It’s Not Fair, but You’ll Need to Live up to it.

We’ve already discussed the fact that there are plenty of people who have put a lot of thought into both SQL virtualization and pit bull ownership, and who have legitimate concerns.  Unfortunately, there is also a lot of misinformation out there, and there are people who have already made up their minds about  SQL virtualization and pit bulls.  This will put the burden of proof on you, to show that your dog or infrastructure is not a problem before alternatives are considered.
Your pit bull and your virtual infrastructure will be judged much more harshly than other, possibly less stable, dogs and infrastructure.   While most people are sensible, kind,  and thoughtful, you will almost certainly encounter people who have read one article, heard a horror story, or worse, tried it and made a mess of it.  Often these people subsequently believe anything that goes wrong in the world is the fault of either pit bulls or virtualization. 

Even if your pit bull is stable, courteous, and just an all around wonderful guy, people may still cross the street to avoid walking past your perfectly well mannered fella who is heeling next to you on his leash.  By the same token, your stable, cost effective, virtual infrastructure may be passed up for physical hardware that is less fault tolerant and less performant, simply because it’s perceived to be impossible to design a stable, performant, virtual infrastructure.  There will always be the guy with a notoriously badly behaved application, who believes that all of the application’s problems are caused by the fact that somewhere down the chain, there is a virtualized SQL server which couldn’t possibly be as stable as the performance and uptime history you’ve provided on said server indicates.   In both of these situations, only time and track record will make a difference, and even then, some people may never be convinced.  All you can do is make your pit bull and your virtual SQL servers, shining examples of the breed or architecture they represent.

PrintFig 1. One of these tricks is made up:   Gerald doesn’t really fetch slippers, he’s far too smart to ever put fabulous shoes in his mouth
Note the sudden improvement in comics, thanks Lisa!
When you’re held to a higher standard of proof, be able to point to resources and data (nicely) to respond to the statements below: 
  • “Microsoft doesn’t support it”
  • “I tried it and it was a disaster, it can’t be done”
    • This one is hard to address tactfully.  My love for all things Dorothy Parker results in the need to bite my tongue to quell the urge to “get quippy” with this one.   I’ll stick to noting that I, personally, can’t run a marathon, but plenty of people do it, and do it really well.
    • In a real life situation, it’s prudent to answer this question in the nicest way possible.  It’s true that virtualizing SQL server used to be a disaster, and wasn’t supported, but recently there have been major advances in best practices for VMware and Hyper-V, stability, performance, and support.  With all the new best practices available, it could be worth giving it another look.
  • “Storage is a disaster on virtual platforms”
    • Storage certainly can be a disaster on virtual platforms, but virtualization can also drastically simplify storage and make it much more flexible
    • If you have good storage administrators, you can abstract yourself from the physical storage layer almost completely.  For me, it’s great to think in terms of IOPS and latency, rather than what raid level is being used, how the disks are attached/presented, and even what storage you’re on.
      • I know it’s very important to a lot of people who are much smarter than me, that you look at dedicated spindles, raid configuration, and attach/presentation methods.  With very high performance systems  it’s important to know more than IOPS and latency, but when we’re talking about vitualization, we’re talking about much smaller machines.  With the current vcpu max at 8vcpus and 256g memory, very high performance machines aren’t candidates for virtualization.  It seems to me that with smaller systems, all of the physical concerns boil down to guaranteeing high IOPS and low latency.  For someone like me who is not a storage admin, I like to be comfortable that a good storage admin will give me the information I can work with which is, for example, “Will this configuration support 10,000 8K IOPs at <7ms latency?”
      • If storage looks local to a virtual machine and you are running vSphere (as opposed to Hyper-V), you can transparently migrate to a different storage system with no downtime – there’s overhead for this, it shouldn’t be done at peak times.
      • You can expand a virtual machine’s disks on the fly, it will depend on the backend storage’s file system whether this is asking for fragmentation.

Monday, January 31, 2011

VMware cloning of Linux guests using LVM

In the past, my attempts to use cloning with customization specifications in combination with Linux guests using LVM would fail. A search of the VMware KB lead me to these two articles both of which say LVM is unsupported:
In testing this again on vSphere 4.1 with SLES 11 SP1, I discovered that the combination now works as long as the appropriate VMware tools are installed.
I did get this revealing error during an attempt that did not have appropriate tools:
“Customization of the guest operating system 'sles11_64Guest' is not supported in this configuration. Microsoft Vista (TM) and Linux guests with Logical Volume Manager are supported only for recent ESX host and VMware Tools versions. Refer to vCenter documentation for supported configurations.”
I could not find any references in the vCenter documentation, but I am checking with VMware to determine if the KBs are still correct.

Thursday, January 27, 2011

Things to Think About Before Virtualizing SQL Server’s Underlying OS, or Adopting a Pit Bull – Part Two

Note: This is an old article, and while Pit Bulls, in general, have remained largely the same in the meantime (although Gerald in particular has a very distinguished grey face now), SQL Server licensing for virtualized platforms has changed dramatically. For more information on the current licensing requirements see the SQL 2012 Licensing Guide, or consult your Microsoft Rep.

Part One – Know if you Can, and if You Can, Be Patient

Part Two:  The Good:

There will be people who have done research on Virtualization or Pit Bulls and who have legitimate concerns and questions for which you should have answers before “going live.”

These are awesome people.  They know enough to know that there are things that can go very wrong with any given virtualization implementation or individual pit bull, but if you provide them the information about your particular virtual infrastructure, or your big goofy dog, they will look at the evidence with a fair mind, and often ask questions  that lead to a better infrastructure going forward.  In the case of pit bulls, It’s perfectly sensible to want to know if your pit bull is dog aggressive (and you should be sure to know this -- if he is, you need to know how you will handle things like walks, vet visits, and other multiple dog situations).   It’s sensible, not insulting, to want to know if your pit bull can be trusted around small animals (Pit Bull Terriers are Terriers after all).   Personally, I’d also want to know how he got so handsome, but that’s just me.  You put a lot of work into training such a well-mannered gentleman, take the time to show off what a great example of a sweet, stable, dog he is, and show he’s a great ambassador for the breed.


Fig 1.  This is what happens when technical people attempt to draw  their own cartoons –
Also, that’s a pit bull not a kangaroo… no disrespect to kangaroos who probably also have a lot of good ideas.
With respect to virtualization, it’s smart of a DBA, or application owner, to ask questions about your virtual infrastructure, and you should have visibility into, have tested, and be able to answer those questions.   Below are some examples of questions I get that I find comforting to know other people are thinking about before jumping into virtualization.
  • Storage:
    • What does the back end storage look like?
    • What do you consider acceptable latency?
    • How many IOPs at what size, can it do without generating big latency?
    • If the disks are shared, how much overhead does it have for my I/O?
    • What is the storage connectivity?
    • If I need to physically separate logs from data files for DR or performance (the tide seems to be turning on the performance question regarding logs and data, but it’s still a deal breaker for many DBAs, and a serious concern for DR), how will that be done?
    • Will the VM think that storage is local to the VM, or will it be directly mapped? 
      • If it’s directly mapped, what technology is used to map it?
      • If it appears to be local storage, how is it presented to the Host server
      • If it will be presented to the VM as local, can I do live storage migration?
      • If it appears to be local to the VM is there still fault tolerance on the storage side?
      • If storage is directly mapped, will a host failover cause any loss of storage connectivity?
  • High Availability
    • Is the cluster configured for HA?
    • Has HA been tested properly?
    • If only some of the machines in the cluster are fully licensed for SQL server, has DRS (or the Hyper-V equivalent) been turned off, so that servers can’t migrate automatically (which would result in being out of licensing compliance)?
    • If my SQL server has mirrored databases, where will the mirrors run? 
      • Will the mirrors be kept on physically separate Host servers and storage from the primary?
  • Resources:
    • Do I get a dedicated OS per instance?
    • Is network failover configured and tested?
    • How many CPUs/how much memory can be dedicated per VM.
    • Does the host server have at least as many cores per socket as the number of VCPUs to be provisioned per virtual machine?  You can comfortably over-provision the total number of vcpus allocated on the host with respect to the number of physical cores.  However, if your physical machine has 4 sockets, each with 4 cores, if you allocate more than 4 vcpus to any individual VM, spanning more than one physical socket may reduce performance by reducing the number of opportunities to schedule threads across sockets.
It may go without saying (but I’m going to say it anyway, and I’ll be posting more on the subject later), you need a good dialogue with the people who run your storage and virtualization tiers.   If you’re the kind of DBA who believes that DBAs are from Mars, and Storage Guys are from a completely different solar system (you know who you are), think really hard about whether you’ll be able to pull this off.   Ideally, you’re able to at least see eye to eye with the guys who keep the rug under you.

Sunday, January 23, 2011

Things to Think About Before Virtualizing SQL Server’s Underlying OS, or Adopting a Pit Bull – Part One

Note: This is an old article, and while Pit Bulls, in general, have remained largely the same in the meantime (although Gerald in particular has a very distinguished grey face now), SQL Server licensing for virtualized platforms has changed dramatically. For more information on the current licensing requirements see the SQL 2012 Licensing Guide, or consult your Microsoft Rep.

These days, I think a lot about both SQL Server Virtualization and Pit Bulls.  They’re partly on my mind because they‘re two of my favorite things, and partly because both get a lot of unjustified bad press.  I'll admit that in both cases, I can be a little biased.  I may have already tipped my hand by playing the "favorite" card.  You can probably hear the pit bull snoring on my feet from wherever you are, and my constant posts on Virtualization probably don‘t help hide my feelings about virtualizing SQL – so  it’s pretty obvious which side of the debate I’m on.  In my environment, we've gone so far down the virtualization road, and gotten so much out of it, that it's sometimes hard to understand why opportunities for virtualization would be passed up.  However, a couple of  months ago at SQLPASS, I had a bunch of conversations with people who had thought a whole lot about virtualizing SQL and had decided for very good reasons, that it wasn't an appropriate solution.  This got me thinking about…

Things everyone should put thought into before Virtualizing SQL Server’s underlying OS, or adopting a Pit Bull -- a Several Part Series (now with 100% more comics – thanks Lisa!).

1.  Find out if you're even allowed to do it, and if you are, license properly:
Pit Bull:  Some home insurance carriers won't insure your home if you adopt a pit bull.  Some cities have Breed Specific Legislation outlawing pit bull ownership.  If you rent, some landlords don't allow it.  Some states and/or Insurance companies require additional liability insurance if you own a pit bull type dog.  You need to know if any of these apply to you.

Virtualized SQL Server:  Before Virtualing SQL, it's very important to find out what licensing you'll need, whether consolidating onto a virtual platform will be cost effective, and whether your management will be amenable to the idea.   To be allowed unlimited virtual machines on a fully socket licensed box you need to be licensed first for Windows (Datacenter), and second for SQL Server.  For SQL 2005 through 2008 R0, unlimited virtualization requires SQL Server Enterprise socket licensing.  For SQL 2008 R2 and up, Datacenter is required (in some cases, Enterprise is grandfathered under an ELA, but your Microsoft Rep will have to explicitly tell you if you’re grandfathered).   If your virtualization platform is not Hyper-V, you also need to be fully socket licensed for your virtualization platform.    Note:  As of the release of 2012, SQL Server licensing for virtualized platforms has changed dramatically. For more information on the current licensing requirements see the SQL 2012 Licensing Guide, or consult your Microsoft Rep.

While Microsoft fully supports SQL on any virtualization platform that is validated through SVVP, not all application vendors support their application back end being on virtualized SQL -- this is becoming less and less common.  As a side note, Oracle does support virtualizing the operating system on Oracle Enterprise Linux, but no other virtualization platforms are supported.  This post doesn’t address Oracle, but if you’re looking at virtualizing Oracle, read their support statements carefully.

2.  Realize that adoption will take longer
Pit Bull:  I'll call our example pit bull Gerald the Gentleman.  It took Gerald six months to get adopted.  Some great folks at a shelter saw what a special fella he was, and worked with him to find a home.  Everyone who met him agreed he was a great dog, but that didn’t get him adopted.  Gerald started to get discouraged, and he started to look sad (see fig. 1).  The folks at the shelter knew he was too good for the right home to pass up, and that once the right home came along it would be Gerald‘s time to shine.  Now he's a rockstar, he can't help but win over everyone he meets (he had surgery recently and it was touch and go whether the Vet’s office would give him back, given that he is such a charmer).  It took time to find him a home, and he had to win over the people who would take him home, but these days, Gerald's a big goofy social butterfly, and has a dorky dance to show you just how happy he is.  A few years later now, and he's not allowed to do the Gerald dance anymore, he's an old man and threw his back out getting over-enthusiastic to Brit-Pop (why Brit-Pop?  I guess we'll never know).

fig. 1.  Gerald at the Shelter.
Virtualized SQL Server:  I don’t have a nickname for our example SQL Server environment.  In the beginning no-one wanted anything to do with Virtualized SQL, but as Microsoft came out with a support statement for SQL Server on a virtual platform, and application groups were staring down expensive hardware refreshes, it began to look more attractive.  It took that first brave application group who were willing to look at the numbers, look at the testing, performance, and stability data, and jump in.  Virtualized SQL server got a big boost from the success of the first trailblazer application group who were willing to move on to a virtual platform.   Other groups began to consider virtualization as an option to avoid an expensive hardware refresh and to share the cost of licensing.   Those application group customers were happy, and they told other customers until everyone wanted virtualized SQL (and why wouldn’t they?   It‘s quick, it‘s cheap, and it‘s really, really, stable).   It's a long way from the first implementation, to being the default choice.  Establishing a track record or reputation takes time.  Customers (internal or external) naturally want the best for their environments, and all the documentation in the world, isn't as good happy customers and months of uptime stats.  These days SQL on (in our case) VMware, is almost as popular as Gerald the Gentleman.  I haven’t provisioned a physical SQL server in over two years.  Small SQL servers, serving a single application get their own, dedicated, OS and application custodians can schedule downtimes without having to coordinate with the six other small apps that would otherwise live on a single SQL Server.  Speaking of those small SQL servers, we’re getting consolidation ratios of up to 50:1 (which makes the pricing attractive).  Even SQL servers requiring dedicated hardware live on top of a virtualized OS to take advantage of inherent HA, abstraction from the hardware, storage agnosticism, and live storage migration.  It’s hard to look back and imagine how we could have scaled out our environment without virtualizing the SQL Server OS.  It’s been a stunning success  -- full adoption just took time.

Thursday, January 20, 2011

SharePoint 2010 Content Database Migration, Re-Deploy Results in Foreign Key Constraint Error

Short Description: In an upgrade from WSS 3.0 to SharePoint 2010 via the content database attach/re-attach with PowerShell upgrade, a content database can be attached and upgraded only once.  A second attempt results in the following errors:

SharePoint log:

[powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]: Cannot upgrade [SPSite Url=http://sitename/site].
[powershell] [SPUpgradeSession] [DEBUG] [12/9/2010 11:49:06 AM]: Skip upgrading [SPSite Url=http://sitename/site].
[powershell] [SPUpgradeSession] [DEBUG] [12/9/2010 11:49:06 AM]: Disposing SPSite Url=http://sitename/site.
[powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]: CanUpgrade [SPSite Url=http://sitename/site] failed.
[powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]: Exception: Object reference not set to an instance of an object.
[powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]:    at Microsoft.SharePoint.Upgrade.SPSiteSequence.get_CanUpgrade()
   at Microsoft.SharePoint.Upgrade.SPUpgradeSession.CanUpgrade(Object o)
Windows Event Log
Date  PowerShell.exe (0x0938)                  0x164C SharePoint Foundation          Database                       5586 Critical Unknown SQL Exception 547 occurred. Additional error information from SQL Server is included below.  The DELETE statement conflicted with the REFERENCE constraint "FK_Dependencies1_Objects". The conflict occurred in database "SharePoint_Config", table "dbo.Dependencies", column 'ObjectId'.  The statement has been terminated. 695f52b2-6e6d-4b5e-97d9-906e71cedd0d
 The DELETE statement conflicted with the REFERENCE constraint "FK_Dependencies1_Objects". The conflict occurred in database "SharePoint_Config", table "dbo.Dependencies", column 'ObjectId'.  The statement has been terminated. 695f52b2-6e6d-4b5e-97d9-906e71cedd0d

I reported this to Microsoft and they confirmed they can re-produce the issue.  More information will be forthcoming.

Steps to Reproduce:

WSS Environment:
2 Web Front Ends, 1 Application Server -- WSS 3.0
Database Server -- SQL 2005 sp3 9.0.4053

SharePoint 2010 Environment:
2 Web Front Ends, 2 Application Servers -- 14.0.5128.5000
Database Servers (mirrored in this case, but can be re-produced on standalone dbs) -- SQL 2008 R2 10.5.1746

First upgrade:
  1. Take a backup of the WSS 3.0 content database in SQL 2005 SP3
  2. Restore backup of the WSS 3.0 content database to the SQL 2008 R2 server associated with SharePoint 2010 environment (you can change the compatibility level without affecting the reproduceability of the problem).
  3. From the SharePoint 2010 application server in the SharePoint Management Shell (Powershell) run the following command to upgrade the database:
    • Mount-SPContentDatabase -Name -DatabaseServer -WebApplication -Updateuserexperience
  4. Upgrade completes successfully with errors – site collections render in SP2010 look and feel and are fully working
 Second upgrade (this is the one with the issue):
  1. In SharePoint 2010 environment, remove content database from web application in Central Admin
  2. In SQL 2008 R2, drop content database
  3. Take another backup of the original WSS 3.0 database in SQL 2005 SP3
  4. Restore that backup of the WSS 3.0 content database to SQL 2008 R2 associated with the SharePoint 2010 environment.
  5. From the SharePoint 2010 application server in the SharePoint Management Shell (Powershell) run the following command to upgrade the database:
    • Mount-SPContentDatabase -Name -DatabaseServer -WebApplication -Updateuserexperience
  6. Upgrade completes successfully with errors.   Site collections now render in the WSS 3.0 look and feel or will not render at all and throw errors when working with the site.
SharePoint Upgrade log contains the errors
 [powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]: Cannot upgrade [SPSite Url=http://sitename/site].
[powershell] [SPUpgradeSession] [DEBUG] [12/9/2010 11:49:06 AM]: Skip upgrading [SPSite Url=http://sitename/site].
[powershell] [SPUpgradeSession] [DEBUG] [12/9/2010 11:49:06 AM]: Disposing SPSite Url=http://sitename/site.
[powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]: CanUpgrade [SPSite Url=http://sitename/site] failed.
[powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]: Exception: Object reference not set to an instance of an object.
[powershell] [SPUpgradeSession] [ERROR] [12/9/2010 11:49:06 AM]:    at Microsoft.SharePoint.Upgrade.SPSiteSequence.get_CanUpgrade()
   at Microsoft.SharePoint.Upgrade.SPUpgradeSession.CanUpgrade(Object o)
 Windows Event Log contains the error
Date  PowerShell.exe (0x0938)                  0x164C SharePoint Foundation          Database                       5586 Critical Unknown SQL Exception 547 occurred. Additional error information from SQL Server is included below.  The DELETE statement conflicted with the REFERENCE constraint "FK_Dependencies1_Objects". The conflict occurred in database "SharePoint_Config", table "dbo.Dependencies", column 'ObjectId'.  The statement has been terminated. 695f52b2-6e6d-4b5e-97d9-906e71cedd0d
 The DELETE statement conflicted with the REFERENCE constraint "FK_Dependencies1_Objects". The conflict occurred in database "SharePoint_Config", table "dbo.Dependencies", column 'ObjectId'.  The statement has been terminated. 695f52b2-6e6d-4b5e-97d9-906e71cedd0d
When looking at the database it is clear that the GUID referenced appears in the objects and dependencies table only when the site collection is attached.  When it is removed the GUID no longer appears in either table, so it's not immediately clear why a delete statement would cause a foreign key constraint violation on a guid that should no longer exist.

Microsoft have confirmed that this is an issue with the PowerShell method of upgrading to SharePoint 2010, and that the workaround is to use stsadm.  Since PowerShell is Microsoft's recommended tool for administering SharePoint 2010 and according to this technet article Microsoft's position is
We recommend that you use Windows PowerShell when performing command-line administrative tasks. The Stsadm command-line tool has been deprecated, but is included to support compatibility with previous product versions.
I requested that there be a fix to the PowerShell method of upgrading a WSS 3.0 content database to SharePoint 2010, and I'm excited to hear of a fix soon!

Microsoft's workaround (which I haven't tried but which I'm providing in case you're really stuck and can't wait for a fix -- is below).
Instead of the PowerShell command, run
stsadm -o addcontentdb -url yoursiteurl -databasename yourdatabaseame -preserveolduserexperience false

My Workaround:
For us, this is a problem primarily with the prep for a production rollout, and not with the production rollout itself (unless something goes horribly wrong and you have to retract and re-upgrade the same site collection twice).  To workaround (actually avoid is more appropriate), you can take advantage of technologies like cloning and snapshots.
Option 1:  Prior to each test run, clone your environment (this is only really feasible if you're using virtual servers) either with a clone or a snapshot, and reset the environment before each test upgrade.
Option 2:  Create a gold point in time "image" of your environment and revert all servers back to that point in time after each test run.

A couple of RMAN errors that took me a few minutes to figure out

Problem:  When restoring a database to a new host with RMAN, I ran into the following errors.

Error 1:
On restoring a control file to a database started in nomount, and listener up:
ORA-19870 error reading backup piece
ORA-19504 failed to create file
ORA-27040 file create error, unable to create file.
This error is pretty cut and dried.  Either the file path does not exist, or the oracle user doesn't own/have permissions to create the file.

Error 2:
on restoring a database to a new host with the control file restored, database in nomount, and listener up:
Failure of recover command at
ORA-01013 user requested cancel of current operation

This error could be much more clear.  It can be caused by an open connection to the database.  Since you're already restoring the database, the easiest way to remedy is to shutdown the database, then startup nomount, and make sure you exit your session before re-running from RMAN.

Installing Reporting Services on SQL 2005 on Windows 2008

Quick Description:  On Windows 2008 R2, even after installing the web server role and registering the .net framework with IIS, reporting services is greyed out as an option when attempting to install it to an existing instance.

A while back I wrote an article on installing Reporting Services on an existing SQL 2005 instance.  These steps work for Windows 2003.  However, if you should for some reason want to install SQL 2005 SP3 on Windows 2008 R2 (it's not the most common configuration, but maybe your licensing only allows up to 2005), these steps won't get you to the point where SQL 2005 will allow you to install reporting services.  In my case, after following the steps in the original article (substituting adding the role in Server Manager, instead of adding IIS in add/remove windows components), and registering .net with IIS, Reporting Services was still greyed out as an option in setup.

I fumbled my way through this one before finding this article that Microsoft published on the subject.   It's messy, but it works.  It also probably demonstrates that SQL 2005 isn't really meant to be installed on Windows 2008 R2.

Wednesday, January 19, 2011

Office Search Server Indexing Problems

Quick Description:  An existing, working installation of Office Search Server protecting a WSS farm stops updating the index and the crawl appears to be hung for an extended period of time.

Problem:  When the crawl hangs, new items and documents added to the WSS farm will not show up in searches.  The crawl never finishes.

I'm temporarily doing a bit more SharePoint administration, which includes the care and feeding of our old WSS 3.0 farm.  It's not as exciting as the shiny new 2010 architecture, but it's our day to day workhorse and I'm learning little tips and tricks for coaxing WSS 3.0 into lasting until you can fully migrate to 2010.

In these cases, we've traditionally been told to rebuild the index.  This fixes the problem, but effectively kills search for the time it takes to get items into the index.  If you have a large farm like us, it could be several hours before you have all items indexed.

Solution: On the application running Office Search Server, go to Start, Programs, Office Search Server, Office Search Server Administration.

In the admin console go to crawling, and click content sources.  Check the dates of the last crawl, and the duration of the current crawl.  If the current crawl has been going for an unreasonably long amount of time, stop the crawl by right clicking on the content source and selecting stop crawl.

To start a full crawl without destroying the current index, select the content source, and on the Edit Content Source page, scroll to the bottom and check "start full crawl of this content source."  The index will now rebuild without resetting and destroying the existing index.

If this doesn't solve the problem, you may need to reset all crawled content which will destroy the index.  If you reset all crawled content, searches will not return results until the content being sought is crawled.

Monday, January 17, 2011

How to Move a Server to a Different Group in DPM 2010

Quick Description:  DPM doesn't provide an obvious mechanism for moving an object from one protection group to another.

Problem:  There are many reasons you might want to move an object into a different protection group in DPM.  For example, if you have a database in a protection group that allows for 5 days online retention, but the requirement changes to 10 days for that database only.  An application changes groups and the objects protected by DPM need to be grouped differently to reflect that change (even just for the sake of clarity).  You've moved a protection group from its original DPM server to the Disaster Protection DPM server by switching disaster protection, and now want to put it in its own group with it's own retention and backup policy.

Solution:  To do this you need to stop protecting the member, and then create a new protection group (or modify an existing one), and re-import the inactive protection into the new group.

Remove the items from the existing group
  • Select the item to be moved in the protection console and right click it.
  • Select "Stop Protection of Member" from the drop down menu.
  • In the Remove from Group screen, verify that Delete Replica on Disk is NOT checked
    • Verify it again, it's important.  If you leave it checked you will lose your existing data
  • Select Okay
  • rinse and repeat as necessary for the items you want to remove.
Re-Add or Create a New Protection Group
  • In the Protection console, either select the group into which you want to move the inactive items, or select Create Protection Group.
  • Select your protection group type
  • In the Select Group Members screen, expand and select the items you want to protect, then click next
  • In the Select Data Protection Method screen, provide a group name, and select the type of protection you want (in this case, disk only).
  • In the Specify Short Term Goals screen, provide values for Retention Range, Synchronization Frequency, and a time for the Express Full Backup
  • In the Review Disk Allocation screen, review the suggested data size, and make any adjustments necessary.  Decide whether storage should be co-located, and whether the volumes should automatically grow.
  • In the Choose Replication Method section select whether DPM should replicate now, later, or manually.
  • In the Consistency Check Options screen select you preferences
  • In the Summary screen, verify the details and select Create Group
  • You may get a message that the action will initiate a consistency check.  This is expected.

Thursday, January 6, 2011

SQL 2008 R2 Replication Config Errors

Quick Description:  SQL 2008 R2 installed on Windows 2008 R2, replication configures and starts, then fails with an Access Denied error.

Transactional Replication is configured on a SQL 2008 R2 server that is installed on a Windows 2008 R2 operating system.  When you run the agent, it fails with error.
Replication-Replication Distribution Subsystem: agent DistributionName failed. The distribution agent failed to create temporary files in 'c:\Program Files\Microsoft SQL Server\100\COM' directory. System returned errorcode 5.
 This Microsoft article addresses this error if you are running your replication agent as the SQL Server Agent account, but in the case where the subscriber, publisher, and replication agent all use different domain service accounts, this did not fix the error.

The following fixes the problem, it may be overkill, I wasn't able to isolate which of the two additional permissions is responsible for fixing it.

On the subscriber grant write permissions for the directory 'c:\Program Files\Microsoft SQL Server\100\COM' to the following accounts:
SQL agent account for both subscriber and publisher
Replication agent account