Tuesday, June 14, 2011

The prize for weirdest DPM error goes to...

Problem: In DPM 2010 (and DPM 2007),
1.  An existing SQL server or SharePoint farm begins failing recovery points and throwing VSS errors.   A new attempt to create a DPM protection group for SQL server fails.
or
2.  DPM does not find a SQL server on a server that is running SQL Server.
3.  DPM cannot create a SharePoint replica because of a VSS error on the SharePoint database server.

The DPM server will register the error "The VSS application writer or the VSS provider is in a bad state. Either it was already in a bad state or it entered a bad state during the current operation. (ID: 30111)" when attempting to create a recovery point, or when attempting to manually rebuild the replica.  This error can be caused by a lot of different problems including just needing to restart the VSS services (for SQL server, that's the VSS service itself, the SQL VSS service, and then the DPMRA service).


A check of the VSS writers on the SQL server will not show the SQL Writer in the list.  From an administrative command prompt type "vssadmin list writer" and check for a sql writer.  If it's missing, and you know the agent was installed correctly, you might be running into the same problem I did.

Solution:

I have a hard time believing the reason for this problem, but...this problem is caused by a trailing space at the end of a database name on the SQL server.  It sounds like a joke, but it's not.  This might not be an easy problem to fix (if applications are using that database), so you may need to start old-school SQL Agent backups until the problem can be fixed.  The problem is documented for SQL 2005/DPM 2007, but I "accidentally confirmed" that it is also an issue on DPM 2010 and SQL 2008 R2.  It also affects upstream SharePoint farm replicas.  From what I can find removing the trailing space, is the only solution to the problem.

It's bad behavior to have a trailing space on a database name.  It's certainly not something you would intentionally put into production, but--hypothetically speaking-- if you were, say, doing a test restore of a database, weren't being careful, and accidentally got whitespace in the name, you could take out all other backups for the whole SQL Server (and SharePoint Farm, if SharePoint databases are hosted on the SQL Server).  This seems a bit like getting grounded for life, as a consequence of being 3 minutes past curfew.  Nobody's saying trailing spaces in database names are a good idea, but to essentially give the whole server the silent treatment (including not logging any useful errors) seems a little harsh.

Here's a query to quickly check for trailing spaces in database names on a SQL server.

select name from sys.databases where name like '% '
 

2 comments:

  1. Hi, I think I'm sharing with you the same amount of problems with getting SP/SQL/DPM to work altogether... Is there an email address where I could contact you to share some thougths? you can drop me a line at roberto dot md at gmail dot com.

    Regards
    Roberto

    ReplyDelete
  2. Hello,

    I had the same issue procecting a SQl 2005 SP4 server using DPm 2010.
    when i tried to create the protection group DPM was not seeing the instance.

    I removed the space at the end and "voila!"

    Thank You
    Ray

    ReplyDelete