Friday, March 4, 2011

Data Protection Manager Continually Crashes after Update KB2250444

Short Description:  After applying KB2250444 the DPM service continuously crashes with error 945 “Unable to connect to the DPM database because of a general database failure.  Make sure that SQL Server is running and that it is configured correctly.”
Problem:  In an experience that has probably shaved years off my life, one of our DPM servers crashed horribly and repeatedly after applying the latest Cumulative Update for Microsoft Data Protection Manager KB2250444.  Our DPM servers are multi-terabyte and are backing up our SQL and SharePoint infrastructure, the thought of having to recover from snapshot and possibly lose several hours of SQL recovery points was starting to look terrifyingly likely. 
The update appeared to apply successfully, but DPM subsequently crashed repeatedly with the error below.
Unable to connect to the DPM database because of a general database failure.  Make sure that SQL Server is running and that it is configured correctly.
Problem Details:
<FatalServiceError><__System><ID>19</ID><Seq>44</Seq><TimeCreated>3/1/2011 9:45:21 PM</TimeCreated><Source>DpmThreadPool.cs</Source><Line>163</Line><HasError>True</HasError></__System><ExceptionType>SqlException</ExceptionType><ExceptionMessage>Could not find stored procedure 'dbo.prc_RM_ReplicaTrail_GetActiveTaskId'.</ExceptionMessage><ExceptionDetails>System.Data.SqlClient.SqlException: Could not find stored procedure 'dbo.prc_RM_ReplicaTrail_GetActiveTaskId'.
I snipped out the stack trace…you’re welcome
The update could not be removed, and a re-apply did nothing to help.  It was obvious from looking at the database that the procedure referenced in the error was in fact missing, and when compared with a healthy, updated, DPM server that had the procedure, the structure of the table updated by the procedure was also not the same.  It’s really not best practices to directly edit the DPM database, or to try to parse through what an update is doing and try to make your own update, so I called Microsoft Support. 
At this point I should mention that many people have solved errors with applying the patch, by following this thread, unfortunately it didn’t work for me.  It’s also necessary to  check are that there is plenty of disk space on the drive hosting the dpmdb as the patch creates a backup of the database prior to applying.  It seems a bit like adding insult to injury that a patch attempts a backup, and if that backup fails, it can cause your database to get into a state that can neither be backed out, nor recovered by re-applying the patch (on Windows 2008R2, there is no uninstall for KB2250444). 
After several hours on the phone with Microsoft support, it was decided that it is supported to manually create the procedure and alter the table directly.  Below are the steps I had to take.  This fixed the problem, the replicas recovered after consistency checks, and test restores completed successfully, still, I really feel this is a last resort kind of fix. 
Alter the Table first, the procedure will not create without the new column “lastbackuptime” :
USE yourdpmdb


ALTER TABLE tbl_PRM_LogicalReplica
ADD LastBackupTime DateTime NULL;




Then Create the Procedures:



------------------------------------------------------------------------------------
USE yourdpmdb


IF EXISTS (SELECT * FROM dbo.sysobjects    
WHERE id = OBJECT_ID(N'prc_PRM_LogicalReplica_UpdateLastBackupTime') 
AND OBJECTPROPERTY(id, N'IsProcedure') = 1)
DROP PROCEDURE dbo.prc_PRM_LogicalReplica_UpdateLastBackupTime
GO

CREATE PROCEDURE dbo.prc_PRM_LogicalReplica_UpdateLastBackupTime
(   
@ReplicaId GUID,     
@LastBackupTime DateTime
)
AS

DECLARE @error int
SET @error = 0
SET NOCOUNT ON

UPDATE tbl_PRM_LogicalReplica
SET LastBackupTime = @LastBackupTime
WHERE ReplicaId = @ReplicaId
AND IsGCed = 0

SELECT @error = dbo.udf_DPS_CheckRowCount(1)
SET NOCOUNT OFF
RETURN @error
GO

GRANT EXECUTE ON  dbo.prc_PRM_LogicalReplica_UpdateLastBackupTime TO MSDPMExecRole
GO


------------------------------------------------------------------------------------
USE yourdpmdb


IF EXISTS (SELECT * FROM dbo.sysobjects    
WHERE id = OBJECT_ID(N'prc_PRM_LogicalReplica_GetLastBackupTimeOnProtectedGroupId') 
AND OBJECTPROPERTY(id, N'IsProcedure') = 1)
DROP PROCEDURE dbo.prc_PRM_LogicalReplica_GetLastBackupTimeOnProtectedGroupId
GO

CREATE PROCEDURE dbo.prc_PRM_LogicalReplica_GetLastBackupTimeOnProtectedGroupId
(   
@ProtectedGroupId GUID
)
AS

DECLARE @error int
SET @error = 0
SET NOCOUNT ON

SELECT 
DS.DataSourceId,
LastBackupTime = CASE
WHEN LogicalReplica.LastBackupTime IS NOT NULL              -- Can be null if no backups have happened
THEN LogicalReplica.LastBackupTime
ELSE DS.LastAddedToCurrentPG                                -- In that case, use the time when this client was protected.
END
FROM tbl_IM_DataSource DS WITH (NOLOCK)
JOIN tbl_PRM_LogicalReplica LogicalReplica WITH (NOLOCK)
ON DS.DataSourceId = LogicalReplica.DataSourceId
WHERE DS.ProtectedGroupId = @ProtectedGroupId
AND LogicalReplica.Validity <> 6
AND DS.IsGCed = 0
AND LogicalReplica.IsGCed = 0

SELECT @error = @@ERROR
SET NOCOUNT OFF
RETURN @error
GO

GRANT EXECUTE ON  dbo.prc_PRM_LogicalReplica_GetLastBackupTimeOnProtectedGroupId TO MSDPMExecRole
GO


------------------------------------------------------------------------------------


No comments:

Post a Comment