Friday, November 11, 2011

SharePoint 2010 Application Crawl Store DB Grows Out of Control

Note:*  I've been out of the SharePoint game a while, and not keeping up with its advances, there is definitely a way to manually clear the log from the GUI now.  I'm not sure if there is also a way to schedule grooming, other blogs would have more up to date info. 

Problem:  On SharePoint 2010, the Search Service Application Store DB grows out of control.  One table MSSCrawlURLLog is giant, and compared to the other tables, I mean really huge -- a couple of orders of magnitude bigger (MSSCrawlhostsLog may also be very large).

Reason:  Crawl logs are maintained for a very long time by default (it appears to be 30 days), there isn’t a way to configure this in the GUI.  Only errors are logged, and they’re logged with the frequency of the indexing crawl schedule.  This should be fine, but if you, for instance, have a pretty large farm and are migrating from 2007 to 2010, and you, say, installed the Fab40 templates, or have users who hardcoded URLs, these logs will get very large, very quickly.  There are a lot of postings about this issue, and the most recommended way to fix it is to reset the crawled content and re-crawl. With a large farm this is a search outage (search may be up, but you won’t get results until content is indexed), and takes a long time. I’m not crazy about this solution, but I haven’t found anything explicitly saying my fix is supported, so if in doubt, go with the reset.

One Fix:  In the Search Service Application Store DB (default name Search_Service_Application_StoreDB_funGUID), there is a procedure proc_MSS_CrawlReportCleanup.   It’s not super documented, and  I can’t find any way to call it through the GUI or PowerShell.  But it does exactly what you think it would do, and cleans up the offending tables.
Our environment is mirrored, so I wrapped calling the proc in the proc below that checks for mirroring primary first.  This shouldn’t be a problem on a standalone DB, but you could take it out for good measure if it doesn’t apply.  Pass the name of your crawl store database and the number of days you want to retain to the procedure.  It will generate a lot of IO if your table is growing a lot.  We run it off-peak.  Also, it’s not a bad idea to take a look at the error log and see if there are any errors that could be cleaned up since errors that don’t get logged, don’t have to be cleaned up.
USE [YOURUTILITYDB]

GO

/****** Object:  StoredProcedure [dbo].[PRUNE__SEARCH_CRAWL_STORE]    Script Date: 11/11/2011 16:29:13 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO


CREATE PROCEDURE [dbo]
.[PRUNE_SEARCH_CRAWL_STORE]
   
    (
   
--Pass the name of your crawlstore database as CrawlstoreDBName default is Search_Service_Application_CrawlStoreDB_somecrazyguid
     @CrawlStoreDBName
nvarchar(1000),
   
--@pruntime sets the number of days of error history to retain
     @prunetime
int
  
)

AS
BEGIN

declare @cmd nvarchar
(2000)

set @cmd = @CrawlStoreDBName +'.dbo.proc_MSS_CrawlReportCleanup '

-- For mirrored databases, the proc/sqlAgent job should live on both servers, and only execute on the primary
--Check the current status of the database because the command will fail if executed on a mirror database
if(select m.mirroring_rolefromsys.databases djoinsys.database_mirroring mon d.database_id = m.database_idwhere d.name = @CrawlStoreDBName) = 1
begin
  
--If the database is primary, run the log prune job
      
exec @cmd @prunetime
end

END
GO



3 comments:

  1. Hi Caroline,

    it seems to me that what you suggest, is actually what is being done in the "Crawl Log Cleanup for xxx" job. xxx being you search service application name. You might want to check if this job is enabled in your installation.

    Best regards
    Niels

    ReplyDelete
  2. I believe this is not supported as it is not supported to modify the SQL databases other than the properties and permissions.

    ReplyDelete