Often when I am requested to look at an issue with SQL Server I have to spend some time in the error log.
On a side note, I think the log itself is a commonly overlooked tool for troubleshooting. Often it can be a bit cumbersome to work with if the default settings are used.
When SQL Server is installed, the error log is configured to start a new log when the service is restarted and only a handful of logs are retained long term. In some cases where troubleshooting has lead people to restart the service, or server a number of times valuable information can be lost simply because of the number of error logs that are retained.
On the other side of the spectrum if the server has been on line for quite some time, the error log can grow to extremely large sizes. Depending on a few settings that may have been made elsewhere many of these log entries are not really much assistance in working though issues. Imagine having an error log with half a million rows, and even though these half a million rows are sorted by date time, you have to scan the log to see when problems started. While you are considering how you may filter the log, or copy the log to a format that is easier for you to manage and review. Don’t forget that a log that size will need to be copied or loaded and sometimes both. Many times this has to be done while there is an issue of some sort, that very well may be causing production issues, and time is critical (nothing like having management wanting a problem solved and you have to wait for a huge error log file to load). The good news is there are easier ways to do this. Here are a few tips I like to stick to with my error logs.
- Cycle your error log – Large error logs can be avoided by taking a couple steps, one of these can be simply cycling the error log. When executed sp_cycle_errorlog will simply start a new log, I like to schedule this to happen on a regular basis for example every Sunday night. On servers that have a lot error log entries, because of security logging or frequent transaction log backups then I may even consider cycling the error log more often.
- Change the retention – By default you are going to have 7 error logs, if you are considering cycling your error log, (I believe you should) I would recommend also changing the retention time of your error log, often I will keep 60 days’ worth of logs, but your requirements may be different. The easiest way to do this is to Right click on the Error Logs folder in Management Studio, select configure and increase the Max number of error logs before they are recycled.
- Table Filtered Logs – One thing I like to do, purely for troubleshooting on super busy systems is to use the stored procedure xp_readerrorlog to insert the rows of data into a table. When I do this, I can then filter out the items that I don’t need to see. This is also an easy way to compile logs from multiple servers into one table, and used in combination with a mail profile send your self the results. This will keep you from having to review many error logs on many servers. Often what I do is use the stored procedure to insert into a temp table, and since this is a filtered error log I will remove known entries from that table that I am aware of, items like transaction log backups successful, log on attempts (I often count those before I remove them to make sure the failed ones are not indicating another issue) and other repeated entries. Once I filter the temp table of everything I am aware of, I then insert the log into a permanent table for safekeeping. The permanent table then is managed to a retention time as well. Remember this table is filtered view, I use it in place of the error log for quick access so I can filter it as I would any other table, but it does not replace having the detailed error log. It just gives me more filtering options (Like joining to job history when a job keeps failing).
The end result is that for my use when the log grows to a size like that, it is frustrating to use the log to find the information that is critical. Managing your error log is important and the end result is that it will make your systems easier to manage and troubleshoot.