Tuesday, April 8, 2014

The system_health Extended Event Session

When I first started poking around in SQL Server 2012, I noticed an extended event session called “system_health” was created by default. It took me a few months before I really dug into the session details to see what it was capturing. But once I did, I was pretty amazed.

The system_health session starts up by default and collects various performance points that can be used to help troubleshoot issues on the server. To look at the details of it, just right click on the event session and choose properties. This will open the Extended Events UI. Select the events page, and from there you will see each of the events that will be captured. You can see more the details of each event by clicking the Configure button.

For example, if you select the error_reported event, then click Configure. This will show you the details of which columns will be captured for the event and if there are any predicates defined.


On the Global Fields tab, you’ll see the actions being performed by the session. These are additional collection items that are captured when the event fires. In this case, the session is getting the callstack, database_id, session_id, sql_text, and the tsql_stack fields.


On the filter tab, you’ll see each of the predicates that are defined. This event will fire for error numbers: 17803, 701, 802, 8645, 8651, 8657 and 8902, as well as for any severity greater than or equal to 20. Most of these errors are related to out of memory issues, but the severity 20+ errors are all critical system issues.


On the Event Fields tab, you’ll see each of the fields that are automatically returned by the selected event; such as the error_number, severity, etc. The fields on this list will change based on which event is selected, because each event is designed to capture different information.


Now click on the Data Storage page. From here you will see each of the targets defined for the session. Targets are nothing more than places the session should store the data it collects. The system_health session as two targets defined; the ring_buffer and to an event_file. The ring_buffer target is an in-memory target designed to only hold the collected data for a short period of time. The event_file target is a file located on disk that holds the collected data.


The event_file will be used to retain the collected data for a longer period of time; however, it’s not indefinite. The default file size is only 5MB and the session only keeps 4 rollover files. This means the files could be overwritten within a few short days.

Click on the Advanced page. This section defines some of the global properties of the event session. For example, the event retention mode is set to “Single event loss”. This means if the event buffers fill up before SQL Server can write the events to the targets, it will drop a single event instead of causing significant performance degradation on the server. You can also define the maximum dispatch latency, which forces events to be flushed from memory to the targets at a defined interval. The maximum memory size defined how much memory is used to store the events before flushing them to the targets. Keep in mind that these two properties work in tandem, meaning the events will be flushed to the targets when either the max dispatch latency is reached first or the max memory size is reached first.


Now that we’ve covered what is being monitored, let’s look at how we can use the collected data.
Let’s say we encounter some data corruption within a database. From the errolog, we can see the error message, database ID, and the file and page numbers.

SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x419b8f57; actual: 0x419bea57). It occurred during a read of page (1:230) in database ID 9 at offset 0x000000001cc000 in file 'D:\MSSQL11.TEST\MSSQL\DATA\CorruptDB.mdf'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

To view the captured event session data, we can either right click on the session and click “watch live data” or we can open the event file which is located in the Log folder under the install directory of the instance. The event_file target will always append _0_ and then some long number the end of each file name. The _0_ has no significance, but the number following it is the number of milliseconds since January 1, 1600. This is done to help guarantees filename uniqueness. Once you have opened the file, scroll down to the timestamp of the error and you should see an event called error_reported. Select that event and then you’ll see the event details in the lower window.


One of the big advantages you’ll notice is the additional data being captured such as the session_id, sql_text, and tsql_stack. These are some of the actions that were define (above) for the session so that SQL Server collected this additional data at the time of the error. While this example may not require you to know the TSQL that was running, this extra info will be very helpful if you run into one of the out of memory errors.

As exciting as the system_health session is, there are a couple of issues I’d like to point out surrounding this session. Anyone with permissions can easily alter or delete the system_health session, whether or not it was a mistake. If this does happen, the entire session can be recreated using the u_tables.sql script. The script is located in the Install folder which is in the default instance folder for SQL Server.


Just open it and scroll down to the bottom of the script and you will see the CREATE EVENT SESSION. Run that section and it will recreate the system_health event session. Reading through that script is also a good way of seeing what each event will capture. The Microsoft product team did a really good job of documenting this script.

The other issue is the settings for the event_file target. The max file size is only 5MB and only has 4 rollover files. In order for the file to retain data for longer period of time, I would suggest changing the max file size to something like 25MB and keep 10 rollover files.


If you’re looking for more info on extended events, be sure to check out Jonathan Kehayias’s blog series “An XEvent a Day”. It’s packed full of goodies about extended events.

Tuesday, March 4, 2014

How Long is that SQL Command Going to Take?

Have you ever needed to restore a large database while someone is standing over your shoulder asking “How long is that going to take"? If that hasn't happened to you yet, then it’s only a matter of time.

Let’s throw out all the reasons why you need to do the restore and just discuss the technical part. Obviously the easiest way to know how long the restore will take is to use the “WITH STATS” option in the restore database command. But let’s say in the heat of the moment you forgot that little piece of the statement. Now what?

In older versions of SQL Server, there was really no way to tell how long a database restore operation would take to complete. You could make a rough guesstimate that if it took SQL Server one hour to backup the database, then it’s likely the restore would take the same amount of time. But in reality, that’s just a guess, and the person standing over your shoulder probably wants a more accurate estimation.

First introduced in SQL Server 2005, DMVs give us a wealth of new information on the internals of SQL Server, and for our dilemma above, we can use sys.dm_exec_requests. This DMV returns one row for each session that actively executing a command. One of the columns returned by the DMV is percent_complete which returns the percent complete for the currently executing command.

USE master;
GO
SELECT
   session_id
  ,start_time
  ,status
  ,command
  ,percent_complete
FROM sys.dm_exec_requests
WHERE session_id = 56;
GO


It looks like the database restore is about 33% complete. Now you have a more accurate idea of how long it will take to complete.

This may seem like a useless tidbit of information, since you can use the “WITH STATS” option in the restore database command to get the same information, but what happens when the command your running doesn't have that option; for example, DBCC SHRINKFILE.

On the rare occasion when you need to shrink a database to free up disk space, SQL Server needs to physically move data pages from one part of the file to another. Depending on how much data needs to be moved, this could take a long time. Using the same TSQL statement from above, we can query the sys.dm_exec_requests DMV.


This is awesome! A percent complete value for every command executing on SQL Server? Not so fast. The percent_complete column in sys.dm_exec_requests only works a few commands.

From Books Online:
  • ALTER INDEX REORGANIZE
  • AUTO_SHRINK option with ALTER DATABASE
  • BACKUP DATABASE
  • CREATE INDEX
  • DBCC CHECKDB
  • DBCC CHECKFILEGROUP
  • DBCC CHECKTABLE
  • DBCC INDEXDEFRAG
  • DBCC SHRINKDATABASE
  • DBCC SHRINKFILE
  • KILL (Transact-SQL)
  • RESTORE DATABASE
  • UPDATE STATISTICS
What a bummer that it doesn't work for every command. But from a DBA’s point of view, this list comprises quite a few of the administrative commands you’d use on a regular basis. These are commands that you would run instead of an end user, and knowing that you can relay a “percent complete” value back to them will assure them you are taking good care of their database.

For more details on this DMV, check out Books Online.

Tuesday, February 11, 2014

My Experience Aboard SQL Cruise 2014

Where do I begin? First let me say, WOW what an experience!

How it All Began
When I first heard about SQL Cruise way back in 2012, I thought the idea of hosting training sessions aboard a cruise ship was a swell idea. However, talking my wife into going with me on the cruise or even letting me go on my own was next to impossible. Don’t get me wrong, my wife and I love cruising and we even took a cruise to our destination wedding in Bermuda. But no matter how I argued, my wife would not budge. We have a 2 year old daughter and it’s pretty hard for my wife to leave her for more than five minutes let alone leave her behind for an entire week. So I figured the SQL Cruise idea would have to be shelved for a couple of years until our daughter got a bit older.

Fast forward to November 2013. At this point in time, I was much more plugged into the SQL Server community; reading blogs, attending user groups, and even attending the PASS Summit in Charlotte, NC. From my many readings, I ran across a webinar that was being hosted by MSSQLTips and Sponsored by Dell Software. The speaker was Derek Colley and his topic was “Are Your SQL Servers Healthy?” which covered various aspects of proper SQL Server configurations, setup, and maintenance. But the biggest thing that caught my eye was that Dell Software would be giving away one ticket to SQL Cruise just for attending. I figured I’ll never win the raffle, but at least I’d get to hear more about SQL Server. Well a few days later, I get an email from a marketing representative from Dell Software congratulating me for being the winner of the SQL Cruise raffle. My response to him was “Are you being serious?” As it turned out he wasn't joking. Now all I had to do was convince my wife to go. It took a few weeks to get all of the logistics in place, but we were finally able to make it happen and accept the award from Dell Software. Just FYI, grandparents make the perfect babysitter for a week-long trip away from your child.

The Cruise
The cruise was aboard the Norwegian Epic, which sailed out of Miami, FL, and made four stops in the Caribbean: St Maarten, St Kitts, US Virgin Islands, and the Bahamas. This was the perfect itinerary because my wife and I had yet to go to any of those locations. For those of you that have never been on a cruise, you should try it at least once, even if it’s not SQL Cruise. That’s all it took for me to convince my wife to go on her first cruise, and that’s all it took for her parents to fall in love with it when they came on along for our wedding cruise. Although I will tell you that cruising with a group is lot more fun; especially when the group is a bunch of SQL nerds.

 

The Training
The training sessions were all scheduled for sea days. This means for the days the ship is sailing between ports, the SQL Cruisers were in class from about 8AM until about 5PM. This allowed everyone to have fun while in port and not have to miss any of the fun on the islands. I mean really, how could anyone not want to spend time at these types of destinations. The speakers for this cruise were some of the best in the industry. Kevin Klein (b|t), Grant Fritchey (b|t), Stacia Misner (b|t), Andrew Kelly (b|t), and Richard Douglas (b|t) covered various topics such as query optimization, performance monitoring, Power BI, backup strategies, and even Azure. The sessions ranged from about 30 minutes to 2 hours. There was always plenty of interaction between the cruisers and the speakers. This is a complete reversal things happened for me at the PASS Summit. At the summit, I had to compete with several other thousand attendees for face-time with the speakers. It's never easy to ask a speaker a question when there are 50 other people in line ahead of you. On board SQL Cruise, it was never like that. If I had a question for one of them, then I could easily just ask them if would go grab and beer and talk SQL. Trust me, it really was that easy. It wasn't just speakers, I was able to do this with the other cruisers as well. I make concerted effort to hang with different cruisers each day to get to know them and how they are using SQL Server in their environment.

 

 

The Networking and Friendships
I have to say, networking is the number one reason you should attend SQL Cruise. I had met most of the speakers before from other events, but I never had this much time to really get to know them. Aside from the training, there are other events like cocktail parties, group dinners, and office hours that give the cruisers every opportunity to talk about SQL. These extra events are where I got the most benefit. I not only got insight and advice from the trainers, but also from the other cruisers. You might not expect it, but several of the cruisers attending the training are certified masters in SQL Server. It's kind of hard to come up with a question that can't be answered by someone with those types of credentials. I just attended my first PASS Summit a few months ago, and even though there were thousands more people at that event, I found myself enjoying the confines of SQL Cruise better. There were only about 20 cruisers which made it extremely easy to get to know each one of them. Outside of class, we didn’t talk about SQL Server 24/7. I remember while at one of the beaches, I found myself in an hour long conversation with a group of cruisers talking about the video games we used to play 10 years ago. At the beginning of the cruise I didn't know much about anyone on the cruise other than their names, but after spending a week with them, I know we'll be friends for a long time. Or if I'm really lucky, maybe I'll get a chance to work with one of them on a future project.

The Sponsors
Nothing like this would be possible without sponsors such as Dell Software, SQLSentry, Red Gate, New Belgium Brewing, and B-Side Consulting. I have to give a big thank you to Dell Software since they were generous enough to give away a ticket. All the sponsors played a big part in the event, and they provide DBAs with top-notch tools for making our jobs a lot easier. Yes, even New Belgium Brewing. They provided some really good wheat beer, Snapshot, for the bon voyage party on the night before departure from Miami.

The Mastermind
Tim Ford (b|t) is the mastermind behind the SQL Cruise. I’m not sure when he first came up with the idea of SQL Server training aboard a cruise ship, but it definitely was the work of a pure genius. I must have thanked him a dozen times over the week for creating this event. And I’m pretty sure my wife thanked him just as much. On some days, I think she had more fun than I did, and she’s not even into technology. One other thing I noticed, is that Tim doesn't do all of this work by himself. His wife, Amy, seems like she plays a big part in the planning and coordination.

Final Thoughts
SQL Cruise really is the epitome of the "SQL Family". It has given me the opportunity to build friendships with the trainers and cruisers that I don’t think would have happened any other way. Because of that, its value is well beyond any dollar figure attached to it. I think I would have eventually convinced my wife to on SQL Cruise, but thanks to Dell Software, I was able to just make it happen a bit sooner. Now all I have to do is convince my wife to go again next year. Fingers crossed!


Tuesday, December 17, 2013

Collecting Historical IO File Statistics

In a previous post, Collecting Historical Wait Statistics, I discussed how you can easily collect historical wait stats by using the DMV sys.dm_os_wait_stats. Well today, I'd like to cover the same concept, but this time collect historical IO file stats from the DMV, sys.dm_io_virtual_files_stats. However, I wanted to improve on the code to make it even easier to implement.

The data collection process is still implemented the same way.  First, we'll need to create a history table to store the data. The data is stored in time slices with the cumulative values as well as the difference (TimeDiff_ms, NumOfReadsDiff, NumOfWritesDiff, etc) in those values since the last collection time.

CREATE TABLE dbo.IoVirtualFileStatsHistory(
  SqlServerStartTime DATETIME NOT NULL
 ,CollectionTime DATETIME NOT NULL
 ,TimeDiff_ms BIGINT NOT NULL
 ,DatabaseName NVARCHAR(128) NOT NULL
 ,DatabaseId SMALLINT NOT NULL
 ,FileId SMALLINT NOT NULL
 ,SampleMs INT NOT NULL
 ,SampleMsDiff INT NOT NULL
 ,NumOfReads BIGINT NOT NULL
 ,NumOfReadsDiff BIGINT NOT NULL
 ,NumOfBytesRead BIGINT NOT NULL
 ,NumOfBytesReadDiff BIGINT NOT NULL
 ,IoStallReadMs BIGINT NOT NULL
 ,IoStallReadMsDiff BIGINT NOT NULL
 ,NumOfWrites BIGINT NOT NULL
 ,NumOfWritesDiff BIGINT NOT NULL
 ,NumOfBytesWritten BIGINT NOT NULL
 ,NumOfBytesWrittenDiff BIGINT NOT NULL
 ,IoStallWriteMs BIGINT NOT NULL
 ,IoStallWriteMsDiff BIGINT NOT NULL
 ,IoStall BIGINT NOT NULL
 ,IoStallDiff BIGINT NOT NULL
 ,SizeOnDiskBytes BIGINT NOT NULL
 ,SizeOnDiskBytesDiff BIGINT NOT NULL
 ,FileHandle VARBINARY(8) NOT NULL
 ,CONSTRAINT PK_IoVirtualFileStatsHistory PRIMARY KEY CLUSTERED
  (CollectionTime,DatabaseName,DatabaseId,FileId)
 )WITH (DATA_COMPRESSION = PAGE);
GO

The next step will be to get the start time of SQL Server, so that we can compare it to the previous collection. If the dates are different, then we must take that into account when calculating the diff values. Because if SQL Server is restarted, then all values in the DMV are reset back to zero. At this point, we know the diff values are actually the same value as the current counters, because this is the first collection after a restart.

IF @CurrentSqlServerStartTime <> ISNULL(@PreviousSqlServerStartTime,0)
BEGIN
 -- If SQL started since the last collection, then insert starter values
 -- Must do DATEDIFF using seconds instead of milliseconds to avoid arithmetic overflow.
 INSERT INTO dbo.IoVirtualFileStatsHistory
 SELECT
   @CurrentSqlServerStartTime
  ,CURRENT_TIMESTAMP
  ,CONVERT(BIGINT,DATEDIFF(SS,@CurrentSqlServerStartTime,CURRENT_TIMESTAMP))*1000
  ,@DatabaseName
  ,@DatabaseId
  ,file_id
  ,sample_ms
  ,sample_ms
  ,num_of_reads
  ,num_of_reads
  ,num_of_bytes_read
  ,num_of_bytes_read
  ,io_stall_read_ms
  ,io_stall_read_ms
  ,num_of_writes
  ,num_of_writes
  ,num_of_bytes_written
  ,num_of_bytes_written
  ,io_stall_write_ms
  ,io_stall_write_ms
  ,io_stall
  ,io_stall
  ,size_on_disk_bytes
  ,size_on_disk_bytes
  ,file_handle
 FROM sys.dm_io_virtual_file_stats(@DatabaseId,NULL);
END
GO

You may notice the DATEDIFF is using "seconds" instead of "milliseconds".  This is because DATEDIFF only returns an INT value. The largest number it can return is equal to about 24 days before it hits an arithmetic overflow error. By converting it to seconds, we can avoid that error. All of the following data collections will do a DATEDIFF using milliseconds.

If the current start time is the same as the previous collection, then we'll grab the difference in values and insert those into the history table.

WITH CurrentIoVirtualFileStats AS
(
 SELECT
   CURRENT_TIMESTAMP AS 'CollectionTime'
  ,@DatabaseName AS 'DatabaseName'
  ,*
 FROM sys.dm_io_virtual_file_stats(@DatabaseId,NULL)
)
INSERT INTO dbo.IoVirtualFileStatsHistory
SELECT
  @CurrentSqlServerStartTime
 ,CURRENT_TIMESTAMP
 ,CONVERT(BIGINT,DATEDIFF(MS,@PreviousCollectionTime,curr.CollectionTime))
 ,@DatabaseName
 ,@DatabaseId
 ,file_id
 ,sample_ms
 ,curr.sample_ms - hist.SampleMs
 ,num_of_reads
 ,curr.num_of_reads - hist.NumOfReads
 ,num_of_bytes_read
 ,curr.num_of_bytes_read - hist.NumOfBytesRead
 ,io_stall_read_ms
 ,curr.io_stall_read_ms - hist.IoStallReadMs
 ,num_of_writes
 ,curr.num_of_writes - hist.NumOfWrites
 ,num_of_bytes_written
 ,curr.num_of_bytes_written - hist.NumOfBytesWritten
 ,io_stall_write_ms
 ,curr.io_stall_write_ms - hist.IoStallWriteMs
 ,io_stall
 ,curr.io_stall - hist.IoStall
 ,size_on_disk_bytes
 ,curr.size_on_disk_bytes - hist.SizeOnDiskBytes
 ,file_handle
FROM CurrentIoVirtualFileStats curr INNER JOIN dbo.IoVirtualFileStatsHistory hist
 ON (curr.DatabaseName = hist.DatabaseName
  AND curr.database_id = hist.DatabaseId
  AND curr.file_id = hist.FileId)
  AND hist.CollectionTime = @PreviousCollectionTime;
GO

At this point, we're through collecting the raw data. However, as I mentioned earlier, I added a lot of functionality into this script. The script is actually a stored procedure that can run all of this code for you; including creation of the history table, data collection, historical data purging and finally reporting.

The stored procedure has 5 input parameters.

@Database - This is used to specify a single database, a list of databases, or a wildcard.
  • '*' is the default value which selects all databases
  • 'MyDatabase1'  will process only that single database
  • 'MyDatabase1,MyDatabase2,MyDatabase3' Comma delimited list of databases
  • 'USER_DATABASES' used to process all user databases
  • 'SYSTEM_DATABASES' used to process only the system databases.
@GenerateReport - Flag. When off, the stored procedure will collect data. When on, it will generate an aggregated report of the historical data.

@HistoryRetention - The is the number of days to keep in the history table. Default is 365.

@ExcludedDBs - Is a comma delimited list of databases to exclude from processing. It should be used when @Database is set to '*'.

@Debug - Flag. When on, it will output TSQL commands that being executed.

Examples:

1. Collect data for all databases.

EXEC dbo.sp_CollectIoVirtualFileStats
 @Database = '*';
GO

2. Collect data for all databases except for AdventureWorks, and output all debug commands.

EXEC dbo.sp_CollectIoVirtualFileStats
  @Database = '*'
 ,@ExcludedDBs = 'AdventureWorks'
 ,@Debug = 1;
GO

3. Output an aggregated report of data collected so far for tempdb.

EXEC dbo.sp_CollectIoVirtualFileStats
  @Database = 'tempdb'
 ,@GenerateReport = 1;
GO

The report would look like this.


Finally, you can copy this report data into Excel and generate some easy to read charts.


From the chart, we can see there was a spike in write latency between 3PM and 4PM for tempdb. If we collect this data over time and identify a similar spike each day then we'd want to investigate further to find out what is causing it. But that can only be done if you're collecting these metrics and storing them for historical analysis. Hopefully, this stored procedure will help you be more proactive in collecting performance metrics for each of your servers.

The entire script is available on the downloads page.

Tuesday, December 10, 2013

Collecting Historical Wait Statistics

As a DBA, I'm sure you've heard many times to always check the sys.dm_os_wait_stats DMV to help diagnose performance issues on your server. The DMV returns information about specific resources SQL Server had to wait for while processing queries. The counters in the DMV are cumulative since the last time SQL Server was started and the counters can only be reset by a service restart or by using a DBCC command. Since DMVs don't persist their data beyond a service restart, we need to come up with a way to collect this data and be able to run trending reports over time.

Collecting the data seems easy enough by simply selecting all rows into a permanent table. However, that raw data won't help us determine the time in which a particular wait type occurred. Think about it for a minute. If the raw data for the counters is cumulative, then how can you tell if a bunch of waits occurred within a span of a few minutes or if they occurred slowly over the past 6 months that SQL Server has been running. This is where we need to collect the data in increments.

First, we need to create a history table to store the data. The table will store the wait stat values as well as the difference (TimeDiff_ms, WaitingTasksCountDiff, WaitTimeDiff_ms, SignalWaitTimeDiff_ms) in those values between collection times.

CREATE TABLE dbo.WaitStatsHistory
(
     SqlServerStartTime DATETIME NOT NULL
    ,CollectionTime DATETIME NOT NULL
    ,TimeDiff_ms INT NOT NULL
    ,WaitType NVARCHAR(60) NOT NULL
    ,WaitingTasksCountCumulative BIGINT NOT NULL
    ,WaitingTasksCountDiff INT NOT NULL
    ,WaitTimeCumulative_ms BIGINT NOT NULL
    ,WaitTimeDiff_ms INT NOT NULL
    ,MaxWaitTime_ms BIGINT NOT NULL
    ,SignalWaitTimeCumulative_ms BIGINT NOT NULL
    ,SignalWaitTimeDiff_ms INT NOT NULL
    ,CONSTRAINT PK_WaitStatsHistory PRIMARY KEY CLUSTERED (CollectionTime, WaitType)
)WITH (DATA_COMPRESSION = PAGE);
GO

Next, we need to get a couple of timestamps when we collect each sample. The first will be the SQL Server start time. We need the SQL Server start time, so we can identify when the service was restarted.

SELECT @CurrentSqlServerStartTime = sqlserver_start_time FROM sys.dm_os_sys_info;
GO

The second set is the previous start time and previous collection time, if they exist in the history table.

SELECT
     @PreviousSqlServerStartTime = MAX(SqlServerStartTime)
    ,@PreviousCollectionTime = MAX(CollectionTime)
FROM msdb.MSDBA.WaitStatsHistory;
GO

The last timestamp is the collection time. We’ll also use this timestamp to calculate the difference in wait stat values between each collection.

SELECT GETDATE() AS 'CollectionTime',* FROM sys.dm_os_wait_stats;
GO

We need to compare the current SQL Server start time to the previous start time from the history table. If they don’t equal, then we assume the server was restarted and insert “starter” values. I call them starter values, because we just collect the current wait stat values and insert 0 for each of the diff columns.

IF @CurrentSqlServerStartTime <> ISNULL(@PreviousSqlServerStartTime,0)
BEGIN
    -- Insert starter values if SQL Server has been recently restarted
    INSERT INTO dbo.WaitStatsHistory
    SELECT
         @CurrentSqlServerStartTime
        ,GETDATE()
        ,DATEDIFF(MS,@CurrentSqlServerStartTime,GETDATE())
        ,wait_type
        ,waiting_tasks_count
        ,0
        ,wait_time_ms
        ,0
        ,max_wait_time_ms
        ,signal_wait_time_ms
        ,0
    FROM sys.dm_os_wait_stats;
END
GO

If the timestamps are the same, we will collect the current wait stats and calculate the difference (in milliseconds) in collection time as well as the difference in values.

INSERT msdb.MSDBA.WaitStatsHistory
SELECT
     @CurrentSqlServerStartTime
    ,cws.CollectionTime
    ,DATEDIFF(MS,@PreviousCollectionTime,cws.CollectionTime)
    ,cws.wait_type
    ,cws.waiting_tasks_count
    ,cws.waiting_tasks_count - hist.WaitingTasksCountCumulative
    ,cws.wait_time_ms
    ,cws.wait_time_ms - hist.WaitTimeCumulative_ms
    ,cws.max_wait_time_ms
    ,cws.signal_wait_time_ms
    ,cws.signal_wait_time_ms - hist.SignalWaitTimeCumulative_ms
FROM CurrentWaitStats cws INNER JOIN MSDBA.WaitStatsHistory hist
    ON cws.wait_type = hist.WaitType
    AND hist.CollectionTime = @PreviousCollectionTime;
GO

You could filter the collection to only specific wait stat counters that you want to track by just a where clause, but I prefer to collect them all and then filter at the reporting end.

At this point, we’re ready to schedule the job. The script could be run at any interval, but I usually leave it to collect data once a day. If I notice a spike in a specific wait stat counter, then I could easily increase the job frequency to once every few hours or even once an hour. Having those smaller, more granular data samples will allow us to isolate which time frame we need to concentrate on.

For example, if we notice the CXPACKET wait suddenly spikes when collecting the data each day, then we could schedule the collection every hour to see if it’s happening during a specific window.

SELECT * FROM msdb.MSDBA.WaitStatsHistory
WHERE WaitType = 'CXPACKET';
GO


Finally, we can use Excel to format this raw data into an easy to read chart.


From this chart, we can see at 5PM there was a spike in CXPACKET waits, but a low number of tasks that waited. In this case, I would be assume there is a single process running in parallel that caused these waits and from there I could dig further into finding the individual query.

Data compression is enabled on this table to help keep it small. It can easily turn it off for the table by removing WITH (DATA_COMPRESSION = PAGE) from the CREATE TABLE statement. However, with page compression enabled, 24 collections (one per hour) only takes up 775KB of space. Without compression, the same sample of data consumes about 2.2MB. If you plan to keep a lot of history, then it's best to leave page compression enabled.

Hopefully, this script will help you keep track of your historical wait statistics, so you can have better knowledge of what has happened to your environment over time. The entire script is posted below. If you want to read further into what each wait statistics means, then check out Paul Randal’s article about wait stats. Additionally, if you want more info on using DMVs, check out Glenn Berry’s diagnostic queries.

/***********************************************
    Create the historical table
***********************************************/

USE msdb;
GO

-- Create the history table if it does not exist
IF OBJECT_ID('dbo.WaitStatsHistory') IS NULL
BEGIN
    CREATE TABLE dbo.WaitStatsHistory
    (
         SqlServerStartTime DATETIME NOT NULL
        ,CollectionTime DATETIME NOT NULL
        ,TimeDiff_ms INT NOT NULL
        ,WaitType NVARCHAR(60) NOT NULL
        ,WaitingTasksCountCumulative BIGINT NOT NULL
        ,WaitingTasksCountDiff INT NOT NULL
        ,WaitTimeCumulative_ms BIGINT NOT NULL
        ,WaitTimeDiff_ms INT NOT NULL
        ,MaxWaitTime_ms BIGINT NOT NULL
        ,SignalWaitTimeCumulative_ms BIGINT NOT NULL
        ,SignalWaitTimeDiff_ms INT NOT NULL
        ,CONSTRAINT PK_WaitStatsHistory PRIMARY KEY CLUSTERED (CollectionTime, WaitType)
    )WITH (DATA_COMPRESSION = PAGE);
END
GO

/***********************************************
    Schedule this section as an on-going job
***********************************************/

DECLARE
     @CurrentSqlServerStartTime DATETIME
    ,@PreviousSqlServerStartTime DATETIME
    ,@PreviousCollectionTime DATETIME;

SELECT @CurrentSqlServerStartTime = sqlserver_start_time FROM sys.dm_os_sys_info;

-- Get the last collection time
SELECT
     @PreviousSqlServerStartTime = MAX(SqlServerStartTime)
    ,@PreviousCollectionTime = MAX(CollectionTime)
FROM msdb.dbo.WaitStatsHistory;

IF @CurrentSqlServerStartTime <> ISNULL(@PreviousSqlServerStartTime,0)
BEGIN
    -- Insert starter values if SQL Server has been recently restarted
    INSERT INTO dbo.WaitStatsHistory
    SELECT
         @CurrentSqlServerStartTime
        ,GETDATE()
        ,DATEDIFF(MS,@CurrentSqlServerStartTime,GETDATE())
        ,wait_type
        ,waiting_tasks_count
        ,0
        ,wait_time_ms
        ,0
        ,max_wait_time_ms
        ,signal_wait_time_ms
        ,0
    FROM sys.dm_os_wait_stats;
END
ELSE
BEGIN
    -- Get the current wait stats
    WITH CurrentWaitStats AS
    (
        SELECT GETDATE() AS 'CollectionTime',* FROM sys.dm_os_wait_stats
    )
    -- Insert the diff values into the history table
    INSERT msdb.dbo.WaitStatsHistory
    SELECT
         @CurrentSqlServerStartTime
        ,cws.CollectionTime
        ,DATEDIFF(MS,@PreviousCollectionTime,cws.CollectionTime)
        ,cws.wait_type
        ,cws.waiting_tasks_count
        ,cws.waiting_tasks_count - hist.WaitingTasksCountCumulative
        ,cws.wait_time_ms
        ,cws.wait_time_ms - hist.WaitTimeCumulative_ms
        ,cws.max_wait_time_ms
        ,cws.signal_wait_time_ms
        ,cws.signal_wait_time_ms - hist.SignalWaitTimeCumulative_ms
    FROM CurrentWaitStats cws INNER JOIN dbo.WaitStatsHistory hist
        ON cws.wait_type = hist.WaitType
        AND hist.CollectionTime = @PreviousCollectionTime;
END
GO