Jan 222013
 

During your career as a DBA, you’ll run across articles by SQL experts or other DBAs that just give you an idea.  A few years ago I found an article, or a poster to be more precise, that explains SQL Server performance counters.  It was written by Kevin Kline, Brent Ozar, Christian Bolton, Bob Ward, Rod Colledge, and Raoul Illyaos.

I was thinking this would be a nice poster for any DBA to hang up in their cubical.  But as I read down near the bottom, I saw they also mentioned getting performance counters from within SQL Server via sys.dm_os_performance_counters.  Ah ha!  My light bulb just turned on!
In an earlier post, I had discussed how to get performance counters from within SQL Server, so for this post I want to share a stored procedure that pulls the counters as described in the poster as well a few more.
The procedure is sp_PerformanceCounters, and it will aggregate overall performance data that is available in the sys.dm_os_performance_countersDMV.  The metrics are gathered for various counters and objects and displayed in 7 columns.
  1. PerformanceObject – The counter category.
  2. CounterName – Name of the counter.
  3. InstanceName – The specific instance of the counter; often the database name.
  4. TimeFrame – The timeframe of the counter.
  5. ActualValue – The value of the counter.
  6. IdealValue – A generalized optimal value for the counter.
  7. Description – A brief description of the counter.

The most important thing to understand from the output is to understand the timeframe of the metrics, and how that impacts the actual value.  There are three possible TimeFrames: Current, Total since SQL startup, and Avg since SQL startup.

For the TimeFrames that are “Current”, those are counter type 65792 and are the absolute values that do not need any special definition.  For example, you can select the number of database pages in the buffer pool, and the value returned is the current value.

SELECT * FROM sys.dm_os_performance_counters
WHERE object_name = ‘MSSQL$TEST1:Buffer Manager’
ANDcounter_name = ‘Database Pages’;

The second TimeFrame “Total since SQL startup” is also easy to interpret.  This counter is also of type 65792, and it’s just the accumulated total of a counter since SQL was started up.  For example, you can select the number of log growths from the databases counter which would tell you the total number of log growths for all databases since SQL Server was last started.

SELECT * FROM sys.dm_os_performance_counters
WHERE object_name = ‘MSSQL$TEST1:Databases’
ANDcounter_name = ‘Log Growths’
ORDER BY instance_name;
The last and most common TimeFrame is “Avg since SQL startup”, which is for counters of type 272696576.  The value of these counters get incremented every time that event is fired and are mostly related to “per second” counters.  To figure out the per second value we have to divide by the total uptime (in seconds) of SQL Server.  This will give us the output displayed in the ActualValue column.  This is also one reason why the SQL Server startup time is the first row displayed in the output.  The SQL Server startup time is taken from the sys.dm_os_sys_infoDMV.

SELECT * FROM sys.dm_os_performance_counters

WHERE object_name = ‘MSSQL$TEST1:SQL Statistics’
ANDcounter_name = ‘Batch Requests/sec’;


Most counter names will match exactly what you see in sys.dm_os_performance_counters; however, there will be a few that are calculated as a ratio. One example is the Page Lookup / Batch Request.  This counter will show you the average number page lookups that occurred per batch request.  Both of these individual counters are of type 272696576 which means they are “per second counters”.  However, for this ratio counter we don’t care about the per second value, we just want total of Page Lookups divided by the total of Batch Requests.

DECLARE@TempValue1 DECIMAL(25,5), @TempValue2 DECIMAL(25,5)
SELECT@TempValue1 = cntr_value
FROM sys.dm_os_performance_counters
WHERE object_name = ‘MSSQL$TEST1:Buffer Manager’
ANDcounter_name = ‘Page lookups/sec’;
SELECT@TempValue2 = cntr_value
FROM sys.dm_os_performance_counters
WHERE object_name = ‘MSSQL$TEST1:SQL Statistics’
ANDcounter_name = ‘Batch Requests/sec’;
— This is to avoid divide by zero.
IF@TempValue2 <> 0
    SELECT@TempValue1/@TempValue2 AS ‘Page lookups/Batch Requests’;
ELSE
    SELECT 0;

All aggregated data is stored in a temporary table #PerformanceCounters and then displayed at the end, and is best viewed using the “Results to Grid” output in SSMS.

EXEC master.dbo.sp_PerformanceCounters;
GO

Most of the ideal values and descriptions were taken from the poster mentioned above.  My disclaimer with that is the real ideal value will always depend on your specific application and setup.  This stored procedure was created to be used an overall health check for a server.  It’s especially useful if another DBA just handed you a new SQL Server to support, and you need a quick way to see a brief history of its performance.
The stored procedure was written for SQL Server 2005 and 2008.  It will work on SQL Server 2012, but there were a lot of changes with the counters so I will have an updated officially supported version for 2012 soon.
Jan 072013
 

Have you seen this type of an informational message in your SQL errorlog before?  If so, then I hope the timestamp associated with it was not during the middle of your peak processing time.  This is a good indication that someone has been tinkering with some of the SQL configuration settings.

SQL Server offers a lot of wonderful configuration options; many of which can be changed dynamically using sp_configure without the need of restarting the SQL Server service.  The downside to the dynamic changes is sometimes they can have a detrimental effect on performance, and that’s what this “informational” message is tell you.  Let’s look at an example.
First, let’s get our starting numbers for the cachestores.

SELECT
      COUNT(*) AS ‘Total Cached Plans’
FROM sys.dm_exec_cached_plans;
GO
SELECT
       name AS ‘Cache Name’
      ,single_pages_kb
      ,multi_pages_kb
      ,entries_count
FROM sys.dm_os_memory_cache_counters
WHERE name IN (‘Object Plans’,‘SQL Plans’,‘Bound Trees’);
GO
As you can see, we have 160 total plans in cache using up the amount of single and multi-page allocations for each entry.
Now let’s say your system administrator came to you today and said he hot-added extra ram to the SQL box because of how much you had been complaining about memory shortages.  You are so excited about this, you login to the server and execute the following query to bump SQL’s maximum memory up to 4GB.

EXEC sys.sp_configure N’max server memory (MB)’,N’4000′;
GO
RECONFIGURE;
GO
Now you look in the SQL errorlog to valid the changes.
You are excited to see this message.
Configuration option ‘max server memory (MB)’ changed from 2000 to 4000. Run the RECONFIGURE statement to install.
And not so excited about these.
SQL Server has encountered 1 occurrence(s) of cachestore flush for the ‘Object Plans’ cachestore (part of plan cache) due to some database maintenance or reconfigure operations.
SQL Server has encountered 1 occurrence(s) of cachestore flush for the ‘SQL Plans’ cachestore (part of plan cache) due to some database maintenance or reconfigure operations.
SQL Server has encountered 1 occurrence(s) of cachestore flush for the ‘Bound Trees’ cachestore (part of plan cache) due to some database maintenance or reconfigure operations.

Let’s run our cachestore query again and look at the results.
Ah proof that we just cleared the entire procedure cache.  Having these cachestores cleared out means you just lost all compiled plans for stored procedures, functions, triggers, views, adhoc plans, prepared plans, etc.  The next time any query runs, its plan will have to be compiled costing extra CPU cycles and potentially degraded performance.
As it turns out, for certain configuration settings when you execute the RECONFIGURE command it essentially does what amounts to DBCC FREEPROCCACHE.

The whole procedure cache is cleared if one of the following server options is changed by the RECONFIGURE statement:

  • cross db ownership chaining
  • index create memory (KB)
  • remote query timeout (s)
  • user options
  • max text repl size (B)
  • cost threshold for parallelism
  • max degree of parallelism
  • min memory per query (KB)
  • query wait (s)
  • min server memory (MB)
  • max server memory (MB)
  • query governor cost limit

This is a very good example of why you should NOT make configuration changes during peak processing times.  You wouldn’t run DBCC FREEPROCCACHE during the middle of your peak processing, so why would you play around with configuration settings during that same time?
Dec 042012
 

I think most DBAs know that you can use the Windows Performance Monitor to capture performance metrics for your SQL Server.  But did you know that you also have access to all the SQL Server performance counters from within SQL Server?


By using using the DMV sys.dm_os_performance_counters, you can view that performance data.  As with all DMVs, there are a few drawbacks.  The data collected is only valid since the last time SQL Server was started, and this DMV does not allow access to any of the Window OS performance counters.

For example, we can query the DMV to find the Page Life Expectancy.
SELECT * FROM sys.dm_os_performance_counters
WHERE  object_name= ‘MSSQL$TEST1:Buffer Manager’
ANDcounter_name = ‘Page life expectancy’;

The DMV returns five columns:
  1. object_name – Category of the counter.
  2. counter_name – Name of the counter.
  3. instance_name – Name of a specific instance of the counter. For example, the database name.
  4. cntr_value – The current value of the counter.
  5. cntr_type – The type of the counter and will vary depending on the Windows architecture.


For this example, the counter value is 3074 which means a data page will stay in the buffer pool for 3074 seconds before being removed.  The cntr_type is very important for each of the counters.  A type of 65792 is PERF_COUNTER_LARGE_ROWCOUNT which means the cntr_value is the absolute meaning and no more calculations are needed.

Now let’s looks at another one that does require extra calculations.
SELECT * FROM sys.dm_os_performance_counters
WHERE  object_name= ‘MSSQL$TEST1:Latches’
ANDcounter_name LIKE ‘Average Latch Wait Time%’;

The Average Latch Wait Time (ms) is type 1073874176 – PERF_AVERAGE_BULK.  This is essentially the same as the total wait time for all latches.  The Average Latch Wait Time Base is type 1073939712 – PERF_LARGE_ROW_BASE.  It’s the number of processes that had to wait for a latch.  In order to find the true “average” wait time, you will need to divide the “Average Latch Wait Time (ms)” by its base value.  In this case, 16296/378 = 43.1ms. 

For more information about all the counter type definitions, check out MSDN.

As you can see, there is wealth of information within this DMV, but you need to understand the types of each counter and don’t just look at the overall value.
Oct 232012
 

Starting with SQL Server 2005, Microsoft introduced Dynamic Management Views to help DBAs see what’s going inside of SQL Server.  One of my favorite is the sys.dm_db_index_usage_stats.  It collects statistics for different types of index operations and the timestamp at which it was last performed.   All the DMVs are great tools; however, they all have the same drawback.  They are memory resident, which means the statistics are reset any time the SQL Server service is restarted, the database is attached/detached, or it’s restored.


The best way to avoid losing this valuable data is to store it in a persistent table.  I typically store data like this within the msdb database, but you can store in any database.  Below we have created a table DBA.IndexUsageStats to hold each snapshot of the data.

USE msdb;
GO
CREATE SCHEMA dba AUTHORIZATIONdbo;
GO
CREATE TABLE dba.IndexUsageStats
  (
     SQLStartTime       DATETIMENULL,
     CaptureTime        DATETIMENULL,
     DatabaseName       NVARCHAR(128) NULL,
     ObjectName         NVARCHAR(128) NULL,
     index_name         NVARCHAR(128) NULL,
     index_type_desc    NVARCHAR(60) NULL,
     index_is_unique    BIT NULL,
     database_id        SMALLINTNULL,
     object_id          INT NULL,
     index_id           INT NULL,
     user_seeks         BIGINTNULL,
     user_scans         BIGINTNULL,
     user_lookups       BIGINT NULL,
     user_updates       BIGINT NULL,
     last_user_seek     DATETIME NULL,
     last_user_scan     DATETIME NULL,
     last_user_lookup   DATETIME NULL,
     last_user_update   DATETIME NULL,
     system_seeks       BIGINT NULL,
     system_scans       BIGINT NULL,
     system_lookups     BIGINT NULL,
     system_updates     BIGINT NULL,
     last_system_seek   DATETIME NULL,
     last_system_scan   DATETIME NULL,
     last_system_lookup DATETIME NULL,
     last_system_update DATETIME NULL
  );
GO

Next we need to create the collector.  This query will pull the current data from the DMV and store it in our table.

DECLARE@CurrentStartTime  AS DATETIME,
        @PreviousStartTime AS DATETIME;
SELECT@CurrentStartTime = sqlserver_start_time
FROM   master.sys.dm_os_sys_info;
USETestDatabase;
INSERT msdb.DBA.IndexUsageStats
SELECT@CurrentStartTime          AS SQLStartTime,
       CURRENT_TIMESTAMP          ASCaptureTime,
       Db_name()                  AS DatabaseName,
       Object_name(ius.object_id) AS ObjectName,
       i.name                     ASIndexName,
       i.type_desc                ASIndexTypeDesc,
       i.is_unique                ASIndexIsUnique,
       ius.*
FROM   sys.dm_db_index_usage_statsius
       INNER JOIN sys.indexes i
               ONius.object_id= i.object_id
                  ANDius.index_id =i.index_id
WHERE  ius.database_id = Db_id()
ORDER  BY ObjectName,
          ius.index_id;
GO
The script can be scheduled with SQL Agent to run on a regular basis.  I prefer a weekly schedule, but any recurring schedule is fine.  If you have more than one user database, then you’ll need to run this script for each one.

In the script we capture a bit more than just the DMV data. For example:

  1. SQLStartTime – The time when the SQL Server service started.
  2. CaptureTime – The time when the script captured a snapshot of the DMV data.
  3. DatabaseName, OjbectName, IndexName, IndexTypeDesc, IndexIsUnique are pretty self-explanatory.

Once you have let this job execute over a few weeks, you can then revisit the data and start seeing how your indexes are being utilized over a longer period of time.  This query will aggregate the index statistics using the captured data.


USE msdb;
SELECT Min(SQLStartTime) AS SQLStartTime,
       Max(CaptureTime)  ASCaptureTime,
       ObjectName,
       IndexName,
       IndexTypeDesc,
       index_id,
       Sum(user_seeks)   ASuser_seeks,
       Sum(user_scans)   ASuser_scans,
       Sum(user_lookups) AS user_lookups,
       Sum(user_updates) AS user_updates
FROM   DBA.IndexUsageStats
WHERE  DatabaseName =‘TestDatabase’
GROUP  BY ObjectName,
          index_name,
          index_type_desc,
          index_id
ORDER  BY ObjectName,
          index_id;
GO

These samples below were taken over a four month period and clearly show how the indexes have been utilized.  It even points out some indexes (IX_ItemCodeIDand IX_PurchasedBy) that may be nothing more than overhead and could be candidates for removal.  These two indexes have not been used for any seeks or scans, but SQL still needs to update them every time DML code is applied to the Purchases table.

As you can see, the more data you have to draw a conclusion about your index usage, the better you can make tuning recommendations for each of them.
Oct 162012
 

You have 5 instances running.  How do you know which one is killing your CPU?


As you can see from Windows Task Manager, you have several instances of SQL installed and one of them is using nearly half of the CPU.  To figure this out, you’ll need to know the PID (Process ID) number.  

To view the PID number in Windows Task Manager, from the menu select View, then Select Columns, and then check the box for PID.  For our example, it’s Process ID 1212 that’s using nearly 50% of our CPU.


So how do you correlate that PID to the SQL instance, so you can start the performance investigation? There’s a couple of places you can find this information.

The first place is to look at the most recent SQL Error Log for each of your instances.  One of the very first entries will be the PID value.
Server process ID is 1212.
The other place is to open the SQL Server Configuration Manager, select SQL Server Services from the left hand side, then right click on an instance and click Properties.  Click the Service tab and you’ll see the PID value for that instance.


Both of these methods give show you which instance is our culprit.  Now you can drill into that instance to find out why it’s killing your CPU.

I prefer to look for this value in the SQL Error Log because I already have Management Studio open ready to investigate the issue.  Keep in mind the PID value will change every time your SQL instance is started.