Tuesday, May 7, 2013

An Alternative to SELECT COUNT(*) for Better Performance

Sometimes rapid code development doesn't always produce the most efficient code.  Take the age old line of code SELECT COUNT(*) FROM MyTable.  Obviously this will give you the row count for a table, but at what cost? Doing any SELECT * from a table will ultimately result in a table or clustered index scan.

USE AdventureWorksDW2012;
SELECT COUNT(*) FROM dbo.FactProductInventory;
GO


Turning on STATISTICS IO on reveals 5753 logical reads just to return the row count of 776286.

Table 'FactProductInventory'. Scan count 1, logical reads 5753, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Starting with SQL Server 2005, Microsoft introduced a DMV, sys.dm_db_partition_stats, that provides you with the same information at a fraction of the cost.  It requires a little more coding, but once you turn on STATISTICS IO, you will see the performance benefit.

USE AdventureWorksDW2012;
SELECT
        s.name AS 'SchemaName'
       ,o.name AS 'TableName'
       ,SUM(p.row_count) AS 'RowCount'
FROM sys.dm_db_partition_stats p
       JOIN sys.objects o ON o.object_id = p.object_id
       JOIN sys.schemas s ON o.schema_id = s.schema_id
WHERE p.index_id < 2 AND o.type = 'U'
       AND s.name = 'dbo'
       AND o.name = 'FactProductInventory'
GROUP BY s.name,o.name
ORDER BY s.name,o.name;
GO


Since we're querying a DMV, we never touch the base table.  We can see here we only need 16 logical reads to return the same row count of 776286, and the FactProductInventory table is nowhere in our execution plan.

Table 'sysidxstats'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 8, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysclsobjs'. Scan count 0, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

By using the DMV, we have improved the query performance and reduced the total I/O count by nearly 100%. Another added benefit of using the DMV, is we won't need locks on the base table and therefore will avoid the possibility of blocking other queries hitting that table.

This is just one simple example of how you can easily improve the performance of an application.

Tuesday, April 2, 2013

Merging SQL Server and Softball Just for Fun

With opening day of Major League Baseball season finally here, I thought I’d take the time to cover two of my favorite topics…SQL Server and softball.  Have you ever thought about how you can use SQL Server in conjunction with softball? Ok, so maybe you haven’t, but I have.  I have been managing a slow-pitch softball team, the Sons of Pitches, for the past 5 years.  Yes, I did say “slow-pitch”.  My friends and I have already passed the peak of our physical ability, so we use it as an excuse to get together and have a little fun.

As a DBA, I’m big on keeping track of metrics, so naturally this spills over into my extracurricular activities.  For each softball game, the league requires us to keep score, but in addition to that I like to keep individual player stats.  Why would I want to do this?  For benchmarking, of course!  How will I ever tell if my players are getting better or worse without it?  So to save myself a lot of time each week, I created a SQL Server database to keep track of all the stats.  Each week, all I have to do is enter the stats and then generate a report by running a few TSQL scripts.  Below is a sample of the score book that I keep for each game.  This is the only manual piece.  Everything else is automated by SQL Server.


Let’s start by creating our database and then a Games table to hold all of the data.

CREATE DATABASE Softball;
GO
USE Softball;
GO
CREATE TABLE dbo.Games(
    [Id] INT IDENTITY(1,1) NOT NULL,
    [GameDate] DATE DEFAULT(GETDATE()) NOT NULL,
    [SeasonId] INT NOT NULL,
    [SeasonName] VARCHAR(50) NOT NULL,
    [GameNumber] SMALLINT NOT NULL,
    [BattingOrder] SMALLINT NOT NULL,
    [Roster] SMALLINT NOT NULL,
    [Name] VARCHAR(50) NOT NULL,
    [PA] FLOAT NOT NULL,
    [AB]  AS ([PA]-([SACf]+[BB])),
    [H]  AS ((([1B]+[2B])+[3B])+[HR]),
    [1B] FLOAT NOT NULL,
    [2B] FLOAT NOT NULL,
    [3B] FLOAT NOT NULL,
    [HR] FLOAT NOT NULL,
    [ROE] FLOAT NOT NULL,
    [SACf] FLOAT NOT NULL,
    [BB] FLOAT NOT NULL,
    [K] FLOAT NOT NULL,
    [RBI] FLOAT NOT NULL,
    [RUNS] FLOAT NOT NULL
);
GO
ALTER TABLE dbo.Games
    ADD CONSTRAINT PK_Games PRIMARY KEY CLUSTERED (Id,GameDate);
GO

Let's backup for one minute.  For those of you who are not familiar with baseball (or softball) scoring; here’s a quick break down of what each abbreviation means.  These are the definitions for my softball scoring.  Some baseball statistics are omitted, because they are not relevant in our league.

PA = Plate Appearance: number of times that player appeared in the batter’s box.
AB = At Bat: plate appearances, not including walks or sacrifices.
H = Hit: number of times a batter safely reached a base w/o an error by the defense.
1B = Single: number of times a batter safely reached first base w/o an error by the defense.
2B = Double: number of times a batter safely reached second base w/o an error by the defense.
3B = Triple: number of times a batter safely reached third base w/o an error by the defense.
HR = Home Run: number of times a batter safely reached all four bases w/o an error by the defense.
ROE = Reached on Error: number of times a batter safely reached a base with an error by the defense.
SACf = Sacrifice Fly: Fly ball hit the outfield that was caught for an out, but allowed a base runner to advance.
BB = Base on Balls (aka Walk): number of times a batter did not swing at four pitches outside the strike zone, and was awarded first base by the umpire.
K = Strike Out: number of times a third strike is called or swung at and missed, or hit foul when the batter already had two strikes.
RBI = Run Batted In: number of runners who scored as a result of the batters’ action.
RUNS = Runs Scored: number of times a runner crossed home plate.
BA = Batting Average: Hits (H) divided by At Bats (AB).
OB = On Base Percentage: number of times a batter reached base (H + AB) divided by (AB + BB + SACf).
SLUG = Slugging Average: number of bases achieved (1B+2B*2+3B*3+HR*4) divided by At Bats (AB). One base is for each 1B, two bases for each 2B, three bases for each 3B, and four bases for each HR.
OPS = On Base Percentage Plus Slugging: sum of batter’s OB + SLUG.

The Games table will hold one row for each player and his stats for that game.  All columns will need to be manually entered, except for H and AB.  The hits and at bats are two of the various computed columns.  What we have now is the raw data stored in the database.

SELECT * FROM Games;
GO


Next, need to build some views to add the other calculated fields: BA, OB, SLUG, and OPS, as well as present the data in a more readable format.  Since Hits (H) and At Bats (AB) are already computed columns, we need to use a view in order to use them in another computed column.

CREATE VIEW dbo.PlayerStats
AS
SELECT
     [SeasonId]
    ,[SeasonName]
    ,[GameNumber]
    ,[BattingOrder]
    ,[Roster]
    ,[Name]
    ,[PA]
    ,[AB]
    ,[H]
    ,[1B]
    ,[2B]
    ,[3B]
    ,[HR]
    ,[ROE]
    ,[SACf]
    ,[BB]
    ,[K]
    ,[RBI]
    ,[RUNS]
    ,CONVERT(DECIMAL(5,3),(ISNULL(([H]/NULLIF([AB],0)),0))) AS [BA]
    ,CONVERT(DECIMAL(5,3),(ISNULL((([H]+[BB])/(NULLIF([AB]+[BB]+[SACf],0))),0))) AS [OB]
    ,CONVERT(DECIMAL(5,3),(ISNULL((([1B]+([2B]*2)
        +([3B]*3)+([HR]*4))/NULLIF([AB],0)),0))) AS [SLUG]
    ,CONVERT(DECIMAL(5,3),(ISNULL((([H]+[BB])/NULLIF([AB]+[BB]+[SACf],0)),0)
        +ISNULL((([1B]+([2B]*2)+([3B]*3)+([HR]*4))/NULLIF([AB],0)),0))) AS [OPS]
FROM dbo.Games;
GO

SELECT * FROM PlayerStats;
GO


Next, we can create a view for the season stats.

CREATE VIEW dbo.SeasonStats
AS
SELECT
     [SeasonId]
    ,[SeasonName]
    ,[Roster]
    ,[Name]
    ,COUNT(GameNumber) as [Games]
    ,SUM([PA]) AS [PA]
    ,SUM([AB]) AS [AB]
    ,SUM([H]) AS [H]
    ,SUM([1B]) AS [1B]
    ,SUM([2B]) AS [2B]
    ,SUM([3B]) AS [3B]
    ,SUM([HR]) AS [HR]
    ,SUM([ROE]) AS [ROE]
    ,SUM([SACf]) AS [SACf]
    ,SUM([BB]) AS [BB]
    ,SUM([K]) AS [K]
    ,SUM([RBI]) AS [RBI]
    ,SUM([RUNS]) AS [RUNS]
    ,ISNULL(CONVERT(DECIMAL(5,3),(SUM([H])/NULLIF(SUM([AB]),0))),0) AS [BA]
    ,ISNULL(CONVERT(DECIMAL(5,3),((SUM([H])+SUM([BB]))
        /(NULLIF(SUM([AB])+SUM([BB])+SUM([SACf]),0)))),0) AS [OB]
    ,ISNULL(CONVERT(DECIMAL(5,3),((SUM([1B])+(SUM([2B])*2)
        +(SUM([3B])*3)+(SUM([HR])*4))/NULLIF(SUM([AB]),0))),0) AS [SLUG]
    ,CONVERT(DECIMAL(5,3),ISNULL((SUM([H])+SUM([BB]))
        /(NULLIF(SUM([AB])+SUM([BB])+SUM([SACf]),0)),0)+ISNULL((SUM([1B])
        +(SUM([2B])*2)+(SUM([3B])*3)+(SUM([HR])*4))/NULLIF(SUM([AB]),0),0)) AS [OPS]
FROM dbo.PlayerStats
GROUP BY [SeasonId],[SeasonName],[Roster],[Name];
GO

This will allow us to view each player’s stats for a given season.

SELECT * FROM SeasonStats
WHERE SeasonId = 12
ORDER BY [OB] DESC, [BA] DESC;
GO


In addition to season stats, we also need a view for career stats.

CREATE VIEW dbo.CareerStats
AS
SELECT
     [Roster]
    ,[Name]
    ,COUNT(GameNumber) as [Games]
    ,SUM([PA]) AS [PA]
    ,SUM([AB]) AS [AB]
    ,SUM([H]) AS [H]
    ,SUM([1B]) AS [1B]
    ,SUM([2B]) AS [2B]
    ,SUM([3B]) AS [3B]
    ,SUM([HR]) AS [HR]
    ,SUM([ROE]) AS [ROE]
    ,SUM([SACf]) AS [SACf]
    ,SUM([BB]) AS [BB]
    ,SUM([K]) AS [K]
    ,SUM([RBI]) AS [RBI]
    ,SUM([RUNS]) AS [RUNS]
    ,ISNULL(CONVERT(DECIMAL(5,3),(SUM([H])/NULLIF(SUM([AB]),0))),0) AS [BA]
    ,ISNULL(CONVERT(DECIMAL(5,3),((SUM([H])+SUM([BB]))
        /(NULLIF(SUM([AB])+SUM([BB])+SUM([SACf]),0)))),0) AS [OB]
    ,ISNULL(CONVERT(DECIMAL(5,3),((SUM([1B])+(SUM([2B])*2)
        +(SUM([3B])*3)+(SUM([HR])*4))/NULLIF(SUM([AB]),0))),0) AS [SLUG]
    ,CONVERT(DECIMAL(5,3),ISNULL((SUM([H])+SUM([BB]))
        /(NULLIF(SUM([AB])+SUM([BB])+SUM([SACf]),0)),0)+ISNULL((SUM([1B])
        +(SUM([2B])*2)+(SUM([3B])*3)+(SUM([HR])*4))/NULLIF(SUM([AB]),0),0)) AS [OPS]
FROM dbo.PlayerStats
GROUP BY [Roster],[Name];
GO

SELECT * FROM CareerStats
ORDER BY [OB] DESC, [BA] DESC;
GO


Next, we can create a view for the individual player stats.

CREATE VIEW dbo.IndividualStats
AS
SELECT
     CONVERT(VARCHAR,[SeasonId]) AS [SeasonId]
    ,[SeasonName]
    ,[Roster]
    ,[Name]
    ,COUNT(GameNumber) AS [Games]
    ,SUM([PA]) AS [PA]
    ,SUM([AB]) AS [AB]
    ,SUM([H]) AS [H]
    ,SUM([1B]) AS [1B]
    ,SUM([2B]) AS [2B]
    ,SUM([3B]) AS [3B]
    ,SUM([HR]) AS [HR]
    ,SUM([ROE]) AS [ROE]
    ,SUM([SACf]) AS [SACf]
    ,SUM([BB]) AS [BB]
    ,SUM([K]) AS [K]
    ,SUM([RBI]) AS [RBI]
    ,SUM([RUNS]) AS [RUNS]
    ,ISNULL(CONVERT(DECIMAL(5,3),(SUM([H])/NULLIF(SUM([AB]),0))),0) AS [BA]
    ,ISNULL(CONVERT(DECIMAL(5,3),((SUM([H])+SUM([BB]))
        /(NULLIF(SUM([AB])+SUM([BB])+SUM([SACf]),0)))),0) AS [OB]
    ,ISNULL(CONVERT(DECIMAL(5,3),((SUM([1B])+(SUM([2B])*2)
        +(SUM([3B])*3)+(SUM([HR])*4))/NULLIF(SUM([AB]),0))),0) AS [SLUG]
    ,CONVERT(DECIMAL(5,3),ISNULL((SUM([H])+SUM([BB]))
        /(NULLIF(SUM([AB])+SUM([BB])+SUM([SACf]),0)),0)+ISNULL((SUM([1B])
        +(SUM([2B])*2)+(SUM([3B])*3)+(SUM([HR])*4))/NULLIF(SUM([AB]),0),0)) AS [OPS]
FROM dbo.PlayerStats
GROUP BY [SeasonId],[SeasonName],[Roster],[Name];
GO

This will allow us to view the stats from an individual player as well as union all of the careers stats.

SELECT * FROM IndividualStats
WHERE Roster = 4
UNION ALL
SELECT 'CAREER STATS','',* FROM CareerStats
WHERE Roster = 4;
GO


Finally, we can create one last view for the individual player stats by season.

CREATE VIEW dbo.IndividualTeamStats
AS
SELECT
     CONVERT(varchar,[SeasonId]) AS [SeasonId]
    ,[SeasonName]
    ,[Roster]
    ,[Name]
    ,CONVERT(varchar,[GameNumber]) AS [Games]
    ,SUM([PA]) AS [PA]
    ,SUM([AB]) AS [AB]
    ,SUM([H]) AS [H]
    ,SUM([1B]) AS [1B]
    ,SUM([2B]) AS [2B]
    ,SUM([3B]) AS [3B]
    ,SUM([HR]) AS [HR]
    ,SUM([ROE]) AS [ROE]
    ,SUM([SACf]) AS [SACf]
    ,SUM([BB]) AS [BB]
    ,SUM([K]) AS [K]
    ,SUM([RBI]) AS [RBI]
    ,SUM([RUNS]) AS [RUNS]
    ,ISNULL(CONVERT(DECIMAL(5,3),(SUM([H])/NULLIF(SUM([AB]),0))),0) AS [BA]
    ,ISNULL(CONVERT(DECIMAL(5,3),((SUM([H])+SUM([BB]))
        /(NULLIF(SUM([AB])+SUM([BB])+SUM([SACf]),0)))),0) AS [OB]
    ,ISNULL(CONVERT(DECIMAL(5,3),((SUM([1B])+(SUM([2B])*2)
        +(SUM([3B])*3)+(SUM([HR])*4))/NULLIF(SUM([AB]),0))),0) AS [SLUG]
    ,CONVERT(DECIMAL(5,3),ISNULL((SUM([H])+SUM([BB]))
        /(NULLIF(SUM([AB])+SUM([BB])+SUM([SACf]),0)),0)+ISNULL((SUM([1B])
        +(SUM([2B])*2)+(SUM([3B])*3)+(SUM([HR])*4))/NULLIF(SUM([AB]),0),0)) AS [OPS]
FROM dbo.Games
GROUP BY [SeasonId],[SeasonName],[Name],[Roster],[GameNumber]
UNION
SELECT
     'SEASON STATS'
    ,[SeasonName]
    ,[Roster]
    ,[Name]
    ,''
    ,[PA]
    ,[AB]
    ,[H]
    ,[1B]
    ,[2B]
    ,[3B]
    ,[HR]
    ,[ROE]
    ,[SACf]
    ,[BB]
    ,[K]
    ,[RBI]
    ,[RUNS]
    ,[BA]
    ,[OB]
    ,[SLUG]
    ,[OPS]
FROM dbo.SeasonStats;
GO

This allows us to get a line by line view of what each player did in each game throughout the season as well as have his aggregated season stats.

SELECT * FROM IndividualStatsTeam
WHERE SeasonName = 'Spring 2012'
ORDER BY Roster;
GO


As you can see, two of the things I'm passionate about, SQL Server and softball, work very well together.  Although this was done as a hobby, I use it every single week.  What kind of hobbies can you use SQL Server for?

And yes, these are the actual stats of my team.  We play just about every Tuesday night during the spring, summer, and fall for the SportsLink league in Charlotte, NC.  We’re not the best team, but we’re also nowhere near the worst, and we’ve managed to win the league championship twice.  If you happen to be in Charlotte for the SQLPass Summit in October, maybe you can find some time of come watch my team play that Tuesday night.  I can't guarantee that it will be good or bad, but it’s definitely entertaining.