Tuesday, March 10, 2015

SQL Nexus Might Just Save Your Bacon

The year is 2015, and I’m still surprised by how many people have never heard of SQL Nexus; although it has been available since approximately 2008. SQL Nexus is a tool that that is designed to collect and analyze data to help you troubleshoot SQL Server performance issues. There are plenty of tools on the market that can easily do this, but this one is free and it can be used to monitor SQL Server 2005 through SQL Server 2014. The tool was originally created by the SQL Server product support team, and is still used today as the primary tool for collecting and analyzing performance issues.

SQL Nexus does not require you to install anything on the server that you intend to monitor, but you will need to copy some data collector files to it. The collector files just run SQLDIAG and some other TSQL scripts to collect data from your monitored server. Below is a list of some of the data that is collected.
  1. Windows Performance Monitor counters
  2. Windows event logs (system, application, and security logs)
  3. Windows configuration settings
  4. SQL Server trace events
  5. SQL Server wait stats
  6. SQL Server blocking info
  7. SQL Server configuration settings
  8. SQL Server dump files
Once the data is collected from the monitored server, you copy all the data files back to another machine for analysis within SQL Nexus.

Installation & Setup
There are a few requirements to setting up SQL Nexus. First, make sure you have the .NET Framework 4.0 installed. Next, SQL Nexus requires an instance of SQL Server to store and analyze the imported data. Ideally, you would put it on a TEST server, but you can also use SQL Server Express Edition running on your workstation. If your workstation that will do the analysis is running SQL Server 2008 or SQL Server 2008 R2, then download and install the Microsoft Report Viewer 2010 control. If your workstation is running SQL Server 2012 or higher, then you can skip this step. Regardless of the version, download and install the Report Viewer hotfix. The hotfix addresses some bugs that will prevent some of the SQL Nexus reports from working properly.


Next, download and install the RML Utilities.


Now that you have all the prerequisites installed, download the latest version of SQL Nexus from the CodePlex website. Just uncompress the Release4.0.0.64.ZIP file and place on your desktop. Within the Release4.0.0.64 folder, you should see a subfolder called DataCollectors, and another compressed file called SqlNexus4.0.0.64.ZIP. Uncompress that zip file and place it on your workstation C drive. The SqlNexus4.0.0.64 folder will contain the SQL Nexus executable and all the other files needed to analyze the collected data. From there, you can create a shortcut on your desktop or your start menu for sqlnexus.exe.


The DataCollectors subfolder contains the data collector files. There is one ZIP file for each version of SQL Server. In our example, we want to monitor a SQL Server 2012 instance, so we need to uncompress the PerfStatsScript2012.zip file. Once uncompressed, you will see 3 CMD files and their accompanying XML configuration files.
  1. StartSQLDiagTrace2012.cmd & SQLDiagPerfStats_Trace2012.XML
  2. StartSQLDiagDetailed_Trace2012.cmd & SQLDiagPerfStats_Detailed_Trace2012.XML
  3. StartSQLDiagForReplay2012.cmd & SQLDiagReplay2012.XML
StartSQLDiagTrace2012.cmd will likely be the batch file you will run most often. By default, it will collect the most common Performance Monitor counters and run a SQL trace to capture batch level commands. The StartSQLDiagDetailed_Trace2012.cmd batch file will capture the same counters, but it will also capture statement level commands and the Showlan Statistics Profile. The StartSQLDiagForReplay2012.cmd batch file captures the trace events needed to replay it against another server.

Collecting Data
To monitor our server, copy the PerfStatsScript2012 subfolder to the SQL Server. This folder cannot be on a network drive; it must be local to that server. It’s best if you do not place it on the C drive or on a drive that hosts your transaction log (LDF) files. The data collector can quickly capture a lot of data, so you want to avoid filling up the C drive or impacting the write performance of your transaction log files. Open a command prompt to that folder location and run StartSQLDiagTrace2012.cmd. SQLDiag will start up the collectors based on what’s in the XML configuration file. Once you see the green line that says “SQLDIAG Collection started”, then you can try to reproduce your performance problem on the server.


Once you are done, just press Ctrl+C to shut down the data collector. Sometimes it will take several minutes for the collector to fully shutdown, so be patient and do not press Ctrl+C more than once.

You can leave the data collector running for a period of time, but please keep in mind that it can collect a lot of data and the SQL trace can potentially impact your performance. It’s usually best to start it, reproduce your performance issue, and then shut down the data collector.

After you stop the data collector, you will see a sub folder called SQLDiagOutput. Copy that entire folder back to your workstation to begin the analysis of the data.

Analysis of the Data
Run sqlnexus.exe using the shortcut you placed on your desktop. The first step is to specify which SQL Server will be used to perform the analysis. Enter the name of your SQL Server and click Connect.


The tool will automatically create a sqlnexus database on that SQL Server.


Now we need to import the data we collected. In the bottom left hand pane, click Import.


Choose the location of the SQLDiagOutput folder and then click the green Import button. If the collector captured data for more than one instance on the monitored server, then you will get a popup window asking which instance you want to analyze. Choose your instance and click OK.


Depending on how much data was collected, the import process could take a while; maybe even several hours. Be patient and just give it time to complete. While it’s running the Import button will change to a red Stop button. Once it’s complete, it will turn back to a green import button.


Now just close the Data Import window. In the SQL Nexus window, click SQL Perf Main from the upper left hand pane. This is main page where we will drill into the various reports.


Each report name that appears in blue is a hyperlink to that report. Report names that are in black did not have any data to analyze. Most reports have a summary of what they display, but some do not, like the Bottleneck Analysis report. It shows CPU utilization by SQL Server (yellow) and everything else (purple). If CPU utilization is higher for purple, then you could assume the CPU is being used by a process other than SQL Server.


If SQL Server is the higher consumer of CPU (yellow), then look further down on that page to see the wait stats analysis. This is the waits that accumulated while the data collector was running. This will help guide you toward what SQL Server was waiting on while executing the queries. You can click on each wait type to see more details about each wait. If you need to the definition of a wait stat, then look at Books Online.

To go back to the previous page in a report, just click the blue left arrow in the toolbar.


The Perfmon Summary report shows an aggregation of the counters collected from Performance Monitor. They are broken out by CPU, IO, Memory, and Network.


You can click on each blue link to get the graphical view. For example, in the IO report details you can see the Avg Disk Sec/Read and the Avg Disk Sec/Write which will show you how fast your IO subsystem is responding to requests.


The ReadTrace Report is really my favorite group of reports. The main report gives you a graphical view of the overall utilization of resources.


From here, you can drill into the data from various points, such as application name, database id, or login name. The Unique Batches report will show you the top TSQL batches by CPU, reads, writes, and duration. Each graph is inverted, so the one on the bottom is the largest consumer of that resource.


On the Y axis of each graph you will see a number. That number corresponds to the TSQL batch command that is listed further down on the page. For example, Query #1 was the largest consumer of reads. From that table, I can see that was a stored procedure named “SLEV”.


You can click on each batch to get even more detail about it in a graphical view over time.


Below the chart, you can click the + button next to “View Chart Details” to get the detailed info about that TSQL batch. This will display the number executions, CPU, reads, and writes, broken out by interval.


There are many more reports, so be sure to click through each one to view the information they provide.

Customize the Data Collector
By default, the data collectors will capture only a subset of perfmon counters and SQL trace events, and it will do that for ALL instances on the server being monitored. If you need to add or remove certain counters or monitor only a specific SQL Server instance, then you will need to edit the SQLDiagPerfStats_Trace2012.XML configuration file. You can open the file in Notepad, but opening it in SQL Server Management Studio color codes it and make everything easier to read.

The top section of the XML file is for the perfmon stats, which takes up about 75% of the file. The bottom section is for the SQL Server trace events. All of the possible options are listed for every counter; however, the majority are set to disable. Therefore, to add additional counters, just search for the counter name and set the “enabled” value to “true”. For example, if you wanted to add the TSQL:StmtStarting and TSQL:StmtCompleted, then just scroll down to about line 2775 and change the enabled value to “true” and save the XML file. Now when you run the collector it will also collect those two trace events.

If you need to set the name of the server then about line 7 you will see Machine name=".". Change the "." to the actual server name. If the server being monitored is a Failover Cluster Instance, then you must enter the SQL Server Virtual Server Name.


If you need to set the SQL Server instance name, then about line 2564 you will see Instance name="*". Change the "*" to the instance name. Only type the instance name here, not the ServerName\InstanceName. If your instance to monitor is the default instance, then enter "MSSQLSERVER".


Also on the same line, the default authentication to connect to that instance of SQL Server is Windows Authentacation. If you need to use SQL Server Authentication, then set windowsauth to “false” and user to the sql server login name. When you run the StartSQLDiagTrace2012.cmd batch file, it will prompt you to enter the password for that account.

Other Hints and Tricks
Before collecting the data, you need to make sure the Windows account is a member of the local administrators group a member of the sysadmin server role for the server you want to monitor. If you use a SQL Server login to collect the SQL trace events, then it also has to be a member of the sysadmin server role.

Once the data collector is started on the monitored server, don’t log off. That will shut down the collectors and you will have to restart them again.

If you cannot stay logged into the server, then you’ll need to schedule the collection by modifying the StartSQLDiagTrace2012.cmd batch file using the /B and /E parameters for SQLDIAG or register it as a Windows service. Books Online has more information about those options.

When you import data into SQL Nexus, it populates the tables in the sqlnexus database. If you already have data in the sqlnexus database, you’ll get the following error when attempting to import new data.


At this point, you will need to do one of two things: overwrite the existing data or create another sqlnexus database. To overwrite the existing database, click the Import link and then click Options. In the dropdown menu, just select “Drop Current DB Before Importing”. This will drop and recreate the sqlnexus database.


Keep in mind, sqlnexus is just another database on a SQL Server. If you want to keep that data for later analysis, you CAN backup that database before overwriting, or you can create a new database. Before clicking the Import link, click the database dropdown list on the tool bar. From there, you can select “<New Database>”. This just creates another database on your local SQL Server and will be the target of the imported data.


This is the way to go if you need to analyze data from multiple servers, because you can use the dropdown database list to change to a different sqlnexus database and view the reports.

Report Parameters is another feature that people tend to overlook. When viewing a report, such as the IO Metrics, click Parameters button on the toolbar. In the popup window, you can change the start and end timeframe that you want to view. This is extremely helpful if your data spans hours or even days.


Finally, you can export the reports to an Excel file by clicking the Save button on the toolbar.


For example, through your analysis you find the offending query; you can export the batch details report to an Excel file and share that report with the development team or other DBA team members.


Conclusion
As you can see, SQL Nexus can be a very powerful tool to help troubleshoot performance issues within SQL Server. This is why the tool is widely used internally be various SQL Server support teams. The best way for you to get more info is to practice collecting and analyzing data from your servers. However, be sure that you practice on a TEST server and not in your production environment. Soon you’ll be able to quickly diagnose and resolve performance issues in your environment. And one day, it might just save your bacon!

Tuesday, February 24, 2015

SQL Cruise Caribbean 2015 Recap


UPDATED: Don't just take my word. Read the feedback from others.



I wish I could just re-post my article from last year about SQL Cruise and change the dates and pictures.The event is just that good.

I still believe this is one of the more premier training events offered for SQL Server. The amount of 1-on-1 time you get with the technical leads is absurd. You can get classroom time at any training conference, but what happens when the session is over? Most instructors head off to their next engagement and you never get a chance to speak with them. On SQL Cruise, that interaction continues well beyond the classroom session, beyond the next session, beyond the day, and even well beyond the week.

Last year I met Andrew Kelly (b|t) and Argenis Fernandez (b|t) aboard SQL Cruise. Andrew was one of the technical leads and Argenis was an attendee. To this day I correspond frequently with both of them about SQL Server as well as other stuff. Andrew has become a great mentor to me and has helped me develop my presentation skills. Argenis was instrumental in helping me decide if joining Microsoft was the right fit for my career goals. I will say that both of those individuals have had a lasting impression on my life. How many times has that happened to you at the PASS Summit? I’m guessing not too many.

This year I got to spend time with a few of the same technical leads from 2014, Grant Fritchey (b|t) and Kevin Kline (b|t), but I also got to meet and learn a lot from the new ones, Jes Borland (b|t), David Klee (b|t), and Jeff Lehmann (w). This year’s topics covered things such as query performance tuning, infrastructure optimization, virtualization, data recovery, PowerShell, and even cloud solutions. Every one of the sessions provided me with valuable information that I can use immediately upon return to work.


But I also got a chance to meet and hang out with a lot of the attendees. As it turns out they are just like me. They are passionate about SQL Server, they want to learn as much as possible, and they want to succeed in their career. That’s the cool thing about SQL Cruise; the attendees are dedicated to SQL Server and want to learn. This is not a vacation for them; it’s an educational experience above all else. The other great thing is all the attendees have a wide range of background, and of course issues to resolve. That’s a great combination for learning new tricks from each other.

 

Now of course the cruise and exotic islands are quite a lot of fun, but they are not a distraction from class. They actually help everyone relax and become more comfortable around each other, which leads to more open discussions in and out of the classroom. There really is no stress when this is the backdrop of a conversation.


Once again, a great conference always has great sponsors. This year the sponsors were Amazon Web Services, Red-Gate, SQL Sentry, and B-Side Consulting. All of these companies use and make products that enhance the SQL Server product. Being a sponsor of the conference shows their commitment to the product and the community. Of course, a great conference always has great organizers, so I have to thank Tim Ford (b|t) and his wife Amy for putting together another great conference. Their hard work and dedication has been a huge help to me and I know to a lot of others.

In the group picture below, almost half of the people are referred to as the "Plus Ones". They are the family members and/or friends of the attendees and technical leads. SQL Cruise encourages the attendees to bring their family members, so they can enjoy the adventures on the ship as well as on the islands. The Plus Ones are invited to every group event, group dinner, and even are allowed to sit in on the classroom sessions. My wife absolutely loves being a Plus One. She has a bubbly personality and can quickly make friends, which turns her week of vacation into quite the adventure. So much so, that she has already made plans for SQL Cruise 2016 without even asking me if I plan to sign up.

This was my second SQL Cruise, and it sure won’t be the last. Now it’s your chance to become part of this elite group with the SQL Family. SQL Cruise 2016 will be setting sail in January 2016, so head on over to the site to learn more or sign up. If you need help convincing the boss, just let me know and I’ll put in a good word for you. And if you need help convincing your own Plus One, let me know, because my wife would be happy to explain all the advantages of attending.

Tuesday, February 3, 2015

Get Out There and Challenge Yourself

It’s been almost five months since I posted my last article, and so much has changed since then. I have a new job, a new house, a new address, and of course a new mortgage.

I had been working as a DBA for Wells Fargo Securities for nearly 15 years when I decided that I could no longer grow in that role. I had a lot of good years at the bank, and I will definitely miss all of my friends. But in order for me to be happy, I really needed a change. My decision to move was more about needing to break loose from the daily routine of checking error logs, backup jobs, and dashboards. I really needed a job that challenged me, so that I could gain even more knowledge about SQL Server. There was only one place I felt I could get that type of challenge in a job. So I decided to apply at Microsoft. I figured since I loved SQL Server so much, there was no better place to learn more about it than from the company that developed it.

After talking to a number of friends that already worked at Microsoft, I decided to focus on the Premier Field Engineer positions for SQL Server. A Premier Field Engineer, or PFE for short, is focused on directly supporting a customer whether it’s over the phone or being onsite with them. The type of tasks performed by a PFE can vary greatly. It could be simple performance troubleshooting, delivering training, or something more in-depth such debugging minidumps. Just the thought of not knowing what problems lay ahead, just sounded like so much fun.

The entire process took about six months, from the first day I submitted my resume to the day I was offered a job. This may sound strange, but I really enjoyed the interview questions, because I felt like it help validate my knowledge of SQL Server.

I have already been working onsite with customers, and so far, I have no plans to go back to a normal DBA job. I’m glad that I had the courage to leave such a good job and to challenge myself in my career goals. Most of all, I’m glad that I have a wife who is very supportive of such a gamble.

For those of you out there that might feel stuck in a boring DBA job, find a way to take a gamble. You don’t have to leave a great company, but you might be able to find a way to make your job more interesting. Start simple, such as learn more about SQL Server and then teach your coworkers. That’s how it all started for me. I started reading more and more about SQL Server, then I tried to pass that knowledge onto the junior DBAs on my team. From there, I got into public speaking at user groups and SQL Saturdays. I started small and with every win, I kept challenging myself to do more. So get out there and start challenging yourself, just as I did.

Monday, February 2, 2015

PowerPoint Slides Available for Download

I promised that I would post the slide decks for my presentations, and now I have finally followed through on that promise. I have added a new Resources page that will have downloadable content available from blog articles and presentations.

Thursday, August 21, 2014

Monitor the Number of Deleted Rows in a Clustered Columnstore Index

In some of my previous posts, I have talked about how to create Columnstore indexes. Now I’d like to discuss one maintenance detail that you need to keep an eye on. I’m talking specifically about the number of “deleted rows” in a clustered Columnstore index.

One of the great benefits of clustered Columnstore indexes in SQL Server 2014 is they are writeable, which means you can insert, update and delete data. This is all well and good, but I think there really should be an asterisk beside update* and delete* with a disclaimer that says something like “Deletes aren’t really deletes and updates aren’t really updates”.

While on the surface, SQL Server appears to be processing the deletes and updates normally, under the hood it’s really just modifying the existing rows and/or adding new ones.

First, let’s go over a quick summary of the anatomy of a clustered Columnstore index. The data in a clustered Columnstore index is stored in two parts. They are comprised of columnstore section and a rowstore section, which is referred to as the Deltastore.


What SQL Server does under the covers is create a Columnstore index for the data that already resides in the table and marks it as read-only. The Deltastore is used to handle all of the inserts and updates that are made to the clustered Columnstore index. The Delete Bitmap is used to mark rows as deleted.

So if you’re keeping score at home, this is how it works:

For INSERTS:

  • Inserts are initially added to the Deltastore section.
  • A background process (tuple mover) eventually compresses and adds those rows to the Columnstore section as new row group.

For DELETES:

  • For deleted rows in the Columnstore section, SQL Server only marks them as “deleted” but doesn’t actually remove the row. Those deleted rows will remain around until the clustered Columnstore index is rebuilt.
  • Deleted rows that are in the Deltastore are actually deleted.

For UPDATES:

  • Updates are treated as a delete + insert. Which means the original version of the updated row is marked as deleted in the Delete Bitmap, and then the new row with the updated values is inserted into the Deltastore.
  • Updated rows that are in the Deltastore are actually updated.

To fully understand how that Deleted Bitmap can affect your performance, let’s look at an example.
I am using the dbo.FactInternetSales table in the AdventureWorksDW2012 database. I used a script by Kalen Delaney to blow up the size of the table to just over 247 million rows and takes up about 2.6GB in disk space.


Using the script below, I created a clustered Columnstore index on the table.

CREATE CLUSTERED COLUMNSTORE INDEX CluCSidx_FactInternetSales ON dbo.FactInternetSales;
GO

You can use the system catalog, sys.column_store_row_groups to monitor the number of “deleted” rows in a clustered Columnstore index. In our example, you can run the following query to see the number of total row and the number of deleted rows.

SELECT
   SUM(total_rows) AS total_rows
  ,SUM(deleted_rows) AS deleted_rows
FROM sys.column_store_row_groups
WHERE object_id = OBJECT_ID('FactInternetSales');
GO


Let’s now “delete” a bunch of rows. I used a script by Pinal Dave to randomly delete several million rows from the table. What’s left over is a table that hasn’t changed at all. Even though I just deleted about 65 million rows, the physical size is still the same; 2.6GB.


If we look at sys.column_store_row_groups, we can see that we’re now carrying those 65 million rows around as extra baggage, and this explains why the table size never changed.

SELECT
   SUM(total_rows) AS total_rows
  ,SUM(deleted_rows) AS deleted_rows
FROM sys.column_store_row_groups
WHERE object_id = OBJECT_ID('FactInternetSales');
GO


Let’s run a couple of sample queries and see what our performance is like.

SELECT COUNT(*) FROM dbo.FactInternetSales;
GO


The results are not too bad. The count hit all rows in the table and still return in 12 seconds and only needed 1709 logical reads. Let’s run another query.

SELECT
   d1.SalesTerritoryRegion
  ,COUNT(*) AS InternetOrderQuantity
  ,ROUND(SUM(f1.SalesAmount),2) AS InternetSalesAmount
FROM [dbo].FactInternetSales f1
INNER JOIN dbo.DimSalesTerritory d1
  ON f1.SalesTerritoryKey=d1.SalesTerritoryKey
GROUP BY d1.SalesTerritoryRegion
ORDER BY InternetSalesAmount DESC;
GO


Here we see similar results. This aggregation query took 2.3 seconds and had 4604 logical reads against FactInternetSales.

Although the performance of those queries was good, they are far from perfect. All of those deleted rows are carried around with the table until the clustered Columnstore index is rebuilt. During that process, all of the “deleted” rows are deleted for real, any rows in the Deltastore are moved into the Columnstore, and the row groups within the Columnstore are recompressed to be as small as possible.

Let’s see what happens to our index after rebuilding it.

ALTER INDEX CluCSidx_FactInternetSales ON dbo.FactInternetSales REBUILD;
GO

Looking at sys.column_store_row_groups confirms that we have completely removed all of the delete rows.


We can check the size of the table and see the storage space needed is considerably smaller; about 682MB smaller.


Now let’s rerun our two queries and see if have any better performance.

SELECT COUNT(*) FROM dbo.FactInternetSales;
GO


Wow! With only a few hundred fewer logical IOs, SQL Server was able to return the result in only 424ms. Compared to our earlier run of over 12 seconds, that’s a huge improvement.

SELECT
   d1.SalesTerritoryRegion
  ,COUNT(*) AS InternetOrderQuantity
  ,ROUND(SUM(f1.SalesAmount),2) AS InternetSalesAmount
FROM dbo.FactInternetSales f1
INNER JOIN dbo.DimSalesTerritory d1
  ON f1.SalesTerritoryKey=d1.SalesTerritoryKey
GROUP BY d1.SalesTerritoryRegion
ORDER BY InternetSalesAmount DESC;
GO


Here we see similar performance gains. We have a few hundred fewer logical reads, and our runtime is down to just 905ms.

Clustered Columnstore indexes for SQL Server 2014 are huge win for any application that make use of them, but as you can see it does come at a cost if you routinely delete lots of rows and fail to monitor the deleted_rows value in sys.column_store_row_groups.

Check out Books Online for further reading about clustered Columnstore indexes.