Tuesday, February 24, 2015

SQL Cruise Caribbean 2015 Recap


UPDATED: Don't just take my word. Read the feedback from others.



I wish I could just re-post my article from last year about SQL Cruise and change the dates and pictures.The event is just that good.

I still believe this is one of the more premier training events offered for SQL Server. The amount of 1-on-1 time you get with the technical leads is absurd. You can get classroom time at any training conference, but what happens when the session is over? Most instructors head off to their next engagement and you never get a chance to speak with them. On SQL Cruise, that interaction continues well beyond the classroom session, beyond the next session, beyond the day, and even well beyond the week.

Last year I met Andrew Kelly (b|t) and Argenis Fernandez (b|t) aboard SQL Cruise. Andrew was one of the technical leads and Argenis was an attendee. To this day I correspond frequently with both of them about SQL Server as well as other stuff. Andrew has become a great mentor to me and has helped me develop my presentation skills. Argenis was instrumental in helping me decide if joining Microsoft was the right fit for my career goals. I will say that both of those individuals have had a lasting impression on my life. How many times has that happened to you at the PASS Summit? I’m guessing not too many.

This year I got to spend time with a few of the same technical leads from 2014, Grant Fritchey (b|t) and Kevin Kline (b|t), but I also got to meet and learn a lot from the new ones, Jes Borland (b|t), David Klee (b|t), and Jeff Lehmann (w). This year’s topics covered things such as query performance tuning, infrastructure optimization, virtualization, data recovery, PowerShell, and even cloud solutions. Every one of the sessions provided me with valuable information that I can use immediately upon return to work.


But I also got a chance to meet and hang out with a lot of the attendees. As it turns out they are just like me. They are passionate about SQL Server, they want to learn as much as possible, and they want to succeed in their career. That’s the cool thing about SQL Cruise; the attendees are dedicated to SQL Server and want to learn. This is not a vacation for them; it’s an educational experience above all else. The other great thing is all the attendees have a wide range of background, and of course issues to resolve. That’s a great combination for learning new tricks from each other.

 

Now of course the cruise and exotic islands are quite a lot of fun, but they are not a distraction from class. They actually help everyone relax and become more comfortable around each other, which leads to more open discussions in and out of the classroom. There really is no stress when this is the backdrop of a conversation.


Once again, a great conference always has great sponsors. This year the sponsors were Amazon Web Services, Red-Gate, SQL Sentry, and B-Side Consulting. All of these companies use and make products that enhance the SQL Server product. Being a sponsor of the conference shows their commitment to the product and the community. Of course, a great conference always has great organizers, so I have to thank Tim Ford (b|t) and his wife Amy for putting together another great conference. Their hard work and dedication has been a huge help to me and I know to a lot of others.

In the group picture below, almost half of the people are referred to as the "Plus Ones". They are the family members and/or friends of the attendees and technical leads. SQL Cruise encourages the attendees to bring their family members, so they can enjoy the adventures on the ship as well as on the islands. The Plus Ones are invited to every group event, group dinner, and even are allowed to sit in on the classroom sessions. My wife absolutely loves being a Plus One. She has a bubbly personality and can quickly make friends, which turns her week of vacation into quite the adventure. So much so, that she has already made plans for SQL Cruise 2016 without even asking me if I plan to sign up.

This was my second SQL Cruise, and it sure won’t be the last. Now it’s your chance to become part of this elite group with the SQL Family. SQL Cruise 2016 will be setting sail in January 2016, so head on over to the site to learn more or sign up. If you need help convincing the boss, just let me know and I’ll put in a good word for you. And if you need help convincing your own Plus One, let me know, because my wife would be happy to explain all the advantages of attending.

Tuesday, February 3, 2015

Get Out There and Challenge Yourself

It’s been almost five months since I posted my last article, and so much has changed since then. I have a new job, a new house, a new address, and of course a new mortgage.

I had been working as a DBA for Wells Fargo Securities for nearly 15 years when I decided that I could no longer grow in that role. I had a lot of good years at the bank, and I will definitely miss all of my friends. But in order for me to be happy, I really needed a change. My decision to move was more about needing to break loose from the daily routine of checking error logs, backup jobs, and dashboards. I really needed a job that challenged me, so that I could gain even more knowledge about SQL Server. There was only one place I felt I could get that type of challenge in a job. So I decided to apply at Microsoft. I figured since I loved SQL Server so much, there was no better place to learn more about it than from the company that developed it.

After talking to a number of friends that already worked at Microsoft, I decided to focus on the Premier Field Engineer positions for SQL Server. A Premier Field Engineer, or PFE for short, is focused on directly supporting a customer whether it’s over the phone or being onsite with them. The type of tasks performed by a PFE can vary greatly. It could be simple performance troubleshooting, delivering training, or something more in-depth such debugging minidumps. Just the thought of not knowing what problems lay ahead, just sounded like so much fun.

The entire process took about six months, from the first day I submitted my resume to the day I was offered a job. This may sound strange, but I really enjoyed the interview questions, because I felt like it help validate my knowledge of SQL Server.

I have already been working onsite with customers, and so far, I have no plans to go back to a normal DBA job. I’m glad that I had the courage to leave such a good job and to challenge myself in my career goals. Most of all, I’m glad that I have a wife who is very supportive of such a gamble.

For those of you out there that might feel stuck in a boring DBA job, find a way to take a gamble. You don’t have to leave a great company, but you might be able to find a way to make your job more interesting. Start simple, such as learn more about SQL Server and then teach your coworkers. That’s how it all started for me. I started reading more and more about SQL Server, then I tried to pass that knowledge onto the junior DBAs on my team. From there, I got into public speaking at user groups and SQL Saturdays. I started small and with every win, I kept challenging myself to do more. So get out there and start challenging yourself, just as I did.

Monday, February 2, 2015

PowerPoint Slides Available for Download

I promised that I would post the slide decks for my presentations, and now I have finally followed through on that promise. I have added a new Resources page that will have downloadable content available from blog articles and presentations.

Thursday, August 21, 2014

Monitor the Number of Deleted Rows in a Clustered Columnstore Index

In some of my previous posts, I have talked about how to create Columnstore indexes. Now I’d like to discuss one maintenance detail that you need to keep an eye on. I’m talking specifically about the number of “deleted rows” in a clustered Columnstore index.

One of the great benefits of clustered Columnstore indexes in SQL Server 2014 is they are writeable, which means you can insert, update and delete data. This is all well and good, but I think there really should be an asterisk beside update* and delete* with a disclaimer that says something like “Deletes aren’t really deletes and updates aren’t really updates”.

While on the surface, SQL Server appears to be processing the deletes and updates normally, under the hood it’s really just modifying the existing rows and/or adding new ones.

First, let’s go over a quick summary of the anatomy of a clustered Columnstore index. The data in a clustered Columnstore index is stored in two parts. They are comprised of columnstore section and a rowstore section, which is referred to as the Deltastore.


What SQL Server does under the covers is create a Columnstore index for the data that already resides in the table and marks it as read-only. The Deltastore is used to handle all of the inserts and updates that are made to the clustered Columnstore index. The Delete Bitmap is used to mark rows as deleted.

So if you’re keeping score at home, this is how it works:

For INSERTS:

  • Inserts are initially added to the Deltastore section.
  • A background process (tuple mover) eventually compresses and adds those rows to the Columnstore section as new row group.

For DELETES:

  • For deleted rows in the Columnstore section, SQL Server only marks them as “deleted” but doesn’t actually remove the row. Those deleted rows will remain around until the clustered Columnstore index is rebuilt.
  • Deleted rows that are in the Deltastore are actually deleted.

For UPDATES:

  • Updates are treated as a delete + insert. Which means the original version of the updated row is marked as deleted in the Delete Bitmap, and then the new row with the updated values is inserted into the Deltastore.
  • Updated rows that are in the Deltastore are actually updated.

To fully understand how that Deleted Bitmap can affect your performance, let’s look at an example.
I am using the dbo.FactInternetSales table in the AdventureWorksDW2012 database. I used a script by Kalen Delaney to blow up the size of the table to just over 247 million rows and takes up about 2.6GB in disk space.


Using the script below, I created a clustered Columnstore index on the table.

CREATE CLUSTERED COLUMNSTORE INDEX CluCSidx_FactInternetSales ON dbo.FactInternetSales;
GO

You can use the system catalog, sys.column_store_row_groups to monitor the number of “deleted” rows in a clustered Columnstore index. In our example, you can run the following query to see the number of total row and the number of deleted rows.

SELECT
   SUM(total_rows) AS total_rows
  ,SUM(deleted_rows) AS deleted_rows
FROM sys.column_store_row_groups
WHERE object_id = OBJECT_ID('FactInternetSales');
GO


Let’s now “delete” a bunch of rows. I used a script by Pinal Dave to randomly delete several million rows from the table. What’s left over is a table that hasn’t changed at all. Even though I just deleted about 65 million rows, the physical size is still the same; 2.6GB.


If we look at sys.column_store_row_groups, we can see that we’re now carrying those 65 million rows around as extra baggage, and this explains why the table size never changed.

SELECT
   SUM(total_rows) AS total_rows
  ,SUM(deleted_rows) AS deleted_rows
FROM sys.column_store_row_groups
WHERE object_id = OBJECT_ID('FactInternetSales');
GO


Let’s run a couple of sample queries and see what our performance is like.

SELECT COUNT(*) FROM dbo.FactInternetSales;
GO


The results are not too bad. The count hit all rows in the table and still return in 12 seconds and only needed 1709 logical reads. Let’s run another query.

SELECT
   d1.SalesTerritoryRegion
  ,COUNT(*) AS InternetOrderQuantity
  ,ROUND(SUM(f1.SalesAmount),2) AS InternetSalesAmount
FROM [dbo].FactInternetSales f1
INNER JOIN dbo.DimSalesTerritory d1
  ON f1.SalesTerritoryKey=d1.SalesTerritoryKey
GROUP BY d1.SalesTerritoryRegion
ORDER BY InternetSalesAmount DESC;
GO


Here we see similar results. This aggregation query took 2.3 seconds and had 4604 logical reads against FactInternetSales.

Although the performance of those queries was good, they are far from perfect. All of those deleted rows are carried around with the table until the clustered Columnstore index is rebuilt. During that process, all of the “deleted” rows are deleted for real, any rows in the Deltastore are moved into the Columnstore, and the row groups within the Columnstore are recompressed to be as small as possible.

Let’s see what happens to our index after rebuilding it.

ALTER INDEX CluCSidx_FactInternetSales ON dbo.FactInternetSales REBUILD;
GO

Looking at sys.column_store_row_groups confirms that we have completely removed all of the delete rows.


We can check the size of the table and see the storage space needed is considerably smaller; about 682MB smaller.


Now let’s rerun our two queries and see if have any better performance.

SELECT COUNT(*) FROM dbo.FactInternetSales;
GO


Wow! With only a few hundred fewer logical IOs, SQL Server was able to return the result in only 424ms. Compared to our earlier run of over 12 seconds, that’s a huge improvement.

SELECT
   d1.SalesTerritoryRegion
  ,COUNT(*) AS InternetOrderQuantity
  ,ROUND(SUM(f1.SalesAmount),2) AS InternetSalesAmount
FROM dbo.FactInternetSales f1
INNER JOIN dbo.DimSalesTerritory d1
  ON f1.SalesTerritoryKey=d1.SalesTerritoryKey
GROUP BY d1.SalesTerritoryRegion
ORDER BY InternetSalesAmount DESC;
GO


Here we see similar performance gains. We have a few hundred fewer logical reads, and our runtime is down to just 905ms.

Clustered Columnstore indexes for SQL Server 2014 are huge win for any application that make use of them, but as you can see it does come at a cost if you routinely delete lots of rows and fail to monitor the deleted_rows value in sys.column_store_row_groups.

Check out Books Online for further reading about clustered Columnstore indexes.

Tuesday, July 29, 2014

How to Edit Read-Only Non-clustered Columnstore Data

As I've discussed in some of my previous posts, creating a non-clustered Columnstore index will make the index as well as the base table read-only. Which means you can’t insert, update, or delete any data until your drop the index. This may seem like a huge issue, but in reality it’s not that much of a problem. Keep in mind the Columnstore index feature is targeted at data warehouses that modify data infrequently. In the examples below, I go through two methods you can use to edit your read-only data.

To get started, we need to create a test table and insert a few rows.

USE master;
GO

CREATE DATABASE TEST;
GO

USE TEST;
GO

CREATE TABLE dbo.Table1
(
   col1 INT
  ,col2 VARCHAR(20)
);
GO

INSERT INTO dbo.Table1 (col1, col2) VALUES (1, 'Test Value 1');
INSERT INTO dbo.Table1 (col1, col2) VALUES (2, 'Test Value 2');
INSERT INTO dbo.Table1 (col1, col2) VALUES (3, 'Test Value 3');
INSERT INTO dbo.Table1 (col1, col2) VALUES (4, 'Test Value 4');
INSERT INTO dbo.Table1 (col1, col2) VALUES (5, 'Test Value 5');
INSERT INTO dbo.Table1 (col1, col2) VALUES (6, 'Test Value 6');
INSERT INTO dbo.Table1 (col1, col2) VALUES (7, 'Test Value 7');
INSERT INTO dbo.Table1 (col1, col2) VALUES (8, 'Test Value 8');
INSERT INTO dbo.Table1 (col1, col2) VALUES (9, 'Test Value 9');
GO

Next, we'll add a non-clustered Columnstore index to the table.

CREATE NONCLUSTERED COLUMNSTORE INDEX nci_Table1 
  ON dbo.Table1(col1,col2);
GO

At this point, we have effectively made this table read-only. We can read from it all day long, but if we attempt to update a value, we will get an error.

SELECT * FROM dbo.Table1;
GO


UPDATE dbo.Table1 SET col2 = 'changed a value' WHERE col1 = 1;
GO

Msg 35330, Level 15, State 1, Line 41 UPDATE statement failed because data cannot be updated in a table that has a nonclustered columnstore index. Consider disabling the columnstore index before issuing the UPDATE statement, and then rebuilding the columnstore index after UPDATE has completed.

Once again, I love Microsoft’s error message. It gives you very good information on one method to use for changing data in a non-clustered Columnstore index.

Disabling the non-clustered Columnstore index gives you the ability to make changes to the data; however, it can't be used for any queries while it's disabled. I like to wrap all the commands into a single transaction using the XACT_ABORT option. This guarantees that any error will rollback the entire set and not just one single statement.

SET XACT_ABORT ON;
GO

BEGIN TRANSACTION;
GO

ALTER INDEX nci_Table1 ON dbo.Table1 DISABLE;
GO

UPDATE Table1 SET col2 = 'changed a value' WHERE col1 = 1;
GO

ALTER INDEX nci_Table1 ON dbo.Table1 REBUILD;
GO

COMMIT TRANSACTION;
GO


Looking at the table again, we can see the first row was definitely changed.

SELECT * FROM dbo.Table1;
GO


This is probably the easiest method to change your data; however, it’s also the most resource intensive. SQL Server will need to rebuild the index for the entire table not just for that one row that we changed. So if your table has millions or even billions of rows, it could take a lot time to rebuild and utilize a lot of resources. This is probably something you don’t want to do in the middle of your business day.

The second method we’ll cover, involves using partition switching. First, we’ll create the same table but partition it into 3 parts.

USE TEST
GO

CREATE PARTITION FUNCTION myRangePF1 (INT) 
  AS RANGE LEFT FOR VALUES (3,6);
GO 

CREATE PARTITION SCHEME myRangePS1 
  AS PARTITION myRangePF1 
  ALL TO ([PRIMARY]);
GO 

CREATE TABLE dbo.PartitionedTable (col1 INT, col2 VARCHAR(20)) 
  ON myRangePS1 (col1);
GO

INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (1, 'Test Value 1');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (2, 'Test Value 2');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (3, 'Test Value 3');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (4, 'Test Value 4');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (5, 'Test Value 5');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (6, 'Test Value 6');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (7, 'Test Value 7');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (8, 'Test Value 8');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (9, 'Test Value 9');
GO

CREATE NONCLUSTERED COLUMNSTORE INDEX nci_PartitionedTable 
  ON dbo.PartitionedTable(col1,col2);
GO

SELECT * FROM dbo.ParitionedTable;
GO


This table has the same data as before, but internally it's partitioned. Using sys.partitions to get the details, you can see there are a total of 3 partitions, each with 3 rows.

SELECT
     SCHEMA_NAME(t.schema_id) AS SchemaName
    ,OBJECT_NAME(i.object_id) AS ObjectName
    ,p.partition_number AS PartitionNumber
    ,fg.name AS FilegroupName
    ,rows AS 'Rows'
    ,au.total_pages AS 'TotalDataPages'
    ,CASE boundary_value_on_right
        WHEN 1 THEN 'less than'
        ELSE 'less than or equal to'
     END AS 'Comparison'
    ,value AS 'ComparisonValue'
    ,p.data_compression_desc AS 'DataCompression'
    ,p.partition_id
FROM sys.partitions p
    JOIN sys.indexes i ON p.object_id = i.object_id AND p.index_id = i.index_id
    JOIN sys.partition_schemes ps ON ps.data_space_id = i.data_space_id
    JOIN sys.partition_functions f ON f.function_id = ps.function_id
    LEFT JOIN sys.partition_range_values rv ON f.function_id = rv.function_id AND p.partition_number = rv.boundary_id
    JOIN sys.destination_data_spaces dds ON dds.partition_scheme_id = ps.data_space_id AND dds.destination_id = p.partition_number
    JOIN sys.filegroups fg ON dds.data_space_id = fg.data_space_id
    JOIN (SELECT container_id, sum(total_pages) as total_pages
            FROM sys.allocation_units
            GROUP BY container_id) AS au ON au.container_id = p.partition_id 
    JOIN sys.tables t ON p.object_id = t.object_id
WHERE i.index_id < 2
ORDER BY ObjectName,p.partition_number;
GO


To use partition switching, you have to create a non-partitioned table that matches your partitioned table in every way; including all the indexes and constraints. For this example, we need to edit row 1 that resides in the first partition, so we need to create a non-partitioned table that has a constraint that mimics the first partition; col1 <= 3.

CREATE TABLE dbo.NonPartitionedTable
(
   col1 INT CHECK (col1 <= 3)
  ,col2 VARCHAR(20)
);
GO

CREATE NONCLUSTERED COLUMNSTORE INDEX nci_NonPartitionedTable
  ON dbo.NonPartitionedTable(col1,col2);
GO

Once we have the table created, we can perform the switch.

SET XACT_ABORT ON;
GO

BEGIN TRANSACTION;
GO

ALTER TABLE dbo.PartitionedTable SWITCH PARTITION 1 TO dbo.NonPartitionedTable; 
GO

ALTER INDEX nci_NonPartitionedTable ON dbo.NonPartitionedTable DISABLE;
GO

UPDATE NonPartitionedTable SET col2 = 'changed a value' WHERE col1 = 2;
GO

ALTER INDEX nci_NonPartitionedTable ON dbo.NonPartitionedTable REBUILD;
GO

ALTER TABLE dbo.NonPartitionedTable SWITCH TO dbo.PartitionedTable PARTITION 1; 
GO

COMMIT TRANSACTION;
GO

Finally, all we have to do is just switch that data back into the partitioned table.

SELECT * FROM dbo.ParitionedTable;
GO


I know this may look exactly like what we did in the first method, and it is. However, by having our initial table partitioned, it gives us the ability to only rebuild the non-clustered Columnstore index on smaller subset of data instead of the entire table. If this partitioned table had millions of rows, then that ALTER INDEX REBUILD command might only need to run against a fraction of those rows and therefore complete much quicker and utilize far fewer resources. Not only can both of these methods can be used to edit existing data in a table that has a non-clustered Columnstore index, but they can also be used to insert or delete rows.

On a final note, I will recommend that you always partition any table that has a Columnstore index. This feature is designed for very large tables, and having it partitioned gives you much more flexibility than if it's not. And not just for working with Columnstore indexes, but other tasks as well.

You can read more about Columnstore indexes and partition switching from Books Online.