Thursday, August 21, 2014

Monitor the Number of Deleted Rows in a Clustered Columnstore Index

In some of my previous posts, I have talked about how to create Columnstore indexes. Now I’d like to discuss one maintenance detail that you need to keep an eye on. I’m talking specifically about the number of “deleted rows” in a clustered Columnstore index.

One of the great benefits of clustered Columnstore indexes in SQL Server 2014 is they are writeable, which means you can insert, update and delete data. This is all well and good, but I think there really should be an asterisk beside update* and delete* with a disclaimer that says something like “Deletes aren’t really deletes and updates aren’t really updates”.

While on the surface, SQL Server appears to be processing the deletes and updates normally, under the hood it’s really just modifying the existing rows and/or adding new ones.

First, let’s go over a quick summary of the anatomy of a clustered Columnstore index. The data in a clustered Columnstore index is stored in two parts. They are comprised of columnstore section and a rowstore section, which is referred to as the Deltastore.


What SQL Server does under the covers is create a Columnstore index for the data that already resides in the table and marks it as read-only. The Deltastore is used to handle all of the inserts and updates that are made to the clustered Columnstore index. The Delete Bitmap is used to mark rows as deleted.

So if you’re keeping score at home, this is how it works:

For INSERTS:

  • Inserts are initially added to the Deltastore section.
  • A background process (tuple mover) eventually compresses and adds those rows to the Columnstore section as new row group.

For DELETES:

  • For deleted rows in the Columnstore section, SQL Server only marks them as “deleted” but doesn’t actually remove the row. Those deleted rows will remain around until the clustered Columnstore index is rebuilt.
  • Deleted rows that are in the Deltastore are actually deleted.

For UPDATES:

  • Updates are treated as a delete + insert. Which means the original version of the updated row is marked as deleted in the Delete Bitmap, and then the new row with the updated values is inserted into the Deltastore.
  • Updated rows that are in the Deltastore are actually updated.

To fully understand how that Deleted Bitmap can affect your performance, let’s look at an example.
I am using the dbo.FactInternetSales table in the AdventureWorksDW2012 database. I used a script by Kalen Delaney to blow up the size of the table to just over 247 million rows and takes up about 2.6GB in disk space.


Using the script below, I created a clustered Columnstore index on the table.

CREATE CLUSTERED COLUMNSTORE INDEX CluCSidx_FactInternetSales ON dbo.FactInternetSales;
GO

You can use the system catalog, sys.column_store_row_groups to monitor the number of “deleted” rows in a clustered Columnstore index. In our example, you can run the following query to see the number of total row and the number of deleted rows.

SELECT
   SUM(total_rows) AS total_rows
  ,SUM(deleted_rows) AS deleted_rows
FROM sys.column_store_row_groups
WHERE object_id = OBJECT_ID('FactInternetSales');
GO


Let’s now “delete” a bunch of rows. I used a script by Pinal Dave to randomly delete several million rows from the table. What’s left over is a table that hasn’t changed at all. Even though I just deleted about 65 million rows, the physical size is still the same; 2.6GB.


If we look at sys.column_store_row_groups, we can see that we’re now carrying those 65 million rows around as extra baggage, and this explains why the table size never changed.

SELECT
   SUM(total_rows) AS total_rows
  ,SUM(deleted_rows) AS deleted_rows
FROM sys.column_store_row_groups
WHERE object_id = OBJECT_ID('FactInternetSales');
GO


Let’s run a couple of sample queries and see what our performance is like.

SELECT COUNT(*) FROM dbo.FactInternetSales;
GO


The results are not too bad. The count hit all rows in the table and still return in 12 seconds and only needed 1709 logical reads. Let’s run another query.

SELECT
   d1.SalesTerritoryRegion
  ,COUNT(*) AS InternetOrderQuantity
  ,ROUND(SUM(f1.SalesAmount),2) AS InternetSalesAmount
FROM [dbo].FactInternetSales f1
INNER JOIN dbo.DimSalesTerritory d1
  ON f1.SalesTerritoryKey=d1.SalesTerritoryKey
GROUP BY d1.SalesTerritoryRegion
ORDER BY InternetSalesAmount DESC;
GO


Here we see similar results. This aggregation query took 2.3 seconds and had 4604 logical reads against FactInternetSales.

Although the performance of those queries was good, they are far from perfect. All of those deleted rows are carried around with the table until the clustered Columnstore index is rebuilt. During that process, all of the “deleted” rows are deleted for real, any rows in the Deltastore are moved into the Columnstore, and the row groups within the Columnstore are recompressed to be as small as possible.

Let’s see what happens to our index after rebuilding it.

ALTER INDEX CluCSidx_FactInternetSales ON dbo.FactInternetSales REBUILD;
GO

Looking at sys.column_store_row_groups confirms that we have completely removed all of the delete rows.


We can check the size of the table and see the storage space needed is considerably smaller; about 682MB smaller.


Now let’s rerun our two queries and see if have any better performance.

SELECT COUNT(*) FROM dbo.FactInternetSales;
GO


Wow! With only a few hundred fewer logical IOs, SQL Server was able to return the result in only 424ms. Compared to our earlier run of over 12 seconds, that’s a huge improvement.

SELECT
   d1.SalesTerritoryRegion
  ,COUNT(*) AS InternetOrderQuantity
  ,ROUND(SUM(f1.SalesAmount),2) AS InternetSalesAmount
FROM dbo.FactInternetSales f1
INNER JOIN dbo.DimSalesTerritory d1
  ON f1.SalesTerritoryKey=d1.SalesTerritoryKey
GROUP BY d1.SalesTerritoryRegion
ORDER BY InternetSalesAmount DESC;
GO


Here we see similar performance gains. We have a few hundred fewer logical reads, and our runtime is down to just 905ms.

Clustered Columnstore indexes for SQL Server 2014 are huge win for any application that make use of them, but as you can see it does come at a cost if you routinely delete lots of rows and fail to monitor the deleted_rows value in sys.column_store_row_groups.

Check out Books Online for further reading about clustered Columnstore indexes.

Tuesday, July 29, 2014

How to Edit Read-Only Non-clustered Columnstore Data

As I've discussed in some of my previous posts, creating a non-clustered Columnstore index will make the index as well as the base table read-only. Which means you can’t insert, update, or delete any data until your drop the index. This may seem like a huge issue, but in reality it’s not that much of a problem. Keep in mind the Columnstore index feature is targeted at data warehouses that modify data infrequently. In the examples below, I go through two methods you can use to edit your read-only data.

To get started, we need to create a test table and insert a few rows.

USE master;
GO

CREATE DATABASE TEST;
GO

USE TEST;
GO

CREATE TABLE dbo.Table1
(
   col1 INT
  ,col2 VARCHAR(20)
);
GO

INSERT INTO dbo.Table1 (col1, col2) VALUES (1, 'Test Value 1');
INSERT INTO dbo.Table1 (col1, col2) VALUES (2, 'Test Value 2');
INSERT INTO dbo.Table1 (col1, col2) VALUES (3, 'Test Value 3');
INSERT INTO dbo.Table1 (col1, col2) VALUES (4, 'Test Value 4');
INSERT INTO dbo.Table1 (col1, col2) VALUES (5, 'Test Value 5');
INSERT INTO dbo.Table1 (col1, col2) VALUES (6, 'Test Value 6');
INSERT INTO dbo.Table1 (col1, col2) VALUES (7, 'Test Value 7');
INSERT INTO dbo.Table1 (col1, col2) VALUES (8, 'Test Value 8');
INSERT INTO dbo.Table1 (col1, col2) VALUES (9, 'Test Value 9');
GO

Next, we'll add a non-clustered Columnstore index to the table.

CREATE NONCLUSTERED COLUMNSTORE INDEX nci_Table1 
  ON dbo.Table1(col1,col2);
GO

At this point, we have effectively made this table read-only. We can read from it all day long, but if we attempt to update a value, we will get an error.

SELECT * FROM dbo.Table1;
GO


UPDATE dbo.Table1 SET col2 = 'changed a value' WHERE col1 = 1;
GO

Msg 35330, Level 15, State 1, Line 41 UPDATE statement failed because data cannot be updated in a table that has a nonclustered columnstore index. Consider disabling the columnstore index before issuing the UPDATE statement, and then rebuilding the columnstore index after UPDATE has completed.

Once again, I love Microsoft’s error message. It gives you very good information on one method to use for changing data in a non-clustered Columnstore index.

Disabling the non-clustered Columnstore index gives you the ability to make changes to the data; however, it can't be used for any queries while it's disabled. I like to wrap all the commands into a single transaction using the XACT_ABORT option. This guarantees that any error will rollback the entire set and not just one single statement.

SET XACT_ABORT ON;
GO

BEGIN TRANSACTION;
GO

ALTER INDEX nci_Table1 ON dbo.Table1 DISABLE;
GO

UPDATE Table1 SET col2 = 'changed a value' WHERE col1 = 1;
GO

ALTER INDEX nci_Table1 ON dbo.Table1 REBUILD;
GO

COMMIT TRANSACTION;
GO


Looking at the table again, we can see the first row was definitely changed.

SELECT * FROM dbo.Table1;
GO


This is probably the easiest method to change your data; however, it’s also the most resource intensive. SQL Server will need to rebuild the index for the entire table not just for that one row that we changed. So if your table has millions or even billions of rows, it could take a lot time to rebuild and utilize a lot of resources. This is probably something you don’t want to do in the middle of your business day.

The second method we’ll cover, involves using partition switching. First, we’ll create the same table but partition it into 3 parts.

USE TEST
GO

CREATE PARTITION FUNCTION myRangePF1 (INT) 
  AS RANGE LEFT FOR VALUES (3,6);
GO 

CREATE PARTITION SCHEME myRangePS1 
  AS PARTITION myRangePF1 
  ALL TO ([PRIMARY]);
GO 

CREATE TABLE dbo.PartitionedTable (col1 INT, col2 VARCHAR(20)) 
  ON myRangePS1 (col1);
GO

INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (1, 'Test Value 1');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (2, 'Test Value 2');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (3, 'Test Value 3');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (4, 'Test Value 4');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (5, 'Test Value 5');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (6, 'Test Value 6');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (7, 'Test Value 7');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (8, 'Test Value 8');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (9, 'Test Value 9');
GO

CREATE NONCLUSTERED COLUMNSTORE INDEX nci_PartitionedTable 
  ON dbo.PartitionedTable(col1,col2);
GO

SELECT * FROM dbo.ParitionedTable;
GO


This table has the same data as before, but internally it's partitioned. Using sys.partitions to get the details, you can see there are a total of 3 partitions, each with 3 rows.

SELECT
     SCHEMA_NAME(t.schema_id) AS SchemaName
    ,OBJECT_NAME(i.object_id) AS ObjectName
    ,p.partition_number AS PartitionNumber
    ,fg.name AS FilegroupName
    ,rows AS 'Rows'
    ,au.total_pages AS 'TotalDataPages'
    ,CASE boundary_value_on_right
        WHEN 1 THEN 'less than'
        ELSE 'less than or equal to'
     END AS 'Comparison'
    ,value AS 'ComparisonValue'
    ,p.data_compression_desc AS 'DataCompression'
    ,p.partition_id
FROM sys.partitions p
    JOIN sys.indexes i ON p.object_id = i.object_id AND p.index_id = i.index_id
    JOIN sys.partition_schemes ps ON ps.data_space_id = i.data_space_id
    JOIN sys.partition_functions f ON f.function_id = ps.function_id
    LEFT JOIN sys.partition_range_values rv ON f.function_id = rv.function_id AND p.partition_number = rv.boundary_id
    JOIN sys.destination_data_spaces dds ON dds.partition_scheme_id = ps.data_space_id AND dds.destination_id = p.partition_number
    JOIN sys.filegroups fg ON dds.data_space_id = fg.data_space_id
    JOIN (SELECT container_id, sum(total_pages) as total_pages
            FROM sys.allocation_units
            GROUP BY container_id) AS au ON au.container_id = p.partition_id 
    JOIN sys.tables t ON p.object_id = t.object_id
WHERE i.index_id < 2
ORDER BY ObjectName,p.partition_number;
GO


To use partition switching, you have to create a non-partitioned table that matches your partitioned table in every way; including all the indexes and constraints. For this example, we need to edit row 1 that resides in the first partition, so we need to create a non-partitioned table that has a constraint that mimics the first partition; col1 <= 3.

CREATE TABLE dbo.NonPartitionedTable
(
   col1 INT CHECK (col1 <= 3)
  ,col2 VARCHAR(20)
);
GO

CREATE NONCLUSTERED COLUMNSTORE INDEX nci_NonPartitionedTable
  ON dbo.NonPartitionedTable(col1,col2);
GO

Once we have the table created, we can perform the switch.

SET XACT_ABORT ON;
GO

BEGIN TRANSACTION;
GO

ALTER TABLE dbo.PartitionedTable SWITCH PARTITION 1 TO dbo.NonPartitionedTable; 
GO

ALTER INDEX nci_NonPartitionedTable ON dbo.NonPartitionedTable DISABLE;
GO

UPDATE NonPartitionedTable SET col2 = 'changed a value' WHERE col1 = 2;
GO

ALTER INDEX nci_NonPartitionedTable ON dbo.NonPartitionedTable REBUILD;
GO

ALTER TABLE dbo.NonPartitionedTable SWITCH TO dbo.PartitionedTable PARTITION 1; 
GO

COMMIT TRANSACTION;
GO

Finally, all we have to do is just switch that data back into the partitioned table.

SELECT * FROM dbo.ParitionedTable;
GO


I know this may look exactly like what we did in the first method, and it is. However, by having our initial table partitioned, it gives us the ability to only rebuild the non-clustered Columnstore index on smaller subset of data instead of the entire table. If this partitioned table had millions of rows, then that ALTER INDEX REBUILD command might only need to run against a fraction of those rows and therefore complete much quicker and utilize far fewer resources. Not only can both of these methods can be used to edit existing data in a table that has a non-clustered Columnstore index, but they can also be used to insert or delete rows.

On a final note, I will recommend that you always partition any table that has a Columnstore index. This feature is designed for very large tables, and having it partitioned gives you much more flexibility than if it's not. And not just for working with Columnstore indexes, but other tasks as well.

You can read more about Columnstore indexes and partition switching from Books Online.

Tuesday, July 8, 2014

Columnstore Table Analyzer

As I’ve discussed in some of my previous posts, there are quite a few data types that cannot be part of a Columstore index. While there are fewer restrictions in SQL Server 2014, they still exist. I find myself constantly looking back at Books Online trying to make sure data types in my tables don’t contain any of those restricted data types. It would be much easier to know from day one which tables I need to redesign, or at least which columns I need to exclude from a non-clustered Columnstore index. This is why I have created the following script.

-- Find columns in user tables that cannot be included in a columnstore index.
-- These restrictions apply to both clustered and non-clustered columnstore indexes.
-- SQL Server 2014: http://msdn.microsoft.com/en-us/library/gg492153(v=sql.120).aspx
-- SQL Server 2012: http://msdn.microsoft.com/en-us/library/gg492153(v=sql.110).aspx


-- Get the version number of SQL Server
DECLARE @ServerVersion TINYINT = CONVERT(INT,SUBSTRING(CONVERT(VARCHAR,SERVERPROPERTY('ProductVersion')),1,(CHARINDEX('.',(CONVERT(VARCHAR,SERVERPROPERTY('ProductVersion'))))-1)))

IF @ServerVersion = 11
-- This section is only for SQL Server 2012
BEGIN
  SELECT 
     s.name AS 'SchemaName'
    ,o.name AS 'TableName'
    ,c.name AS 'ColumnName'
    ,'ColumnType' = CASE t.name
      WHEN 'decimal' THEN t.name + '(' + CONVERT(VARCHAR,c.precision) + ',' + CONVERT(VARCHAR,c.scale) + ')'
      WHEN 'numeric' THEN t.name + '(' + CONVERT(VARCHAR,c.precision) + ',' + CONVERT(VARCHAR,c.scale) + ')'
      WHEN 'varchar' THEN
        CASE c.max_length
          WHEN -1 THEN 'varchar(max)'
          ELSE 'varchar(' + CONVERT(VARCHAR,c.max_length) + ')'
        END
      WHEN 'nvarchar' THEN 
        CASE c.max_length
          WHEN -1 THEN 'nvarchar(max)'
          ELSE 'nvarchar(' + CONVERT(VARCHAR,c.max_length) + ')'
        END
      WHEN 'datetimeoffset' THEN t.name + '(' + CONVERT(VARCHAR,c.scale) + ')'
      ELSE t.name
     END
    ,'ColumnAttribute' = CASE 
      WHEN (c.is_filestream = 1) THEN 'Filestream'
      WHEN (c.is_sparse = 1) THEN 'Sparse'
      ELSE ''
    END
  FROM sys.columns c 
  JOIN sys.objects o ON c.object_id = o.object_id
  JOIN sys.types t ON c.user_type_id = t.user_type_id
  JOIN sys.schemas s ON o.schema_id = s.schema_id
  WHERE o.is_ms_shipped <> 1
  -- These types cannot be part of a SQL Server 2012 columnstore index
  AND (
    t.name IN 
         ('binary'
      ,'varbinary'
      ,'ntext'
      ,'text'
      ,'image'
      ,'uniqueidentifier'
      ,'rowversion'
      ,'timestamp'
      ,'sql_variant'
      ,'hierarchyid'
      ,'geography'
      ,'geometry'
      ,'xml')
  OR (
      -- Decimal & numeric cannot have a precision over 18
      t.name IN ('decimal','numeric') 
      AND c.precision > 18)
  OR (
      -- Varchar(max) and nvarchar(max)
      t.name = 'datetimeoffset'
      AND c.scale > 2)
  OR (
      -- Varchar(max) and nvarchar(max)
      t.name IN ('varchar','nvarchar') 
      AND c.max_length = -1)
  OR (
      -- Filestream
      c.is_filestream = 1)
  OR (
      -- Sparse
      c.is_sparse = 1)
  )
  ORDER BY s.name,o.name,c.column_id
END
ELSE IF @ServerVersion = 12
-- This section is only for SQL Server 2014
BEGIN
  SELECT 
     s.name AS 'SchemaName'
    ,o.name AS 'TableName'
    ,c.name AS 'ColumnName'
    ,'ColumnType' = CASE t.name
      WHEN 'varchar' THEN
        CASE c.max_length
          WHEN -1 THEN 'varchar(max)'
          ELSE 'varchar(' + CONVERT(VARCHAR,c.max_length) + ')'
        END
      WHEN 'nvarchar' THEN 
        CASE c.max_length
          WHEN -1 THEN 'nvarchar(max)'
          ELSE 'nvarchar(' + CONVERT(VARCHAR,c.max_length) + ')'
        END
      ELSE t.name
     END
    ,'ColumnAttribute' = CASE 
      WHEN (c.is_filestream = 1) THEN 'Filestream'
      WHEN (c.is_sparse = 1) THEN 'Sparse'
      ELSE ''
    END
  FROM sys.columns c 
  JOIN sys.objects o ON c.object_id = o.object_id
  JOIN sys.types t ON c.user_type_id = t.user_type_id
  JOIN sys.schemas s ON o.schema_id = s.schema_id
  WHERE o.is_ms_shipped <> 1
  -- These types cannot be part of a SQL Server 2014 columnstore index
  AND (
    t.name IN 
      ('ntext'
      ,'text'
      ,'image'
      ,'rowversion'
      ,'timestamp'
      ,'sql_variant'
      ,'hierarchyid'
      ,'geography'
      ,'geometry'
      ,'xml')
  OR (
      -- Varchar(max) and nvarchar(max)
      t.name IN ('varchar','nvarchar') 
      AND c.max_length = -1)
  OR (
      -- Filestream
      c.is_filestream = 1)
  OR (
      -- Sparse
      c.is_sparse = 1)
  )
  ORDER BY s.name,o.name,c.column_id
END
ELSE
BEGIN
  RAISERROR ('This script only works on SQL Server 2012 and SQL Server 2014.',16,1);
END
GO

Running this against your database will output an organized list of tables along with the column name and data type that cannot be used within a Columnstore index.


The script can be used if you plan to create a clustered or non-clustered index, since the data type restrictions would apply to both. The script can also be used to analyze databases in either SQL Server 2012 or 2014.

You can read more about the limitations and restrictions of Columnstore indexes in Books Online.

Tuesday, June 3, 2014

Columnstore Memory Grant Issue

In a previous post about non-clustered columnstore indexes, I mentioned the creation of an index is a very memory intensive operation. Sometimes the memory grant needed exceeds what is currently available on your server. So what do you do about it?

SQL Server requires a minimal amount of memory in order to create a columnstore index. This can be calculated as Memory Grant Request in MB = ((4.2 * number of columns in the columnstore index) + 68) * Degree of Parallelism + (number of string columns * 34). If there is not enough physical memory available to create the columnstore index, SQL Server will throw an error.

The test server I’m using for the examples below has 2 CPUs and 4GB of memory.


The Max Degree of Parllelism is set to 0 and Max Server Memory is set to 4095MB.

EXEC sp_configure 'max degree of parallelism';
GO
EXEC sp_configure 'max server memory (MB)';
GO


I have enlarged a table, FactInternetSales, in the AdventureWorksDW2012 database using Kalen Delaney’s script. The table has 247 million rows and contains 26 columns; three of which are string columns. If we want to create a non-clustered columnstore index on the entire table, then using the formula above we could estimate a memory grant of 456MB would be needed to build the index.
To create the index we’ll use this query.

USE AdventureWorksDW2012;
GO
CREATE NONCLUSTERED COLUMNSTORE INDEX csi_FactInternetSalesBig ON dbo.FactInternetSalesBig
(
    ProductKey,
    OrderDateKey,
    DueDateKey,
    ShipDateKey,
    CustomerKey,
    PromotionKey,
    CurrencyKey,
    SalesTerritoryKey,
    SalesOrderNumber,
    SalesOrderLineNumber,
    RevisionNumber,
    OrderQuantity,
    UnitPrice,
    ExtendedAmount,
    UnitPriceDiscountPct,
    DiscountAmount,
    ProductStandardCost,
    TotalProductCost,
    SalesAmount,
    TaxAmt,
    Freight,
    CarrierTrackingNumber,
    CustomerPONumber,
    OrderDate,
    DueDate,
    ShipDate
);
GO

Once we execute that query, we can view the memory grant using this query.

SELECT
     dop
    ,requested_memory_kb
    ,required_memory_kb
    ,ideal_memory_kb
    ,granted_memory_kb
FROM sys.dm_exec_query_memory_grants
WHERE session_id = 55;
GO


The requested memory grant was actually 525MB, but it still wasn’t too far from our estimation. What would happen to the create index query if we change the max server memory to 1024MB? Let’s find out.

EXEC sp_configure 'max server memory (MB)',1024;
GO
RECONFIGURE;
GO

Oops, we got an error.

The statement has been terminated.
Msg 8658, Level 17, State 1, Line 1
Cannot start the columnstore index build because it requires at least 341424 KB, while the maximum memory grant is limited to 189696 KB per query in workload group 'default' (2) and resource pool 'default' (2). Retry after modifying columnstore index to contain fewer columns, or after increasing the maximum memory grant limit with Resource Governor.

This is telling us we need an absolute minimum of 341MB of memory to create the colunstore index. That seems weird since we have 1024MB of memory configured for the server. Well not really. The Resource Governor is always running in the background for all SQL Servers, and every query executes inside the default workgroup pool. That default workload group has a limit of granting no more than 25% of available memory to one single query. This can be verified by running the following query.

SELECT
     name
    ,request_max_memory_grant_percent
FROM sys.dm_resource_governor_workload_groups
WHERE name = 'default';
GO


There are a couple of workarounds to this problem.

First, we can try to reduce the max degree of parallelism when creating the index. Using the formula above, that would have an estimated memory grant of 279MB. That’s still more than our minimum allowable grant size, but let’s try it anyway. All we have to do is add the WITH (MAXDOP=1) hint to the end of the create index statement.

USE AdventureWorksDW2012;
GO
CREATE NONCLUSTERED COLUMNSTORE INDEX csi_FactInternetSalesBig ON dbo.FactInternetSalesBig
(
    ProductKey,
    OrderDateKey,
    DueDateKey,
    ShipDateKey,
    CustomerKey,
    PromotionKey,
    CurrencyKey,
    SalesTerritoryKey,
    SalesOrderNumber,
    SalesOrderLineNumber,
    RevisionNumber,
    OrderQuantity,
    UnitPrice,
    ExtendedAmount,
    UnitPriceDiscountPct,
    DiscountAmount,
    ProductStandardCost,
    TotalProductCost,
    SalesAmount,
    TaxAmt,
    Freight,
    CarrierTrackingNumber,
    CustomerPONumber,
    OrderDate,
    DueDate,
    ShipDate
) WITH (MAXDOP=1);
GO

Oops, we got the same error again.

The statement has been terminated.
Msg 8658, Level 17, State 1, Line 1
Cannot start the columnstore index build because it requires at least 341424 KB, while the maximum memory grant is limited to 189696 KB per query in workload group 'default' (2) and resource pool 'default' (2). Retry after modifying columnstore index to contain fewer columns, or after increasing the maximum memory grant limit with Resource Governor.

Now it’s on to our next workaround; changing the Resource Governor settings. As I mentioned above, the Resource Governor limits each query to have a memory grant of 25% of the total available. We can easily be changed by using the following query.

ALTER WORKLOAD GROUP [default] WITH (request_max_memory_grant_percent = 50);
GO
ALTER RESOURCE GOVERNOR RECONFIGURE;
GO

The setting change is dynamic, but let’s double check it.

SELECT
     name
    ,request_max_memory_grant_percent
FROM sys.dm_resource_governor_workload_groups
WHERE name = 'default';
GO


Now let’s try to create the columnstore index again.

Ahh success! We can check the memory grant while it’s running.

SELECT
     dop
    ,requested_memory_kb
    ,required_memory_kb
    ,ideal_memory_kb
    ,granted_memory_kb
FROM sys.dm_exec_query_memory_grants
WHERE session_id = 55;
GO


Once the columnstore index has been created, you should change the default workload group back to its default of 25. That way it doesn’t adversely affect other queries running on your server.

ALTER WORKLOAD GROUP [default] WITH (request_max_memory_grant_percent = 25);
GO
ALTER RESOURCE GOVERNOR RECONFIGURE;
GO

Another workaround is to exclude some of the 26 columns from our create index statement. Using the formula above, if we removed the last six columns of the table, we’d only need a 186MB memory grant. However, that may not be an option, because our user queries may need those columns to be part of the colunstore index to get the maximum performance for their queries.

Finally, as last workaround, you could add more memory to the server, but unless your server is VM that might be bit impossible for most situations.

For more info on columnstore indexes, check out the Columnstore Index FAQ on TechnetBooks Online, and my other blog posts.

Tuesday, May 20, 2014

Comparison of Columnstore Compression

SQL Server 2012 introduced non-clustered columnstore indexes, and SQL Server 2014 gave us clustered columnstore indexes. Both share the same technology for performance boosts, and they both share the same algorithms for compression. However, the compression will depend on the data you are storing.

SQL Server uses a mechanism of row groups, segments, encoding and compression to store the data for columnstore indexes. First the data is horizontally divided into row groups, where each group contains approximately 1 million rows. Next, the row groups are vertically partitioned into segments, where each column of the table is its own segment. Those segments are then encoded and compressed before being stored on disk. So if you create a colulmnstore index on a table that has five columns and 100,000 rows, it would result in one row group with five segments.

Compression of any type will definitely vary depending on the data that needs to be compressed. Columnstore indexes use RLE (run-length encoding) compression which works best when you have a lot of repetitive values. Just to see how this works, let’s compare the compression for data that is random to the compression for data that is highly redundant.

First, we’ll create the following tables that contain only a single column using the CHAR data type. Each table will store 1 million rows. One table will store random values for each row, and the other will store identical values for each row.

CREATE TABLE dbo.IdenticalCharacter(Col1 CHAR(50) NULL);
GO
CREATE TABLE dbo.RandomCharacter(Col1 CHAR(50) NULL);
GO

Now we'll populate the tables will artificial data.

DECLARE @Counter int = 0;
WHILE @Counter < 1000000
BEGIN
    INSERT INTO IdenticalCharacter VALUES (REPLICATE('a',50));
    SET @Counter = @Counter+1;
END
CHECKPOINT;
GO
DECLARE @Counter int = 0;
WHILE @Counter < 1000000
BEGIN
    INSERT INTO RandomCharacter
    SELECT CAST(NEWID() AS CHAR(50));
    SET @Counter = @Counter+1;
END
CHECKPOINT;
GO

Next, we’ll create a non-clustered columnstore index on each table.

CREATE NONCLUSTERED COLUMNSTORE INDEX csi1 ON dbo.IdentitcalCharacter(Col1);
GO
CREATE NONCLUSTERED COLUMNSTORE INDEX csi1 ON dbo.RandomCharacter(Col1);
GO

Now we can calculate the size of each columnstore index using the script below. This is a modified version of the script from Books Online.

SELECT
     SchemaName
    ,TableName
    ,IndexName
    ,SUM(on_disk_size_MB) AS TotalSizeInMB
FROM
(
    SELECT
        OBJECT_SCHEMA_NAME(i.OBJECT_ID) AS SchemaName,
        OBJECT_NAME(i.OBJECT_ID ) AS TableName,
        i.name AS IndexName
        ,SUM(css.on_disk_size)/(1024.0*1024.0) AS on_disk_size_MB
    FROM sys.indexes i
    INNER JOIN sys.partitions p
        ON i.object_id = p.object_id
    INNER JOIN sys.column_store_segments css
        ON css.hobt_id = p.hobt_id
    WHERE i.type_desc = 'NONCLUSTERED COLUMNSTORE'
        AND i.index_id NOT IN (0,1)
        AND p.index_id NOT IN (0,1)
    GROUP BY OBJECT_SCHEMA_NAME(i.OBJECT_ID) ,OBJECT_NAME(i.OBJECT_ID ),i.name

    UNION ALL

    SELECT
         OBJECT_SCHEMA_NAME(i.OBJECT_ID) AS SchemaName
        ,OBJECT_NAME(i.OBJECT_ID ) AS TableName
        ,i.name AS IndexName
        ,SUM(csd.on_disk_size)/(1024.0*1024.0) AS on_disk_size_MB
    FROM sys.indexes i
    INNER JOIN sys.partitions p
        ON i.object_id = p.object_id
    INNER JOIN sys.column_store_dictionaries csd
        ON csd.hobt_id = p.hobt_id
    WHERE i.type_desc = 'NONCLUSTERED COLUMNSTORE'
        AND i.index_id NOT IN (0,1)
        AND p.index_id NOT IN (0,1)
    GROUP BY OBJECT_SCHEMA_NAME(i.OBJECT_ID) ,OBJECT_NAME(i.OBJECT_ID ),i.name
) AS SegmentsPlusDictionary
GROUP BY SchemaName,TableName,IndexName
ORDER BY SchemaName,TableName,IndexName;
GO


As you can see, the random data values did not compress nearly as much. 30MB of the total 41MB was just for the dictionaries, while only 11MB was needed for the segments. When you graph the data in Excel, it makes it little bit easier to see the difference in size.


We can also compare a few other data types (integer and decimal), and we’ll see similar differences, but just not as drastic.

CREATE TABLE dbo.IdenticalInteger(Col1 INT NULL);
GO
CREATE TABLE dbo.RandomInteger(Col1 INT NULL);
GO
CREATE TABLE dbo.IdenticalDecimal(Col1 DECIMAL(18,8) NULL);
GO
CREATE TABLE dbo.RandomDecimal(Col1 DECIMAL(18,8) NULL);
GO
DECLARE @Counter int = 0;
WHILE @Counter < 1000000
BEGIN
    INSERT INTO IdenticalInteger VALUES (1);
    SET @Counter = @Counter+1;
END
CHECKPOINT;
GO
DECLARE @Counter int = 0;
WHILE @Counter < 1000000
BEGIN
    INSERT INTO RandomInteger VALUES (@Counter);
    SET @Counter = @Counter+1;
END
CHECKPOINT;
GO
DECLARE @Counter int = 0;
WHILE @Counter < 1000000
BEGIN
    INSERT INTO IdenticalDecimal VALUES (1.0);
    SET @Counter = @Counter+1;
END
CHECKPOINT;
GO
DECLARE @Counter int = 0;
WHILE @Counter < 1000000
BEGIN
    INSERT INTO RandomDecimal VALUES (CAST((RAND() * @Counter) AS DECIMAL(18,8)));
    SET @Counter = @Counter+1;
END
CHECKPOINT;
GO
CREATE NONCLUSTERED COLUMNSTORE INDEX csi1 ON dbo.IdenticalInteger(Col1);
GO
CREATE NONCLUSTERED COLUMNSTORE INDEX csi1 ON dbo.RandomInteger(Col1);
GO
CREATE NONCLUSTERED COLUMNSTORE INDEX csi1 ON dbo.IdenticalDecimal(Col1);
GO
CREATE NONCLUSTERED COLUMNSTORE INDEX csi1 ON dbo.RandomDecimal(Col1);
GO

Now run the same query from above to see the size difference of the indexes.


Here we see similar results as before; the highly redundant data gets compressed far more than the random data. Again, graphing the data makes it a little easier to visualize the difference.


The compression algorithm used for columnstore indexes is different from the row and page data compression originally released in SQL Server 2008; however, it’s interesting to compare the differences.

For this example, we’ll use actual data from a trading application. The fact table contains over 106 million rows, 26 columns which are mostly float data types, and is just over 21GB in size. Let’s compare the various types of compression: row, page, non-clustered columnstore index for SQL Server 2012, non-clustered columnstore index for SQL Server 2014, clustered columnstore index for SQL Server 2014, and lastly a clustered columnstore index with archival compression for SQL Server 2014.


The graph above shows row and page compression already have compressed the data significantly, but the various columnstore indexes compress it even further. There are a couple of things to keep in mind. First, the non-clustered columnstore indexes are a separate copy from the base table, so the 7199MB 2012 columnstore index would be in addition to the 21GB of raw data for a total of nearly 29GB. The two clustered columnstore indexes only need 3.4GB and 2.6GB to store the entire dataset. That’s a 6X and 8X compression difference.

You may wonder why the 2012 non-clustered columnstore index is larger (7199MB) than the just using ordinary row or page compression, but there really is nothing wrong. It’s just a good example showing the differences in the compression algorithms. Although this example doesn’t show it, there are some cases where the columnstore compressed data could be higher than the actual data.

As you can see, columnstore indexes have to potential to highly compress your data, but it really depends on the type of data you’re storing and redundancy of the data values. The only way to know for sure to is to test it with your own data.

For more info on columnstore indexes, check out the Columnstore Index FAQ on TechnetBooks Online, and my other blog posts.