Rolling a SQL Server partition function is no big deal.
First time I had to implement it, I found it complicated because I was frightened by the
SWITCH / SPLIT / MERGE / NEXT USED keywords I did not know, but as soon as I understood what was hidden behind, it became all clear.
Let's hope this post will clarify this amazing feature !
OK, just to make it clear from the beginning, I will not explain what is SQL Server partitioning... It has been widely covered on web and I have nothing to add !
Neither will I explain the best filegroup and partition strategies, which is a looong story and really depends on your specific context !
The Chuck Norris example
We need a working example. As usual when I feel lonely about finding inspiration, my friend Chuck Norris rescued me.
Suppose you are Chuck Norris and you want to store the name of all the people you saved life, and the date at which it happened. You will create an MSSQL table with 3 columns: date of intervention, first name, last name.
You save lot of lives (remember who you are), then the table will become huge in a few days. You have limited storage, the only choice you have is to keep only 5 days of data. Therefore, every night, you will run a job which will remove all the rows which are outdated.
They are many ways to implement this job:
- Simple answer is to run a DELETE query. Hum... good idea if your table has only few records. If your table is big, DELETE operation may be very slow, fill your transactional log, and finally fail.
- Another answer is to use a WHILE (do not mention cursor please) and DELETE the rows per block. Nice... depending of the indexes of your table, it may solve the "transactional log full" issue, but it will still be slow.
- Another way is to create several tables, one per day, and use a UNION ALL view to "merge" all the dates. In this design, deleting outdated rows means DROP the oldest table. Do not laugh please, this is very serious. It is called partitioned views in SQL Server 7.0 and 2000, and it was, in my opinion, the best option when table partitioning did not exist.
- Last option is to use a sliding window partitioning. If your design is perfect, partition rolling will be quick and will not overload your transactional log. Partitioning is good, but not in every case. I recommend you to read this post from Brent Ozar PLF which gives some alerts about partitioning usage (for instance you need the Enterprise version of MSSQL).
Rolling a sliding window partition
Overview
You can consider a sliding window partition as a wood box with a fixed number of slots you can use. But, in this wood box, there are two slots more you should never use : the
-infinite slot and the
+infinite slot (I will explain why later).
|
Sliding window partition : usable slots and forbidden slots |
The picture above describes the Chuck Norris table. Suppose we are on 2012/01/05, meaning we want to keep people saved between 2012/01/01 and 2012/01/05 (included). Table is partitioned in 7 slots :
- The slot which contains saved people between ages and 2012/01/01. It is red because it shall always be empty.
- The slot which contains saved people between 2012/01/01 and 2012/01/02. It is blue because it may contain data.
- The slot for people saved on 2012/01/02.
- The slot for people saved on 2012/01/03.
- The slot for people saved on 2012/01/04.
- The slot for people saved on 2012/01/05.
- The slot which contains saved people after 2012/01/06. Again, it is red because it is future and shall always be empty.
When you want to operate a roll on this box, you have to free the oldest slot (not the -infinite one, but the very next one in blue), then merge the it with the -infinite one (it is quick because both are empty), and finally split the +infinite one in two empty slots.
I guess this is pretty clear, and I swear the next sections are nothing more complicated than this !
Step 1 : Create the Chuck scheme, table, partition function and scheme
Here we will create the big partitioned table in which Chuck will store the guys he saved life. But, since this table is partitioned, we have to create the corresponding partition function and partition scheme.
The following SQL is a bit long but does nothing complicated. It creates the objects I just mentioned, and fill the Chuck table with a few rows. Additionally, I recommend to create constraints on partitioned table to make sure no data is inserted out-of-bounds. This could lead to big slowness's in partition roll.
-- Create a partition function to store data for 5 days
CREATE PARTITION FUNCTION ChuckPartitionFunction(DATE) AS RANGE RIGHT FOR VALUES
(
CAST('2012/01/01' AS DATE), CAST('2012/01/02' AS DATE),
CAST('2012/01/03' AS DATE), CAST('2012/01/04' AS DATE),
CAST('2012/01/05' AS DATE), CAST('2012/01/06' AS DATE)
)
GO
-- Create a partition scheme to map previous function on filegroups
CREATE PARTITION SCHEME ChuckPartitionScheme AS PARTITION ChuckPartitionFunction TO
(
[PRIMARY], [PRIMARY], [PRIMARY],
[PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY]
)
GO
CREATE SCHEMA chuck
GO
-- Notice the final "ON [ChuckPartitionScheme](SaveDate)" : table is partitioned
CREATE TABLE chuck.[SavedPeople]
(
SaveDate DATE,
FirstName VARCHAR(256),
LastName VARCHAR(256)
) ON ChuckPartitionScheme(SaveDate)
GO
-- Notice the final "ON [ChuckPartitionScheme](SaveDate)" :
-- clustered index is partitioned too
CREATE CLUSTERED INDEX PartitionedIndex ON chuck.SavedPeople
( SaveDate ASC )
ON ChuckPartitionScheme(SaveDate)
GO
-- Create a constraint to ensure no data in inserted out-of partitioned dates
ALTER TABLE chuck.SavedPeople
WITH NOCHECK ADD CONSTRAINT ChuckSavedPeople_PartitionDateCheck
CHECK ( SaveDate >= CAST('2012/01/01' AS DATE)
AND SaveDate < CAST('2012/01/06' AS DATE) )
GO
-- Insert a few rows in original table
INSERT INTO chuck.SavedPeople (SaveDate, FirstName, LastName)
VALUES
(CAST('2012/01/01' AS DATE), 'Attila', 'Le Hun'),
(CAST('2012/01/01' AS DATE), 'Napoleon', 'Bonaparte'),
(CAST('2012/01/02' AS DATE), 'Alexandre', 'Dumas'),
(CAST('2012/01/02' AS DATE), 'Emile', 'Zola'),
(CAST('2012/01/03' AS DATE), 'Edgar Allan', 'Poe'),
(CAST('2012/01/03' AS DATE), 'Francoise', 'Chandernagor'),
(CAST('2012/01/04' AS DATE), 'Honore', 'De Balzac'),
(CAST('2012/01/04' AS DATE), 'Robert', 'Merle'),
(CAST('2012/01/05' AS DATE), 'Marcel', 'Proust'),
(CAST('2012/01/05' AS DATE), 'Jean', 'Giono')
GO
-- Insert more and more rows in original table. This may be time consuming.
DECLARE @COUNTER INT = 0
WHILE @COUNTER < 18
BEGIN
INSERT INTO chuck.SavedPeople
SELECT * FROM chuck.SavedPeople
SET @COUNTER = @COUNTER + 1
END
-- Ensure the original data has enough rows
SELECT COUNT(*) FROM chuck.SavedPeople
Step 2 : Create a temporary box in which you will free the out-of-date slot
When you want to roll a partition, you have to switch the slot you want to free into a new table. This later table must have the exact same structure (including clustered index) as the table you want to roll, else SQL engine will not be happy. Furthermore, this new table may be partitioned the same way the original table is, or may not be... the only restriction is that it must reside in the same filegroup (and this table must be empty). In my example it is easy because I am using
PRIMARY (bad bad bad !) Hence you cannot use a temp table because it would be created in tempdb database, which is not the same filegroup !
-- Create a table with the exact same structure as the original one
CREATE TABLE chuck.SavedPeople_TobeDropped
(
SaveDate DATE,
FirstName VARCHAR(256),
LastName VARCHAR(256)
)
GO
-- Clustered index has to be created too !!!
CREATE CLUSTERED INDEX PartitionedIndex ON chuck.SavedPeople_TobeDropped
( SaveDate ASC )
Step 3 : Free the out-of-date slot
Once new table is created, you have to use
SWITCH keyword to move data from slot of original table (purple one) to new table (green one). After this operation, your original table does not contain anymore the data you wanted to free, they have been instantly moved to the new table. You just need to
DROP this new table to get rid of the data. No big
DELETE need.
|
To switch a partition with lot of data into an empty table is efficient |
-- Move data from slot 2 of original table to the new table
ALTER TABLE chuck.SavedPeople SWITCH PARTITION 2 TO chuck.SavedPeople_TobeDropped
GO
-- Ensure that data have been moved
select COUNT(*) from chuck.SavedPeople -- All data except 2012/01/01
select COUNT(*) from chuck.SavedPeople_TobeDropped -- Only 2012/01/01
-- Destroy the temporary table in which we did operate the switch
DROP TABLE chuck.SavedPeople_TobeDropped
GO
select COUNT(*) from chuck.SavedPeople -- All data except 2012/01/01
Step 4 : Merge and split partition function
Until now, we just destroyed the data we did not want to store anymore. We did not slide (or roll) the window partition. This is done using the keywords
MERGE and
SPLIT (with
NEXT USED).
|
Split and merge of partitions must be performed on EMPTY slots |
MERGE deletes a bound from partition function, which is equivalent to regroup 2 slots into 1 single slot. This operation is efficient only if the slots you try to merge are empty, else it may consume time and space (transactional log). That is why I said that the first slot must always be empty (remember the constraint we added on Chuck table).
SPLIT creates 2 slots from 1 single slot (which must be empty). But, you will ask : where is the new partition created ? Good question ! Before execution
SPLIT statement, you shall execute
NEXT USED statement to tell SQL engine which filegroup to use on the new slot. Note that
NEXT USED statement has to be called for every partition scheme using the partition function, since the filegroups are attached to scheme, not to functions. In the Chuck example, this is not important because there is only one partition scheme.
-- Merge the 2 first slots which should be empty
ALTER PARTITION FUNCTION ChuckPartitionFunction()
MERGE RANGE (CAST('2012/01/01' AS DATE))
-- Prepare the SPLIT operation by telling the SQL
-- engine we will create a new slot in PRIMARY filegroup
ALTER PARTITION SCHEME ChuckPartitionScheme NEXT USED [PRIMARY];
-- Split the last slot which should be empty :
-- create a new bound in partition function
ALTER PARTITION FUNCTION ChuckPartitionFunction()
SPLIT RANGE (CAST('2012/01/07' AS DATE))
Step 5 : Update the constraints
It is almost over ! One last thing before leaving : we shall update the constraint we defined on Chuck table. Currently, it is checking that any data in the table has a SaveDate between 1st of January and 5th of January. Now, since we rolled the partition, it should be between 2nd of January and 6th of January.
-- Update the previous constraint on dates
ALTER TABLE chuck.SavedPeople
DROP CONSTRAINT ChuckSavedPeople_PartitionDateCheck
ALTER TABLE chuck.SavedPeople WITH NOCHECK
ADD CONSTRAINT ChuckSavedPeople_PartitionDateCheck
CHECK ( SaveDate >= CAST('2012/01/02' AS DATE)
AND SaveDate < CAST('2012/01/07' AS DATE) )
Before running in production...
Wait a minute before releasing this in production :)
First of all, I put
GO everywhere, which is not a good idea except for demo purpose.
Then, I highly recommend to put the steps 2 to 5 in a transaction, because if something goes wrong during these steps I do not guarantee stability of your partition functions and table. The step 1 is not part the partition roll (it may be an ETL process which feeds the big table), that is why I do not include it in transaction.
What is more, even though the partition roll is simple enough, it may keep some surprises, so you should stress test your code before running it in a sensitive environment. I encountered following issues which took some time to solve:
- locking issue : perform the roll while tables are being fed is a bad idea.
- bulk insert may disable constraints on table while insertion, meaning that data may be inserted in an out-of date slot ! This is dramatic because SPLIT and MERGE need empty slots in order to be efficient !
- if you have several partition scheme based on the same partition function, the above code becomes a bit more complex.
Finally, you noticed I hard-coded dates in the SQL script. Once again, this is only for demo purpose. In real production context you will have to dynamically compute the new dates (or bounds, more generally).
It is possible to write a facility (in C# for instance) using SMO and system tables to generate the roll script with the proper bounds every day (or week or whatever). But, this is very context-dependent and is hard to generalize... so my opinion is to understand the partition rolling and have fun writing your own one !
References
Partitioning within MSSQL 2005
http://msdn.microsoft.com/en-us/library/ms345146%28v=sql.90%29.aspx
CREATE PARTITION FUNCTION by MSDN
http://msdn.microsoft.com/en-us/library/ms187802%28v=sql.100%29.aspx
ALTER TABLE by MSDN
http://msdn.microsoft.com/en-us/library/ms190273%28v=sql.100%29.aspx
ALTER PARTITION by MSDN
http://msdn.microsoft.com/en-us/library/ms186307(v=sql.100).aspx
SQL Server Partitioning: Not the Best Practices for Everything
http://www.brentozar.com/archive/2008/06/sql-server-partitioning-not-the-answer-to-everything/