First time I had to implement it, I found it complicated because I was frightened by the SWITCH / SPLIT / MERGE / NEXT USED keywords I did not know, but as soon as I understood what was hidden behind, it became all clear.
Let's hope this post will clarify this amazing feature !
OK, just to make it clear from the beginning, I will not explain what is SQL Server partitioning... It has been widely covered on web and I have nothing to add !
Neither will I explain the best filegroup and partition strategies, which is a looong story and really depends on your specific context !
The Chuck Norris example
We need a working example. As usual when I feel lonely about finding inspiration, my friend Chuck Norris rescued me.Suppose you are Chuck Norris and you want to store the name of all the people you saved life, and the date at which it happened. You will create an MSSQL table with 3 columns: date of intervention, first name, last name.
You save lot of lives (remember who you are), then the table will become huge in a few days. You have limited storage, the only choice you have is to keep only 5 days of data. Therefore, every night, you will run a job which will remove all the rows which are outdated.
They are many ways to implement this job:
- Simple answer is to run a DELETE query. Hum... good idea if your table has only few records. If your table is big, DELETE operation may be very slow, fill your transactional log, and finally fail.
- Another answer is to use a WHILE (do not mention cursor please) and DELETE the rows per block. Nice... depending of the indexes of your table, it may solve the "transactional log full" issue, but it will still be slow.
- Another way is to create several tables, one per day, and use a UNION ALL view to "merge" all the dates. In this design, deleting outdated rows means DROP the oldest table. Do not laugh please, this is very serious. It is called partitioned views in SQL Server 7.0 and 2000, and it was, in my opinion, the best option when table partitioning did not exist.
- Last option is to use a sliding window partitioning. If your design is perfect, partition rolling will be quick and will not overload your transactional log. Partitioning is good, but not in every case. I recommend you to read this post from Brent Ozar PLF which gives some alerts about partitioning usage (for instance you need the Enterprise version of MSSQL).
Rolling a sliding window partition
Overview
You can consider a sliding window partition as a wood box with a fixed number of slots you can use. But, in this wood box, there are two slots more you should never use : the -infinite slot and the +infinite slot (I will explain why later).Sliding window partition : usable slots and forbidden slots |
- The slot which contains saved people between ages and 2012/01/01. It is red because it shall always be empty.
- The slot which contains saved people between 2012/01/01 and 2012/01/02. It is blue because it may contain data.
- The slot for people saved on 2012/01/02.
- The slot for people saved on 2012/01/03.
- The slot for people saved on 2012/01/04.
- The slot for people saved on 2012/01/05.
- The slot which contains saved people after 2012/01/06. Again, it is red because it is future and shall always be empty.
I guess this is pretty clear, and I swear the next sections are nothing more complicated than this !
Step 1 : Create the Chuck scheme, table, partition function and scheme
Here we will create the big partitioned table in which Chuck will store the guys he saved life. But, since this table is partitioned, we have to create the corresponding partition function and partition scheme.The following SQL is a bit long but does nothing complicated. It creates the objects I just mentioned, and fill the Chuck table with a few rows. Additionally, I recommend to create constraints on partitioned table to make sure no data is inserted out-of-bounds. This could lead to big slowness's in partition roll.
-- Create a partition function to store data for 5 days CREATE PARTITION FUNCTION ChuckPartitionFunction(DATE) AS RANGE RIGHT FOR VALUES ( CAST('2012/01/01' AS DATE), CAST('2012/01/02' AS DATE), CAST('2012/01/03' AS DATE), CAST('2012/01/04' AS DATE), CAST('2012/01/05' AS DATE), CAST('2012/01/06' AS DATE) ) GO -- Create a partition scheme to map previous function on filegroups CREATE PARTITION SCHEME ChuckPartitionScheme AS PARTITION ChuckPartitionFunction TO ( [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY] ) GO CREATE SCHEMA chuck GO -- Notice the final "ON [ChuckPartitionScheme](SaveDate)" : table is partitioned CREATE TABLE chuck.[SavedPeople] ( SaveDate DATE, FirstName VARCHAR(256), LastName VARCHAR(256) ) ON ChuckPartitionScheme(SaveDate) GO -- Notice the final "ON [ChuckPartitionScheme](SaveDate)" : -- clustered index is partitioned too CREATE CLUSTERED INDEX PartitionedIndex ON chuck.SavedPeople ( SaveDate ASC ) ON ChuckPartitionScheme(SaveDate) GO -- Create a constraint to ensure no data in inserted out-of partitioned dates ALTER TABLE chuck.SavedPeople WITH NOCHECK ADD CONSTRAINT ChuckSavedPeople_PartitionDateCheck CHECK ( SaveDate >= CAST('2012/01/01' AS DATE) AND SaveDate < CAST('2012/01/06' AS DATE) ) GO -- Insert a few rows in original table INSERT INTO chuck.SavedPeople (SaveDate, FirstName, LastName) VALUES (CAST('2012/01/01' AS DATE), 'Attila', 'Le Hun'), (CAST('2012/01/01' AS DATE), 'Napoleon', 'Bonaparte'), (CAST('2012/01/02' AS DATE), 'Alexandre', 'Dumas'), (CAST('2012/01/02' AS DATE), 'Emile', 'Zola'), (CAST('2012/01/03' AS DATE), 'Edgar Allan', 'Poe'), (CAST('2012/01/03' AS DATE), 'Francoise', 'Chandernagor'), (CAST('2012/01/04' AS DATE), 'Honore', 'De Balzac'), (CAST('2012/01/04' AS DATE), 'Robert', 'Merle'), (CAST('2012/01/05' AS DATE), 'Marcel', 'Proust'), (CAST('2012/01/05' AS DATE), 'Jean', 'Giono') GO -- Insert more and more rows in original table. This may be time consuming. DECLARE @COUNTER INT = 0 WHILE @COUNTER < 18 BEGIN INSERT INTO chuck.SavedPeople SELECT * FROM chuck.SavedPeople SET @COUNTER = @COUNTER + 1 END -- Ensure the original data has enough rows SELECT COUNT(*) FROM chuck.SavedPeople
Step 2 : Create a temporary box in which you will free the out-of-date slot
When you want to roll a partition, you have to switch the slot you want to free into a new table. This later table must have the exact same structure (including clustered index) as the table you want to roll, else SQL engine will not be happy. Furthermore, this new table may be partitioned the same way the original table is, or may not be... the only restriction is that it must reside in the same filegroup (and this table must be empty). In my example it is easy because I am using PRIMARY (bad bad bad !) Hence you cannot use a temp table because it would be created in tempdb database, which is not the same filegroup !-- Create a table with the exact same structure as the original one CREATE TABLE chuck.SavedPeople_TobeDropped ( SaveDate DATE, FirstName VARCHAR(256), LastName VARCHAR(256) ) GO -- Clustered index has to be created too !!! CREATE CLUSTERED INDEX PartitionedIndex ON chuck.SavedPeople_TobeDropped ( SaveDate ASC )
Step 3 : Free the out-of-date slot
Once new table is created, you have to use SWITCH keyword to move data from slot of original table (purple one) to new table (green one). After this operation, your original table does not contain anymore the data you wanted to free, they have been instantly moved to the new table. You just need to DROP this new table to get rid of the data. No big DELETE need.To switch a partition with lot of data into an empty table is efficient |
-- Move data from slot 2 of original table to the new table ALTER TABLE chuck.SavedPeople SWITCH PARTITION 2 TO chuck.SavedPeople_TobeDropped GO -- Ensure that data have been moved select COUNT(*) from chuck.SavedPeople -- All data except 2012/01/01 select COUNT(*) from chuck.SavedPeople_TobeDropped -- Only 2012/01/01 -- Destroy the temporary table in which we did operate the switch DROP TABLE chuck.SavedPeople_TobeDropped GO select COUNT(*) from chuck.SavedPeople -- All data except 2012/01/01
Step 4 : Merge and split partition function
Until now, we just destroyed the data we did not want to store anymore. We did not slide (or roll) the window partition. This is done using the keywords MERGE and SPLIT (with NEXT USED).Split and merge of partitions must be performed on EMPTY slots |
SPLIT creates 2 slots from 1 single slot (which must be empty). But, you will ask : where is the new partition created ? Good question ! Before execution SPLIT statement, you shall execute NEXT USED statement to tell SQL engine which filegroup to use on the new slot. Note that NEXT USED statement has to be called for every partition scheme using the partition function, since the filegroups are attached to scheme, not to functions. In the Chuck example, this is not important because there is only one partition scheme.
-- Merge the 2 first slots which should be empty ALTER PARTITION FUNCTION ChuckPartitionFunction() MERGE RANGE (CAST('2012/01/01' AS DATE)) -- Prepare the SPLIT operation by telling the SQL -- engine we will create a new slot in PRIMARY filegroup ALTER PARTITION SCHEME ChuckPartitionScheme NEXT USED [PRIMARY]; -- Split the last slot which should be empty : -- create a new bound in partition function ALTER PARTITION FUNCTION ChuckPartitionFunction() SPLIT RANGE (CAST('2012/01/07' AS DATE))
Step 5 : Update the constraints
It is almost over ! One last thing before leaving : we shall update the constraint we defined on Chuck table. Currently, it is checking that any data in the table has a SaveDate between 1st of January and 5th of January. Now, since we rolled the partition, it should be between 2nd of January and 6th of January.-- Update the previous constraint on dates ALTER TABLE chuck.SavedPeople DROP CONSTRAINT ChuckSavedPeople_PartitionDateCheck ALTER TABLE chuck.SavedPeople WITH NOCHECK ADD CONSTRAINT ChuckSavedPeople_PartitionDateCheck CHECK ( SaveDate >= CAST('2012/01/02' AS DATE) AND SaveDate < CAST('2012/01/07' AS DATE) )
Before running in production...
Wait a minute before releasing this in production :)First of all, I put GO everywhere, which is not a good idea except for demo purpose.
Then, I highly recommend to put the steps 2 to 5 in a transaction, because if something goes wrong during these steps I do not guarantee stability of your partition functions and table. The step 1 is not part the partition roll (it may be an ETL process which feeds the big table), that is why I do not include it in transaction.
What is more, even though the partition roll is simple enough, it may keep some surprises, so you should stress test your code before running it in a sensitive environment. I encountered following issues which took some time to solve:
- locking issue : perform the roll while tables are being fed is a bad idea.
- bulk insert may disable constraints on table while insertion, meaning that data may be inserted in an out-of date slot ! This is dramatic because SPLIT and MERGE need empty slots in order to be efficient !
- if you have several partition scheme based on the same partition function, the above code becomes a bit more complex.
Finally, you noticed I hard-coded dates in the SQL script. Once again, this is only for demo purpose. In real production context you will have to dynamically compute the new dates (or bounds, more generally).
It is possible to write a facility (in C# for instance) using SMO and system tables to generate the roll script with the proper bounds every day (or week or whatever). But, this is very context-dependent and is hard to generalize... so my opinion is to understand the partition rolling and have fun writing your own one !
References
Partitioning within MSSQL 2005http://msdn.microsoft.com/en-us/library/ms345146%28v=sql.90%29.aspx
CREATE PARTITION FUNCTION by MSDN
http://msdn.microsoft.com/en-us/library/ms187802%28v=sql.100%29.aspx
ALTER TABLE by MSDN
http://msdn.microsoft.com/en-us/library/ms190273%28v=sql.100%29.aspx
ALTER PARTITION by MSDN
http://msdn.microsoft.com/en-us/library/ms186307(v=sql.100).aspx
SQL Server Partitioning: Not the Best Practices for Everything
http://www.brentozar.com/archive/2008/06/sql-server-partitioning-not-the-answer-to-everything/
HI,
ReplyDeleteIn Production we are using Last Partition (+Infinite Partition) and there is no partition on Right side.
We are trying to Add Another on right Side for March 2022 but Splitting taking Lock on that blocking insert.
Can you help How Can split Range Without taking Lock on table ?