Monday, February 13, 2012

Display long SQL text within Management Studio

Did you ever try to display long varchar's in Management Studio query result ? If you use the default option "Display result to grids" (as I do), it is a pain because:
  1. if the text is very long, it may be truncated in the grid
  2. if your text contains line-feeds, they are not displayed in the grid
My tip for this issue is to convert your text into XML, so that when you click on the grid cell containing your text, it is well displayed.

The expression to convert into XML is the following. Pay attention to the special characters that must be escaped before casting to XML type !
CAST(REPLACE(REPLACE(YourVarcharColumn, '&', '&amp;'), '<', '&lt;') AS XML)

Let's take an example:
/* ******************* */
/* CREATE SAMPLE TABLE */
/* ******************* */

CREATE TABLE OneTable ( Id INT IDENTITY(1, 1), LogDate DATETIME, TextColumn VARCHAR(4000) )


/* ******************************** */
/* FILL SAMPLE TABLE WITH SOME DATA */
/* ******************************** */

INSERT INTO OneTable ( LogDate, TextColumn ) VALUES
(GETDATE(), 'Mignonne, allons voir si la rose
Qui ce matin avoit desclose
Sa robe de pourpre au Soleil,
A point perdu ceste vesprée
Les plis de sa robe pourprée,
Et son teint au vostre pareil.')

INSERT INTO OneTable ( LogDate, TextColumn ) VALUES
(GETDATE(), 'Las ! voyez comme en peu d''espace,
Mignonne, elle a dessus la place
Las ! las ses beautez laissé cheoir !
Ô vrayment marastre Nature,
Puis qu''une telle fleur ne dure
Que du matin jusques au soir !')

INSERT INTO OneTable ( LogDate, TextColumn ) VALUES
(GETDATE(), 'Donc, si vous me croyez, mignonne,
Tandis que vostre âge fleuronne
En sa plus verte nouveauté,
Cueillez, cueillez vostre jeunesse :
Comme à ceste fleur la vieillesse
Fera ternir vostre beauté.')


/* ************************* */
/* SELECT DATA THE USUAL WAY */
/* ************************* */

SELECT Id, LogDate, TextColumn
FROM OneTable


/* **************************** */
/* SELECT DATA WITH CAST AS XML */
/* **************************** */

SELECT Id, LogDate, CAST(REPLACE(REPLACE(TextColumn, '&', '&amp;'), '<', '&lt;') AS XML) AS TextColumn
FROM OneTable


/* ************************* */
/* CLEAN UP THE SAMPLE STUFF */
/* ************************* */

DROP TABLE OneTable

The first select, using varchar type, will return something like that:
Even if you can resize the column, admit this is not very user-friendly.
The second select, using XML cast, will return something like that:
Well... you will tell me this not more sexy... But if you click on the XML cells, it will display the exact text (except escape characters):

If you need to use it frequently, I recommend to create a scalar-valued function to do the job, which avoids you to type the boring replace/replace/cast expression !
CREATE FUNCTION CastVarcharToXml ( @Input VARCHAR(MAX) )
RETURNS XML
AS
BEGIN
 RETURN CAST(REPLACE(REPLACE(@Input, '&', '&amp;'), '<', '&lt;') AS XML)
END
Clearly, the limitations are :
  1. you have to click on every text you want to full-display out-of the grid
  2. the characters you had to escape (< and &) will be printed as escaped characters
Anyway, this might help you if you log long texts in your db, like .Net exceptions with call stack !

Friday, February 10, 2012

Rolling a SQL Server partition made easy

Rolling a SQL Server partition function is no big deal.
First time I had to implement it, I found it complicated because I was frightened by the SWITCH / SPLIT / MERGE / NEXT USED keywords I did not know, but as soon as I understood what was hidden behind, it became all clear.
Let's hope this post will clarify this amazing feature !

OK, just to make it clear from the beginning, I will not explain what is SQL Server partitioning... It has been widely covered on web and I have nothing to add !
Neither will I explain the best filegroup and partition strategies, which is a looong story and really depends on your specific context !

The Chuck Norris example

We need a working example. As usual when I feel lonely about finding inspiration, my friend Chuck Norris rescued me.
Suppose you are Chuck Norris and you want to store the name of all the people you saved life, and the date at which it happened. You will create an MSSQL table with 3 columns: date of intervention, first name, last name.
You save lot of lives (remember who you are), then the table will become huge in a few days. You have limited storage, the only choice you have is to keep only 5 days of data. Therefore, every night, you will run a job which will remove all the rows which are outdated.

They are many ways to implement this job:
  • Simple answer is to run a DELETE query. Hum... good idea if your table has only few records. If your table is big, DELETE operation may be very slow, fill your transactional log, and finally fail.
  • Another answer is to use a WHILE (do not mention cursor please) and DELETE the rows per block. Nice... depending of the indexes of your table, it may solve the "transactional log full" issue, but it will still be slow.
  • Another way is to create several tables, one per day, and use a UNION ALL view to "merge" all the dates. In this design, deleting outdated rows means DROP the oldest table. Do not laugh please, this is very serious. It is called partitioned views in SQL Server 7.0 and 2000, and it was, in my opinion, the best option when table partitioning did not exist.
  • Last option is to use a sliding window partitioning. If your design is perfect, partition rolling will be quick and will not overload your transactional log. Partitioning is good, but not in every case. I recommend you to read this post from Brent Ozar PLF which gives some alerts about partitioning usage (for instance you need the Enterprise version of MSSQL).

Rolling a sliding window partition

Overview

You can consider a sliding window partition as a wood box with a fixed number of slots you can use. But, in this wood box, there are two slots more you should never use : the -infinite slot and the +infinite slot (I will explain why later).
Sliding window partition : usable slots and forbidden slots
The picture above describes the Chuck Norris table. Suppose we are on 2012/01/05, meaning we want to keep people saved between 2012/01/01 and 2012/01/05 (included). Table is partitioned in 7 slots :
  1. The slot which contains saved people between ages and 2012/01/01. It is red because it shall always be empty.
  2. The slot which contains saved people between 2012/01/01 and 2012/01/02. It is blue because it may contain data.
  3. The slot for people saved on 2012/01/02.
  4. The slot for people saved on 2012/01/03.
  5. The slot for people saved on 2012/01/04.
  6. The slot for people saved on 2012/01/05.
  7. The slot which contains saved people after 2012/01/06. Again, it is red because it is future and shall always be empty.
When you want to operate a roll on this box, you have to free the oldest slot (not the -infinite one, but the very next one in blue), then merge the it with the -infinite one (it is quick because both are empty), and finally split the +infinite one in two empty slots.
I guess this is pretty clear, and I swear the next sections are nothing more complicated than this !

Step 1 : Create the Chuck scheme, table, partition function and scheme

Here we will create the big partitioned table in which Chuck will store the guys he saved life. But, since this table is partitioned, we have to create the corresponding partition function and partition scheme.
The following SQL is a bit long but does nothing complicated. It creates the objects I just mentioned, and fill the Chuck table with a few rows. Additionally, I recommend to create constraints on partitioned table to make sure no data is inserted out-of-bounds. This could lead to big slowness's in partition roll.

-- Create a partition function to store data for 5 days
CREATE PARTITION FUNCTION ChuckPartitionFunction(DATE) AS RANGE RIGHT FOR VALUES
(
    CAST('2012/01/01' AS DATE), CAST('2012/01/02' AS DATE),
    CAST('2012/01/03' AS DATE), CAST('2012/01/04' AS DATE),
    CAST('2012/01/05' AS DATE), CAST('2012/01/06' AS DATE)
)

GO

-- Create a partition scheme to map previous function on filegroups
CREATE PARTITION SCHEME ChuckPartitionScheme AS PARTITION ChuckPartitionFunction TO
(
    [PRIMARY], [PRIMARY], [PRIMARY],
    [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY]
)

GO

CREATE SCHEMA chuck

GO

-- Notice the final "ON [ChuckPartitionScheme](SaveDate)" : table is partitioned
CREATE TABLE chuck.[SavedPeople]
(
    SaveDate DATE,
    FirstName VARCHAR(256),
    LastName VARCHAR(256)
) ON ChuckPartitionScheme(SaveDate)

GO

-- Notice the final "ON [ChuckPartitionScheme](SaveDate)" :
-- clustered index is partitioned too
CREATE CLUSTERED INDEX PartitionedIndex ON chuck.SavedPeople
( SaveDate ASC )
ON ChuckPartitionScheme(SaveDate)

GO


-- Create a constraint to ensure no data in inserted out-of partitioned dates
ALTER TABLE chuck.SavedPeople
WITH NOCHECK ADD CONSTRAINT ChuckSavedPeople_PartitionDateCheck
    CHECK ( SaveDate >= CAST('2012/01/01' AS DATE)
    AND SaveDate < CAST('2012/01/06' AS DATE) )

GO

-- Insert a few rows in original table
INSERT INTO chuck.SavedPeople (SaveDate, FirstName, LastName)
VALUES
(CAST('2012/01/01' AS DATE), 'Attila', 'Le Hun'),
(CAST('2012/01/01' AS DATE), 'Napoleon', 'Bonaparte'),
(CAST('2012/01/02' AS DATE), 'Alexandre', 'Dumas'),
(CAST('2012/01/02' AS DATE), 'Emile', 'Zola'),
(CAST('2012/01/03' AS DATE), 'Edgar Allan', 'Poe'),
(CAST('2012/01/03' AS DATE), 'Francoise', 'Chandernagor'),
(CAST('2012/01/04' AS DATE), 'Honore', 'De Balzac'),
(CAST('2012/01/04' AS DATE), 'Robert', 'Merle'),
(CAST('2012/01/05' AS DATE), 'Marcel', 'Proust'),
(CAST('2012/01/05' AS DATE), 'Jean', 'Giono')

GO

-- Insert more and more rows in original table. This may be time consuming.
DECLARE @COUNTER INT = 0
WHILE @COUNTER < 18
BEGIN
 INSERT INTO chuck.SavedPeople
 SELECT * FROM chuck.SavedPeople
 SET @COUNTER = @COUNTER + 1
END

-- Ensure the original data has enough rows
SELECT COUNT(*) FROM chuck.SavedPeople

Step 2 : Create a temporary box in which you will free the out-of-date slot

When you want to roll a partition, you have to switch the slot you want to free into a new table. This later table must have the exact same structure (including clustered index) as the table you want to roll, else SQL engine will not be happy. Furthermore, this new table may be partitioned the same way the original table is, or may not be... the only restriction is that it must reside in the same filegroup (and this table must be empty). In my example it is easy because I am using PRIMARY (bad bad bad !) Hence you cannot use a temp table because it would be created in tempdb database, which is not the same filegroup !

-- Create a table with the exact same structure as the original one
CREATE TABLE chuck.SavedPeople_TobeDropped
(
    SaveDate DATE,
    FirstName VARCHAR(256),
    LastName VARCHAR(256)
)

GO

-- Clustered index has to be created too !!!
CREATE CLUSTERED INDEX PartitionedIndex ON chuck.SavedPeople_TobeDropped
( SaveDate ASC )

Step 3 : Free the out-of-date slot

Once new table is created, you have to use SWITCH keyword to move data from slot of original table (purple one) to new table (green one). After this operation, your original table does not contain anymore the data you wanted to free, they have been instantly moved to the new table. You just need to DROP this new table to get rid of the data. No big DELETE need.
To switch a partition with lot of data into an empty table is efficient
-- Move data from slot 2 of original table to the new table
ALTER TABLE chuck.SavedPeople SWITCH PARTITION 2 TO chuck.SavedPeople_TobeDropped

GO

-- Ensure that data have been moved
select COUNT(*) from chuck.SavedPeople -- All data except 2012/01/01
select COUNT(*) from chuck.SavedPeople_TobeDropped -- Only 2012/01/01

-- Destroy the temporary table in which we did operate the switch
DROP TABLE chuck.SavedPeople_TobeDropped

GO

select COUNT(*) from chuck.SavedPeople -- All data except 2012/01/01

Step 4 : Merge and split partition function

Until now, we just destroyed the data we did not want to store anymore. We did not slide (or roll) the window partition. This is done using the keywords MERGE and SPLIT (with NEXT USED).
Split and merge of partitions must be performed on EMPTY slots
MERGE deletes a bound from partition function, which is equivalent to regroup 2 slots into 1 single slot. This operation is efficient only if the slots you try to merge are empty, else it may consume time and space (transactional log). That is why I said that the first slot must always be empty (remember the constraint we added on Chuck table).

SPLIT creates 2 slots from 1 single slot (which must be empty). But, you will ask : where is the new partition created ? Good question ! Before execution SPLIT statement, you shall execute NEXT USED statement to tell SQL engine which filegroup to use on the new slot. Note that NEXT USED statement has to be called for every partition scheme using the partition function, since the filegroups are attached to scheme, not to functions. In the Chuck example, this is not important because there is only one partition scheme.

-- Merge the 2 first slots which should be empty
ALTER PARTITION FUNCTION ChuckPartitionFunction()
    MERGE RANGE (CAST('2012/01/01' AS DATE))

-- Prepare the SPLIT operation by telling the SQL
-- engine we will create a new slot in PRIMARY filegroup
ALTER PARTITION SCHEME ChuckPartitionScheme NEXT USED [PRIMARY];

-- Split the last slot which should be empty :
-- create a new bound in partition function
ALTER PARTITION FUNCTION ChuckPartitionFunction()
    SPLIT RANGE (CAST('2012/01/07' AS DATE))

Step 5 : Update the constraints

It is almost over ! One last thing before leaving : we shall update the constraint we defined on Chuck table. Currently, it is checking that any data in the table has a SaveDate between 1st of January and 5th of January. Now, since we rolled the partition, it should be between 2nd of January and 6th of January.

-- Update the previous constraint on dates
ALTER TABLE chuck.SavedPeople
    DROP CONSTRAINT ChuckSavedPeople_PartitionDateCheck

ALTER TABLE chuck.SavedPeople WITH NOCHECK
    ADD CONSTRAINT ChuckSavedPeople_PartitionDateCheck
    CHECK  ( SaveDate >= CAST('2012/01/02' AS DATE)
    AND SaveDate < CAST('2012/01/07' AS DATE) )

Before running in production...

Wait a minute before releasing this in production :)
First of all, I put GO everywhere, which is not a good idea except for demo purpose.
Then, I highly recommend to put the steps 2 to 5 in a transaction, because if something goes wrong during these steps I do not guarantee stability of your partition functions and table. The step 1 is not part the partition roll (it may be an ETL process which feeds the big table), that is why I do not include it in transaction.
What is more, even though the partition roll is simple enough, it may keep some surprises, so you should stress test your code before running it in a sensitive environment. I encountered following issues which took some time to solve:
  • locking issue : perform the roll while tables are being fed is a bad idea.
  • bulk insert may disable constraints on table while insertion, meaning that data may be inserted in an out-of date slot ! This is dramatic because SPLIT and MERGE need empty slots in order to be efficient !
  • if you have several partition scheme based on the same partition function, the above code becomes a bit more complex.
Finally, you noticed I hard-coded dates in the SQL script. Once again, this is only for demo purpose. In real production context you will have to dynamically compute the new dates (or bounds, more generally).
It is possible to write a facility (in C# for instance) using SMO and system tables to generate the roll script with the proper bounds every day (or week or whatever). But, this is very context-dependent and is hard to generalize... so my opinion is to understand the partition rolling and have fun writing your own one !

References

Partitioning within MSSQL 2005
http://msdn.microsoft.com/en-us/library/ms345146%28v=sql.90%29.aspx

CREATE PARTITION FUNCTION by MSDN
http://msdn.microsoft.com/en-us/library/ms187802%28v=sql.100%29.aspx

ALTER TABLE by MSDN
http://msdn.microsoft.com/en-us/library/ms190273%28v=sql.100%29.aspx

ALTER PARTITION by MSDN
http://msdn.microsoft.com/en-us/library/ms186307(v=sql.100).aspx

SQL Server Partitioning: Not the Best Practices for Everything
http://www.brentozar.com/archive/2008/06/sql-server-partitioning-not-the-answer-to-everything/