Monday, February 13, 2012

Display long SQL text within Management Studio

Did you ever try to display long varchar's in Management Studio query result ? If you use the default option "Display result to grids" (as I do), it is a pain because:
  1. if the text is very long, it may be truncated in the grid
  2. if your text contains line-feeds, they are not displayed in the grid
My tip for this issue is to convert your text into XML, so that when you click on the grid cell containing your text, it is well displayed.

The expression to convert into XML is the following. Pay attention to the special characters that must be escaped before casting to XML type !
CAST(REPLACE(REPLACE(YourVarcharColumn, '&', '&amp;'), '<', '&lt;') AS XML)

Let's take an example:
/* ******************* */
/* CREATE SAMPLE TABLE */
/* ******************* */

CREATE TABLE OneTable ( Id INT IDENTITY(1, 1), LogDate DATETIME, TextColumn VARCHAR(4000) )


/* ******************************** */
/* FILL SAMPLE TABLE WITH SOME DATA */
/* ******************************** */

INSERT INTO OneTable ( LogDate, TextColumn ) VALUES
(GETDATE(), 'Mignonne, allons voir si la rose
Qui ce matin avoit desclose
Sa robe de pourpre au Soleil,
A point perdu ceste vesprée
Les plis de sa robe pourprée,
Et son teint au vostre pareil.')

INSERT INTO OneTable ( LogDate, TextColumn ) VALUES
(GETDATE(), 'Las ! voyez comme en peu d''espace,
Mignonne, elle a dessus la place
Las ! las ses beautez laissé cheoir !
Ô vrayment marastre Nature,
Puis qu''une telle fleur ne dure
Que du matin jusques au soir !')

INSERT INTO OneTable ( LogDate, TextColumn ) VALUES
(GETDATE(), 'Donc, si vous me croyez, mignonne,
Tandis que vostre âge fleuronne
En sa plus verte nouveauté,
Cueillez, cueillez vostre jeunesse :
Comme à ceste fleur la vieillesse
Fera ternir vostre beauté.')


/* ************************* */
/* SELECT DATA THE USUAL WAY */
/* ************************* */

SELECT Id, LogDate, TextColumn
FROM OneTable


/* **************************** */
/* SELECT DATA WITH CAST AS XML */
/* **************************** */

SELECT Id, LogDate, CAST(REPLACE(REPLACE(TextColumn, '&', '&amp;'), '<', '&lt;') AS XML) AS TextColumn
FROM OneTable


/* ************************* */
/* CLEAN UP THE SAMPLE STUFF */
/* ************************* */

DROP TABLE OneTable

The first select, using varchar type, will return something like that:
Even if you can resize the column, admit this is not very user-friendly.
The second select, using XML cast, will return something like that:
Well... you will tell me this not more sexy... But if you click on the XML cells, it will display the exact text (except escape characters):

If you need to use it frequently, I recommend to create a scalar-valued function to do the job, which avoids you to type the boring replace/replace/cast expression !
CREATE FUNCTION CastVarcharToXml ( @Input VARCHAR(MAX) )
RETURNS XML
AS
BEGIN
 RETURN CAST(REPLACE(REPLACE(@Input, '&', '&amp;'), '<', '&lt;') AS XML)
END
Clearly, the limitations are :
  1. you have to click on every text you want to full-display out-of the grid
  2. the characters you had to escape (< and &) will be printed as escaped characters
Anyway, this might help you if you log long texts in your db, like .Net exceptions with call stack !

Friday, February 10, 2012

Rolling a SQL Server partition made easy

Rolling a SQL Server partition function is no big deal.
First time I had to implement it, I found it complicated because I was frightened by the SWITCH / SPLIT / MERGE / NEXT USED keywords I did not know, but as soon as I understood what was hidden behind, it became all clear.
Let's hope this post will clarify this amazing feature !

OK, just to make it clear from the beginning, I will not explain what is SQL Server partitioning... It has been widely covered on web and I have nothing to add !
Neither will I explain the best filegroup and partition strategies, which is a looong story and really depends on your specific context !

The Chuck Norris example

We need a working example. As usual when I feel lonely about finding inspiration, my friend Chuck Norris rescued me.
Suppose you are Chuck Norris and you want to store the name of all the people you saved life, and the date at which it happened. You will create an MSSQL table with 3 columns: date of intervention, first name, last name.
You save lot of lives (remember who you are), then the table will become huge in a few days. You have limited storage, the only choice you have is to keep only 5 days of data. Therefore, every night, you will run a job which will remove all the rows which are outdated.

They are many ways to implement this job:
  • Simple answer is to run a DELETE query. Hum... good idea if your table has only few records. If your table is big, DELETE operation may be very slow, fill your transactional log, and finally fail.
  • Another answer is to use a WHILE (do not mention cursor please) and DELETE the rows per block. Nice... depending of the indexes of your table, it may solve the "transactional log full" issue, but it will still be slow.
  • Another way is to create several tables, one per day, and use a UNION ALL view to "merge" all the dates. In this design, deleting outdated rows means DROP the oldest table. Do not laugh please, this is very serious. It is called partitioned views in SQL Server 7.0 and 2000, and it was, in my opinion, the best option when table partitioning did not exist.
  • Last option is to use a sliding window partitioning. If your design is perfect, partition rolling will be quick and will not overload your transactional log. Partitioning is good, but not in every case. I recommend you to read this post from Brent Ozar PLF which gives some alerts about partitioning usage (for instance you need the Enterprise version of MSSQL).

Rolling a sliding window partition

Overview

You can consider a sliding window partition as a wood box with a fixed number of slots you can use. But, in this wood box, there are two slots more you should never use : the -infinite slot and the +infinite slot (I will explain why later).
Sliding window partition : usable slots and forbidden slots
The picture above describes the Chuck Norris table. Suppose we are on 2012/01/05, meaning we want to keep people saved between 2012/01/01 and 2012/01/05 (included). Table is partitioned in 7 slots :
  1. The slot which contains saved people between ages and 2012/01/01. It is red because it shall always be empty.
  2. The slot which contains saved people between 2012/01/01 and 2012/01/02. It is blue because it may contain data.
  3. The slot for people saved on 2012/01/02.
  4. The slot for people saved on 2012/01/03.
  5. The slot for people saved on 2012/01/04.
  6. The slot for people saved on 2012/01/05.
  7. The slot which contains saved people after 2012/01/06. Again, it is red because it is future and shall always be empty.
When you want to operate a roll on this box, you have to free the oldest slot (not the -infinite one, but the very next one in blue), then merge the it with the -infinite one (it is quick because both are empty), and finally split the +infinite one in two empty slots.
I guess this is pretty clear, and I swear the next sections are nothing more complicated than this !

Step 1 : Create the Chuck scheme, table, partition function and scheme

Here we will create the big partitioned table in which Chuck will store the guys he saved life. But, since this table is partitioned, we have to create the corresponding partition function and partition scheme.
The following SQL is a bit long but does nothing complicated. It creates the objects I just mentioned, and fill the Chuck table with a few rows. Additionally, I recommend to create constraints on partitioned table to make sure no data is inserted out-of-bounds. This could lead to big slowness's in partition roll.

-- Create a partition function to store data for 5 days
CREATE PARTITION FUNCTION ChuckPartitionFunction(DATE) AS RANGE RIGHT FOR VALUES
(
    CAST('2012/01/01' AS DATE), CAST('2012/01/02' AS DATE),
    CAST('2012/01/03' AS DATE), CAST('2012/01/04' AS DATE),
    CAST('2012/01/05' AS DATE), CAST('2012/01/06' AS DATE)
)

GO

-- Create a partition scheme to map previous function on filegroups
CREATE PARTITION SCHEME ChuckPartitionScheme AS PARTITION ChuckPartitionFunction TO
(
    [PRIMARY], [PRIMARY], [PRIMARY],
    [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY]
)

GO

CREATE SCHEMA chuck

GO

-- Notice the final "ON [ChuckPartitionScheme](SaveDate)" : table is partitioned
CREATE TABLE chuck.[SavedPeople]
(
    SaveDate DATE,
    FirstName VARCHAR(256),
    LastName VARCHAR(256)
) ON ChuckPartitionScheme(SaveDate)

GO

-- Notice the final "ON [ChuckPartitionScheme](SaveDate)" :
-- clustered index is partitioned too
CREATE CLUSTERED INDEX PartitionedIndex ON chuck.SavedPeople
( SaveDate ASC )
ON ChuckPartitionScheme(SaveDate)

GO


-- Create a constraint to ensure no data in inserted out-of partitioned dates
ALTER TABLE chuck.SavedPeople
WITH NOCHECK ADD CONSTRAINT ChuckSavedPeople_PartitionDateCheck
    CHECK ( SaveDate >= CAST('2012/01/01' AS DATE)
    AND SaveDate < CAST('2012/01/06' AS DATE) )

GO

-- Insert a few rows in original table
INSERT INTO chuck.SavedPeople (SaveDate, FirstName, LastName)
VALUES
(CAST('2012/01/01' AS DATE), 'Attila', 'Le Hun'),
(CAST('2012/01/01' AS DATE), 'Napoleon', 'Bonaparte'),
(CAST('2012/01/02' AS DATE), 'Alexandre', 'Dumas'),
(CAST('2012/01/02' AS DATE), 'Emile', 'Zola'),
(CAST('2012/01/03' AS DATE), 'Edgar Allan', 'Poe'),
(CAST('2012/01/03' AS DATE), 'Francoise', 'Chandernagor'),
(CAST('2012/01/04' AS DATE), 'Honore', 'De Balzac'),
(CAST('2012/01/04' AS DATE), 'Robert', 'Merle'),
(CAST('2012/01/05' AS DATE), 'Marcel', 'Proust'),
(CAST('2012/01/05' AS DATE), 'Jean', 'Giono')

GO

-- Insert more and more rows in original table. This may be time consuming.
DECLARE @COUNTER INT = 0
WHILE @COUNTER < 18
BEGIN
 INSERT INTO chuck.SavedPeople
 SELECT * FROM chuck.SavedPeople
 SET @COUNTER = @COUNTER + 1
END

-- Ensure the original data has enough rows
SELECT COUNT(*) FROM chuck.SavedPeople

Step 2 : Create a temporary box in which you will free the out-of-date slot

When you want to roll a partition, you have to switch the slot you want to free into a new table. This later table must have the exact same structure (including clustered index) as the table you want to roll, else SQL engine will not be happy. Furthermore, this new table may be partitioned the same way the original table is, or may not be... the only restriction is that it must reside in the same filegroup (and this table must be empty). In my example it is easy because I am using PRIMARY (bad bad bad !) Hence you cannot use a temp table because it would be created in tempdb database, which is not the same filegroup !

-- Create a table with the exact same structure as the original one
CREATE TABLE chuck.SavedPeople_TobeDropped
(
    SaveDate DATE,
    FirstName VARCHAR(256),
    LastName VARCHAR(256)
)

GO

-- Clustered index has to be created too !!!
CREATE CLUSTERED INDEX PartitionedIndex ON chuck.SavedPeople_TobeDropped
( SaveDate ASC )

Step 3 : Free the out-of-date slot

Once new table is created, you have to use SWITCH keyword to move data from slot of original table (purple one) to new table (green one). After this operation, your original table does not contain anymore the data you wanted to free, they have been instantly moved to the new table. You just need to DROP this new table to get rid of the data. No big DELETE need.
To switch a partition with lot of data into an empty table is efficient
-- Move data from slot 2 of original table to the new table
ALTER TABLE chuck.SavedPeople SWITCH PARTITION 2 TO chuck.SavedPeople_TobeDropped

GO

-- Ensure that data have been moved
select COUNT(*) from chuck.SavedPeople -- All data except 2012/01/01
select COUNT(*) from chuck.SavedPeople_TobeDropped -- Only 2012/01/01

-- Destroy the temporary table in which we did operate the switch
DROP TABLE chuck.SavedPeople_TobeDropped

GO

select COUNT(*) from chuck.SavedPeople -- All data except 2012/01/01

Step 4 : Merge and split partition function

Until now, we just destroyed the data we did not want to store anymore. We did not slide (or roll) the window partition. This is done using the keywords MERGE and SPLIT (with NEXT USED).
Split and merge of partitions must be performed on EMPTY slots
MERGE deletes a bound from partition function, which is equivalent to regroup 2 slots into 1 single slot. This operation is efficient only if the slots you try to merge are empty, else it may consume time and space (transactional log). That is why I said that the first slot must always be empty (remember the constraint we added on Chuck table).

SPLIT creates 2 slots from 1 single slot (which must be empty). But, you will ask : where is the new partition created ? Good question ! Before execution SPLIT statement, you shall execute NEXT USED statement to tell SQL engine which filegroup to use on the new slot. Note that NEXT USED statement has to be called for every partition scheme using the partition function, since the filegroups are attached to scheme, not to functions. In the Chuck example, this is not important because there is only one partition scheme.

-- Merge the 2 first slots which should be empty
ALTER PARTITION FUNCTION ChuckPartitionFunction()
    MERGE RANGE (CAST('2012/01/01' AS DATE))

-- Prepare the SPLIT operation by telling the SQL
-- engine we will create a new slot in PRIMARY filegroup
ALTER PARTITION SCHEME ChuckPartitionScheme NEXT USED [PRIMARY];

-- Split the last slot which should be empty :
-- create a new bound in partition function
ALTER PARTITION FUNCTION ChuckPartitionFunction()
    SPLIT RANGE (CAST('2012/01/07' AS DATE))

Step 5 : Update the constraints

It is almost over ! One last thing before leaving : we shall update the constraint we defined on Chuck table. Currently, it is checking that any data in the table has a SaveDate between 1st of January and 5th of January. Now, since we rolled the partition, it should be between 2nd of January and 6th of January.

-- Update the previous constraint on dates
ALTER TABLE chuck.SavedPeople
    DROP CONSTRAINT ChuckSavedPeople_PartitionDateCheck

ALTER TABLE chuck.SavedPeople WITH NOCHECK
    ADD CONSTRAINT ChuckSavedPeople_PartitionDateCheck
    CHECK  ( SaveDate >= CAST('2012/01/02' AS DATE)
    AND SaveDate < CAST('2012/01/07' AS DATE) )

Before running in production...

Wait a minute before releasing this in production :)
First of all, I put GO everywhere, which is not a good idea except for demo purpose.
Then, I highly recommend to put the steps 2 to 5 in a transaction, because if something goes wrong during these steps I do not guarantee stability of your partition functions and table. The step 1 is not part the partition roll (it may be an ETL process which feeds the big table), that is why I do not include it in transaction.
What is more, even though the partition roll is simple enough, it may keep some surprises, so you should stress test your code before running it in a sensitive environment. I encountered following issues which took some time to solve:
  • locking issue : perform the roll while tables are being fed is a bad idea.
  • bulk insert may disable constraints on table while insertion, meaning that data may be inserted in an out-of date slot ! This is dramatic because SPLIT and MERGE need empty slots in order to be efficient !
  • if you have several partition scheme based on the same partition function, the above code becomes a bit more complex.
Finally, you noticed I hard-coded dates in the SQL script. Once again, this is only for demo purpose. In real production context you will have to dynamically compute the new dates (or bounds, more generally).
It is possible to write a facility (in C# for instance) using SMO and system tables to generate the roll script with the proper bounds every day (or week or whatever). But, this is very context-dependent and is hard to generalize... so my opinion is to understand the partition rolling and have fun writing your own one !

References

Partitioning within MSSQL 2005
http://msdn.microsoft.com/en-us/library/ms345146%28v=sql.90%29.aspx

CREATE PARTITION FUNCTION by MSDN
http://msdn.microsoft.com/en-us/library/ms187802%28v=sql.100%29.aspx

ALTER TABLE by MSDN
http://msdn.microsoft.com/en-us/library/ms190273%28v=sql.100%29.aspx

ALTER PARTITION by MSDN
http://msdn.microsoft.com/en-us/library/ms186307(v=sql.100).aspx

SQL Server Partitioning: Not the Best Practices for Everything
http://www.brentozar.com/archive/2008/06/sql-server-partitioning-not-the-answer-to-everything/

Sunday, January 22, 2012

Get size of every partition of a SSAS cube

Today, I had to check the sizes of every partitions of an OLAP cube, to ensure size of partitions is homogeneous. I did not find how to do that using a simple xmla query.So I decided to write a small console application in C# (VS2008) to perform it. The quickest way is to use the Management Object facility provided by Microsoft: it is a API which allows to connect to the online cube and browse its structure, read data, alter cube structure, launch process... anything you want !


Create project and reference SSAS DLL

First of all, create a project in Visual Studio (console application for instance).
Then, you have to add a reference in the project to the SSAS management object dll. Could find why Microsoft did it this way, but this dll has a strange component name. Where you expect something like Microsoft.AnalysisServices.xxx, you have to find Analysis Management Objects. So be careful when you link this assembly in Visual Studio.
Link the Analysis Services Management Object assembly to your project

Remember I am a bit lazy ? Yes, I cannot bear to write a long namespace several times... No problem we will use an alias.
using AS = Microsoft.AnalysisServices;



Browse the cube and get partitions size

Now, declare a small structure which will contains the information we need about a partition.
public struct PartitionInfo
{
    public string CubeName;
    public string MeasureGroup;
    public string PartitionName;
    public long EstimatedSize;
}

Here comes the interesting point. We will write the method which lists all the partitions and their size for a given SSAS database. The parameters are a ConnectionString (which does not need Initial Catalog, since it is next parameter), and the name of database you want to connect to.
The method performs the following actions:
  1. Connect to AS server using the command line provided as first parameter (line 4 and 6)
  2. Open the database which matches the name provided as second parameter (line 7)
  3. Browse every cube available in the database (line 8)
  4. Browse every measure group available in the current cube (line 10)
  5. Browse every partition available in the current measure group (line 12)
  6. For each of these partitions, save its definition (name, parent measure group, parent cube) and its size (lines 14 to 19). Even though property name is EstimatedSize, in my experience its value was always quite correct.
  7. Return the list of partitions we found to the caller (line 24)
public static IList<partitioninfo> GetListOfPartitions(string ConnectionString, string DatabaseName)
{
    List<partitioninfo> LPI = new List<partitioninfo>();
    using (AS.Server Server = new AS.Server())
    {
        Server.Connect(ConnectionString);
        AS.Database Database = Server.Databases.FindByName(DatabaseName);
        foreach (AS.Cube Cube in Database.Cubes)
        {
            foreach (AS.MeasureGroup MG in Cube.MeasureGroups)
            {
                foreach (AS.Partition P in MG.Partitions)
                {
                    PartitionInfo PI = new PartitionInfo();
                    PI.CubeName = Cube.Name;
                    PI.MeasureGroup = MG.Name;
                    PI.PartitionName = P.Name;
                    PI.EstimatedSize = P.EstimatedSize;
                    LPI.Add(PI);
                }
            }
        }
    }
    return LPI;
}

Pay special attention if you are debugging this code with JIT debugger, I noticed it does not work fine : the Database object does not support to be watched by debugger, this could lead your SSAS objects to be unusable during process execution. For instance I could not find any cube in my database in debug mode, but it works fine in normal execution mode.


Save your data to CSV file

To finish the job, it may be useful to save the loaded data into a CSV file. In my case. I have to check that all partitions have homogeneous sizes, and regroup some small partitions together, so it is of great help to load the results in Excel !
public static void SaveToCsv(IList<partitioninfo> LPI, string Filename)
{
    using (System.IO.StreamWriter SW = new System.IO.StreamWriter(Filename))
    {
        SW.WriteLine("CubeName,MeasureGroup,PartitionName,EstimatedSizeInBytes");
        foreach (PartitionInfo PI in LPI)
        {
            string Row = string.Format("{0},{1},{2},{3}", PI.CubeName, PI.MeasureGroup, PI.PartitionName, PI.EstimatedSize);
            SW.WriteLine(Row);
        }
    }
}


Friday, January 13, 2012

How to deploy Reporting Services reports in batch mode

Have you ever tried to bulk-deploy reporting services 2008 files ? After some Internet searches, you may find the following ways :
  • Using the report manager directly on http page on the report server. Well documented here.
    Works fine if you have a very few reports to deploy on a very few SSRS instances.
  • Using the rs.exe tool (should be here C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn). Documented on MSDN. For some reason this tool does not work on my company desktops (got an authentication error). Works fine if you does not need to alter the datasource of the report at deployment-time.
  • But if your need is a bit more specific (like changing the datasource when you deploy your report, or quickly deploy many reports on many servers/folders), you will end up with custom code which uses Reporting Services web service. You may find plenty of solutions on the web which use this web service to deploy reports, but I found out this interesting one which points to the essential of the web service usage.

What does the tool do

In my specific context, there is a dozen of different reports, each report being deployed on a dozen of folders. Each folder has its own datasource, so I need to update the datasource of each report depending on the folder it is being deployed. And finally, the deployment has to be fully automated because it has to be included in a software factory process.

Folder structure once deployed on Reporting Services instance
I suggest a small console application which performs the following operations:
  1. Read on command line the list of report files (rdl) to be deployed, the target SSRS url, the connection string to use in datasource, the target SSRS folder, and the friendly name of the report once deployed.
  2. Open the web service located on SSRS url (from command line).
  3. Through web service, ensure the target folder exists, else create it.
  4. Through web service, create a dedicated datasource inside of the target folder, pointing to the connection string provided in command line.
  5. For each report to be deployed, use web service to deploy it, and then alter the datasource it is using.
  6. Displays an error on standard console output and returns a non-zero value if something goes wrong.

Command line parsing and usage (step 1)

As usual, I adapted the code from Ray Hayes, which is very easy to use.

The allowed options are:
  • /H /Help => Optional, displays a usage message and returns
  • /S /WebServiceUrl => Mandatory, the URL used to deploy the SSRS reports. For instance: http://localhost/ReportServer/ReportService2005.asmx
  • /C /ConnectionString => Mandatory, the connection string used by SSRS reports to connect to the SQL database, without credentials info. For instance: Data Source=YourSqlHostName\YourSqlInstance;Initial Catalog=YourDb;
  • /L /LoginSql => Mandatory, the login used by SSRS reports to connect to the SQL database. It cannot be part of the previous connection string, must be set by /L.
  • /P /PasswordSql => Mandatory, the password to the previous login.
  • /F /TargetFolder => Optional, the SSRS folder in which to deploy the reports. If not supplied, reports will be created in root folder.
  • File1, File2, ... => List of report files to be deployed. If a directory is provided instead of file, all the *.rdl file in it will be deployed, ignoring backup files and sub-directories.
Usually I create one cmd file for each folder I want to update.
SET DEPLOYER="ReportingServicesDeployer.exe"
SET WSDL_URL="http://_YourSSRSServer_/ReportServer/ReportService2005.asmx"
SET RS_FOLDER="TopFolder/SubFolder"
SET DS_CS="Data Source=_YourSqlHostname_;Initial Catalog=_YourSqlDb_;"
SET DS_USER="_YourSqlLogin_"
SET DS_PWD="_YourSqlPassword_"
SET CONTENT_DIR="C:\YourReportSolution\YourReportProject"

%DEPLOYER% /S %WSDL_URL% /F %RS_FOLDER% /C %DS_CS% /L %DS_USER% /P %DS_PWD% %CONTENT_DIR%

Dynamic web service reference to ReportService2005.asmx (step 2)

One of the funny part of this tool is to connect to a web service URL provided on the command line. There are some tips to know in Visual Studio 2008 to achieve this without headache. This is clearly explained here.

At first try, I used the following url: http://localhost/reportserver/reportservice.asmx?wsdl But it did not work at all, so I tried http://localhost/reportserver/reportservice2005.asmx?wsdl and it worked fine (even though my RS version is 2008). Note that http://localhost/reportserver/reportservice2008.asmx?wsdl does not exist. Could not find a logical explanation about this...

The hot point is to provide the URL of the SSRS interface at runtime. In the VS project, I named the RS web service ReportingService2005. Then I extended this class with the following specific constructor which allows to instantiate the webservice with an URL at runtime:
public partial class ReportingService2005 : System.Web.Services.Protocols.SoapHttpClientProtocol
{
    public ReportingService2005(string Url)
    {
        this.Url = Url;
        if ((this.IsLocalFileSystemWebService(this.Url) == true))
        {
            this.UseDefaultCredentials = true;
            this.useDefaultCredentialsSetExplicitly = false;
        }
        else
        {
            this.useDefaultCredentialsSetExplicitly = true;
        }
    }
}

RS folder creation (step 3)

Reporting Services handles a folder system, like standard file system: the root folder is called Home, you can create several folders in a parent one, but each parent has only one parent. When you deploy a report (or a datasource) you have to choose the folder in which it will be done.
Folders in Reporting Services

The following snippets creates a folders tree in the Reporting Services file system. It iterates starting from top-level folder down to leaf folder, and for each level it creates the desired folder if it does not already exist.
public static void CreateFolder(ReportingService2005 rs, string Folder)
{
    string CurrentFolder = "/";

    if (!string.IsNullOrEmpty(Folder))
    {
        foreach (string NextFolder in Folder.Split(new char[] { '/' }))
            if (!string.IsNullOrEmpty(NextFolder))
            {
                if (rs.GetItemType(string.Concat(CurrentFolder.TrimEnd('/'), "/", NextFolder)) != ItemTypeEnum.Folder)
                {
                    rs.CreateFolder(NextFolder, CurrentFolder, null);
                }
                CurrentFolder = string.Concat(CurrentFolder.TrimEnd('/'), "/", NextFolder);
            }
    }
}

RS data source creation (step 4)

First thing to know about data sources in reporting services: they are elements you can deploy on a server, it works the same way you deploy reports. A data source file is a stand alone element which contains the full connection string to the sql database you are pointing to (server name, instance name, database, login, password, etc.). When you create a data source file, you can choose the folder in which it will be created.
Data sources are used by reports to get the connection string they have to use to fetch data. The idea is that several reports may use the same connection string, so that it is better to store the connection string only once, instead of duplicating it in every reports.

In this tool, all the reports to be deployed will be altered to point to an unique data source. This tool does not support reports having several datasources.
The following code creates a datasource in a given folder. This version allows only sql-login, but it is few modifications to handle another authentication mode.
public static void CreateDataSource(ReportingService2005 rs, string Folder, string DatasourceName, string ConnectionString, string Login, string Password)
{
    DataSourceDefinition dsd = new DataSourceDefinition();
    dsd.ConnectString = ConnectionString;
    dsd.Enabled = true;
    dsd.Extension = "SQL";
    dsd.CredentialRetrieval = CredentialRetrievalEnum.Store;
    dsd.UserName = Login;
    dsd.Password = Password;
    rs.CreateDataSource(DatasourceName, Folder, true, dsd, null);
}

RS report deployment (step 5)

Deployment of a report is handled by one single method. So if you have 5 five reports to deploy, you have to call this method 5 times. It assumes the web service instance (first parameter of the method) has been previously initialized (step 2).
Even if there is nothing complicated in it, the method is a bit long, so we will cut it in four parts and explain them separately. As you can notice, I took heavy inspiration from Dan Sniderman's blog.

First part is about reading the report file (rdl extension) from filesystem and storing it in a in-memory buffer for later use. Yes, in-memory storing is evil... but I assume you do not make report which are bigger than 100 MB :)
public static void DeployOneReport(ReportingService2005 rs, string rdlPath, string parentFolder, string reportName, string DatasourceName)
{
    Byte[] definition = null;
    Warning[] warns = null;

    //read the file as a filestream into a byte array
    FileStream stream = File.OpenRead(rdlPath);
    definition = new Byte[stream.Length];
    stream.Read(definition, 0, (int)stream.Length);
Through web service instance, the CreateReport method allows to deploy the report onto the server. The parameters are the following:
  1. Report : name of the report in the RS filesystem
  2. Parent : full path to the RS folder in which you want to store the report
  3. Overwrite : replace an existing report with the same name?
  4. Definition : the binary content of the report
  5. Properties : a list of key-value options
  6. Returns a list of warnings that could happen during the deployment.
    //    Call the web service to create the report
    rs.Timeout = 1000000;
    warns = rs.CreateReport(reportName, parentFolder, true, definition, null);
Then, just do a check on warnings that may have been raised during deployment. I do not care about the shared data source warning, so I explicitly ignore it.
    if (warns != null)
    {
        foreach (Warning warning in warns)
        {
            //ignore the warning about shared data source
            if (!(warning.Message.StartsWith("The dataset ‘") && warning.Message.Contains("’ refers to the shared data source ‘") && warning.Message.EndsWith("which is not published on the report server.")))
            {
                // otherwise throw it
                throw new Exception(string.Format("Warning returned from WebService: {0}", warning.Message));
            }
        }
    }
Final piece of code is about altering the report datasource. Remember that the shared data source used by the original report has not been deployed, and that is exactly what we want! The idea is to make the online report use the data source we deployed in step 4.
To achieve this, we will do the following:
  1. find the data sources referenced by the online report (see line 1)
  2. create a DataSourceReference object pointing to online data source from step 4 (see line 4)
  3. create a DataSource object which includes the previous DataSourceReference and name it just like the first data source from the online report (see lines 6 to 8)
  4. update the data sources of online report with the DataSource object we just created (see line 9)
    DataSource[] ExistingDS = rs.GetItemDataSources(string.Concat(parentFolder, "/", reportName));
    if ((ExistingDS != null) && (ExistingDS.Length > 0))
    {
        DataSourceReference dsr = new DataSourceReference();
        dsr.Reference = string.Concat(parentFolder, "/", DatasourceName);
        DataSource ds = new DataSource();
        ds.Item = dsr;
        ds.Name = ExistingDS[0].Name;
        rs.SetItemDataSources(string.Concat(parentFolder, "/", reportName), new DataSource[] { ds });
    }
}

Download the source code

You can download it here.
It is a VS2010 solution, you can use the source code the way you want, but it comes without any warranty, etc.

References

Deploy reports using the report manager
http://dobrzanski.net/2008/08/12/reporting-services-deploying-rdl-files/

Deploy reports using the rs.exe utility
http://msdn.microsoft.com/en-us/library/ms162839(v=SQL.100).aspx

Basics of the Reporting Services web service usage
http://blog.magenic.com/blogs/daniels/archive/2008/11/25/Deploying-SSRS-Reports-from-Powershell.aspx

reportservice.asmx vs reportservice2005.asmx
http://wmug.co.uk/blogs/sean/archive/2009/04/12/adding-ssrs-2008-web-reference-in-visual-studio-2008.aspx

Referencing a dynamic webservice in VS2008
http://devlicio.us/blogs/derik_whittaker/archive/2008/07/21/referencing-2-0-web-services-asmx-in-visual-studio-2008.aspx

Command line parsing
http://www.codeproject.com/KB/recipes/commandlineparser.aspx