Quantcast
Channel: beyondrelational.com
Viewing all articles
Browse latest Browse all 25

SSIS - Split single row to Multiple rows

$
0
0

Its been a long long time since I have blogged. So have I been away from forums hence no ideas to blog. I was back at forum today and got a nice scenario. How do you split a single row to multiple rows? The input file has only one row.

There can be 2 scenarios which arise:

Scenario 1. the number of columns expected for each output row are the same.

Input: 1*1*1~2*2*2~3*3*3~4*4*4

Treat “~” as the row delimiter and * as the pipe delimiter

Expected Output:

111
222
333
444

Scenario 2. The number of rows for each output row are different. We need to default values if we have less than expected columns and ignore the extra columns.

Input:

1*1*1*1~2*2~3*3*3*3*3*3*3~4*4

Treat “~” as the row delimiter and * as the pipe delimiter

Expected Output:

111
22 
333
44 

Notice the blanks in the 3rd column for 2nd and 4th row.

SOLUTION:

Case 1:

This is simple you just need to setup the Flat File connection manager with a row delimiter as “~” and Column Delimiter as “*”

images[1]

This will give us the expected results.

Now Let us see at the 2nd Case.

What will happen if we take the second file as the source and set it up similarly? Have a look at the image below:

images[1]

This image above shows the output data as correctly formed however we have one extra column as per out requirement. We can remove this extra column while mapping to the destination.

When you run the package and view the same data in the Data Viewer you will see the output as below which is again not what we expect but if we do not map the last columns to the destination our job is done.

images[1]

CONCLUSION:

  1. SSIS In this case first splits the row based on row delimiter and then based on Column Delimiter
  2. SSIS decides the number of columns in output based on the number of columns available in the 1st row (in case above the number of “*” before we get a “~”)
  3. If there are more number of column delimiters than in the first row the extra data goes to the last column, before the next row delimiter is received(see row 3 above)

Approach to be taken for case 2:

  1. At package design time, set up a file with 1st row having one more column than expected in the final output.
  2. Map all columns to the destination except the last one.
  3. Once designed, remove the extra column set up in Step one.
  4. Job achieved very easily.

I was earlier thinking of writing a code in Script Component to achieve this, but while preparing this blog I got this method.

Let me know what you think of this method.

I will also post code that is required to achieve this. Decision is yours which method you choose.

C#

#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.IO;
#endregion

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{

    //StramReader to read the input file stream
    private StreamReader textReader;
    //String to save the source file path
    private string SrcFilePath;

    //Int to count the number of records read.
    private int i = 0;


    //Override the AcquireConnections Method to set up the connection once for the file.
    public override void AcquireConnections(object Transaction)
    {
        SrcFilePath = @"H:\MSBI\SSIS\2012\Input\SingleRowToMultiple.txt";
    }

    public override void PreExecute()
    {
        base.PreExecute();
        //Set the textReader at the PreExecute Phase so that we donot initialize it for each record.
        textReader = new StreamReader(SrcFilePath);
    }

    public override void PostExecute()
  {
        base.PostExecute();

        //Close the Text reader once the file has been read in the PostExecute Phase.
        textReader.Close();
    }


    public override void CreateNewOutputRows()
    {
        string nextLine;
        string[] rows;
        string[] columns;
        char[] rowDelimiters;
        char[] colDelimiters;
        rowDelimiters = "~".ToCharArray();
        colDelimiters = "*".ToCharArray();

        //Read next line from the file to the string variable
        nextLine = textReader.ReadLine();

        //Read the file till nextLine variable is not NULL ie. EOF
        while (nextLine != null)
        {
            if (i >= 0 && nextLine.Length > 0)
            {
                //Split the records by ~ to later extract the data and in each record.
                rows = nextLine.Split(rowDelimiters);
                {
                    foreach (string row in rows)
                    {
                        // Add new row to Script Component Output
                        this.Output0Buffer.AddRow();

                            // Split the row to columns based on *
                            columns = row.Split(colDelimiters);

                            this.Output0Buffer.Column0 = columns.Length > 0 ? columns[0] : string.Empty;
                            this.Output0Buffer.Column1 = columns.Length > 1 ? columns[1] : string.Empty;
                            this.Output0Buffer.Column2 = columns.Length > 2 ? columns[2] : string.Empty;
                            this.Output0Buffer.Column3 = columns.Length > 3 ? columns[3] : string.Empty;
                    }
                }
            }
            i++;
            //Read the next line
            nextLine = textReader.ReadLine();
        }
    }

}


Viewing all articles
Browse latest Browse all 25

Trending Articles