At XTIVIA we often receive requests from clients who require somewhat more than the average database administration tasks. Recently I had a request help a client pull normalized data from one object in one data source and merge that data with data in two de-normalized tables in a SQL Server instance. Our client had already attempted this process using server links and programmability but the performance of a standard stored procedure through a normal linked server between outdated versions of two different RDBM systems was intolerable and not practical for normal business practices. So how to move and merge over a million rows of data in a timely manner?
SSIS is the answer we developed through BIDS. Oh great, more acronyms. Don’t worry. If you have ever taken a backup of a database or created a maintenance plan, you have used SSIS. It stands for SQL Server Integration Services and BIDS is the platform used to develop an SSIS package. BIDS stands for Business Intelligence Development Studio. Simply put, business intelligence is a way to move data into or out of almost any data source. You can also manipulate data between the source and target such as alter a data type and many other operations.
But I’m not a BI specialist or developer so, as a DBA how is SSIS going to be useful for me? Let me put it in a way that most of us DBAs will immediately grasp. Think of a SSIS package as an external stored procedure between almost any two data sources. Using the BI development studio, I can set variables, query a table, conduct transactions and trap errors much like you can in a stored procedure. Of course, SSIS is capable of doing a lot more than I will demonstrate here but the goal is to overcome your fear of SSIS and show a simple way to use BIDS to manipulate data outside of a data environment.
Now to be clear, SSIS finds its value in manipulating data from multiple data sources and operating outside the database environment. It can certainly be used to manipulate data within a single database or instance but these tasks are usually much more efficient when you use a stored procedure. What I am talking about here is moving a result set to a file, moving data between two different database systems, such as from Oracle to SQL Server, and to manipulate data to conform with these systems. I refer you to the right-click options in SQL Server to import or export data. Yes, folks, those are SSIS functions.
Let’s take a few exploratory steps in the BIDS environment by creating a SSIS package to retrieve data from one SQL Server instance, modify data type to change a datetime to a string and insert that modified data into a different instance with a table the closely but not exactly matches the table on the source instance. Easy enough to do using stored procedures but the point is to demonstrate how we can accomplish this goal using an SSIS package.
Identify the Data Source and Target
First let’s look at the data sources we will be using:
The source table is a common table in the ‘AdventureWorks2008r2’ sample database named ‘BillOfMaterials’.
CREATE TABLE Production.BillOfMaterials(
BillOfMaterialsID int IDENTITY(1,1) NOT NULL,
ProductAssemblyID int NULL,
ComponentID int NOT NULL,
StartDate datetime NOT NULL,
EndDate datetime NULL,
UnitMeasureCode nchar(3) NOT NULL,
BOMLevel smallint NOT NULL,
PerAssemblyQty decimal(8, 2) NOT NULL,
ModifiedDate datetime NOT NULL,
) ON PRIMARY
The target table is very similar except the columns are re-arranged a bit and the datetime fields are now nVarchar strings.
CREATE TABLE [BillOfMaterials_Closed](
[BillOfMaterialsID] [int] IDENTITY(1,1) NOT NULL,
[ProductAssemblyID] [int] NULL,
[ComponentID] [int] NOT NULL,
[UnitMeasureCode] [nchar](3) NOT NULL,
[BOMLevel] [smallint] NOT NULL,
[PerAssemblyQty] [decimal](8, 2) NOT NULL,
[ModifiedDate] [nVarchar(30)] NOT NULL,
[StartDate] [nVarchar(30)] NOT NULL,
[EndDate] [nVarchar(30)] NULL
) ON [PRIMARY]
Identify the Steps in the Package
Just as we would in a stored procedure, we must identify the steps we need to take to accomplish the business rules governing this process.
- Create connections to each data source.
- We need to determine what data qualifies for processing using a date we will fetch from a ‘LastProcessed’ table on the target.
- We need to create the target table in the target data source.
- We need to extract the qualifying data from the source target data source.
- We need to change the three ‘datetime’ column data to ‘nVarchar’ strings.
- We need to insert this transformed data into the target table.
- We need to update the ‘LastProcessed’ table with the new date.
Open BIDS and Create a New Project
Depending on your SQL Server installation, BIDS should already installed on your server under the SQL Server program group on your start menu. If you don’t find it in the SQL Server program group it will have to be installed to proceed. For the sake of argument, we’ll assume that you have BIDS already installed.
When you open BIDS you will be presented the start page. From here we will create the project and a blank SSIS package within that project. On the start page center pane you will find a link that will guide you through creating a project.
Choose an ‘Integration Services Project, give the project a name and click ‘OK’. Now hold on to your hat as the development studios opens a blank package for you. You’ll notice three areas of the studio. On the far left is the toolbox. We’ll use a few items from there. In the center is the development pane. You have seen this before if you have ever built a database maintenance plan. On the far right is the project explorer. Now to build our project we are going to follow the steps outlined above.
Step 1: Create Connections to the Data Sources
Below the center pane (process pane) is a window that contains the connection manager. Right-click on this pane and choose ‘New OLE DB Connection’ which is what we need for this project. You will notice that there are several connection types from which you can choose.
A list of data connections will now pop up and will be empty. Click ‘New…’ and the actual connection wizard will open. At the top, choose the provider type. In this case we will use the SQL Server native client 10. Next in the drop down list box choose the source server. In this case we will use the default instance on the same server as the target but remember this can be any number of connection types including those for an Excel CSV file or an Oracle instance and many others. Now choose the authentication type and finally the database for the source data. In this case, ‘AdventureWorks2008r2’. Now click ‘Test Connection’ to verify that the connection is valid.
Now we want to do the same thing with the target data source. For this project, I have a named instance on this server. It contains the table where we will get our ‘LastProcessed’ data qualifier. Now we have our data sources defined. Next!
Step 2: Retrieve the ‘LastProcessed’ Date
The first thing we need to do is create a variable to hold our ‘LastProcessed’ date. To do this, in the menu bar choose ‘SSIS’ then choose ‘Variables’. This will open the variables window.
The variables window is where we declare variables, assign them values and define the scope of the variable. It is now covering the toolbox but there are tabs at the bottom where you can toggle between the toolbox and variable window.
Anyway, in the upper left portion of the variable pane is the ‘New Variable’ icon. Click that icon and you will see a new variable appear.
Give the variable a name, indicate a data type (in this case, datetime), give it a default value, if needed and define the scope of the variable. The scope refers to which part of the package for which the variable is valid. Since we are going to need that variable throughout the entire package we will make the scope ‘Package’.
Now let’s populate that variable with the value we need. For this, I’m going to go back to the ‘Toolbox’ and choose the ‘Execute SQL Task’ and drag it over to the process pane.
Now you need to double-click this ‘Execute SQL Task’ to open the dialog for this task. In this dialog we want to indicate which connection type, connection and what SQL statement we will use to populate the ‘LastProcessed’ variable.
The connection type should be ‘OLE DB’. The connection is the target connection we configured earlier. The SQL statement is listed below. This statement will give us our data qualifier.
Now is a good time to perform our first debug test. Click the green arrow in the menu bar and you will launch the debugger. When you launch the debugger, the ‘Execute SQL Task’ tool should first turn yellow and then turn green when the step completes successfully. If the item turns red instead, the task failed. Test your connection or SQL statement to be sure that all is configured correctly. Rename this task so you can identify what the task is doing by clicking on the task name.
Good time to save. Next!
Step 3: Create the Target Table
Our next task is to create the table that will house our transformed data in the target instance. We are going to take a different tack on this task. Instead of having an ‘Execute SQL Task’ tool create this table, we are going to execute a stored procedure on the target instance that will create this table for us and we’re going to use a variable to execute the stored procedure. Sound confusing? No worries. It’s pretty simple.
To do this, grab an ‘Execute SQL Task’ and pull it onto the process pane. Now we are going to create a variable that has a scope of only this new task. Highlight the new task and click on the tab below the toolbox labeled ‘Variables’. You’ll notice in the variable pane we only have the ‘LastProcessed’ variable in the list. Let’s add a new variable.
Click on the ‘New Variable’ icon and create a variable named ‘CreateTargetTable’. Note that when you highlight a task and then create a variable, that the scope of this variable is now restricted to this task. Of course, you can change this but for now leave the scope as the default. Change the data type to string and in the ‘Value’ cell type the statement that will execute the stored procedure. In this case, we have a procedure named ‘CreateTargetTable’ already on the target.
Now we are going to associate that variable to an ‘Execute SQL Task’. Grab this task from the toolbox and drag it to the process pane. Edit the task for the connection type of ‘OLE DB’, the connection will be the target instance but this time the ‘SQLSourceType’ will be ‘Variable’. Now you will see the ‘SourceVariable’ option. Choose the user variable ‘CreateTargetTable’ as you see below.
Don’t forget to highlight the first task and drag the green arrow down to the second task as you would in a normal maintenance task.
Now test and save.
Step 4: Grab the Qualifying Data from the Source Database
Now it’s the time to retrieve the data we need from the source database. We could use an ‘Execute SQL Task’ tool if we wish but there is a tool especially made for this type of task. Grab the ‘Data Flow Task’ from the toolbox and drag it over to the process pane. This task will let us retrieve the data we need, transform that data and insert that data into the table on the target. Understand that a ‘Data Flow Task’ is a subset of tasks that we will use accomplish the retrieval of the qualifying data from the source, transform that data and insert the data into the target table. You can change the process pane from control flow to data flow using the tabs above the process pane.
Each ‘Data Flow Task’ must have a data source and a data target so the toolbox changes to contain these tasks. At the top of the toolbox you will see ‘Data Flow Sources’ section. Grab the ‘OLE DB Source’ task and drag it over to the data flow task pane. Right-click the ‘OLE DB’ source and choose ‘Edit’.
Here we need to change the connection in the first drop down list to point to the source database. In this case, the ‘Data access mode’ will be ‘SQL Command’ and the SQL statement will be entered in the ‘SQL command text’ box.
You’ll notice that the search condition uses a question mark instead of a variable name. That is simply a place holder for the input parameter contained in the ‘LastProcessed’ variable. Now we will associate that variable to this SQL statement. Click the ‘Parameters’ button and you will see that a single line is visible. In the ‘Variables’ column choose the user variable ‘LastProcessed’ as shown below. For each question mark in the SQL statement you entered earlier, there will be a parameter that needs to be assigned a value. Once you have assigned variables to the parameters, click ‘OK’ twice and we have completed step 4.
Step 5: Transform the Data
Our next step is to transform the three datatime fields in the dataset into strings that will fit into the target table. This task is also performed in the ‘Data Flow Task’ we created in the step above. Notice that in the data flow toolbox there is a section labeled ‘Data Flow Transformations’. From that group of tools choose the ‘Data Conversion’ task and drag it onto the process pane for the data flow.
Right click on this new task and you will see a depiction of the source table. Each column has a checkbox next to it. Click the radio button next to the ‘StartDate’ column and that column will appear in the table at the bottom of the editor. The ‘Output Alias’ refers to the name of the output data, remove the ‘CopyOf’ in front of the column name. Next, click on the ‘Data Type’ column and choose ‘string [DT_STR]’ with a length of 30 which matches the corresponding column in the target table. Do the same with the remaining datatime columns.
That was easy. Save!
Step 6: Insert Data into Target Table
Now that we have transformed the data, we need to put that data where it belongs. The bottom section of the data flow toolbox is the ‘Data Flow Destinations’ tools. Grab the ‘OLE DB Destination’ and drag it into the process pane of the data flow task. Right-click and choose ‘Edit’ and you will see another editor that looks very much like the one we saw in step 4. And we need to provide the connection for the target. The data access mode will be ‘Table or view – fast load’ and the table is the target table we built in step 3. I hope you tested after step 3 because that test will have created the target table in the target database. If that table is not listed in the drop down list, go back and test step 3.
If you remember, the structure of the target table not only changes the data type of three columns but also those columns are arranged differently than the source table so we need to map the matching columns between the two tables.
In the table on the left of the editor, choose the ‘Mappings’ item and you’ll the current mappings. Now here’s a tip, if the column names in both the source and target tables are the same, BIDS is smart enough to recognize where the columns go but if you changed the names or added columns to the target that are not in the source, you will need to map these columns by hand. Do that by choosing item you wish to change from the input column and change its value as necessary.
Also, notice that in the bottom three items in the input column are the ‘OLE DB Source.<Column Name>’ inherited from the data transformation we did in the last step.
Done. Test and save!
Step 7: Update the ‘LastProcessed’ Value in the Target Database.
Now all we have left to do is update the ‘LastProcessed’ value in the target database. In the tables on the top of the process pane, click on the ‘Control Flow’ tab. Drag another ‘Execute SQL Task’ into the process pane. Right-click on the new task and choose ‘Edit’. Just like we did in step 2, supply the type of connection, the connection string for the target database and the SQL statement that will update the ‘LastProcessed’ table so that it contains today’s date as shown below.
Test and save!
And We’re Done
Well, there you have it. We have used BIDS to create a SSIS package. In this package we retrieve data and assign it to SSIS variables. We created an object in a database. We moved data from a source to a target while changing data types. We updated a value in a table on the target. We used variables that contained SQL, used embedded SQL in a task and even executed a stored procedure. During this process we created four control flow tasks and three data flow tasks. That’s a full day.
Of course, there is much more we can do with SSIS packages as you can see from the magnitude of tools that are available. We can also create event handlers to take care of any errors that are encountered during runtime but I’ll leave that for you to discover. Explore BIDS and enjoy a little trial and error and soon the phrase ‘business intelligence’ won’t make your eyes roll back anymore. And, of course, if you have a large project with which you need assistance, contact XTIVIA. We specialize in business intelligence.