Thing which seemed very Thingish inside you is quite different when it gets out into the open and has other people looking at it

Thursday, August 21, 2008

Introduction to Data Warehousing

Data warehousing is a very useful process to support business and organizational decision making activities to store organizational historical data which can be later presented using a portal.

Data warehousing is mainly done in three basic steps

Data warehouse mainly depends on the Extraction Transformation Load process

Extraction – Extract the data from the source systems and converts them into a format for transformation process

Resources used for extraction task.

DW can use source system to get data to the system. Such as employee allocation information, employee human resource information, and financial information ect. These resources can come in many forms such as excel/csv/rdbms ect.

These data can be extracted to the warehouses using DTS packages or SSIS packages.

Transformation – Apply set of rules and conditions to the source data and transform filtered data to the data warehouse.

Set of rules which can be applied in the transformation tasks.

    • Aggregation

    • Filtering

    • Joining

    • Sorting

    • Pivoting -Turning multiple columns into multiple rows or vice versa

Loading – Load the filtered data in to the tables arrange them accordingly to the data warehousing schema with the usage of facts and dimensions. Facts store measures calculated, usually business process is represented by a fact table. Dimension tables contain component attributes, which represents descriptive information about a fact.


Data warehouse schema is put to data cubes and they are access by the internal portal

Useful Technologies for Data Warehousing

Data can be extracted through DTS Packages and SSIS packages.

Schedule tasked can be assigned to extract data daily using scheduled job using SQL server 2000/2005.

Data transformation can be done using views and stored procedures designed accordingly and executed those using DTS packages as well as SSIS packages

Loading can be done using data cubes architecture . Table data is loaded to the cubes using SQL 2005 Business Intelligent Studio which are arrange accordingly to the relevant DW schema. These cubes can be access in three ways

  1. Access through the reports which uses SSRS 2005 MDX coding and VB.Net custom codes.

  2. Access data through the portal which uses share point technology and .Net framework.

  3. Generating pivots using Microsoft Excel


Friday, August 1, 2008

Adding A New Cube To The Warehouse

This post explains the tasks which were done/followed when adding a new data cube to an already existing data warehouse. This more like a guide line to follow when you are creating a new cube to the system.

To add a new cube to the system table structure should be created. Following steps are taken to add brand new set of tables from the sources to data warehouse

1 Extraction process

  • Create relevant tables in extraction

  • Create views for all those created tables

  • Create the DTS Package to transform data from source to extraction

2 Transformation process

  • Create the relevant fact table to store the measures

  • Create the needed dimensions to store new attributes

  • Create views for newly created tables

  • Create SPs to apply set of rules to filter information and create calculations for the fact tables

  • Create DTS packages to call the SPs.

All stored procedures and queries can be written using SQL using Microsoft SQL query analyzer.

3 Cube Structure Creations (Loading)

Create cube for the relevant fact table including the needed dimensions using Microsoft SQL business intelligent studio(2005)

After cube structure is created following task should be followed

  • Adjust the dimension property values in the cube structure

  • Under the calculation tab create an MDX query in order to create the calculation.

  • Under newly created dimensions create attribute hierarchy and adjust the properties accordingly.

  • Process the cube.

4 Unit Testing

Last but not least Unit testing. Unit testing is a very crucial task in data warehousing. Since we calculate sensitive information. You can do the unit testing process using SQL coding to check the transformation process is properly done

  • Source to extraction testing

    • Test all source records are extracted to Ext Database

  • Extraction to fact table testing

    • Test fact table calculations are properly calculated.

  • Fact tables to cubes

    • Create a pivot to the newly created cube compare data with the fact table, according to the cube calculation.


Tuesday, July 1, 2008

SSIS Vedio

This is a vedio on creating SSIS packages it might be useful for all da SSIS beginers

video

Wednesday, April 30, 2008

How to create a simple SSIS package

I would like to give the instructions from the very first step so anyone who reads the blog can understand how to do it. So to start SSIS you need to have SQL Server Business Intelligence Development Studio installed. First open Business Intelligence studio and to create the first project . Create a new project (new ->Project ) under Business Intelligence Project Select Integration Services Project give an appropriate name and click ok.

Creating A Connection Manager

What I recommend for the first step in creating SSIS package should be creating a connection manager. In the Connection Manager panel right click and click new OLEDB Connection (It can be any other depending on your need for example if you are planning to handle your operation using flat files you should select flat file connection. If its on log file it should be flat file connection). So here we are talking about DB connection. Click new and give the appropriate Connection and the DB name

Before Finishing you should always test the connection , in order to avoid failure when running the package

Tuesday, April 29, 2008

SSIS Introduction

SSIS (SQL Server 2005 Integration Services ) is the data transformation standard in SQL 2005 which replaces the DTS packages in SQL 200o.SSIS has a platform for building high performance data integration solutions, including extraction, transformation, and load (ETL) packages for data warehousing.

You can start doing SSIS using Business Intelligence Studio which provides a graphical tool kit for designing and debugging SSIS packages. It has more functionalities than DTS(in SQL 2000) and it can work parallel packages faster. I would like to explain how to create SSIS packages using this blog mainly because when I was learning there were not many resources to learn. So I would like to explain the functionalities of SSIS using graphics/Vedios in a simpler way

Monday, April 28, 2008

How To Create Control Flow Items

There are many control flow items in SSIS unlike in DTS.I am going to demonstrate few manly used control flow items such as Execute SQL task, Execute Package Task, Data Flow Task . I would like to explain other items in my next article.

Execute SQL Task – This is used to execute SQL statements like insertions and truncations.

To create SQL task drag and drop the SQL Task from the control flow item tab and double click the package.

Specify the task name and the description in properties window. Double click the SQL task and give the connection type it can be OLEDB/Excel, Then select the connection manager- which has the DB/Tables you are using to run the SQL statements.

Under SQL statement write the SQL statement and parse the query to test for the SQL syntax.

This is how it looks like in when you double click the SQL execution task,things which are marked in red are the things you should be careful when creating an execute SQL task

Execute Package Task-This is used to call other packages , drag and drop the execute package tasks and under general field give the proper name and description,

Under package give the connection and the relevant package u want (packages must be created previously).

When creating the package make sure to adjust the fail package on failure property true in properties tab so if the calling package fail it will fail the parent package as well.


Saturday, April 26, 2008

How Build A Data Flow Task ( Transform data through SSIS )

You can transform data from one location(source) to another(destination) with adding modification to the data using Data Flow Tasks..To Create a Data Flow you need to select the Data Flow Task from the control flow panel and add to the control panel tab and double click on it . chose the appropriate Data Source and the Data Destination for the data transformation.

Data Source Can Be

* Data Reader Source – Extract data from relational DB by using .Net Provider

* Excel Source – Extract data from excel work book

* Flat File Source - Extract data from flat files

* OLE DB Source - – Extract data from relational DB by using OLE DB connection

Data Destination Can be

* Data Mining Model Training

* Data Reader Destination

* Dimension Processing

* Excel Destination

* Flat File Destination

* OLE DB Destination

I am going to discuss on OLE DB Source to OLE DB data transformation.

First drag and drop the OLE DB Source task and rename the name and description

Double Click the Task –

Select the OLE DB Connection Manager

Data Access Mode

Data Excrating Source – Which can be SQL command,Table/Veiw according to the data access mode you select

Always preview the table before closing the tab in order to avoid errors at the execution state

Then Drag and drop OLE DB Destination task and name it appropraiately

Before you do your mapping you should connect the Source and the destination first

For that you can just connect two task using OLE DB Source output(select both two packages right click -> add path-> OLE DB Source output)

Or

You can put any Data Flow Transformation task and then connect

Data Flow Transformation Tasks are show in the picture, you can use any of these task in-between and map to the destination

After mapping the source with the destination you can double click the Destination task and select the appropriate destination table from the connection manager and arrange the mapping accordingly

Friday, April 25, 2008

How To Deal With Slowly Changing Data In SSIS

In a data warehouse it is important to keep historical data as well as the new data.. In SSIS there is a way to get slowly changing data using Slowly changing dimension under data transformation task..It is a way of inserting data from source to destination with the Type 1 and Type 2 changes.

First you should create the source using a proper data source .Then drag and drop the slowly changing dimension task and give the specification as shown below

Connection Manager – Make a connection to the source and give that connection, it can be a database or an excel sheet.

Table or view – Specify the source table

Input Columns – Columns from the source

Dimension Columns – map the source columns with the destination columns

Key Type – Specify the primary key/Composite keys

And click next

Select a change type for slowly changing dimension columns specify the dimension columns sand the change type (in a data ware house all attributes should be changing attributes other than attributes like ( last updated date/date created) if you put any attribute as fixed attribute it will not updates the data when it comes for type 1 changes so I prefer not to use fixed attribute type if an attribute is not changing attribute then do not mention it in the changing attribute columns) click next

Inferred Dimension Members – enable inferred member support only if you think fact tables may reference dimensions members that are not yet loaded

(you should keep it as disable if you are not sure about it).. Click next

In the fixed and changing attribute option select change all the matching records, include out dated records, when changes are detected in changing attributes and finish slowly changing dimension wizard

It will automatically create the dimension with appropriate type 1 and type 2 fields where if an attribute is type1 it will updates the specific column if its type 2 it will insert as a new record.

You can rename the task accordingly using properties.

Monday, April 21, 2008

How To Create A Dynamic Log File In SSIS

In SSIS you can create a log file to track the behavior of the SSIS package execution. These are the steps used when creating a dynamic log file
  • Create a log file in connection manager.
    • Right click on connection manager ->New File Connection->give the proper sources to the log file.(Usage type- ,Existing file/New file Path- path of the log file)

    • Or go to SSIS -Login (in the menu bar) and create a new File Connection

  • Login Configurations
    • In the menu bar Click SSIS -> Logging.

    • Click checkbox for the Package(left side).

    • Select the configuration and give a name for the log file.

    • Click the Details tab to fire the events.

  • Making the log file dynamic

    • In the properties tab of the log file connection (click on log file in the connection manager and view the properties tab) Click on expression and give the following settings



    • Property – Connection String

    • Expression - @[User::gvErrorLogDir] + @[System::PackageName] + " "+ (DT_WSTR, 4) YEAR( GETDATE() ) + "-"+ ((LEN((DT_WSTR, 2) MONTH( GETDATE() ) )>1)?(DT_WSTR, 2) MONTH( GETDATE() ):"0" + (DT_WSTR, 2) MONTH( GETDATE() ) )+"-"+((LEN((DT_WSTR, 2) DAY( GETDATE() ) )>1)?(DT_WSTR, 2) DAY( GETDATE() ): "0"+ (DT_WSTR, 2) DAY( GETDATE() ) ) + " " + ((LEN((DT_WSTR, 2) DATEPART("Hh", GETDATE() ) )>1)?(DT_WSTR, 2) DATEPART( "Hh", GETDATE() ) : "0" +(DT_WSTR, 2) DATEPART( "Hh", GETDATE() ) ) + "h-"+ ((LEN((DT_WSTR, 2)DATEPART( "mi", GETDATE() ) )>1)?(DT_WSTR, 2) DATEPART( "mi", GETDATE() ) :"0" + (DT_WSTR, 2) DATEPART( "mi", GETDATE() ) ) + "File.txt"


Relevant Links
How To Create A Logger In SSIS
SSIS Junkie
Integration Services Log Providers-MSDN