Thursday, November 21, 2013

SSIS:BI Env.:Performance Tuning SSIS


SQL Server Integration Services SSIS Best Practices



Refer Source Video: Free SQL Training - Performance Tuning SSIS

Graphically deselecting column does not take out column, in SQL query specify the columns.Its a Client Side filter, so it pulls all data across network.

oledb source: Table or View
SQL Command:
Select * From tablename
                        [ instead for performance: Select col1,col2 from tablename]

Data Flow Task
Source - Advanced editor - Input Output Properties - column:Fastparse

For each individual column the above step need to be repeated.Fastparse will trust data being correct and load , it improves performance a lot but if any row bad then it will fail and need to rollback.

--
Refer Source Video:SSIS & SSAS BI Platform Performance Tuning

OLTP -(SSIS) or TSQL ->DW->SSAS cube<-Reports(Share point Portal,Excel etc.)

SSAS Cube in separate server  for cube to process faster.
Using TSQL fine , not necessarily only SSIS component.
Source - 'FastParse' can be used if  data is correct.
Packet size- SQL Server 2008 ( 2 types compressions: datatype ( Select data fast , Insert slow: integer takes 1 byte instead of 4 bytes) Page Compression:WinZip does for files.


Wednesday, November 13, 2013

SQL:SQL FULL OUTER JOIN-SQL UNION -- SQL UNION ALL

Refer the Source Video: SQL Server join :- Inner join,Left join,Right join and full outer join

Full outer join=Result of (Inner join (common records) + Left Outer Join + Right Outer Join)


Source: SQL FULL OUTER JOIN Keyword

SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
FULL OUTER JOIN Orders
ON Customers.CustomerID=Orders.CustomerID
ORDER BY Customers.CustomerName;

Note: The FULL OUTER JOIN keyword returns all the rows from the left table (Customers), and all the rows from the right table (Orders). If there are rows in "Customers" that do not have matches in "Orders", or if there are rows in "Orders" that do not have matches in "Customers", those rows will be listed as well.


Source: SQL UNION -- SQL UNION ALL 

SELECT City FROM Customers
UNION
SELECT City FROM Suppliers
ORDER BY City;

Note: UNION cannot be used to list ALL cities from the two tables. If several customers and suppliers share the same city, each city will only be listed once. UNION selects only distinct values. Use UNION ALL to also select duplicate values!

SSIS:Performance Tuning


SQL Server Integration Services SSIS Best Practices



Refer the Source Video:SSIS Performance Tuning

FULL RECOVERY - lot of  logging done

DW project - generally Simple Recovery Mode (minimal logging)


Simple Recovery it loads lot of rows in sec.

Control Flow  Data Flow

Control Flow:

              Create DB 
                  {notes:Create DB ( Execute SQL Task:SQL Statement )}
                   {notes: ALTER DATABASE [DBname] SET RECOVERY FULL

              Create Table

Data Flow:


Flat File Source ( 6,000,000 rows)

OLE DB Destination

notes:
CREATE DATABASE (Transact-SQL)

  Create DB ( Execute SQL Task:SQL Statement )

USE [master]
GO
IF DB_ID(DBNAME) IS NOT NULL
BEGIN
ALTER DATABASE [DBNAME] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE DBNAME;
END
GO

CREATE DATABASE [DBNAME]
CONTAINMENT = NONE
ON PRIMARY
(NAME = N'DBNAME', FILENAME = N'C:\ProgramFiles\Microsoft SQL Server\MS SQL11.MSSQLSERVER\MSSQL\DATA\DBNAME.mdf', SIZE =10 MB, MAXSIZE = UNLIMITED, FILEGROWTH = 1 MB)

LOG ON
( NAME = N'DBNAME_log', FILENAME = N' C:\ProgramFiles\Microsoft SQL Server\MS SQL11.MSSQLSERVER\MSSQL\DATA\DBNAME.ldf', SIZE = 2 MB, MAXSIZE = 2000GB, FILEGROWTH = 10%)

GO
GO
ALTER DATABASE [DBNAME] SET RECOVERY FULL
-----------------------------------------------------------------------------------------------------------

Source: SSIS Performance Tuning Simple Load Part 2 Database Growth SSIS

In the CREATE DATABASE  statement if  the FILEGROWTH  is increased  say 20MB or 100MB , performance improves.

-----------------------------------------------------------------------------------------------------------
Source:SSIS Performance Tuning Simple Load Part 3 Initial Database Size 

In previous video using the "DBGrowth"  which refer to "FILEGROWTH" the performance improved.The table that being loaded is below  the "FILEGROWTH" so it will not impart on performance.

By changing the "Initial Database Size" which refers to "SIZE"  , it will change the performance. 

-----------------------------------------------------------------------------------------------------------
 Source:SSIS Performance Tuning Simple Load Part 4 BI vs Enterprise Edition

In Enterprise Edition its faster.  
-----------------------------------------------------------------------------------------------------------
Source:SSIS Performance Tuning Simple Load Part 5 Commit Size

Data Flow:

OLE DB Destination
-----------
OLE DB Destination Editor:

             Configure the properties used to insert data into a relational database using an OLE DB provider.

[Connection Manager
Mappings
Error Output]

Specify an OLE DB Connection manager, a data source, or a data source view, and select the data access mode.If using the SQL Command access mode, specify the SQL Command either by typing
the query or by using Query Builder. For Fast - load access, set the table update options.

OLE DB Connection manager:
oledb-LoadDatabase

Data access mode:
Table or View - fast load

Name of  the table or the view:
[dbo].[Tablename..]

                        []Keep Identity [*]Table lock
[]Keep nulls []Check Constraints

Rows. per batch:

Maximum insert commit size: 0 ( changed from a higher number to 0) ( it means 1 transaction)[1 transaction: it will commit all rows, in 1 time{Logsize,TempDB size will be  bigger]
[Caution: Table having "Clustered Index"  won't work.][Only for heap table(having no indexes)]

View Existing Data
----
~Source:Making Fast Load really fast on clustered indexed tables with SSIS

Moreover, as using MICS parameter leads to a dramatic performance degradation, I think that the OleDb destination adapter should issue a warning if both ORDER and MICS parameter are set so to make the programmer think twice before using them together.

~Source:Batch Sizes, Fast Load, Commit Size And The OLE DB Destination

Rows per batch:
This setting affects the performance and commit semantics of the SQL engine, but doesn’t affect things on the SSIS side. Lowering this value does not affect the number of times SSIS commits data to the server.

Maximum insert commit size:
Enlisting the Data Flow task in a transaction also has the same effect as setting FastLoadMaxInsertCommitSize  = 0.  All rows will be committed or rolled back at the end of the Data Flow Task.

Beware
Setting the Maximum insert commit size property to a low value will cause a lot of overhead on the destination as the rows will be committed much more often.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SSIS Performance Tuning Simple Load Part 6 Packet Size SSIS 2012 Video Example

Ole Db -right click -  All - see  "Packet Size:0" - by default it means 4kb. Change it to 32KB which is maximum, if the value disappears then its on the "Connection" , choose the "project.param" , choose connection
string and specify  "Packet Size" 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Source: SSIS Performance Tuning Simple Load Part 7 Row vs Page Compression SSIS 2012 Video Example

Connect to Server

Server type: Database Engine

Choose the Database->Table(right click "Storage" - "Manage Compression" - change Compression type "Page"

Next screen "Data Compression Wizard" , Run immediately ,

Task Manager - Look at "Performance", "Write Speed" is less , "Read speed" as usual.

Conclusion:Compression "Slows down" the load into "heap" table. 

Observation:"Row" Compression is bit faster , "Write Speed" is little faster , "Read speed" little better

But any how "Row" Compression is slower.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Source: SSIS Performance Tuning Simple Load Part 8 Balanced Data Distributor SSIS 2012 Video Example

Balanced Data Distributor , does parallel processing.It works like "Conditional Split" , but its more efficient.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Source:SSIS ole db vs SQL Server Destination Simple Performance Test


OLEDB destination is faster and its recommened but in this example SQL Server Destination is faster.
----------------------------------------------------------------------------------------------------------------------------------

Friday, November 8, 2013

SSIS:Conditional Split Transformation

Refer Source Video: SSIS Conditional Split Transformation Part 1 Video Example SSIS 2012

2005,2008,2008 R2, 2012 - same. No much difference.
DataFlow

OLEDB DataSource:

SQL Command: query ( UNION ALL  of  two resultset)

Derieved Column: To view the values , "Condition" , column define the condition for spliting. The "Order" of  the condition important, accordingly it will execute.

Why "Order" is important, see in the below video.

Refer Source Video:SSIS Conditional Split IS NOT NULL Part 3 Video Example SSIS 201

If you have NULL  value then the Conditional Split Transformation , will stop. If  one adds like ISNULL(columnname) as the codition , it still will not work and other condition it will not run until the "Order" the ISNULL having condition is first.

Refer Source Video:SSIS Conditional Split Default Output Name Part 2 Video Example SSIS

"Conditional Split Transformation" Editor, the rows that do not meet any condition defined in the condition criteria will be lost, to capture it , in "Default Output name:" give a name and in Data Flow add a Derived Column & add , so those rows are captured.

Refer Source Video:SSIS Conditional Split Configure Error Output Part 4 Video Example

If  you have NULL value or for any condition you want those row to fall into another category then in the "Conditional Split Transformation" , there is a "Red Arrow" in that you specify the "Redirect" output of conditions , thats the same you see in "Configure Error Output", so using a 'Derieved Column" , one can see the error rows.

------------------------------------------------------------------------------------------------------------
Refer Source Video:Conditional Split Transformation In SSIS

If  in "Condition Split Transformation" one want to split data and put into different sources like one to "Flat file" and other set to  any other database.

------------------------------------------------------------------------------------------------------------