Exploring: Tuning SSIS Package

Refer the article source:Faster Extraction & Loading by SSIS

1.Create Indexes to the joining columns before starting extraction ( especially those that has data type other than integer like varchar, nvarchar, datetime, etc.

2.Choose data type & size of the columns wisely.

3.
Try to avoid type casting or manipulating data types inside SSIS package.
Typecasting to do in source query of OLE DB Source component.
Try to populate numerical values in facts and in joining columns.
if you choose a numeric key; less space means more record per page which means less
I/O for the same amount of data. Also joining on a numeric key is usually faster than joining
on a varchar / nvarchar key.

4.SELECT only columns that you need.

5.OLE DB Destination: ( 5 types of Data Access Mode) Use Fast Load option.

The fast load option will use BULK INSERT statement instead of INSERT statement.

If the fast load option is not selected then by default INSERT is used.

Keep Identity – Keep Nulls – Table Lock – It is the recommended option (removes the overhead of lock escalation.)
Check Constraints – It is preferable to uncheck(reduce the overhead for the pipeline engine.)
Rows per batch – RowsPerBatch is the number of rows you would want in One Buffer.

(automatically sets this property based on the RowSize and MaxBufferRows property.)

Maximum insert commit size – commits rows in batches that are the smaller from either (a) the Maximum insert commit size, or (b) the remaining rows in the buffer that is currently being processed.

6.Try to run all load processes in parallel as long as possible. Run as many load processes as you have available CPUs. The number of parallel process you can run is directly proportion to the number of logical processors (hyper threading technology) you have in your system.

--deadlock may occur while accessing the same table by more than one process.

--But the Log Extract task is using the same table to log the extract information.

Parallel processes

To avoid this type of problem change the value of the property IsolationLevel from Serializable to ReadUncommitted for all Log Extract tasks. This will not put the locks on table level and thus will not trigger any lock exception.

7.Network limitations: Packet Size property of the connection manager to an integer value

The max value that you can insert is 32767.

EXEC sp_configure 'show advanced option', '1';

RECONFIGURE;

EXEC sp_configure;

EXEC sp_configure 'network packet size (B)', '32767';

RECONFIGURE WITH OVERRIDE;

RECONFIGURE;

EXEC sp_configure

8.Derived Column Transformation:

The derived column, we usually use for data type conversation or handling NULL values.

two approaches:

derived Column transformation for all the records (Without Conditional split) - more time.

only for those records that have NULL values -With Conditional Split

9.If your package and SQL Server is running on the same system, then better try to take SQL Server destination instead of the OLE DB destination, so as to increase the performance of loading of data.

10.No Sort Transformation - Instead ORDER BY clause in Source SQL statements.

11.Truncate Vs Delete:

truncate removes all data from the table on one go and maintain a short log

Delete statement log every thing on record level as a result decreasing the performance.

12.Look up transformation with Derived columns transformation:

Lookup transformation: picking some values when the data got matched

some values that may not be present in the target table like NULL values, (convert the values with some predefined values like 0 or -1 and then it will be matched to the target table. )

perform a lookup transformation first and will take all the records that are successful in lookup and do the derived column transformation for the rest of the bad records.

13.A common table expression (CTE) approach will be much easier and faster way than nesting sub queries while dealing with multiple aggregate functions.

--------------------------------------------------------------------------------------------------------------------------------

Source: SQL Server Integration Services SSIS Best Practices

Best Practice #1 - Pulling High Volumes of Data
...
Then we came with an approach to make the target table a heap by dropping all the indexes on the target table in the beginning, transfer the data to the heap and on data transfer completion, recreate indexes on the target table. With this approach, the whole process (by dropping indexes, transferring data and recreating indexes) took just 3-4 hours which was what we were expecting.

So the recommendation is to consider dropping your target table indexes if possible before inserting data to it specially if the volume of inserts is very high.
....
Best Practice #2 - Avoid SELECT *

Beware when you are using "Table or view" or "Table name or view name from variable" data access mode in OLEDB source. It behaves like SELECT * and pulls all the columns, use this access mode only if you need all the columns of the table or view from the source to the destination.

Tip : Try to fit as many rows into the buffer which will eventually reduce the number of buffers passing through the dataflow pipeline engine and improve performance.

Best Practice #3 - Effect of OLEDB Destination Settings

Data Access Mode - 'fast load' option (uses a BULK INSERT statement) instead of a simple INSERT statement.If you select the 'fast load' option, there are also a couple of other settings...

Keep Identity – Keep Nulls – Table Lock – * Check Constraints –

Best Practice #4 - Effect of Rows Per Batch and Maximum Insert Commit Size Settings

Rows per batch – The default value for this setting is -1 which specifies all incoming rows will be treated as a single batch.

Maximum insert commit size – The default value for this setting is '2147483647' (largest value for 4 byte integer type) which specifies all incoming rows will be committed once on successful completion.

You can specify a positive value for this setting to indicate that commit will be done for those number of records. You might be wondering, changing the default value for this setting will put overhead on the dataflow engine to commit several times. Yes that is true, but at the same time it will release the pressure on the transaction log and tempdb to grow tremendously specifically during high volume data transfers.

Best Practice #5 - SQL Server Destination Adapter

Local SQL Server Database:
If the target is Local SQL Server Database. Performance wise its similar as "Bulk Insert Task". Advantage over the "Bulk Insert Task" is that it transform data before uploading to Destination.
Along with OLEDB destination adapter options , SQL Server Destination Adapter has more options.
"Insert Trigger Option" which can validate business rules can be set.Can specify number of first/last rows in the input load, specify maximum number of errors thats allowed until the bulk load option is cancelled.
Insert Column sort order can be specified which is used in the upload process.

SQL Server Database on Remote Server:
Can not use SQL Server Destination Adapter , instead use OLEDB destination adapter.If Destination change from local to remote or 1 sql server instance to another instance , use OLEDB destination adapter to
minimize future changes.

Best Practice #6 - Avoid asynchronus transformation ( such as sort transformation )

SSIS works internally:
1.SSIS runtime engine: executes the package.
2.It executes every task other than "data flow task" in the defined sequence.
3."SSIS Run time Engine" hands over "data flow task" to "data flow pipeline engine".
4."data flow pipeline engine":breaks:"data flow task": 1 or more execution trees in parallel to achieve high performance.

Synchronous transformation:
1.get a rec.,process it,pass it to other transformation or destination

Refer Source article it has many more tips...

Exploring

Tuesday, September 17, 2013

Tuning SSIS Package

No comments:

Post a Comment