Importing data with mixed types
I seem to spend a lot of time wrestling with SSIS data flows that have Excel sources. The problem comes where a column contains mixed data types. Specifically a recurring scenario is where an Excel worksheet that I get asked to import into the database contains a column with mostly numeric values but in fact the field in question is a string data type, with a smaller number of alpha numeric codes in later rows.
I understand that Excel helpfully(!?) determines the data type based on a sampling of the first few values (the first 8 by default I think). I also know about adding the IMEX=1 option to the data connection string to inhibit this behaviour. However, I think the IMEX option is of limited value because my observation is that Excel still decides on double float unless it finds at least one non numeric value in its sampling phase. The trouble is that the Excel files I'm asked to import contain thousands of rows and you never know where non numeric data may lurk.
By way of example, consider a very basic Excel worksheet with a single column heading and 9 values, the last of which is non numeric:
id
1000001
1000002
1000003
1000004
1000005
1000006
1000007
1000008
ABC123
I can run a SSIS data flow that happily puts all 9 rows into a SQL table with the following schema definition:
CREATE TABLE TestTable (
id nvarchar(50) null)
There is no error in the SSIS package but the SQL table looks like this after the load:
id
1000001
1000002
1000003
1000004
1000005
1000006
1000007
1000008
NULL
Hi,
Thanks for looking at this. Something is different between our environments because the MS Jet driver always turns the 9th row value to NULL in my package, rendering the Data Conversion task useless. This may be because I use OpenOffice rather than proper Excel. When I create a second column (named new_id), which is derived from the id column using the IF function combined with ISNUMBER and TEXT functions, you can see from the screenshot below that this fixes the issue.
Thanks for looking at this. Something is different between our environments because the MS Jet driver always turns the 9th row value to NULL in my package, rendering the Data Conversion task useless. This may be because I use OpenOffice rather than proper Excel. When I create a second column (named new_id), which is derived from the id column using the IF function combined with ISNUMBER and TEXT functions, you can see from the screenshot below that this fixes the issue.
I know it is also possible to bulk load data into a SQL DB from an Excel file using the script task, which may be the answer for ETL tasks involving Excel data sources that need to be automated. If someone could walk me through creating a script task to achieve this, that'll be great. I would prefer Visual Basic as the language.
No comments:
Post a Comment