Working with Excel files in C#
1
1
|
I receive a monthly XLSX file that needs imported into SQL Server using an SSIS package. Unfortunately, the sender does not follow UNC naming for the filename or worksheets, and our recent migration to SQL Server 2012 has caused the package to fail - even when using an Excel Connection Manager. We have also tried sending them a template, but they refuse to follow it, and we have no leverage to force them to do so.
I have been attempting an update to the package which would use a Script Task to import each of the two Excel worksheets into a
System.Object for each, which I could then either query, or loop through, to import the data into the destination SQL Server tables.
So far, using the examples from Microsoft here I have been successful at importing the Excel file path/name, and both worksheet names, into Object variables. However, this does not create an Object containing the actual datasets from either worksheet.
Based on examples here and elsewhere around the web, I've started a C# script that I believe will output the worksheet data into an Object variable, but I'm not very proficient with C# and having difficulty getting it debugged without a complete example to copy from. This is my code so far:
The code builds successfully, but when I run the package I receive a nondescript error
I do not get any output from either of my
Console.WriteLine commands, so I believe the Script Task is failing immediately. I do have Delay Validation = True, though changing it did not make a difference. Do you see any obvious/novice errors in my script? I've worked with SQL and SSIS for a number of years, but my C#/VB/Java/etc. knowledge and experience is limited.
Also, if I'm overlooking a better method to accomplish this in SSIS(other than the Excel Connection, which does not work), please let me know.
UPDATE - 5/31/16: I had a little time to work on the project today, and made a bit of progress, I think. I've updated my Script Task to include the following:
The Script Task completes successfully now, so I added a Foreach Loop container afterward, set it to Foreach From Variable Enumerator, and selected ExcelDataTable_IndemnityScores as the collection.
However, now I'm having difficulty extracting data from this Object variable. It has(or at least shouldhave) two columns, which I've set in the Variable Mappings, and am using an Execute SQL command to insert the values into a table. Unfortunately, only a single blank value for each column are inserted.
So next I replaced the Execute SQL with a simple Script Task to return the values of each variable. Unfortunately, instead of the value it returns "Microsoft.SqlServer.Dts.Runtime.Variable". I presume this is a novice error on my part, but I've not found anything online explaining the error yet?
UPDATE 6/14/2016: I finally completed the package and it ran successfully in production yesterday. I ended up using advice presented here, as well as examples found elsewhere. My general workflow required triple-nested Foreach Loops to get both worksheets imported from the source workbook(s) - I only expect one per-month, but nothing has been 100% consistent with this task.
My outermost loop simply enumerates my import directory to find the file(s) downloaded by the FTP process. It contains two script tasks. The first one simply confirms the filename of the first spreadsheet downloaded by the FTP process. I used the Microsoft link above for my code, with only minor modification for my variable names.
The second task gets all worksheet names from the first spreadsheet, and was also built using the Microsoft link above. However I exclude any worksheet names with "#" to prevent the XML database from being assigned to my variable.
The second loop(first inner loop) enumerates through each worksheet name parsed within the first loop. It contains three script tasks, the first of which imports the data from the first worksheet into my object variable.
public void Main() { try {
The second scrip task in this loop simply deletes any blank rows from Excel. I could have incorporated it with the script above, but I kept it portable for potential future reuse somewhere else.
The third script task in this loop uses the worksheet name to set a variable which is used in the next loop to determine my destination table.
The third loop(2nd inner loop) enumerates the rows in the object variable containing the data from the worksheet. It contains a single Execute SQL task which imports the data from the two source columns into the correct destination table based on the variable value set by the worksheet name above. Since the worksheet names have not always been consistent, this loop connects directly to my object variable, which eliminates the need to call the source columns by name. Rather, I just assign each one to a destination variable in the Foreach loop, and pass that data into my table row-by-row.
Thanks again for everyone's help and suggestions!
| ||||
|
0
|
Usually when I get that message immediately it means that I spelled the variable name wrong in your case ExcelFile. I also get this error when I execute a sql query and it returns null. Your best bet it commenting out sections of your code until the writeline executes then you at least know what code is causing the problem.
I dont understand though why the excel connection will not work. If the file is located on a UNC path and that is causing the problem you can use your script task to move the file to a location that would work.
| ||||
|
0
|
Joe C could be right, you might be referencing a variable by its wrong name. Did you pass the variable / parameter to the script task?
Still, I do not understand why you are not using one of the Data Flow-Script Tasks. You can define Input- and Output-Columns and then fill them in your script task's code:
Those output-rows can be transferred to the next task in your Data Flow - just like SSIS likes it. Also, it is much easier to use the variables. You can access them like a usual property by
this.Variables.ExcelFile (f. e.).
Another Note: Do not forget to set the (Control Flow's) script task's result. Your task might finish successfully, but there won't be any constraints in the following sequence flow.
Dts.TaskResult = (int)ScriptResults.Success; | ||||||||||||||||
|
0
|
Wow don't you hate it when your job gets complicated by others like that! So there are lots of ways to solve your issue my personal opinion is all within a script task will probably be easier for you to follow the logic and complete but @Johannes also brings up another good method. There are 2 places for script tasks and they are pretty different methods of coding and thought processes. One is the "Script Task" that is available in the Control Flow which is what appears to be where you are coding and adding the object to a variable.
The second is the "Script Component" which is available in the Data Flow Task.The former you need to think of as a standalone script that acts fairly independent of everything else and the latter is embedded within the Data Flow task and is what acts as a source, destination or transformation. Meaning that it can be geared toward populating a record set variable (object) to be consumed.
So in option 1 the way you are currently proceedign all you need to finish your code is to add some c# to update/populate a SQL table of your desire. Here is some code I stole from one of my packages I do this in:
For option 2 I used to have something about this perhaps @Johannes has a link laying around or someone can comment it here. But in this method you should be able to reuse a lot of your code but then move it to the "script component". Then define the recordset schema on the object and use it just like any other source in a data flow task.
There are 2 issues to consider and will need some further logic. 1) if using option 1 you will need to either rename your tables/datasets to what you expect before using bulk copy or dynamically manage column mappings. 2) in the data flow option you will need to transform your dataset before populating your final recordset variable to always have the same columns and datatypes.
There are performance and data validity considerations to both options. The first is probably a better performance, but data validity/error checking isn't handled by SSIS. Option 2, you will recieve the benefit of the SSIS error checking and performance for large data sets. If your data set are very very large both options need to be tweaked. There are additional considerations such as threading but I do not believe the will apply to you.
I hope this helps.
| ||||||||
|
0
|
I edited my original question with an overview of the solution which ended up working for me. If anyone has questions or would like more details/examples please let me know.
|