Wednesday, November 23, 2016

Excel - C#



Working with Excel files in C#




I receive a monthly XLSX file that needs imported into SQL Server using an SSIS package. Unfortunately, the sender does not follow UNC naming for the filename or worksheets, and our recent migration to SQL Server 2012 has caused the package to fail - even when using an Excel Connection Manager. We have also tried sending them a template, but they refuse to follow it, and we have no leverage to force them to do so.
I have been attempting an update to the package which would use a Script Task to import each of the two Excel worksheets into a System.Object for each, which I could then either query, or loop through, to import the data into the destination SQL Server tables.
So far, using the examples from Microsoft here I have been successful at importing the Excel file path/name, and both worksheet names, into Object variables. However, this does not create an Object containing the actual datasets from either worksheet.
Based on examples here and elsewhere around the web, I've started a C# script that I believe will output the worksheet data into an Object variable, but I'm not very proficient with C# and having difficulty getting it debugged without a complete example to copy from. This is my code so far:
using System;
using System.Data;
using System.Data.OleDb; 
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
        public DataSet Main()
        {
            string fileName;
            string connectionString;

            fileName = Dts.Variables["ExcelFile"].Value.ToString();
            Console.WriteLine(fileName);

            connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;" +
                "Data Source=" + fileName + ";Extended Properties=Excel 12.0 Xml";
            Console.WriteLine(connectionString);           
            DataSet data = new DataSet();
            using (OleDbConnection con = new OleDbConnection(connectionString))
            {
                con.Open();
                OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", connectionString);
                adapter.Fill(data);
            }

            return data;
        }
}
The code builds successfully, but when I run the package I receive a nondescript error
Error: 0x1 at Script Task: Exception has been thrown by the target of an invocation.
Task failed: Script Task
I do not get any output from either of my Console.WriteLine commands, so I believe the Script Task is failing immediately. I do have Delay Validation = True, though changing it did not make a difference. Do you see any obvious/novice errors in my script? I've worked with SQL and SSIS for a number of years, but my C#/VB/Java/etc. knowledge and experience is limited.
Also, if I'm overlooking a better method to accomplish this in SSIS(other than the Excel Connection, which does not work), please let me know.
UPDATE - 5/31/16: I had a little time to work on the project today, and made a bit of progress, I think. I've updated my Script Task to include the following:
        DataSet data = new DataSet();
        using (OleDbConnection con = new OleDbConnection(connectionString))
        {
            con.Open();
            OleDbDataAdapter adapter = new OleDbDataAdapter(query, con);
            //OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM [Indemnity Scores$]", con);
            adapter.Fill(data);
            Dts.Variables["ExcelDataTable_IndemnityScores"].Value = data;
        }
The Script Task completes successfully now, so I added a Foreach Loop container afterward, set it to Foreach From Variable Enumerator, and selected ExcelDataTable_IndemnityScores as the collection.
However, now I'm having difficulty extracting data from this Object variable. It has(or at least shouldhave) two columns, which I've set in the Variable Mappings, and am using an Execute SQL command to insert the values into a table. Unfortunately, only a single blank value for each column are inserted.
So next I replaced the Execute SQL with a simple Script Task to return the values of each variable. Unfortunately, instead of the value it returns "Microsoft.SqlServer.Dts.Runtime.Variable". I presume this is a novice error on my part, but I've not found anything online explaining the error yet?
UPDATE 6/14/2016: I finally completed the package and it ran successfully in production yesterday. I ended up using advice presented here, as well as examples found elsewhere. My general workflow required triple-nested Foreach Loops to get both worksheets imported from the source workbook(s) - I only expect one per-month, but nothing has been 100% consistent with this task.
My outermost loop simply enumerates my import directory to find the file(s) downloaded by the FTP process. It contains two script tasks. The first one simply confirms the filename of the first spreadsheet downloaded by the FTP process. I used the Microsoft link above for my code, with only minor modification for my variable names.
The second task gets all worksheet names from the first spreadsheet, and was also built using the Microsoft link above. However I exclude any worksheet names with "#" to prevent the XML database from being assigned to my variable.
The second loop(first inner loop) enumerates through each worksheet name parsed within the first loop. It contains three script tasks, the first of which imports the data from the first worksheet into my object variable.
public void Main() { try {
            string fileName;
            string connectionString;
            string worksheetName;
            string query;

            fileName = Dts.Variables["ExcelFile"].Value.ToString();
            //MessageBox.Show("InsertWorksheetDataIntoObject - Filename: " + fileName);

            connectionString = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;" +
                "Data Source={0};Extended Properties=Excel 12.0 Xml;", fileName);
            //MessageBox.Show("Connection: " + connectionString);

            worksheetName = Dts.Variables["ExcelTable"].Value.ToString();
            worksheetName = worksheetName.Replace("'", "");
            //MessageBox.Show("InsertWorksheetDataIntoObject - Worksheet: " + worksheetName);

            query = string.Format("SELECT * FROM [" + worksheetName + "]");
            //MessageBox.Show("Query: " + query);

            DataSet data = new DataSet();
            using (OleDbConnection con = new OleDbConnection(connectionString))
            {
                con.Open();
                OleDbDataAdapter adapter = new OleDbDataAdapter(query, con);
                adapter.Fill(data);
                Dts.Variables["ExcelDataTable"].Value = data;
            }

            Dts.TaskResult = (int)ScriptResults.Success;
        }

        catch (Exception ex)
        {
            Dts.Events.FireError(-1, "ErrorMessage", ex.ToString(), "", 0);
            Dts.TaskResult = (int)ScriptResults.Failure;
        }


        //return data;

    }
The second scrip task in this loop simply deletes any blank rows from Excel. I could have incorporated it with the script above, but I kept it portable for potential future reuse somewhere else.
The third script task in this loop uses the worksheet name to set a variable which is used in the next loop to determine my destination table.
The third loop(2nd inner loop) enumerates the rows in the object variable containing the data from the worksheet. It contains a single Execute SQL task which imports the data from the two source columns into the correct destination table based on the variable value set by the worksheet name above. Since the worksheet names have not always been consistent, this loop connects directly to my object variable, which eliminates the need to call the source columns by name. Rather, I just assign each one to a destination variable in the Foreach loop, and pass that data into my table row-by-row.
Thanks again for everyone's help and suggestions!
shareimprove this question
   
Have you tried enabling this excel-sql-server.com/… (ad hoc, dynamic parameters, allow in process etc) and get data from that file in SSMS? – gofr1 May 28 at 7:29

4 Answers

Usually when I get that message immediately it means that I spelled the variable name wrong in your case ExcelFile. I also get this error when I execute a sql query and it returns null. Your best bet it commenting out sections of your code until the writeline executes then you at least know what code is causing the problem.
I dont understand though why the excel connection will not work. If the file is located on a UNC path and that is causing the problem you can use your script task to move the file to a location that would work.
shareimprove this answer
   
Good catch on the variable Joe C! I do have a variable named "ExcelFile" used in a previous task, while this task uses "ExcelFiles" FYI - those were the variable names from the Microsoft example I started out with, not one I will be using in the actual package. – Pete Gossett May 31 at 13:41
Joe C could be right, you might be referencing a variable by its wrong name. Did you pass the variable / parameter to the script task?
Still, I do not understand why you are not using one of the Data Flow-Script Tasks. You can define Input- and Output-Columns and then fill them in your script task's code:
    public override void CreateNewOutputRows()
    {
        /*
          Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
          For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
        */
    }
Those output-rows can be transferred to the next task in your Data Flow - just like SSIS likes it. Also, it is much easier to use the variables. You can access them like a usual property by this.Variables.ExcelFile (f. e.).
Another Note: Do not forget to set the (Control Flow's) script task's result. Your task might finish successfully, but there won't be any constraints in the following sequence flow.
Dts.TaskResult = (int)ScriptResults.Success;
shareimprove this answer
   
I believe the Data Flow Script Task requires an Excel connection, correct? The filename we receive does not follow standard UNC naming, so the Excel connection fails - or at least that appears to be true. If I manually rename the file, removing spaces and special characters, the Excel connection will work...if only there were a away to ensure the client would use our template. – Pete Gossett May 31 at 13:44
   
Is the location always the same? Why wouldn't you use a script task to rename the file? Then take an Excel-Source which is followed by another (script) task. – Johannes Jun 1 at 11:12
   
That might be a possible solution Johannes, though I'll also need to rename the worksheets as they also have spaces in their names. I'll explore that option and see what problems I encounter. Thanks! – Pete GossettJun 1 at 16:40
   
Renaming a worksheet in an Excel-workbook shouldn't be a problem with .Net's libraries. I'm glad I could help.– Johannes Jun 2 at 8:45
Wow don't you hate it when your job gets complicated by others like that! So there are lots of ways to solve your issue my personal opinion is all within a script task will probably be easier for you to follow the logic and complete but @Johannes also brings up another good method. There are 2 places for script tasks and they are pretty different methods of coding and thought processes. One is the "Script Task" that is available in the Control Flow which is what appears to be where you are coding and adding the object to a variable. enter image description here
The second is the "Script Component" which is available in the Data Flow Task.enter image description hereThe former you need to think of as a standalone script that acts fairly independent of everything else and the latter is embedded within the Data Flow task and is what acts as a source, destination or transformation. Meaning that it can be geared toward populating a record set variable (object) to be consumed.
So in option 1 the way you are currently proceedign all you need to finish your code is to add some c# to update/populate a SQL table of your desire. Here is some code I stole from one of my packages I do this in:
            SqlConnection sqlConnection = new SqlConnection(sqlConnectionString);
        sqlConnection.Open();

        SqlBulkCopy bulkCopy = new SqlBulkCopy(sqlConnection);
        bulkCopy.DestinationTableName = _stagingTableName;
        foreach (DataColumn col in _jobRecDT.Columns)
        {
            //System.Windows.Forms.MessageBox.Show(col.ColumnName);
            bulkCopy.ColumnMappings.Add(col.ColumnName, col.ColumnName);
        }


        bulkCopy.WriteToServer(_jobRecDT);

        sqlConnection.Close();
For option 2 I used to have something about this perhaps @Johannes has a link laying around or someone can comment it here. But in this method you should be able to reuse a lot of your code but then move it to the "script component". Then define the recordset schema on the object and use it just like any other source in a data flow task.
There are 2 issues to consider and will need some further logic. 1) if using option 1 you will need to either rename your tables/datasets to what you expect before using bulk copy or dynamically manage column mappings. 2) in the data flow option you will need to transform your dataset before populating your final recordset variable to always have the same columns and datatypes.
There are performance and data validity considerations to both options. The first is probably a better performance, but data validity/error checking isn't handled by SSIS. Option 2, you will recieve the benefit of the SSIS error checking and performance for large data sets. If your data set are very very large both options need to be tweaked. There are additional considerations such as threading but I do not believe the will apply to you.
I hope this helps.
shareimprove this answer
   
Thanks Matt for the detailed example! – Pete Gossett Jun 14 at 20:34
   
my pleasure I remember the first time I had to figure out scripts in SSIS it was lots of fun :) – Matt Jun 14 at 20:37
I edited my original question with an overview of the solution which ended up working for me. If anyone has questions or would like more details/examples please let me know.

No comments:

Post a Comment