Sunday, December 18, 2016

Configure a Report Portal in SSRS

Configure a Report Portal in SSRS



Date is a common conformed dimension because its attributes (day, week, month, quarter, year, etc.) have the same meaning when joined to any fact table.

---
--

  • Key. A key is one or more data attributes that uniquely identify an entity.  In a physical database a key would be formed of one or more table columns whose value(s) uniquely identifies a row within a relational table. 
  • Composite key. A key that is composed of two or more attributes. 
  • Natural key. A key that is formed of attributes that already exist in the real world.  For example, U.S. citizens are issued a Social Security Number (SSN)  that is unique to them (this isn't guaranteed to be true, but it's pretty darn close in practice).  SSN could be used as a natural key, assuming privacy laws allow it, for a Person entity (assuming the scope of your organization is limited to the U.S.). 
  • Surrogate key. A key with no business meaning.
  • Candidate key. An entity type in a logical data model will have zero or more candidate keys, also referred to simply as unique identifiers (note: some people don't believe in identifying candidate keys in LDMs, so there's no hard and fast rules). For example, if we only interact with American citizens then SSN is one candidate key for the Person entity type and the combination of name and phone number (assuming the combination is unique) is potentially a second candidate key. Both of these keys are called candidate keys because they are candidates to be chosen as the primary key, an alternate key or perhaps not even a key at all within a physical data model. 
  • Primary key. The preferred key for an entity type.
  • Alternate key. Also known as a secondary key, is another unique identifier of a row within a table. 
  • Foreign key. One or more attributes in an entity type that represents a key, either primary or secondary, in another entity type.
- See more at: http://www.agiledata.org/essays/keys.html#sthash.voaMLW8f.dpuf

---





select cast((630/60.0) as  decimal(16,2))

Wednesday, November 23, 2016

How to Handle NULL Values Loaded by SSIS from Excel Files


How to Handle NULL Values Loaded by SSIS from Excel Files


By Ying Zhang
I was assigned a project to import two Excel files to our SQL 2005 server. Some DBAs or developers may have experienced that the Excel Source of SSIS would load NULL values of some columns even though the Excel file has real values. Figure 1 and Figure 2 show one example: cell I11 in the Excel file (Figure 1) appears as NULL in SSIS if you preview the file in SSIS (Figure 2). This happens because the driver samples the first eight rows of data and determines the data types of the fields (according to Excel Inserts Null Values by Allen Mitchell). Obviously, the Zip_Code field was considered as a number field and any values having a dash were replaced by NULL.
Figure 1
Figure 2

Because our client could not provide the data in other formats and we must return the same zip codes to them, I had to find a way to resolve this problem. I tried to save the file as CSV format or bulk insert the data to SQL in Script Task, however, they did not solve the problem. After spending a few hours trying different methods and reading articles online, below are the two solutions I found. I chose to go with the second method so that the process is completely automated.

Method 1

The quick fix is to convert the values in the Zip_Code column to TEXT format in Excel. As you can see in Figure 1, the values in SSN were aligned to the left which means it was a text field. And the values in Zip_Code were aligned to the right indicating it was a value field. If you convert it by right click on the column and choose convert to Text, then SSIS will be able to see all the data correctly. If your Office is an older version, you can create a new column next to it and use the Text() function to make the conversion. Figure 3 shows the new column generated. This method is fast but needs some manual work.
Figure 3

Method 2

This method is to use Script Task inside SSIS to open the Excel as a connection, query the data from the spreadsheet, and then load them to SQL. I had two Excel files to load, so I used a Foreach Loop Container to load the two files. Below are the steps to setup the task.
  1. Declare variables to be used by the Foreach Loop Container.
  2. Map values to variables in Foreach Loop Container
  3. Pass variables to Script Task
  4. Add code to Script Task to import data from Excel
  5. Declare variables to be used by the Foreach Loop Container
As mentioned earlier, I have two Excel files to import which means there are two source files, two select queries, and two destination tables. Thus, a set of variables need to be setup for them. Table 1 illustrates the variables and their values. The variable "path" was used in case the two files are saved in different locations. A variable, "run", was also added to enable/disable the import of a certain file.
Table 1. Variables and Their Values
Variable NameData TypeVariable Value for File 1Variable Value for File 2
pathStringH:\H:\
excelfileStringCustomers.xlsEmployees.xls
destinationtableStringCustomersEmployees
querystringStringselect F1,F2,F3,F4,F5,F6,F7,F8 from [Sheet1$] WHERE F1<>'SSN'select F1,F2,F3 from [Employees$] WHERE F3<>'Zip'
runBooleanTrueFalse
Declaring Variables
In the Control Flow tab, add a Foreach Loop Container from the Control Flow Items under Toolbox. Right click on the package area and choose Variables, a Variables window will appear on the left panel where the Toolbox normally locates. Before adding new variables, click on the Foreach Loop Container so it is chosen. This will default the scope of the added variables to be "Foreach Loop Container". If the scope of these variables were set to "Package", you may receive an error telling you the Excel file defined by the variables is locked or read only because it is used by another procedure in the SSIS even though you have no other processes to do with your variables.
Click the first icon in the variables window to add variables. Figure 4 shows the inputs. Because the paths of both files are the same, I specified it in the default value. The default value for the "run" variable was also set to be "true".
Figure 4
Map variables in For Each Loop Container
Double click on the Foreach Loop Container, the Foreach Loop Editor appears. Click on Collection on the left hand side, the contents of the editor changes (Figure 5).
Figure 5
In the dropdown box next to Enumerator, choose "Foreach File Enumerator" (Figure 6).
Figure 6
Click on the "Columns..." button at the bottom on the right and a For Each Item Columns window will pop up (Figure 7). Click the Add button multiple times to add 4 columns. These are for the four variables to be used by the container. Change the data type of the last column to Boolean and this is for variable "run". Click OK and you are back to the Foreach Loop Editor window.
Figure 7
Click on Variable Mappings from the list on the left to map the variables to the columns you just created earlier. Choose variable by clicking the drop down box under the Variable column. Remember to specify the index on the second column.
Figure 8
Figure 9
Click on the Collection on the left hand side again, and now you can input the values for each variable. Figure 10 shows the values input of the variables. Click OK.
Figure 10

Pass variables to Script Task
Still in the Control Flow tab, drag a Script Task from the Control Flow Items in Toolbox to the Foreach Loop Container (Figure 11).
Figure 11
Double click on the Scrip Task and a Script Task Editor appears (Figure 12).
Figure 12
Choose Script on the left panel, contents of the Script Task Editor changes as shown in Figure 13. Since the variables used in this package will not be modified by the Script Task, they are handled as "Read Only Variables". That being said, type in all the variable names separated by comma in the ReadOnlyVariables area (Figure 13). Click on the Design Script... button and the Visual Studio for Applications Designer appears. Now we can add the code to load data.
Figure 13
Code Script Task to import data from Excel to SQL.
The codes are shown below. If you have not added the reference System.Xml.dll, you need to add it by right click on the References on the left panel under Project Explorer and choose Add Reference.
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO
Imports System.Xml
Imports ADODB
Imports System.Data.OleDb
Imports System.Data.SqlClient
Public Class ScriptMain
Public Sub Main()
Dim excelFile As String 'Excel file to be imported.
Dim connectionString1 As String 'for Connection to Excel file
Dim excelConnection As OleDbConnection
Dim connectionString2 As String 'for SQL
Dim dbconnection As SqlConnection
Dim currentTable As String
Dim intSuccess As Integer
If Dts.Variables("run").Value.ToString = "True" Then
excelFile = Dts.Variables("path").Value.ToString & Dts.Variables("excelfile").Value.ToString
connectionString1 = ("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & excelFile & ";Extended Properties=""Excel 8.0;IMEX=1;HDR=NO""")
'Excel connection
excelConnection = New OleDbConnection(connectionString1)
excelConnection.Open()
'SQL connection
connectionString2 = "Data Source=SERVERNAME;Initial Catalog=DATABASENAME; Integrated Security=True;"
dbconnection = New SqlConnection(connectionString2)
dbconnection.Open()
Dim command As New OleDbCommand(Dts.Variables("querystring").Value.ToString, excelConnection)
Dim rdr As OleDbDataReader = command.ExecuteReader
Dim BulkCopy As New SqlBulkCopy(dbconnection)
BulkCopy.DestinationTableName = Dts.Variables("destinationtable").Value.ToString
BulkCopy.WriteToServer(rdr)
End If
Dts.TaskResult = Dts.Results.Success
End Sub
End Class
In the connection string for Excel, I added "IMEX=1" to tell handle the Excel file in "Import Mode".
0 is Export mode
1 is Import mode
2 is Linked mode (full update capabilities)
I also specified HDR=NO so SSIS reads the first row with column names as a record, that way all columns would be considered characters and this is what I believe helped avoid loading the NULL values. I worried if this would cause a data type conversion problem when loading the DOB and DOH columns because the data types of the destination fields are smalldatetime. Fortunately, the data types were converted implicitly with no errors.
Click OK for multiple times to get back to the Control Flow tab, run the package. Figure 14 shows all the zip codes were loaded correctly. The Employees table was empty because the run variable for the second file was set to be false.
Figure 14

This is the best way I can think of to solve this problem using SSIS. If you have a better solution or feedbacks, I'd like to know. Thanks to the web links below that provided me the codes and configurations. If your projects requires loop through multiple worksheets in multiple Excel files, the codes give in the first link below should help.
Source:

Importing data with mixed types


Importing data with mixed types



I seem to spend a lot of time wrestling with SSIS data flows that have Excel sources. The problem comes where a column contains mixed data types. Specifically a recurring scenario is where an Excel worksheet that I get asked to import into the database contains a column with mostly numeric values but in fact the field in question is a string data type, with a smaller number of alpha numeric codes in later rows.

I understand that Excel helpfully(!?) determines the data type based on a sampling of the first few values (the first 8 by default I think). I also know about adding the IMEX=1 option to the data connection string to inhibit this behaviour. However, I think the IMEX option is of limited value because my observation is that Excel still decides on double float unless it finds at least one non numeric value in its sampling phase. The trouble is that the Excel files I'm asked to import contain thousands of rows and you never know where non numeric data may lurk.

By way of example, consider a very basic Excel worksheet with a single column heading and 9 values, the last of which is non numeric:
id
1000001
1000002
1000003
1000004
1000005
1000006
1000007
1000008
ABC123

I can run a SSIS data flow that happily puts all 9 rows into a SQL table with the following schema definition:
CREATE TABLE TestTable (
id nvarchar(50) null)

There is no error in the SSIS package but the SQL table looks like this after the load:
id
1000001
1000002
1000003
1000004
1000005
1000006
1000007
1000008
NULL


Hi,

Thanks for looking at this. Something is different between our environments because the MS Jet driver always turns the 9th row value to NULL in my package, rendering the Data Conversion task useless. This may be because I use OpenOffice rather than proper Excel. When I create a second column (named new_id), which is derived from the id column using the IF function combined with ISNUMBER and TEXT functions, you can see from the screenshot below that this fixes the issue.

 Screenshot


I know it is also possible to bulk load data into a SQL DB from an Excel file using the script task, which may be the answer for ETL tasks involving Excel data sources that need to be automated. If someone could walk me through creating a script task to achieve this, that'll be great. I would prefer Visual Basic as the language.

Mixed data types in Excel column






Case
My column in Excel contains mixed datatypes (strings and numbers). The string values are shown as NULL. How can I get all values?







Solution
The Excel driver reads the first 8 rows to determine the datatype. If the majority of the rows are numbers it will choose a numeric datatype instead of a string data type and visa versa.

So if your Excelsheet looks like this. The first and second column get DT_R8 and the third column gets DT_WSTR.

The standard result in SSIS



TypeGuessRows
You can change the default of checking 8 rows (1 to16) by changing the windows registry, but that doesn't change the majority rule! If you want to change that number... Start the Windows Registry-editor and search for "TypeGuessRows". The place of that key depends on the version of Windows and Office. Examples:
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.0\Engines\Excel

For the ACE provider (Excel 2007, 2010 & 2013):
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\15.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows

Note: Setting the TypeGuessRows value to zero (0) will result in checking the first 16384 rows in excel. 

Change at your own risk!



A little trick i have used. Set IMEX = 1. Then if you have col headings, add them into the query, but then use HDR=NO. That way the first row will have text and everything else will be text. Then after the import has happened delete all header rows.

--

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=pathname_variable;Extended Properties="Excel 8.0;HDR=NO;IMEX=1"

But still digital columns are NULL,my column holds both Text and Digits.

IMEX=1 working if I am not using connection string in expression but my requirement is dynamic path, so I have to use expression.
--

The best way is to convert the column data type in excel to text and then load


---

Importing Data From Excel Using SSIS - Part 1





Problem
Recently while working on a project to import data from an Excel worksheet using SSIS, I realized that sometimes the SSIS Package failed even though when there were no changes in the structure/schema of the Excel worksheet. I investigated it and I noticed that the SSIS Package succeeded for some set of files, but for others it failed. I found that the structure/schema of the worksheet from both these sets of Excel files were the same, the data was the only difference. How come just changing the data can make an SSIS Package fail? What actually causes this failure? What can we do to fix it?  Check out this tip to learn more.
Solution
You must be wondering why the changes in the data can cause the SSIS Package to fail. Before I can talk about this issue in detail, first let me demonstrate this issue with an example.  This example should demonstrate the actual failure and solution for this problem. As you can see in the image below, I have 18 records in the Excel worksheet, when I ran my SSIS Package to load the data from this worksheet, it worked fine.

In the next image, I made some changes to row number 7. The description for ProductDescriptionId 907 is much larger than the previous data load. When I ran my SSIS package again to load the data from this worksheet, it worked fine as well.

In the next image, I reverted the previous change and made some changes to the row number 14. The description for ProductDescriptionId 1203 is much larger than the previous data load. When I ran my SSIS package again to load the data from this worksheet, it failed with the following exception:
[Excel Source [1]] Error: There was an error with output column "Description" (18) on output "Excel Source Output" (9). The column status returned was: "Text was truncated or one or more characters had no match in the target code page.".
[Excel Source [1]] Error: The "output column "Description" (18)" failed because truncation occurred, and the truncation row disposition on "output column "Description" (18)" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.
[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component "Excel Source" (1) returned error code 0xC020902A. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.

What caused the above SSIS package to fail?
SSIS Excel Connection Manager determines the data type of each column of the worksheet on the basis of the data of that particular column from the first 8 rows. This is the default behavior and the SSIS connection manager uses a value in the registry to determine the number of rows for the data type determination.  Before I can explain more about the registry key and it's setting, let's see if we can do anything in the SSIS package to prevent this issue.
In your SSIS Package, right click on the Excel source in the data flow task and click on "Advance Editor for Excel Source".  Next change the data type and length of the column from its default value. For example, in my case I have a "Description" column which has text data and the length is up to 500 characters. I have put max length for this column as 1000. Now click on "OK" button.

But what is this, the validation of the Excel Source failed itself as you can see below:

When you double click on the Excel Source task, it will inform that the component is not in a valid state and asks for your confirmation to fix this issue. When you click on the "Yes" button it will reset the data type and length of the column to what it was earlier, before we made the changes as above.

Now to summarize the whole thing : SSIS Excel Connection manager determines the data type and length of columns from the worksheet on the basis of the first eight rows of data. Even though we can change the data type and length from the Advance Editor for Excel Source, it will not be valid and the information will be reset by SSIS Excel Connection manager automatically using the same determination process.
Fixing the problem now in the registry
So as I said before, the number of rows to consider when determining the data type and length is determined by a registry key called TypeGuessRows by the SSIS Excel Connection Manager. By default its value is 8 and hence 8 rows are considered when determining the data type and length.
Now coming back to the solution, these are some of the options to address this problem:
  • Configure the source system to provide your Excel file in which data is sorted on the basis of the length of the data in each column so that largest value of each column appears in the first row and alphanumeric data appears before numeric data.
  • Configure the source system to provide a dummy record in your Excel file as first row with desired data type and size; then after data import you can remove/delete that dummy record from the database.
  • Configure the source system to provide you a csv file instead of Excel file because with a csv file you have more control to determine data type and length of a column.
  • Changing the TypeGuessRows registry key to 0 from its default value of 8. This will make the Excel connection manager consider all the rows when determining data type and length of each column.
Unfortunately, the first three options do not apply in my scenario as I do not have control on the source system providing data in worksheet.  With this being said, I made the change in the TypeGuessRows registry key and updated its value from 8 to 0. After making this change, my same package worked like a charm for the same Excel worksheets for which it failed last time.
Registry Key Location - [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel]
Please Note : On a 64-Bit Windows Server machine the registry key will be available here:
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel

Please be cautious when changing the value for TypeGuessRows registry key and keep these points in mind:
  • TypeGuessRows registry key is global setting and its not only going to impact your SSIS Package, but it will impact every place where it is referenced.
  • Changing the value for TypeGuessRows registry key to 0 from its default value of 8, makes the Excel connection manager consider all the rows when determining the data type and length of each column and hence it could have a severe impact on the performance of your SSIS package if the number of rows in your Excel worksheet is large.

Now, as we saw, to fix this issue the easiest solution is to make change in the registry, but in many scenarios you will not have control in making this change as the servers are managed by the Operations Team or there might be several other applications running on the same machine. Even if you have control in changing this setting, this change might cause your SSIS Package to perform poorly, based on the amount of data you have in your Excel worksheet, and it may impact other systems as well when this registry key is being referenced. So now the question is, is there any way we can avoid making changes in the registry, but still solve the problem? Well, stay tuned for part 2 of this tip for a solution which does not require a registry change, but still solves the problem.

In SSIS excel data source column values are returned as NULL






Hiii Friends!!! … Few days back I noticed an unexpected behavior w.r.t an excel sheet so thought of sharing that experience with you people … Here .. what had happened, I was given one excel sheet to import  into database table. I created one package excel source and OLEDB destination.. Sorry, what was that ? Yes, I knowwwwww you geniuses, I could have done that using Import/Export wizard which also creates package. But I wanted to create package by myself … so.. created a package and ran it. Data got imported so without even querying the table I conveyed that data import is done, can be verified… To my surprise, I got an immediate reply with my manager in CC stating imported data is incorrect. In one of the column the text data was replaced by NULL during import … Now, in order to test what went wrong during import process I had to run it all over again and geniuses you know what I have package ready with me…   When verified I found out that that particular column had mixed type of values (numeric and character) so ideally excel should have treated this column of a TEXT data type. However, interestingly it did not and it treated it as a numeric column … Question is why did this happen? If we find out how does EXCEL decides on the data type of the column? We will have the solution. 

We all know that each installation does some registry entries/changes in the windows, which controls the future conduct of that software.. Therefore let us go to registry (as MSoffice suit was installed) and find out if we can gather any evidences..

Please open Run prompt  – type regedit.exe and registry editor will be displayed.. Then traverse to following path 

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel 

You will see following registry entries 


The highlighted entries 

·         ImportMixedTypes  - What should be the data type of the column with mixed type of data? Default is TEXT.

·         TypeGuessRows – The no. of rows analyzed by EXCEL before deciding the data type of the column. Default value is 8.


We just came to know that, by default excel treats the data type of the column with mixed types of data as TEXT but in our case it did not. We also came to know that excel scans first 8 rows before deciding the data type of the column. 

Let us check out the first 8 rows of the column which did not import properly…

1000
2000
ASBD
230
AHGS
1000
asd
1.25

Out of 8 rows 5 rows are of numeric and 3 contains the character.  TypeGuessRows property overrides the ImportMixedTypes property to decide the data type of the column. Thus resulted in the all the text values displayed as NULL. (In the below screenshot it can be observed)



Now, how do we make Excel to consider this columns data type as TEXT? Well, if we make EXCEL work in IMPORT mode then we should be good. Did I hear what IMPORT mode in EXCEL is?  Import mode is something where we explicitly tell EXCEL driver to consider the ImportMixedTypes registry setting. Now the question to ask is How do I do that ? It can be done by adding IMEX to the connection string of the EXCEL manager

Right Click on the Excel connection Manager -> Go to properties -> Connection string 

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=Excel file path\Test_IMEX.xlsx;Extended Properties="EXCEL 12.0;HDR=YES;"

After adding IMEX=1 it should look like the one given below

Provider=Microsoft.Jet.OLEDB.4.0;Data Source= Excel file path\Test_IMEX.xlsx;Extended Properties="EXCEL 12.0;HDR=YES;IMEX=1;"

Save it and re-run the package 



Now,as you can see the text values displayed as they are. This way we can make EXCEL treat column as Excel. But similar setting won’t work if the first 8 rows of same data type say varchar / numeric and rest of rows have mixed type of data. In that case, EXCEL will treat the column datatype as char/numeric. Thus it is good practice to set TypeGuessRows=0. But if you have millions of rows in the sheet then it will hamper the performance since EXCEL will loop over each row from the sheet and then decide the data type of the column. 

Note : 

If you have the Excel provider as Microsoft.ACE.OLEDB.12.0; then you will have to check the registry setting of the EXCEL @

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\AccessConnectivityEngine\Engines\Excel

Excel - C#



Working with Excel files in C#




I receive a monthly XLSX file that needs imported into SQL Server using an SSIS package. Unfortunately, the sender does not follow UNC naming for the filename or worksheets, and our recent migration to SQL Server 2012 has caused the package to fail - even when using an Excel Connection Manager. We have also tried sending them a template, but they refuse to follow it, and we have no leverage to force them to do so.
I have been attempting an update to the package which would use a Script Task to import each of the two Excel worksheets into a System.Object for each, which I could then either query, or loop through, to import the data into the destination SQL Server tables.
So far, using the examples from Microsoft here I have been successful at importing the Excel file path/name, and both worksheet names, into Object variables. However, this does not create an Object containing the actual datasets from either worksheet.
Based on examples here and elsewhere around the web, I've started a C# script that I believe will output the worksheet data into an Object variable, but I'm not very proficient with C# and having difficulty getting it debugged without a complete example to copy from. This is my code so far:
using System;
using System.Data;
using System.Data.OleDb; 
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
        public DataSet Main()
        {
            string fileName;
            string connectionString;

            fileName = Dts.Variables["ExcelFile"].Value.ToString();
            Console.WriteLine(fileName);

            connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;" +
                "Data Source=" + fileName + ";Extended Properties=Excel 12.0 Xml";
            Console.WriteLine(connectionString);           
            DataSet data = new DataSet();
            using (OleDbConnection con = new OleDbConnection(connectionString))
            {
                con.Open();
                OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", connectionString);
                adapter.Fill(data);
            }

            return data;
        }
}
The code builds successfully, but when I run the package I receive a nondescript error
Error: 0x1 at Script Task: Exception has been thrown by the target of an invocation.
Task failed: Script Task
I do not get any output from either of my Console.WriteLine commands, so I believe the Script Task is failing immediately. I do have Delay Validation = True, though changing it did not make a difference. Do you see any obvious/novice errors in my script? I've worked with SQL and SSIS for a number of years, but my C#/VB/Java/etc. knowledge and experience is limited.
Also, if I'm overlooking a better method to accomplish this in SSIS(other than the Excel Connection, which does not work), please let me know.
UPDATE - 5/31/16: I had a little time to work on the project today, and made a bit of progress, I think. I've updated my Script Task to include the following:
        DataSet data = new DataSet();
        using (OleDbConnection con = new OleDbConnection(connectionString))
        {
            con.Open();
            OleDbDataAdapter adapter = new OleDbDataAdapter(query, con);
            //OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM [Indemnity Scores$]", con);
            adapter.Fill(data);
            Dts.Variables["ExcelDataTable_IndemnityScores"].Value = data;
        }
The Script Task completes successfully now, so I added a Foreach Loop container afterward, set it to Foreach From Variable Enumerator, and selected ExcelDataTable_IndemnityScores as the collection.
However, now I'm having difficulty extracting data from this Object variable. It has(or at least shouldhave) two columns, which I've set in the Variable Mappings, and am using an Execute SQL command to insert the values into a table. Unfortunately, only a single blank value for each column are inserted.
So next I replaced the Execute SQL with a simple Script Task to return the values of each variable. Unfortunately, instead of the value it returns "Microsoft.SqlServer.Dts.Runtime.Variable". I presume this is a novice error on my part, but I've not found anything online explaining the error yet?
UPDATE 6/14/2016: I finally completed the package and it ran successfully in production yesterday. I ended up using advice presented here, as well as examples found elsewhere. My general workflow required triple-nested Foreach Loops to get both worksheets imported from the source workbook(s) - I only expect one per-month, but nothing has been 100% consistent with this task.
My outermost loop simply enumerates my import directory to find the file(s) downloaded by the FTP process. It contains two script tasks. The first one simply confirms the filename of the first spreadsheet downloaded by the FTP process. I used the Microsoft link above for my code, with only minor modification for my variable names.
The second task gets all worksheet names from the first spreadsheet, and was also built using the Microsoft link above. However I exclude any worksheet names with "#" to prevent the XML database from being assigned to my variable.
The second loop(first inner loop) enumerates through each worksheet name parsed within the first loop. It contains three script tasks, the first of which imports the data from the first worksheet into my object variable.
public void Main() { try {
            string fileName;
            string connectionString;
            string worksheetName;
            string query;

            fileName = Dts.Variables["ExcelFile"].Value.ToString();
            //MessageBox.Show("InsertWorksheetDataIntoObject - Filename: " + fileName);

            connectionString = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;" +
                "Data Source={0};Extended Properties=Excel 12.0 Xml;", fileName);
            //MessageBox.Show("Connection: " + connectionString);

            worksheetName = Dts.Variables["ExcelTable"].Value.ToString();
            worksheetName = worksheetName.Replace("'", "");
            //MessageBox.Show("InsertWorksheetDataIntoObject - Worksheet: " + worksheetName);

            query = string.Format("SELECT * FROM [" + worksheetName + "]");
            //MessageBox.Show("Query: " + query);

            DataSet data = new DataSet();
            using (OleDbConnection con = new OleDbConnection(connectionString))
            {
                con.Open();
                OleDbDataAdapter adapter = new OleDbDataAdapter(query, con);
                adapter.Fill(data);
                Dts.Variables["ExcelDataTable"].Value = data;
            }

            Dts.TaskResult = (int)ScriptResults.Success;
        }

        catch (Exception ex)
        {
            Dts.Events.FireError(-1, "ErrorMessage", ex.ToString(), "", 0);
            Dts.TaskResult = (int)ScriptResults.Failure;
        }


        //return data;

    }
The second scrip task in this loop simply deletes any blank rows from Excel. I could have incorporated it with the script above, but I kept it portable for potential future reuse somewhere else.
The third script task in this loop uses the worksheet name to set a variable which is used in the next loop to determine my destination table.
The third loop(2nd inner loop) enumerates the rows in the object variable containing the data from the worksheet. It contains a single Execute SQL task which imports the data from the two source columns into the correct destination table based on the variable value set by the worksheet name above. Since the worksheet names have not always been consistent, this loop connects directly to my object variable, which eliminates the need to call the source columns by name. Rather, I just assign each one to a destination variable in the Foreach loop, and pass that data into my table row-by-row.
Thanks again for everyone's help and suggestions!
shareimprove this question
   
Have you tried enabling this excel-sql-server.com/… (ad hoc, dynamic parameters, allow in process etc) and get data from that file in SSMS? – gofr1 May 28 at 7:29

4 Answers

Usually when I get that message immediately it means that I spelled the variable name wrong in your case ExcelFile. I also get this error when I execute a sql query and it returns null. Your best bet it commenting out sections of your code until the writeline executes then you at least know what code is causing the problem.
I dont understand though why the excel connection will not work. If the file is located on a UNC path and that is causing the problem you can use your script task to move the file to a location that would work.
shareimprove this answer
   
Good catch on the variable Joe C! I do have a variable named "ExcelFile" used in a previous task, while this task uses "ExcelFiles" FYI - those were the variable names from the Microsoft example I started out with, not one I will be using in the actual package. – Pete Gossett May 31 at 13:41
Joe C could be right, you might be referencing a variable by its wrong name. Did you pass the variable / parameter to the script task?
Still, I do not understand why you are not using one of the Data Flow-Script Tasks. You can define Input- and Output-Columns and then fill them in your script task's code:
    public override void CreateNewOutputRows()
    {
        /*
          Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
          For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
        */
    }
Those output-rows can be transferred to the next task in your Data Flow - just like SSIS likes it. Also, it is much easier to use the variables. You can access them like a usual property by this.Variables.ExcelFile (f. e.).
Another Note: Do not forget to set the (Control Flow's) script task's result. Your task might finish successfully, but there won't be any constraints in the following sequence flow.
Dts.TaskResult = (int)ScriptResults.Success;
shareimprove this answer
   
I believe the Data Flow Script Task requires an Excel connection, correct? The filename we receive does not follow standard UNC naming, so the Excel connection fails - or at least that appears to be true. If I manually rename the file, removing spaces and special characters, the Excel connection will work...if only there were a away to ensure the client would use our template. – Pete Gossett May 31 at 13:44
   
Is the location always the same? Why wouldn't you use a script task to rename the file? Then take an Excel-Source which is followed by another (script) task. – Johannes Jun 1 at 11:12
   
That might be a possible solution Johannes, though I'll also need to rename the worksheets as they also have spaces in their names. I'll explore that option and see what problems I encounter. Thanks! – Pete GossettJun 1 at 16:40
   
Renaming a worksheet in an Excel-workbook shouldn't be a problem with .Net's libraries. I'm glad I could help.– Johannes Jun 2 at 8:45
Wow don't you hate it when your job gets complicated by others like that! So there are lots of ways to solve your issue my personal opinion is all within a script task will probably be easier for you to follow the logic and complete but @Johannes also brings up another good method. There are 2 places for script tasks and they are pretty different methods of coding and thought processes. One is the "Script Task" that is available in the Control Flow which is what appears to be where you are coding and adding the object to a variable. enter image description here
The second is the "Script Component" which is available in the Data Flow Task.enter image description hereThe former you need to think of as a standalone script that acts fairly independent of everything else and the latter is embedded within the Data Flow task and is what acts as a source, destination or transformation. Meaning that it can be geared toward populating a record set variable (object) to be consumed.
So in option 1 the way you are currently proceedign all you need to finish your code is to add some c# to update/populate a SQL table of your desire. Here is some code I stole from one of my packages I do this in:
            SqlConnection sqlConnection = new SqlConnection(sqlConnectionString);
        sqlConnection.Open();

        SqlBulkCopy bulkCopy = new SqlBulkCopy(sqlConnection);
        bulkCopy.DestinationTableName = _stagingTableName;
        foreach (DataColumn col in _jobRecDT.Columns)
        {
            //System.Windows.Forms.MessageBox.Show(col.ColumnName);
            bulkCopy.ColumnMappings.Add(col.ColumnName, col.ColumnName);
        }


        bulkCopy.WriteToServer(_jobRecDT);

        sqlConnection.Close();
For option 2 I used to have something about this perhaps @Johannes has a link laying around or someone can comment it here. But in this method you should be able to reuse a lot of your code but then move it to the "script component". Then define the recordset schema on the object and use it just like any other source in a data flow task.
There are 2 issues to consider and will need some further logic. 1) if using option 1 you will need to either rename your tables/datasets to what you expect before using bulk copy or dynamically manage column mappings. 2) in the data flow option you will need to transform your dataset before populating your final recordset variable to always have the same columns and datatypes.
There are performance and data validity considerations to both options. The first is probably a better performance, but data validity/error checking isn't handled by SSIS. Option 2, you will recieve the benefit of the SSIS error checking and performance for large data sets. If your data set are very very large both options need to be tweaked. There are additional considerations such as threading but I do not believe the will apply to you.
I hope this helps.
shareimprove this answer
   
Thanks Matt for the detailed example! – Pete Gossett Jun 14 at 20:34
   
my pleasure I remember the first time I had to figure out scripts in SSIS it was lots of fun :) – Matt Jun 14 at 20:37
I edited my original question with an overview of the solution which ended up working for me. If anyone has questions or would like more details/examples please let me know.