Tuesday, July 30, 2013

SSIS Package Configuration

Please refer to the video for details. SSIS Package Configuration

ssispackage-configuration

------------------

Finalpoint:

PackageConfiguration: XML Indirect method:

The .dtsconfig file which holds the connection to database , the path of the file is put in
an environment variable.

PackageConfiguration:SQL Server

Configure and it creates a Table , say "SSISConfigurations" in the specified database.

In this configuration , specify a Filter that identifies it as unique along with PackagePath.

SSMS:Navigate to the specified database, risht click choose "Generate Script" , then choose
under "Table ".. "Script - True... , choose the table "
"SSISConfigurations" , in query window it puts the script.

To move production:

This
"SSISConfigurations" script give to dba and the environment variable that holds the connection
detail.

-----------------

1.Environment variable: (My Computer - Properties -Advanced - Environmental Variable -Product Id ,
Right Click-Solution Explorer-Package Configuration option [ Enable Package Configuration ]-Environmental variable-,Wizard -Map Variable Value
2.Variables - New User Variables : regedit-HKEY_CURRENT_USER-Add -Key-
Right Click-Solution Explorer-Package Configuration-Registry-,Wizard -Map Variable Value
3.xml configuration - 2 methods :Direct ( create the Connection string as Env. variable,map in package configuration), Indirect
i.Direct- Right Click-Solution Explorer-Package Configuration-choose a new file to store all parameters - say PkgConfig.dtsConfig
ii.Environmental variable - ConfigPath point to the PkgConfig.dtsConfig ( it has connection string)
4."SQL Server Configuration" - Server Name -.[dbo].[SSISConfigurations] , Configuration filter:VariableDemo { this makes package unique}
5. Go to DB(in explorer)- right click-Task-Generate Script - Script Data(True),Tables(SSISConfigurations):Puts script into Query window
and with the values the DBA has to just run that script (Test or Prod) and create the necessary Environmental variables.

Monday, July 29, 2013

Oracle OBIEE ( Oracle Business Intelligence Enterprise Edition)

Operational Data Store (ODS):Data Warehousing (DWH):Enterprise Data Warehouse (EDW)

DWH Concepts and Fundamentals

Operational Data Store (ODS) : - Staging Area. Data updated from ERP etc.

Data Warehousing (DWH) : Data from Staging Area loaded for data storage.

Enterprise Data Warehouse (EDW) : Data Sources from different Business Area updating data (Data Marts etc.) [ If clear pictures of Business processes are known , then this is having integrated data.]

Bill Inmon (Normalization process , RDBMS) vs. Ralph Kimball (Dimensional Modeling)

datawarehousing-inmon-kimball

Monday, July 22, 2013

DW - Normalization

Answers collected from the below site.Please refer to the site for details. database-normalization

Normalization of Database (eliminate data duplication & maintain data integrity)
1NF: student(s_id,s_name) subject(subject_id,student_id,subject)
Each column have a unique value.
Each row of data have a unique identifier i.e. Primary key.
Table:Student , Subject (sid,sname,subject - sname repeats in rows)
2NF: All 1NF + if table has concatenated primary key ( no partial dependency
of any column on primary key.) ( if any column depends on one part of concatenated key then table fails 2NF)
Customer(cust_id,cust_name,ord_id,ord_name,sale_det)
PK:(cust_id+ord_id)
cust_name ( depends on only customer_id:FAILURE)
ord_name ( depends on only ord_id:FAILURE)
Customer(cust_id,cust_name) Order_Detail( ord_id, ord_name)
Sales(cust_id,ord_id,sale_det)
3NF: move the transitive functional dependency to another table.non-prime attribute must depend on primary key.
Student(stud_id,stud_name,dob,zip(street,city,state - transitive) -move to)
Student(stud_id,stud_name,dob,zip) Address(Zip,Street,city,state)
BCNF:higher version of 3 NF does not have multiple candidate keys.

SQL SERVER – Difference Between Candidate Keys and Primary Key
Primary key: not null & uniquely defines the row Candidate Key: can be null, unique

DW - SCD - Slowly changing dimensions

Dimension - in data management & data warehousing - logical grouping of data
[geographical location, customer , product]
Type 0 - Values remain as they were at the time the dimension record was first inserted. [ history ]
Type I - This methodology overwrites old with new data, and therefore does not track historical data.
Supplier_Key Supplier_Code Supplier_Name Supplier_State
123 ABC xyz CA
Supplier_Code is the natural key.
Supplier_Key is a surrogate key ( joins use integer rather than character keys.)
Type II:This method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers.
Start_Date,End_Date( null - current one)
Type IV:Is usually referred to as using "history tables".Where one table keeps the current data, and an additional table is used to keep a record of some or all changes.

Slowly_changing_dimension

Saturday, July 13, 2013

Oracle - Tuning SQL Statements

Answers collected from the below site.Please refer to the site for details.
Tuning SQL Statements
1.Oracle SQL Analyzer:statement TopSQL:Hint Wizard:SQL Tuning Wizard
2.Index:full table scans:Oracle cost-based optimizer
3.specifying Hints:

Oracle SQL Analyze applies these "rules-of-thumb" when you tune a statement with the Tuning Wizard, and supplies alternative SQL statements when possible.

Use NOT EXISTS instead of NOT IN ( NOT IN uses full Table scan)
Use NOT EXISTS or NOT IN with hints instead of MINUS (MINUS does not use Indexes)
Use TRUNC differently to enable indexes ,Use operators differently to enable indexes
Do not use columns on both sides of operator,Use WHERE in place of HAVING,
Use UNION ALL instead of UNION

The logic of the NOT EXISTS clause tells Oracle not to return the row if it finds a match in both tables.

NOT IN" vs "MINUS": "NOT IN" is much slower than "MINUS" as for "NOT IN" for each row the subquery all rows are scanned.In "MINUS" at one time both table Full Scan is done.

SELECT mod_code FROM SRS.Table1 WHERE mod_code NOT IN (SELECT mod_code FROM SRS.Table2);

SELECT mod_code FROM SRS.Table1 MINUS SELECT mod_code FROM SRS.Table2;

Friday, July 12, 2013

SQL - DEPT table who has no employees assigned in EMP table

Answers collected from the below site.Please refer to the site for details.
To take out those dept who has no employees assigned to it
Table Structure:EMP:( EMPNO ENAME DEPTNO) DEPT:DEPTNO DNAME
Ms Sql Server:->select distinct d.dname from dept d EXCEPT select d.dname from dept d JOIN emp e where d.deptno=e.deptno
SQL SERVER – EXCEPT Clause in SQL Server is Similar to MINUS Clause in Oracle
select dname from dept where deptno not in (select deptno from emp)
Notes:i.Above to work assume all emplyoees have a deptno.CEO deptno could be NULL = fail, for that the below query.Not in uses Full scan of the table very slow,instead use NOT EXISTS.
select distinct d.dname from dept d left join emp e on d.deptno=e.deptno where e.deptno is null
Oracle:->select distinct d.dname from dept d where NOT EXISTS ( select * from emp e where d.deptno=e.deptno)

Reference:T-SQL for Starters: Simple and Correlated Subqueries

subquery:A subquery is a query that SQL Server must evaluate before it can process the main query. note:Instead of IN operator JOIN is more efficient.
correlated subquery:A correlated subquery is one that depends on a value in the outer query.The net effect is that the subquery runs once for every row in the main query; this situation is inefficient. Correlated subqueries evaluate once for each row of the outer query, so use these queries only for single-row results.
EXISTS:But the convention with EXISTS is that you use an asterisk, rather than specifying a column, because you do not expect SQL Server to return any data values—EXISTS returns only Boolean values (i.e., true or false).

Get employees who worked in more than one department with SQL query

Employee (employeeName, wage, contactNo)Department (employeeName, departmentNo,hours,startDate)

select e.employeename from Employee e INNER JOIN Department d ON e.employeename=d.employeename GROUP BY e.employeename HAVING COUNT(departmentNo) > 2
sql query to display dept name,location name,number of employees
select dname as DNAME, loc as LOC, SUM(1) as Number_of_People, avg(sal) as AvgSalary from dept,emp group by dname, loc;

Wednesday, July 10, 2013

SSIS- What is the difference between data flow and control flow?

Control Flow has the Business Logic.Data Flow deals with data.

Control Flow vs Data Flow

What is the difference between data flow and control flow?

SSIS Basics Package Creation Video Example SSIS 2012

SSIS Basics Package Creation Video Example SSIS 2012

------------------------------------------------------------------------------------------------------------
SSIS Basics Package Creation Video Example SSIS 2012

Task:
Control Flow Tab:Data Flow Task:clicking on it take to "Data Flow" tab [ here specified "Source" system from which Data is pulled and populated to "Destination".]

Control Flow Tab:Send Mail Task:A "precedent" "constraint" is applied - on Job failure to send an email.

Transformation:
Data Flow Tab:
Derived Column Transformation:
Derived Column Name: Full Name Expression:FirstName+" "+Surname

note:
Using Derieved Column Transformation , the value of the column also changed.

------------------------------------------------------------------------------------------------------------
1.Visual Studio:SQL Server Data Tools (versn.2012):BIDS(Business Intelligence Development Studio):Part of Sql Server Installation.

1.To create SSIS Package need , "SQL Server Data Tools" installed ( part of Sql Server 2012).

note1:In previous vers. of Sql Server: "Business Intelligence Development Studio"("BIDS").
note2:Both of them integrated into "Visual Studio" , part of "Sql Server Installation".

2.[i.Project:Solution Explorer:Project.params:Connection Managers] [ii.SSIS Packages:package.dtsx] SSIS Toolbox.

2.To create package you click on "SQL Server Data Tools"

Also when you create Package first you create a Solution for SSIS.

To do that: File->New Project->Integration Services Project

Business Intelligence

Analysis Services
Integration Services
Reporting Services

SQL Server
Other Project Types

notes:
i.when you create a solution , it creates project , in one solution, you can have multiple projects.One can even include one SSRS project.But most solution contains one project.

SSIS Basics
Project.params
Connection Managers
SSIS Packages

Package.dtsx

3.Control Flow(Precedence Constraint):Data Flow(Derieved Column Transformation: "FNANE" + " "+ "LNAME"):Parameters:Event Handlers:Package Explorer

(vers.2012:Variables(change value during execution of the package),Parameters(deploy in diff. env.(dev,prod) and change value(connection)

ii.Tabs:
Control Flow,Data Flow,Parameters,Event Handler,Package Explorer

Parameters:change connection string to point to Dev. or Prod.

a.Variables - change the values in runtime.
note:Previous vers. only had variables , which one can use to change values inside package

and also outside of package but now it has "Parameters".

b.Control Flow

Data Flow Task ( Name It : Import MyFile)

Send Mail Task

i.Constraint (Precedence Constraint)

Value:Failure

4.Flat File Source:OLE DB Destination:SQL Server Management Studio
[OLE DB: Destination(check constraints,keep identity,table lock,keep nulls)][Data access mode:Table or view - fast load][Mapping]

5.Configuration:Connection Manager:Expression( dynamically change):Solution Explorer:Project.params

6.New 2012 Vers.:Connection Manager(right click):Parameterize:

In "Connection Manager" , in property: Connection String: the value is hard coded , there is another property "Expression:" that at run time can change the value ( its in 2012 ) , when required to point it to dev.,prod. or test.

To do that: just below solution explorer the proect.param which is outside of project , there create a parameter for the connection string. Then choose the OLEDB connection string and right click and choose "Parameterize" , there put this paratermeter that is newly created.

7.Deploy:Configuration properties:Deployment:Server name:

Righ click on Solution Explorer , choose "Deployment" that deployes the package to the Server. But first check all configuration so right click then choose properties and in the configuration properties set the values.
---------------------------------------------------------------------------------------------
SQL Server Management Studio(SSMS)
[from visual studio:Data Tools:Click on icon SSMS & log on to Database Engine]

-Database -Security -Server Objects
-Replication -Management -Notification Services
-SQL Server Agent
-right click on Job node-New Job-

Run as: SQL Agent Service Account
Use Windows Authentication
{Package:Properties:Security:ProtectionLevel:EncryptSensitiveWithUserKey
Transactions:IsolationLevel:Serializable

ProtectionLevel - for encrypting the package(default: EncryptSensitiveWithUserKey will often cause problems. sensitive information is a database connection string that contains a password in it.The default setting will encrypt this information using the user key of the person who created the package. When another user executes the package it will fail because that user's key will not be able to decrypt the connection string. A good way around this is to change the ProtectionLevel to DontSaveSensitive, meaning you aren't going to put sensitive information in the package so there's no need to worry about encryption.

IsolationLevel - Serializable is the level where read locks are held until a transaction commits or rolls back which provides that no data read can be updated.choose one of the other levels such as ReadCommitted..

TransactionOption

}

Note that the Run as setting is the SQL Agent Service Account. This is the default setting although from a security standpoint it may not be what you want. You can setup a Proxy that allows you to give a particular credential permission to execute an SSIS package from a SQL Server Agent job step.

-Security
Security then Credentials in SSMS Object Explorer and right click to create a new credential as shown below:

Credential name: SSIS Execution Account
...

Navigate to SQL Server Agent then Proxies in SSMS Object Explorer and right click to create a new proxy as shown below:

Proxy name: SSIS Proxy
Credential name:SSIS Execution Account
Active to the following subsystems:
SQL Server Integration Services Package
---
Now when you create or edit a SQL Server Agent job step

-------

Scheduling SSIS packages with SQL Server Agent
-you to execute an SSIS package in a SQL Server Agent job step.
-it allows you to schedule the execution of an SSIS package so that it runs without any user interaction.

---------
SQL Server Import & Export Wizard:
Data source:SQL Native Client Server name: Database:
Destination: Specify Table Copy or Query:
Select Source Tables & views: Edit Mapping
Save & Execute Package: to perform export operation & to
create an SSIS package and save it to SQL Server or file system.

------
OLE DB Source: Data access mode: SQL command: SQL command text: query
---
Deployment utility:DTUTIL.EXE
---
MSDB database - Sql server packages
---
Deploy to package store: File System node:Import package
---
Executing SSIS package:
DTEXEC,DTEXECUI,SQL Server Agent
---

---------------------------------------------------------------------------------------------

Containers:Sequence,For Each Loop,For Loop
Load:Fast Load,Precedence Constraint,Row By Row Insert
Package Configuration:[2005-08]SQL Table or XML file.[2012]:new added :catalog,parameters
MergeJoin:Sorted data in Source query[Order By]:Instead of:[Sort Transformation: Slow in large dataset.]
MDX query:'MDXResult' variable:NewConnection:Sql Execute Task

------------------------------------------------------------------------------------------------------------

Reference:

SQL Server Integration Services (SSIS)
SSIS Package Architecture Overview Basics Video Example SSIS 2
SSIS Interview Questions and Answers Video SSIS 2012

------------------------------------------------------------------------------------------------------------

Tuesday, July 9, 2013

SQL - How Using MERGE in SQL Server 2008 to insert, update and delete at the same time

Answers collected from the below site.Please refer to the site for details.
Using MERGE in SQL Server to insert, update and delete at the same time
MERGE <target_table> [AS TARGET]
USING <table_source> [AS SOURCE]
ON <search_condition>
[WHEN MATCHED 
THEN <merge_matched> ]
[WHEN NOT MATCHED [BY TARGET]
THEN <merge_not_matched> ]
[WHEN NOT MATCHED BY SOURCE
THEN <merge_ matched> ];
The MERGE statement basically merges data from a source result set to a target table based on a condition that you specify and if the data from the source already exists in the target or not.

-----------------------------------------------------------------------
Source Video:  SQL Server 2008 MERGE
--MERGE Statement
MERGE INTO dbo.tbl1  tbl ---target
   USING (Select 'Seven' as Col2) src --source
ON (tbl.col2 = src.col2)
WHEN MATCH THEN
UPDATE SET COL2 = 'Eight'
WHEN TARGET NOT MATCHED THEN 
INSERT (Col2) Values ('Seven');
Go
Select * from dbo.tbl1;
Go
-------------------------------------------------------------------------
USE DemoDb
Go
Insert into dbo.tbl (col2) VALUES ('One');
Go
Select  *  from dbo.tbl1;
Go
--If-else Statement
IF EXISTS(SELECT * from dbo.tbl1 WHERE col2 = 'FIVE')
    UPDATE  dbo.tbl1 Set Col2 = 'Six' WHERE Col2= 'FIVE';
ELSE
    INSERT INTO dbo.tbl1 (col2) VALUES('Five);
Go
Select * from dbo.tbl1 ;

--Drop Table
IF EXISTS(......
     DROP TABLE [dbo].[tbl1];
Go
--Create table
CREATE TABLE [dbo].[tbl1] ( [Col1] [int] IDENTITY(1,1) NOT NULL, [Col2] [Varchar] (50) NOT NULL,
     CONSTRAINT [PK_tbl1] PRIMARY KEY CLUSTERED [COL1] ASC
)
GO

Col1. Col2.
1 One
2 Two
3 Three
4 Four

--MERGE Statement

MERGE INTO dbo.tbl1  tbl ---target
   USING (Select 'Seven' as Col2) src --source
ON (tbl.col2 = src.col2)
WHEN MATCH THEN
UPDATE SET COL2 = 'Eight'
WHEN TARGET NOT MATCHED THEN 
INSERT (Col2) Values ('Seven');
Go
Select * from dbo.tbl1;
Go

Monday, July 8, 2013

SQL - How to find 2nd highest salary from the table employee?.

Answers collected from the below site.Please refer to the site for details.

www.sqllion.com-dense_rank

select * from (

select DENSE_RANK() OVER(partition by DeptNo order by salary desc) as RankID,* from dbo.employee) InnQ

where InnQ.RankID = 2

www.questpond.com

(Using corelated queries)

select * from Empsal MainTable

where 2 = (select count(*) from Empsal Auxillary where Auxillary.salary >= MainTable.salary)

----------------------------------------------------------------------------------------------------------------------------------

Reference:

Explain RowNumber,Partition,Rank and DenseRank ?

Refer the above video for the details.

Select ROW_NUMBER( ) OVER(Order By CustomerName) as OrderNumber,
ROW_NUMBER( ) OVER( PARTITION By VendorName Order By VendorName) as VendorNumber,
DENSE_RANK( ) OVER( Order by CustomerName) as CustomerNumber,
CustomerName,ProductName,Amount,VendorName From Sales

note:If Rank( ) is used in place of DENSE_RANK( ) , it does not give a sequential number.

----------------------------------------------------------------------------------------------------------------------------------

SQL - How to delete DUPLICATE ROWS in a table?

Answer collected from the below site.Please refer to the site for details.
blog.sqlauthority.com-sql-server-delete-duplicate-records-rows

Table in example has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.

DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

------
blog.sqlauthority.com-delete-duplicate-rows-Using CTE
/* Delete Duplicate records */
WITH CTE (COl1,Col2, DuplicateCount)
AS
(
SELECT COl1,Col2,
ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount
FROM DuplicateRcordTable
)
DELETE
FROM CTE
WHERE DuplicateCount > 1
GO

---

Refer Source: How can I remove duplicate rows?

...

Another possible way of doing this is

;WITH ct.

AS (SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3

ORDER BY ( SELECT 0)) RN

FROM #MyTable)

DELETE FROM cte

WHERE RN > 1

I am using ORDER BY (SELECT 0) above as it is arbitrary which row to preserve in the event of a tie.

To preserve the latest one in RowID order for example you could use ORDER BY RowID DESC

Refer Source:

SQL Query - Delete duplicates if more than 3 dups?

with cte as (
  select row_number() over (partition by dupcol1, dupcol2 order by ID) as rn
     from table)
delete from cte
   where rn > 2; -- or >3 etc

The query is manufacturing a 'row number' for each record, grouped by the (dupcol1, dupcol2) and ordered by ID. In effect this row number counts 'duplicates' that have the same dupcol1 and dupcol2 and assigns then the number 1, 2, 3.. N, order by ID. If you want to keep just 2 'duplicates', then you need to delete those that were assigned the numbers 3,4,.. N and that is the part taken care of by the DELLETE.. WHERE rn > 2;

Using this method you can change the ORDER BY to suit your preferred order (eg. ORDER BY ID DESC), so that the LATEST has rn=1, then the next to latest is rn=2 and so on. The rest stays the same, the DELETE will remove only the oldest ones as they have the highest row numbers.

Unlike this closely related question, as the condition becomes more complex, using CTEs and row_number() becomes simpler. Performance may be problematic still if no proper access index exists.

Refer Source: ROW_NUMBER (Transact-SQL)

Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.

Refer Source:

How to use ROW_NUMBER() to enumerate and partition records in SQL Server

Here is our final SQL statement, which achieves the business logic we wanted to implement.

SELECT
       [PersonID]
     [FamilyID]
      ,[FirstName]
      ,[LastName]
      ,[DateOfBirth]
      ,ROW_NUMBER() over(PARTITION BY FamilyID,
                         CONVERT(NVARCHAR(25), DateOfBirth, 111)
                         ORDER BY DateOfBirth ASC) TwinCode

  FROM [People]
ORDER BY PersonID

IIn the ROW_NUMBER function above, I am doing several things. I’m grouping on FamilyID, and also grouping on a converted DateOfBirth. I convert the DateOfBirth to an nvarchar using the 111 conversion code, because that gets results like ‘2009/10/11′ and ‘2009/10/12′ which can easily be grouped by to achieve distinct dates.

------------------------------------------------------------------------------------------------------------
Reference:

What is CTE ( Common table expression) in SQL Server?

------------------------------------------------------------------------------------------------------------