Exploring: August 2013

Friday, August 30, 2013

Snowflake, Star Schema

1.Database Design - Snowflake, Star Schema?

i.Star Schema :
Too much normalization would need lot join which slows down performance.
In Snow Flake the Dimension Tables are further normalized to reduce duplication of data.
That is typical Relational Database model. As its for storing data and reducing as much duplication and
saving physical storage as reports are not generated here so speed is not here the driving force.
Snowflake - Star Schema

2.In the Star schema Is the Fact Table Normalized?
No answer on this after reading the below article.
forum.kimballgroup.com/t364-normalization-in-dwh-environment

But Fact tables have the numberic values and FK's ( those are PK in Dimension tables to get , Dimension
details.) Dimension tables have details , for better performance less join is better so Dimension table is not
further normalized.
-----------------------
For Product Hierarchy: In staging area with normalized tables load first the lowest one then the highest this
Product table , then join all tables build the query and through the query load data of Product Dimension Table.So staging area tables are normalized but the actual load to the Product Dimension table which
is through the Source Query , which builds the hierarchy structure also prior to load to Dimension table.
----
Source:Maintaining Dimension Hierarchies
The product dimension in the data warehouse should be denormalized into a single flattened dimension table. The normalization illustrated above is the design pattern for the source system and staging areas, not the actual dimension table that users query.

3.How many Fact & Dimension Tables are in Star Schemas?

Generally how many Fact Tables and Dimensions Table you

have used in the Project?
Which one is loaded first Fact Table or Dimensions Table 
into the warehouse?
What is the size of the Fact Table and Dimension Table?
what is the size of the table and warehouse

http://www.allinterview.com/showanswers/29468.html

---

it depends upon the requirement of the client . dimension table is loaded first , using the primary keys of the dimension table , fact tables are loaded .size of the fact and dimension table also depends upon the requirement . size of the table and warehouse also depends upon clients requirement .

---

Dimension will be loaded first then fact,
Table size: Dimenstion > fact
Number of Records: fact > dimention
Size: Table < DWH
---
The 10 Essential Rules of Dimensional Modeling

http://www.kimballgroup.com/2009/05/29/the-10-essential-rules-of-dimensional-modeling/

What is the approximate size of data warehouse?

http://www.allinterview.com/showanswers/15990.html

ans.

---

it could range somewhere between 25-40 terabytes

---

the minimum size of data warehouse is 100Giga byte and maximum size is in n tera bytes(n=any integer)

---

smal projects:::100gb to 500 gb medium projects:: 500gb to 1000 gb large projects :::1000 gb to 1200gb etc

FAQ >> Understanding file sizes (Bytes, KB, MB, GB, TB)


http://myrepono.com/faq/4




1024 bytes


 = 

1 KB



1024 KB


 = 

1 MB



1024 MB


 = 

1 GB



1024 GB


 = 

1 TB



1024 TB


 = 

1 PB

Thursday, August 29, 2013

DW - Normalization Details

Unnormalized Relation
------------------------------------------------------------------------------------------------------------
Order no. Order date Item lines
------------------------------------------------------------------------------------------------------------
Item code Quantity Price/Unit

1 1/1/2001 100 1 100
200 2 50

Item code Quantity Price/Unit

2 2/1/2001 100 5 100

First Normal Form(1NF) [Flat File] - there is no composite attributes, and every attribute is single & describes 1 property.
------------------------------------------------------------------------------------------------------------
Order no. Order date Item code Quantity Price/Unit
------------------------------------------------------------------------------------------------------------
1 1/1/2001 100 1 100
1 1/1/2001 200 2 50
2 2/1/2001 100 5 100

2NF:if it is in 1NF and non-key attributes are functionally dependant on the key attribute(s).
Orders: Order Details:
--------------------------------
Order no. Order date
--------------------------------
Order Details:
----------------------------------------------
Order no. Item code Quantity
----------------------------------------------
Prices:
----------------------------------------------
Item code Price/Unit
----------------------------------------------
3NF:where all attributes in a relation are not functionally dependent only on the key. If 2 non-key attributes
are functionally dependent.
2NF:

Stud_id Name Department Year Hostel name
1 A Physics 1 xyz
2 B Math 3 abc

Converting 2NF: to 3NF:

Stud_id Name Department Year
1 A Physics 1
2 B Math 3

Year Hostel name
1 xyz
2 def
3 abc

Reference book:Analysis & Design of Information Systems By:V.RAJARAMAN

Friday, August 9, 2013

SQL Tuning

SQL Tuning or SQL Optimization

source: count(*) vs count(column-name) - which is more correct? [duplicate]

COUNT(*) counts all rows
COUNT(column) counts non-NULLs only
COUNT(1) is the same as COUNT(*) because 1 is a non-null expressions

COUNT(star) vs COUNT(column) vs. COUNT(DISTINCT) vs. COUNT(col) OVER(…)

SQL SERVER – Solution – Puzzle – SELECT * vs SELECT COUNT(*)

COUNT(*) vs. COUNT(1) vs. COUNT(pk): which is better?

1.Actual column names than Select *
2.Instead of HAVING(use when required only): use subject != 'Science'
note: HAVING clause used to filter rows after all rows are selected.
3.Minimize subqueries in main query.
select name from employee where (salary, age) = (select max(salary),max(age) from employee_details)
and dept = 'Electronics';
4.Use EXIST(efficient filter criteria in main query), IN(slowest performance, efficient when criteria in subquery)
and table joins appropriately.
5.Use EXISTS:instead of DISTINCT when using joins when table having 1:M.
6.Use UNION ALL instead of UNION.

1024 bytes	=	1 KB
1024 KB	=	1 MB
1024 MB	=	1 GB
1024 GB	=	1 TB
1024 TB	=	1 PB