Sunday, February 11, 2018

Data Warehouse Page



1.Database Design - Snowflake, Star Schema?

i.Star Schema ( Fact - Dimension Tables) Normalization is not done too much.

      a.Too much normalization would need lot join which slows down performance.
      b.In Snow Flake the Dimension Tables are further normalized to reduce duplication of data.
         That is typical Relational Database model. As its for storing data and reducing as much duplication and
         saving physical storage as reports are not generated here so speed is not here the driving force.
         Essbase was OLAP model thats the Star Schema design.
       
         (See the pictures below as it looks as Star.)
         Snowflake - Star Schema


2. Relational - Dimensional Modelling



In simple words OLTP normalized database are designed with most optimal "transactional" point of view. Databases are normalized to work optimally to a transactional system. When I say optimization of transactional system i mean ..getting to a design state of database structure where all transactional operations like delete,insert,update and select are balanced to give equal or optimum importance to all of them at any point of time...as they are equally valued in a transactional system.
And that what a normalized system offer ..minimal updates possible for a data update,minimal insert possible for new entry,one place delete for category deletion etc (e.g. new product category )...all this is possible a we branch a create master tables .....but this comes at the cost of "select" operation delay ..but as I said its(normalization) not most efficient model for all operations ..its "Optimal"...having said we get other methods to enhance data fetching speed..like indexing etc
On the other hand Dimensional model (mostly used for data-ware house design)..meant for giving importance to only one kind of operations thats Selection of data...as in data-ware houses ..data update/insertion happens periodically ..and its a one time cost.
So if one try to tweak normalized data structure so that only selection is the most important operation at any point in time ...we will end up getting a denormalized (I would say partially denormalized)..dimensional star structure.
  • all the foreign keys a one place Fact -no dimension to dimension join (i.e. master to master table join)..snowflake represent same dimension
    • ideally designed facts carry only numbers ..measures or foreign keys
    • dimension are used to carry description and non aggregatable info
    • redundancy of data is ignored ...but in rare cases if Dimensions itself grow too much .snowflake design is seen as option..but that still is avoidable
For details please go through detailed books on this topic.



3.Normalization
Database Normalization Basics

From the book.

Example of  an Unnormalized Relation

Order no.                  Order date                                Item Lines
                                                                Item Code       Quantity     Price/unit
1000                        100112                    100                  10             10.25
                                                                300                  15              5.5
2000                        101512                    200                   1               100
3000                        102012                    100                  30              5
4000                        110112                    400                  1               500
                                                                300                  20             5.5
1)Example : 1NF

Order no.                 Order date        Item Code       Quantity         Price/unit
1000                        100112                    100                  10             10.25
1000                        100112                    300                  15              5.5

2000                        101512                    200                   1               100
3000                        102012                    100                  30              5
4000                        110112                    400                  1               500

4000                        110112                    300                  20             5.5

Layman point of view can see unnecessary the  Unit price is repeating. Its just a flat file(1NF).
There  are no composite attributes, and every attribute is single and describes one property.

When we want to move into a computer and store electronically the storage space saving becomes important. Then smart logical database design and physical database design is important.

2)2NF:
A relation is said to be in 2NF , if  it is in 1NF and non-key attributes are functionally dependent on the key
attribute(s).


Orders:
Order no.                 Order date

Order  Details:
 Order no.  Item Code       Quantity

Prices:
Item Code      Price/unit

3)3NF: If  two non-key attributes are functionally dependent, then there will be unnecessary duplication of data.

A  2NF  Form Relation

Roll no.     Name       Department     Year   Hostel Name


Year, Hostel Name are dependant.

3NF:

Roll no.     Name       Department     Year


Year   Hostel Name



4. MOLAP, ROLAP, And HOLAP 

MOLAP, ROLAP, And HOLAP 



Collected Notes:

Introduction

Data Warehouse - Basic Concepts - SSDI



5.

------------------------------------------------------------------------------------------------------------
Very Basic questions:

1.What is the difference between Data Warehousing, Business Intelligence.?
ans.

In simple term in data warehouse data is stored. From this stored data, management make decision by looking at reports or by a data analysis tool.

In Microsoft the BI tool is Analysis Services , Reporting Services , Integration Services.

In the data warehouse more focus on data modelling, normalization and optimizing data. Deciding on Star schema (OLAP applns. - Fact Dimension Table) or Snowflake Schema ( RDBMs applns.).

------------------------------------------------------------------------------------------------------------
Theoretical Concept Questions.

1.. What is   Agile software development ?


Agile_software_development


One of the differences between agile and waterfall, is that testing of the software is conducted at different points during the software development lifecycle. In the waterfall model, there is a separate testing phase after implementation. In Agile XP, testing is done concurrently with implemetation.
------------------------------------------------------------------------------------------------------------
[Just Brief theoretical knowledge]*

2.Software as a service (SaaS) ?

software as a Service (SaaS) is a model of software deployment where an application is hosted as a service outside of the customer’s site and delivered to customers across the Internet.

saas - in nutshell


saas - deployment model
--
Software_as_a_service

sometimes referred to as "on-demand software" , is a software delivery model in which software and associated data are centrally hosted on the cloud. SaaS is typically accessed by users using a thin client via a web browser.

SaaS has become a common delivery model for many business applications, including accountingcollaborationcustomer relationship management (CRM), management information systems (MIS), enterprise resource planning (ERP), invoicing, human resource management (HRM), content management (CM) and service desk management.[


The term "software as a service" (SaaS) is considered to be part of the nomenclature of cloud computing, along with infrastructure as a service (IaaS), platform as a service (PaaS),desktop as a service (DaaS), and backend as a service (BaaS).[8]



Architecture

While an exception rather than the norm, some SaaS solutions do not use multi-tenancy, or use other mechanisms—such as virtualization—to cost-effectively manage a large number of customers in place of multi-tenancy.[12] Whether multi-tenancy is a necessary component for software-as-a-service is a topic of controversy.[13]



Open integration protocols


Since SaaS applications cannot access a company's internal systems (databases or internal services), they predominantly offer integration protocols and application programming interfaces (APIs) that operate over a wide area network. Typically, these are protocols based on HTTPRESTSOAP and JSON.


References:


Developing a SaaS product? Scrum is NOT for you.


Scrum:
More methodical way of developing.

------------------------------------------------------------------------------------------------------------


No comments:

Post a Comment