Thursday, November 22, 2018

Database-Security model-Row-ColumnLevel


------------------------------------------------------------------------------------------

                     How to Setup Row Level Security for SQL Server


------------------------------------------------------------------------------------------

How to setup such a security model?

A row-level security model for a database.
Users are connecting with their Windows user accounts.

Windows groups ( Active Directory domain(s)).

Tables: Department,Employee

Need to build a mechanism on top of these tables. 

Security Mapping Table ( DepartmentID,Windows group name)
-Tie : Security ->department->view

Preference: To create database roles 

Advantage:

To test the Model:( Windows groups yet)
- database roles ( add or remove users)

Switch domains: Production & Development ( No need to re-seeding your mapping table)
-need to associate : right Windows security groups -> right database roles


Use Ownership chaining:
        granting  SELECT permissions against the view
        Not granting any permissions against the underlying tables

-- Database roles:

-- Create database roles to go along with the Windows groups
CREATE ROLE HR_User;
GO

--Security Mapping table:

-- Create the security mapping table and populate it so we can enforce row-level security
CREATE TABLE dbo.SecurityMap (
DepartmentID INT NOT NULL,
RoleName NVARCHAR(255),
CONSTRAINT PK_SecurityMap PRIMARY KEY CLUSTERED (DepartmentID, RoleName)
);
GO 


--Finally, we need the view that uses the SecurityMap table to do the filtering:

-- Create a view which uses row-level security to filter the roles
-- Note the OR IS_MEMBER('HR_User') = 1 as well as the OR IS_MEMBER('db_owner') = 1
-- as part of the WHERE clause. If a user is a member of either of these roles,
-- then all the rows will be returned.

CREATE VIEW dbo.DepartmentEmployees
AS
SELECT Emp.FirstName, Emp.MiddleName, Emp.SurName, Dept.DepartmentName
FROM dbo.Department Dept
INNER JOIN dbo.Employee Emp
ON Dept.DepartmentID = Emp.DepartmentID
LEFT JOIN dbo.SecurityMap
ON Emp.DepartmentID = dbo.SecurityMap.DepartmentID
WHERE IS_MEMBER(dbo.SecurityMap.RoleName) = 1
OR IS_MEMBER('HR_User') = 1
OR IS_MEMBER('db_owner') = 1;
GO 

GRANT SELECT ON dbo.DepartmentEmployees TO Manager;
GRANT SELECT ON dbo.DepartmentEmployees TO HR_User;

~

Note that the "magic" is being worked by the IS_MEMBER() function. 
If the user is a member of the role which matches up to the department based on the SecurityMap table, the user sees the row containing the employee information. 
..is a member of the db_owner role, both roles of which see all the data. 

-- Query as 
CREATE USER Abc WITHOUT LOGIN;
GO 

EXEC sp_addrolemember @membername = 'Abc', @rolename = 'HR_User';
GO

EXECUTE AS USER = 'Abc';
GO

EXEC sp_addrolemember @membername = 'Abc', @rolename = 'HR_User';
GO


------------------------------------------------------------------------------------------


------------------------------------------------------------------------------------------

~

Now, we don't want interns to have this level of permissions. We only want them to have access to specific columns. There's a way to do this. Immediately after the table name, we can specify the columns we want to grant permission to (or DENY, if we needed to do that) within a set of parentheses, like so:
GRANT SELECT ON dbo.Employee (EmployeeID, FirstName, MiddleName, SurName) TO HR_Intern;

~

Filtering Columns in SQL Server Using Views



CREATE SCHEMA MARKETING;
GO

SELECT  
    
[name] AS [schema]  
  
[schema_id] 
  
USER_NAME(principal_id[Owner] FROM sys.schemas;


SQL Server will use the owner of the respective schemas as the "owner" for the object when ownership chaining is considered.


An object can have an explicit owner by using ALTER AUTHORIZATION on the object, as shown here (the CREATE USER statement creates a valid user to which to make the owner of the table being created):

CREATE USER TestUser WITHOUT LOGINGO 
CREATE TABLE Marketing.OwnedTable 
  
TableValue INT ); GO 
ALTER AUTHORIZATION ON Marketing.OwnedTable TO TestUserGO

SELECT  
    
so.[name] AS [Object] 
  
sch.[name] AS [Schema] 
  
USER_NAME(COALESCE(so.[principal_id]sch.[principal_id])) AS [Owner] 
  
type_desc AS [ObjectType] FROM sys.objects so 
  
JOIN sys.schemas sch 
    
ON so.[schema_id] sch.[schema_id] WHERE [type] IN ('U''P');


Sunday, February 11, 2018

Database Design - Snowflake, Star Schema?



1.Database Design - Snowflake, Star Schema?

i.Star Schema ( Fact - Dimension Tables) Normalization is not done too much.

      a.Too much normalization would need lot join which slows down performance.
      b.In Snow Flake the Dimension Tables are further normalized to reduce duplication of data.
         That is typical Relational Database model. As its for storing data and reducing as much duplication and
         saving physical storage as reports are not generated here so speed is not here the driving force.
         Essbase is OLAP model thats the Star Schema design.
    
         (See the pictures below as it looks as Star.)
         Snowflake - Star Schema

2.In the Star schema Is the Fact Table Normalized?

No answer on this after reading the below article.

forum.kimballgroup.com/t364-normalization-in-dwh-environment

But Fact tables have the numberic values and FK's ( those are PK in Dimension tables to get , Dimension
details.) Dimension tables have details , for better performance less join is better so Dimension table is not
further normalized.
-----------------------
For Product Hierarchy: In staging area with normalized tables load first the lowest one then the highest this
Product table , then join all tables and load data of Product Dimension Table.
----
Source:Maintaining Dimension Hierarchies
The product dimension in the data warehouse should be denormalized into a single flattened dimension table. The normalization illustrated above is the design pattern for the source system and staging areas, not the actual dimension table that users query.

-----------------------

3.How many Fact & Dimension Tables are in Star Schemas?



Generally how many Fact Tables and Dimensions Table you 
have used in the Project?
Which one is loaded first Fact Table or Dimensions Table 
into the warehouse?
What is the size of the Fact Table and Dimension Table?
what is the size of the table and warehouse
---

http://www.allinterview.com/showanswers/29468.html
---
it depends upon the requirement of the client . dimension
table is loaded first , using the primary keys of the
dimension table , fact tables are loaded .size of the fact
and dimension table also depends upon the requirement .
size of the table and warehouse also depends upon clients
requirement . 
---
Dimension will be loaded first then fact,
Table size: Dimenstion > fact
Number of Records: fact > dimention
Size: Table < DWH

---
-----------------------------------------------------------------------------------------------------------


The 10 Essential Rules of Dimensional Modeling


-----------------------------------------------------------------------------------------------------------

What is the approximate size of data warehouse?

http://www.allinterview.com/showanswers/15990.html

ans.
---
it could range somewhere between 25-40 terabytes 
---
the minimum size of data warehouse is 100Giga byte
and maximum size is in n tera bytes(n=any integer) 
---
smal projects:::100gb to 500 gb
medium projects:: 500gb to 1000 gb
large projects :::1000 gb to 1200gb etc 
----

FAQ >> Understanding file sizes (Bytes, KB, MB, GB, TB)

1024 bytes

 = 

1 KB

1024 KB

 = 

1 MB

1024 MB

 = 

1 GB

1024 GB

 = 

1 TB

1024 TB

 = 

1 PB


Data Warehouse Page



1.Database Design - Snowflake, Star Schema?

i.Star Schema ( Fact - Dimension Tables) Normalization is not done too much.

      a.Too much normalization would need lot join which slows down performance.
      b.In Snow Flake the Dimension Tables are further normalized to reduce duplication of data.
         That is typical Relational Database model. As its for storing data and reducing as much duplication and
         saving physical storage as reports are not generated here so speed is not here the driving force.
         Essbase was OLAP model thats the Star Schema design.
       
         (See the pictures below as it looks as Star.)
         Snowflake - Star Schema


2. Relational - Dimensional Modelling



In simple words OLTP normalized database are designed with most optimal "transactional" point of view. Databases are normalized to work optimally to a transactional system. When I say optimization of transactional system i mean ..getting to a design state of database structure where all transactional operations like delete,insert,update and select are balanced to give equal or optimum importance to all of them at any point of time...as they are equally valued in a transactional system.
And that what a normalized system offer ..minimal updates possible for a data update,minimal insert possible for new entry,one place delete for category deletion etc (e.g. new product category )...all this is possible a we branch a create master tables .....but this comes at the cost of "select" operation delay ..but as I said its(normalization) not most efficient model for all operations ..its "Optimal"...having said we get other methods to enhance data fetching speed..like indexing etc
On the other hand Dimensional model (mostly used for data-ware house design)..meant for giving importance to only one kind of operations thats Selection of data...as in data-ware houses ..data update/insertion happens periodically ..and its a one time cost.
So if one try to tweak normalized data structure so that only selection is the most important operation at any point in time ...we will end up getting a denormalized (I would say partially denormalized)..dimensional star structure.
  • all the foreign keys a one place Fact -no dimension to dimension join (i.e. master to master table join)..snowflake represent same dimension
    • ideally designed facts carry only numbers ..measures or foreign keys
    • dimension are used to carry description and non aggregatable info
    • redundancy of data is ignored ...but in rare cases if Dimensions itself grow too much .snowflake design is seen as option..but that still is avoidable
For details please go through detailed books on this topic.



3.Normalization
Database Normalization Basics

From the book.

Example of  an Unnormalized Relation

Order no.                  Order date                                Item Lines
                                                                Item Code       Quantity     Price/unit
1000                        100112                    100                  10             10.25
                                                                300                  15              5.5
2000                        101512                    200                   1               100
3000                        102012                    100                  30              5
4000                        110112                    400                  1               500
                                                                300                  20             5.5
1)Example : 1NF

Order no.                 Order date        Item Code       Quantity         Price/unit
1000                        100112                    100                  10             10.25
1000                        100112                    300                  15              5.5

2000                        101512                    200                   1               100
3000                        102012                    100                  30              5
4000                        110112                    400                  1               500

4000                        110112                    300                  20             5.5

Layman point of view can see unnecessary the  Unit price is repeating. Its just a flat file(1NF).
There  are no composite attributes, and every attribute is single and describes one property.

When we want to move into a computer and store electronically the storage space saving becomes important. Then smart logical database design and physical database design is important.

2)2NF:
A relation is said to be in 2NF , if  it is in 1NF and non-key attributes are functionally dependent on the key
attribute(s).


Orders:
Order no.                 Order date

Order  Details:
 Order no.  Item Code       Quantity

Prices:
Item Code      Price/unit

3)3NF: If  two non-key attributes are functionally dependent, then there will be unnecessary duplication of data.

A  2NF  Form Relation

Roll no.     Name       Department     Year   Hostel Name


Year, Hostel Name are dependant.

3NF:

Roll no.     Name       Department     Year


Year   Hostel Name



4. MOLAP, ROLAP, And HOLAP 

MOLAP, ROLAP, And HOLAP 



Collected Notes:

Introduction

Data Warehouse - Basic Concepts - SSDI



5.

------------------------------------------------------------------------------------------------------------
Very Basic questions:

1.What is the difference between Data Warehousing, Business Intelligence.?
ans.

In simple term in data warehouse data is stored. From this stored data, management make decision by looking at reports or by a data analysis tool.

In Microsoft the BI tool is Analysis Services , Reporting Services , Integration Services.

In the data warehouse more focus on data modelling, normalization and optimizing data. Deciding on Star schema (OLAP applns. - Fact Dimension Table) or Snowflake Schema ( RDBMs applns.).

------------------------------------------------------------------------------------------------------------
Theoretical Concept Questions.

1.. What is   Agile software development ?


Agile_software_development


One of the differences between agile and waterfall, is that testing of the software is conducted at different points during the software development lifecycle. In the waterfall model, there is a separate testing phase after implementation. In Agile XP, testing is done concurrently with implemetation.
------------------------------------------------------------------------------------------------------------
[Just Brief theoretical knowledge]*

2.Software as a service (SaaS) ?

software as a Service (SaaS) is a model of software deployment where an application is hosted as a service outside of the customer’s site and delivered to customers across the Internet.

saas - in nutshell


saas - deployment model
--
Software_as_a_service

sometimes referred to as "on-demand software" , is a software delivery model in which software and associated data are centrally hosted on the cloud. SaaS is typically accessed by users using a thin client via a web browser.

SaaS has become a common delivery model for many business applications, including accountingcollaborationcustomer relationship management (CRM), management information systems (MIS), enterprise resource planning (ERP), invoicing, human resource management (HRM), content management (CM) and service desk management.[


The term "software as a service" (SaaS) is considered to be part of the nomenclature of cloud computing, along with infrastructure as a service (IaaS), platform as a service (PaaS),desktop as a service (DaaS), and backend as a service (BaaS).[8]



Architecture

While an exception rather than the norm, some SaaS solutions do not use multi-tenancy, or use other mechanisms—such as virtualization—to cost-effectively manage a large number of customers in place of multi-tenancy.[12] Whether multi-tenancy is a necessary component for software-as-a-service is a topic of controversy.[13]



Open integration protocols


Since SaaS applications cannot access a company's internal systems (databases or internal services), they predominantly offer integration protocols and application programming interfaces (APIs) that operate over a wide area network. Typically, these are protocols based on HTTPRESTSOAP and JSON.


References:


Developing a SaaS product? Scrum is NOT for you.


Scrum:
More methodical way of developing.

------------------------------------------------------------------------------------------------------------


Saturday, February 10, 2018

Learning steps


--------------------------------------------------------------------------------------------------------------------------
                                                                  Unix:
--------------------------------------------------------------------------------------------------------------------------




--------------------------------------------------------------------------------------------------------------------------
                                                             Scripting - Python

--------------------------------------------------------------------------------------------------------------------------




    Courseera:



      Course1:2-4 hours/week
             Programming for Everybody (Getting Started with Python)
      Course2:2-4 hours/week
          Python Data Structures
     Course1,2:
          variables and expressions, 
          conditional execution (loops, branching, and try/except), 
          functions, 
          Python data structures (strings, lists, dictionaries, and tuples), 
          and manipulating files.

       Course3:Using Python to Access Web Data(6 weeks of study, 2-4 hours/week)
   -access data using web APIs
       -HTML, XML, and JSON data formats in Python



        Course4: 5 weeks of study, 2-3 hours/week

         Using Databases with Python

         -Database:SQLite3
         -build web crawlers and multi-step data gathering and visualization processes
          -SQL  -database design  -D3.js library to do basic data visualization


         Course5: 6 weeks of study, 2-4 hours/week

Capstone: Retrieving, Processing, and Visualizing Data with Python



Course1: Introduction to Data Science in Python
Course2: Applied Plotting, Charting & Data Representation in Python
Course3: Applied Machine Learning in Python
Course4: Applied Text Mining in Python

Course5: Applied Social Network Analysis in Python


--------------------------------------------------------------------------------------------------------------------------


                                                                  Java:

--------------------------------------------------------------------------------------------------------------------------




   Courseera:



Course1: Programming Foundations with JavaScript, HTML and CSS (4 weeks of study, 3-7 hours/week)

Course2:Java Programming: Solving Problems with Software(4 weeks of study, 4-8 hours/week)

Course3:Java Programming: Arrays, Lists, and Structured Data(4 weeks of study, 4-8 hours/week)

Course4:Java Programming: Principles of Software Design(4 weeks of study, 4-8 hours/week)

Course5:Java Programming: Build a Recommendation System(4 weeks of study, 3-6 hours/week)



--------------------------------------------------------------------------------------------------------------------------

                                                          3.Microsoft Azure

--------------------------------------------------------------------------------------------------------------------------


                                           Azure Tutorial for Beginners


--------------------------------------------------------------------------------------------------------------------------


                                                         4.Hadoop

--------------------------------------------------------------------------------------------------------------------------






--------------------------------------------------------------------------------------------------------------------------


                                                5.Big Data Specialization

--------------------------------------------------------------------------------------------------------------------------


Big Data Specialization


Course1: Introduction to Big Data(3 weeks of study, 5-6 hours/week)
Course2:Big Data Modeling and Management Systems(6 weeks of study, 2-3 hours/week)
Course3:Big Data Integration and Processing
Course4:Machine Learning With Big Data
Course5:Graph Analytics for Big Data

Course6:Big Data - Capstone Project


--------------------------------------------------------------------------------------------------------------------------


                                             6.Data Scientist :Data Science Specialization:

--------------------------------------------------------------------------------------------------------------------------





Courseera:


Course1: The Data Scientist’s Toolbox(1-4 hours/week)
  introduction to the tools:version control, markdown, git, GitHub, R, and RStudio.

Course2: R Programming

    statistical computing which includes programming in R, reading data into R, accessing R packages, writing R        functions, debugging, profiling R code, and organizing and commenting R code. Topics in statistical data analysis will provide working examples
Course3: Getting and Cleaning Data
Course4: Exploratory Data Analysis
Course5: Reproducible Research
Course6: Statistical Inference
Course7: Regression Models
Course8: Practical Machine Learning
Course9: Developing Data Products
Course10: Data Science Capstone (4-9 hours/week)



--------------------------------------------------------------------------------------------------------------------------

Wednesday, February 7, 2018

SQL:SQL Server Programming Part 9 - Table Variables


SQL Server Programming Part 9 - Table Variables

SQL: SET ANSI_NULLS ON/OFF



SET ANSI_NULLS ON/OFF 

QUOTED_IDENTIFIER


QUOTED_IDENTIFIER

This setting controls how quotation marks ".." are interpreted by the SQL compiler. When QUOTED_IDENTIFIER is ON then quotes are treated like brackets ([...]) and can be used to quote SQL object names like table names, column names, etc. When it is OFF (not recommended), then quotes are treated like apostrophes ('..') and can be used to quote text strings in SQL commands.

ANSI_NULLS

This setting controls what happens when you try to use any comparison operator other than IS on NULL. When it is ON, these comparisons follow the standard which says that comparing to NULL always fails (because it isn't a value, it's a Flag) and returns FALSE. When this setting is OFF (really not recommended) you can sucessfully treat it like a value and use =<>, etc. on it and get back TRUE as appropiate.
The proper way to handle this is to instead use the IS (ColumnValue IS NULL ..).



In SQL Server, what does “SET ANSI_NULLS ON” mean?




SET ANSI_NULLS ON/OFF: The ANSI_NULLS option specifies that how SQL Server handles the comparison operations with NULL values. When it is set to ON any comparison with NULL using = and <> will yield to false value. And it is the ISO defined standard behavior.


~


Returns one row. So even when both operands are columns, NULL does not equal NULL. And thedocumentation for = doesn't have anything to say about the operands:
When you compare two NULL expressions, the result depends on the ANSI_NULLS setting:
If ANSI_NULLS is set to ON, the result is NULL1, following the ANSI convention that a NULL(or unknown) value is not equal to another NULL or unknown value.
If ANSI_NULLS is set to OFF, the result of NULL compared to NULL is TRUE.
Comparing NULL to a non-NULL value always results in FALSE2.
However, both 1 and 2 are incorrect - the result of both comparisons is UNKNOWN.

*The cryptic meaning of this text was finally discovered years later. What it actually means is that, for those comparisons, the setting has no effect and it always acts as if the setting were ON. Would have been clearer if it had stated that SET ANSI_NULLS OFF was the setting that had no affect.