Data Modeling for BI

Effective data modeling creates a foundation for reliable and consistent business reporting. Learn how proper data organization improves decision-making, ensures data quality, and makes analytics more accessible to business users.

Dimensional Modeling Fundamentals

Dimensional Model Overview

Core components of dimensional models and their relationships

100%

🔍 Use Ctrl+Scroll to zoom

Legend

Components

Fact

Dimension

Key

Connection Types

Process Flow

Contains

References

Uses

Key Dimensional Modeling Concepts

📊

Facts vs. Dimensions

Technical

The distinction between facts and dimensions is central to business analytics:

Facts represent business processes and events that can be measured
Dimensions provide context and filtering capabilities for analysis
Proper modeling allows flexible "slicing and dicing" of metrics
This approach aligns with how business users naturally think about data
Enables both high-level summaries and detailed drill-downs

📊

Facts vs. Dimensions

Non-Technical

Technical implementation details for facts and dimensions:

Fact tables contain foreign keys to dimensions and numerical measures
Facts are typically normalized and partitioned by date
Dimension tables contain descriptive attributes and hierarchies
Dimensions often use surrogate keys generated during ETL
Fact grain determines the level of detail captured in analysis

Star Schema Design

Star Schema Architecture

The structure of a star schema with a central fact table and related dimension tables

100%

🔍 Use Ctrl+Scroll to zoom

Legend

Components

Fact

Dimension

Connection Types

Process Flow

date_key

product_key

customer_key

Example Star Schema Design

-- Dimension: Date
CREATE TABLE dim_date (
  date_key INT PRIMARY KEY,
  full_date DATE UNIQUE,
  day_of_week VARCHAR(10),
  day_of_month INT,
  month_number INT,
  month_name VARCHAR(10),
  quarter INT,
  year INT,
  is_weekend BOOLEAN,
  is_holiday BOOLEAN,
  fiscal_year INT,
  fiscal_quarter INT
);

-- Dimension: Product
CREATE TABLE dim_product (
  product_key INT PRIMARY KEY,
  product_id VARCHAR(50) UNIQUE,  -- natural key
  product_name VARCHAR(100),
  product_description TEXT,
  brand VARCHAR(50),
  category VARCHAR(50),
  subcategory VARCHAR(50),
  department VARCHAR(50),
  unit_cost DECIMAL(10,2),
  unit_price DECIMAL(10,2),
  current_flag BOOLEAN,
  effective_date DATE,
  expiration_date DATE
);

-- Dimension: Customer
CREATE TABLE dim_customer (
  customer_key INT PRIMARY KEY,
  customer_id VARCHAR(50) UNIQUE,  -- natural key
  customer_name VARCHAR(100),
  email VARCHAR(100),
  phone VARCHAR(20),
  address_line1 VARCHAR(100),
  address_line2 VARCHAR(100),
  city VARCHAR(50),
  state VARCHAR(50),
  postal_code VARCHAR(20),
  country VARCHAR(50),
  customer_segment VARCHAR(50),
  acquisition_date DATE,
  lifetime_value DECIMAL(10,2),
  current_flag BOOLEAN,
  effective_date DATE,
  expiration_date DATE
);

-- Fact: Sales
CREATE TABLE fact_sales (
  sales_key BIGINT PRIMARY KEY,
  date_key INT REFERENCES dim_date(date_key),
  product_key INT REFERENCES dim_product(product_key),
  customer_key INT REFERENCES dim_customer(customer_key),
  store_key INT REFERENCES dim_store(store_key),
  promotion_key INT REFERENCES dim_promotion(promotion_key),
  transaction_id VARCHAR(50),
  sales_amount DECIMAL(12,2),
  quantity INT,
  unit_price DECIMAL(10,2),
  discount_amount DECIMAL(10,2),
  net_amount DECIMAL(12,2),
  tax_amount DECIMAL(10,2),
  gross_profit DECIMAL(12,2),
  transaction_timestamp TIMESTAMP,
  return_flag BOOLEAN
);

❄️

Star vs. Snowflake Schema

Technical

The choice between star and snowflake has business implications:

Star schemas provide faster reports and dashboards for end users
Snowflake schemas offer better data consistency for reference data
Star schemas are more intuitive for business users creating reports
Snowflake schemas may be better for managing hierarchical dimensions
Most organizations prioritize query performance over storage efficiency

❄️

Star vs. Snowflake Schema

Non-Technical

Technical comparison of star and snowflake schemas:

Star schemas denormalize dimensions for simpler queries and faster joins
Snowflake schemas normalize dimensions to reduce redundancy
Star schemas typically have fewer joins and better query performance
Snowflake schemas save storage space but add query complexity
Star schemas are preferred in most modern data warehouses

Slowly Changing Dimensions

🔄

Slowly Changing Dimension Types

Technical

SCD strategies allow businesses to manage changing reference data:

Track customer attribute changes for accurate point-in-time reporting
Maintain historical product categorizations for trend analysis
Support compliance requirements by preserving data lineage
Enable accurate historical comparisons despite master data changes
Balance history preservation with performance and storage needs

🔄

Slowly Changing Dimension Types

Non-Technical

Technical implementation of SCD types:

Type 1: Overwrite the attribute value (no history)
Type 2: Add a new row with effective dates and current flag
Type 3: Add new columns for previous values (limited history)
Type 4: Use history tables for full change tracking
Type 6: Hybrid approach combining Types 1, 2, and 3

SCD Type 2 Implementation Example

-- SCD Type 2 dimension with history tracking
CREATE TABLE dim_customer (
  customer_key SERIAL PRIMARY KEY,  -- surrogate key
  customer_id VARCHAR(50),         -- natural key (business key)
  first_name VARCHAR(50),
  last_name VARCHAR(50),
  email VARCHAR(100),
  phone VARCHAR(20),
  address VARCHAR(100),
  city VARCHAR(50),
  state VARCHAR(50),
  postal_code VARCHAR(20),
  segment VARCHAR(50),
  effective_date DATE NOT NULL,    -- when this version became effective
  expiration_date DATE,           -- when this version expired (null if current)
  current_flag BOOLEAN NOT NULL,   -- indicates if this is current version
  
  -- Create an index on the combination of natural key and current flag
  -- for efficient lookups during ETL
  CONSTRAINT idx_customer_current UNIQUE (customer_id, current_flag)
    WHERE current_flag = TRUE
);

-- Query to handle a customer address change in Type 2 SCD
BEGIN TRANSACTION;

-- Step 1: Set expiration and current flag on existing record
UPDATE dim_customer
SET 
  expiration_date = CURRENT_DATE - 1,
  current_flag = FALSE
WHERE 
  customer_id = '12345' 
  AND current_flag = TRUE;

-- Step 2: Insert new record with updated address
INSERT INTO dim_customer (
  customer_id, first_name, last_name, email, phone, 
  address, city, state, postal_code, segment,
  effective_date, expiration_date, current_flag
)
SELECT
  customer_id, first_name, last_name, email, phone,
  'New Address', city, state, postal_code, segment, -- Only address changes
  CURRENT_DATE, NULL, TRUE
FROM dim_customer
WHERE 
  customer_id = '12345'
  AND expiration_date = CURRENT_DATE - 1;

COMMIT;

Aggregate Tables and Materialized Views

📈

Aggregation Strategies

Technical

Aggregation strategies deliver significant business benefits:

Dashboard performance improvements of 10-100x for large datasets
Reduced operational costs through efficient resource utilization
Support for interactive data exploration at scale
Enablement of mobile BI with low latency requirements
Cost-effective scaling of analytics to more business users

📈

Aggregation Strategies

Non-Technical

Technical implementation of aggregation strategies:

Precalculated aggregate tables with dimension foreign keys
Materialized views with incremental refresh mechanisms
Summary tables with different temporal granularities
Aggregate navigation in semantic layers
Indexing strategies for aggregate tables

Materialized Aggregate Example

-- Create a daily sales aggregate table
CREATE TABLE agg_daily_sales (
  date_key INT REFERENCES dim_date(date_key),
  product_key INT REFERENCES dim_product(product_key),
  store_key INT REFERENCES dim_store(store_key),
  
  -- Aggregate measures
  total_sales_amount DECIMAL(14,2),
  total_quantity INT,
  total_discount DECIMAL(12,2),
  transaction_count INT,
  customer_count INT,
  
  -- Meta information
  last_updated_timestamp TIMESTAMP,
  
  -- Primary key
  PRIMARY KEY (date_key, product_key, store_key)
);

-- Population/refresh logic
INSERT INTO agg_daily_sales
SELECT
  f.date_key,
  f.product_key,
  f.store_key,
  SUM(f.sales_amount) AS total_sales_amount,
  SUM(f.quantity) AS total_quantity,
  SUM(f.discount_amount) AS total_discount,
  COUNT(DISTINCT f.transaction_id) AS transaction_count,
  COUNT(DISTINCT f.customer_key) AS customer_count,
  CURRENT_TIMESTAMP AS last_updated_timestamp
FROM 
  fact_sales f
WHERE
  f.date_key >= (SELECT MAX(date_key) FROM agg_daily_sales) - 7 -- Incremental refresh
GROUP BY
  f.date_key,
  f.product_key,
  f.store_key
ON CONFLICT (date_key, product_key, store_key) 
DO UPDATE SET
  total_sales_amount = EXCLUDED.total_sales_amount,
  total_quantity = EXCLUDED.total_quantity,
  total_discount = EXCLUDED.total_discount,
  transaction_count = EXCLUDED.transaction_count,
  customer_count = EXCLUDED.customer_count,
  last_updated_timestamp = CURRENT_TIMESTAMP;

Conformed Dimensions and Facts

Enterprise Data Warehouse Architecture

Structure of a data warehouse with shared conformed dimensions across multiple fact tables

100%

🔍 Use Ctrl+Scroll to zoom

Legend

Components

Fact

Dimension

Connection Types

Process Flow

date_key

product_key

customer_key

🔄

Conformed Dimensions

Technical

Conformed dimensions deliver essential business capabilities:

Ensures consistent reporting across different business processes
Enables integrated dashboards that combine metrics from multiple areas
Facilitates drill-across capabilities between related subject areas
Creates a shared business vocabulary across the organization
Supports enterprise-wide analytics governance

🔄

Conformed Dimensions

Non-Technical

Technical implementation of conformed dimensions:

Shared dimension tables referenced by multiple fact tables
Consistent surrogate key generation across data pipelines
Master data management integration for consistency
Attribute standardization and centralized business rules
Dimension versioning coordination across subject areas

Data Vault Modeling

🏛️

Data Vault Architecture

Technical

Data Vault modeling addresses key business challenges:

Adaptability to changing business requirements without model redesign
Auditability for regulatory compliance with full data lineage
Integration of disparate data sources while preserving source information
Scalability for enterprise data volumes with parallel loading
Historical preservation across the enterprise data landscape

🏛️

Data Vault Architecture

Non-Technical

Technical implementation of Data Vault modeling:

Hub tables that store business keys with minimal context
Link tables that maintain relationships between business keys
Satellite tables that store descriptive attributes and history
Hash keys for performance and referential integrity
Layered architecture with raw, business, and presentation vaults

Data Vault Core Components

-- Hub: Customer (business entity)
CREATE TABLE hub_customer (
  customer_hk CHAR(32) PRIMARY KEY,  -- hash key of business key
  customer_bk VARCHAR(50) NOT NULL,  -- business key
  record_source VARCHAR(100) NOT NULL,
  load_date TIMESTAMP NOT NULL,
  UNIQUE(customer_bk)
);

-- Link: Customer to Account relationship
CREATE TABLE link_customer_account (
  link_hk CHAR(32) PRIMARY KEY,  -- hash of relationship
  customer_hk CHAR(32) NOT NULL REFERENCES hub_customer(customer_hk),
  account_hk CHAR(32) NOT NULL REFERENCES hub_account(account_hk),
  record_source VARCHAR(100) NOT NULL,
  load_date TIMESTAMP NOT NULL,
  UNIQUE(customer_hk, account_hk)
);

-- Satellite: Customer details
CREATE TABLE sat_customer_details (
  customer_hk CHAR(32) NOT NULL REFERENCES hub_customer(customer_hk),
  hashdiff CHAR(32) NOT NULL,  -- hash of all attributes
  effective_from TIMESTAMP NOT NULL,
  load_date TIMESTAMP NOT NULL,
  record_source VARCHAR(100) NOT NULL,
  
  -- Descriptive attributes
  customer_name VARCHAR(100),
  email VARCHAR(100),
  phone VARCHAR(20),
  address VARCHAR(200),
  city VARCHAR(50),
  state VARCHAR(50),
  postal_code VARCHAR(20),
  
  -- Audit/metadata attributes
  is_current BOOLEAN,
  expiry_date TIMESTAMP,
  
  PRIMARY KEY (customer_hk, effective_from)
);

Best Practices for BI Data Modeling

Design Principles

Prioritize Query Performance
- Denormalize dimensions where appropriate
- Create indexes aligned with common query patterns
- Implement aggregates for frequently used metrics
Plan for Scalability
- Design partitioning strategies aligned with data volumes
- Implement incremental processing patterns
- Consider distribution keys in MPP databases
Balance Flexibility and Standardization
- Create conformed dimensions for cross-subject analysis
- Document business rules and calculations
- Establish naming conventions and metadata standards
Design for Understandability
- Use business terminology in model naming
- Document metrics and dimension definitions
- Create intuitive hierarchies for drill-down analysis
Implement Proper History Management
- Choose appropriate SCD types based on business requirements
- Document time variance policies for dimensions
- Consider bi-temporal modeling for regulated industries

Dimensional Modeling Fundamentals​

Dimensional Model Overview

Legend

Components

Connection Types

Key Dimensional Modeling Concepts​

Facts vs. Dimensions

Facts vs. Dimensions

Star Schema Design​

Star Schema Architecture

Legend

Components

Connection Types

Example Star Schema Design​

Star vs. Snowflake Schema

Star vs. Snowflake Schema

Slowly Changing Dimensions​

Slowly Changing Dimension Types

Slowly Changing Dimension Types

SCD Type 2 Implementation Example​

Aggregate Tables and Materialized Views​

Aggregation Strategies

Aggregation Strategies

Materialized Aggregate Example​

Conformed Dimensions and Facts​

Enterprise Data Warehouse Architecture

Legend

Components

Connection Types

Conformed Dimensions

Conformed Dimensions

Data Vault Modeling​

Data Vault Architecture

Data Vault Architecture

Data Vault Core Components​

Best Practices for BI Data Modeling​

Design Principles​

Additional Resources​

Dimensional Modeling Fundamentals

Key Dimensional Modeling Concepts

Star Schema Design

Example Star Schema Design

Slowly Changing Dimensions

SCD Type 2 Implementation Example

Aggregate Tables and Materialized Views

Materialized Aggregate Example

Conformed Dimensions and Facts

Data Vault Modeling

Data Vault Core Components

Best Practices for BI Data Modeling

Design Principles

Additional Resources