Data Modelling

What is Data Modelling?

Data modelling is the process of creating a visual representation of data and its relationships within a database system. It involves defining how data is connected, how it will be stored and organized, and the format of the data.

Think of data modelling as creating a blueprint before building a house. It helps you plan the database structure before actually implementing it, saving time and reducing errors.

Purpose of Data Modelling

Data modelling serves several important purposes:

Communication: Provides a common language for database designers, developers, and business users to discuss data requirements
Documentation: Creates a visual reference that documents data structures and relationships
Planning: Helps plan database structure before implementation begins
Error Detection: Identifies potential problems in database design early in the development process
Standardization: Establishes consistent naming conventions and data formats across the organization
Scope Management: Defines clear boundaries of what data will be included in the system

The Data Modelling Process

Data modelling typically follows these steps:

1. Requirements Gathering

Identify business needs and objectives
Gather information about data elements needed
Document data sources and destinations
Understand access patterns and volumes
Identify security and privacy requirements

2. Conceptual Data Modelling

Create high-level representation of data
Identify main entities and relationships
Focus on business concepts, not technical details
Establish scope of the data model
Communicate with business stakeholders

3. Logical Data Modelling

Transform conceptual model into more detailed form
Define attributes for each entity
Establish relationships and cardinality
Normalize the data structure
Remain independent of any specific database technology

4. Physical Data Modelling

Transform logical model into database-specific implementation
Define tables, columns, data types
Establish primary and foreign keys
Plan indexes and storage structures
Optimize for performance and storage efficiency

5. Validation and Verification

Review model with stakeholders
Validate against requirements
Test with sample data
Perform walkthroughs of common scenarios
Refine the model as needed

Data Modelling Techniques

Several techniques are used for data modelling, each with different strengths:

1. Entity-Relationship Modelling

The most common approach for relational databases, focusing on entities and their relationships.

Key Components:

Entities (things or objects)
Attributes (properties of entities)
Relationships (connections between entities)
Cardinality (one-to-one, one-to-many, many-to-many)

Notation Styles:

Chen notation (original ER notation)
Crow’s Foot notation (widely used in industry)
UML Class Diagrams (object-oriented approach)
IDEF1X (used in government and manufacturing)

2. Dimensional Modelling

Used primarily for data warehouses and business intelligence systems.

Key Components:

Fact tables (contain measurable data)
Dimension tables (provide context for facts)
Star schema (central fact table with surrounding dimensions)
Snowflake schema (normalized dimensions in multiple tables)

3. Object-Oriented Modelling

Based on object-oriented programming concepts.

Key Components:

Classes (templates for objects)
Objects (instances of classes)
Inheritance (parent-child relationships)
Encapsulation (hiding internal details)
Polymorphism (multiple forms)

4. Hierarchical Modelling

Represents data in a tree-like structure with parent-child relationships.

Key Components:

Root node (top-level entity)
Parent nodes (entities with children)
Child nodes (entities related to parents)
Links (connections between nodes)

5. Network Modelling

Extends hierarchical model to allow many-to-many relationships.

Key Components:

Records (collections of related fields)
Sets (named relationships between records)
Owner records (parent records in relationships)
Member records (child records in relationships)

Data Modelling Notations

Different visual notations are used to represent data models:

1. Entity-Relationship Diagrams (ERD)

Components:

Rectangles represent entities
Ovals or circles represent attributes
Diamonds or lines represent relationships
Various symbols indicate cardinality

2. Unified Modeling Language (UML)

Components:

Classes shown as rectangles with three sections
Associations represented by lines
Multiplicities shown at ends of associations
Attributes and operations listed in class boxes

3. Crow’s Foot Notation

Components:

Entities shown as rectangles
Relationships shown as lines
Cardinality indicated by symbols at line ends:
- Crow’s foot (many)
- Bar (exactly one)
- Circle (zero)
- Bar with circle (zero or one)

4. IDEF1X (Integration Definition for Information Modeling)

Components:

Entity boxes with rounded corners (dependent) or square corners (independent)
Relationships with solid or dashed lines
Key attributes listed at top of entity box
Non-key attributes below a horizontal line

Levels of Data Modelling

Data modelling occurs at different levels of abstraction:

1. Conceptual Data Model

Purpose: To represent data requirements from a business perspective

Characteristics:

High-level overview
Identifies major entities and relationships
No technical details
Understandable by non-technical stakeholders
Typically 10-20 entities for a moderate system

Example Elements:

Customer
Order
Product
“Customer places Order”
“Order contains Product”

2. Logical Data Model

Purpose: To represent the detailed structure of data independent of any database technology

Characteristics:

Complete set of entities
All attributes identified
Primary keys established
Foreign keys defined
Normalized to reduce redundancy
Technology-independent

Example Elements:

Customer (customer_id, name, email, phone, address)
Order (order_id, customer_id, order_date, total_amount)
Product (product_id, name, description, price)
Primary and foreign key relationships

3. Physical Data Model

Purpose: To represent how data will be implemented in a specific database system

Characteristics:

Database-specific syntax and features
Physical storage considerations
Performance optimization elements
Indexing strategies
Partitioning schemes
Specific data types and constraints

Example Elements:

Customers (customer_id INT PRIMARY KEY, name VARCHAR(100), etc.)
Indexes (CREATE INDEX idx_customer_name ON Customers(name))
Storage parameters (TABLESPACE users STORAGE(INITIAL 100K))

Cardinality in Data Modelling

Cardinality defines the numerical relationship between entities:

1. One-to-One (1:1)

Each entity in the first set is related to exactly one entity in the second set, and vice versa.

Example: Each Person has exactly one Passport, and each Passport belongs to exactly one Person.

Notation:

Crow’s Foot: Bar on both ends of the relationship line
Chen: “1” on both ends

2. One-to-Many (1:N)

Each entity in the first set can be related to multiple entities in the second set, but each entity in the second set is related to only one entity in the first set.

Example: Each Department has many Employees, but each Employee belongs to only one Department.

Notation:

Crow’s Foot: Bar on one end, crow’s foot on the other
Chen: “1” on one end, “N” on the other

3. Many-to-Many (M:N)

Each entity in the first set can be related to multiple entities in the second set, and vice versa.

Example: Each Student can enroll in many Courses, and each Course can have many Students.

Notation:

Crow’s Foot: Crow’s foot on both ends
Chen: “N” on both ends

Implementation: Typically requires an intermediary table (junction or bridge table) in relational databases.

4. Zero or One (0:1)

An optional relationship where an entity may have no relationship or at most one.

Example: An Employee may or may not have a Company Car.

Notation:

Crow’s Foot: Circle and bar
Chen: “0,1”

Entity Types in Data Modelling

Entities can be classified into different types:

1. Strong Entity (Independent Entity)

Exists independently of other entity types
Has its own primary key
Not dependent on any other entity

Example: Customer, Product

2. Weak Entity (Dependent Entity)

Depends on a strong entity for its existence
Cannot be uniquely identified by its attributes alone
Requires the primary key of its owner entity

Example: Order Line Item (depends on Order)

3. Associative Entity (Junction Entity)

Represents a many-to-many relationship
Contains foreign keys from the entities it connects
May contain additional attributes about the relationship

Example: Enrollment (connects Student and Course)

4. Subtype Entity (Child Entity)

Represents a specialization of a supertype entity
Inherits attributes from the supertype
Has additional attributes specific to the subtype

Example: SavingsAccount and CheckingAccount as subtypes of Account

Data Model Quality Characteristics

A good data model should exhibit these qualities:

1. Completeness

Captures all required data elements
Represents all necessary relationships
Includes all business rules and constraints

2. Correctness

Accurately represents the real-world system
Uses appropriate modelling constructs
Free from logical contradictions

3. Simplicity

As simple as possible while meeting requirements
Avoids unnecessary complexity
Easy to understand and explain

4. Flexibility

Accommodates future changes
Allows for reasonable extensions
Minimizes impact of changes

5. Integration

Consistent with existing data models
Uses standard terminology
Compatible with related systems

6. Implementability

Can be implemented in target technology
Performs well with expected data volumes
Practical to build and maintain

Common Data Modelling Challenges

Data modellers often face these challenges:

1. Handling Temporal Data

Historical data requirements
Effective dating
Time-variant attributes
Audit trails

2. Modelling Hierarchies

Parent-child relationships
Organizational structures
Product categorizations
Recursive relationships

3. Representing Business Rules

Complex constraints
Conditional relationships
Validation rules
Derivation formulas

4. Managing Data Volumes

Partitioning strategies
Archiving approaches
Summarization techniques
Performance considerations

5. Integration with Legacy Systems

Mapping to existing structures
Compatibility constraints
Migration paths
Conversion requirements

Data Modelling Best Practices

Follow these practices for effective data modelling:

Involve stakeholders throughout the process
Start simple and add complexity as needed
Use consistent naming conventions
Document assumptions and decisions
Validate models with sample data and scenarios
Consider performance implications early
Review models with both business and technical teams
Normalize first, then denormalize if needed
Create reusable components for common patterns
Version control your data models

Tools for Data Modelling

Several tools are available to assist with data modelling:

Database-specific tools:
- Oracle SQL Developer Data Modeler
- Microsoft SQL Server Management Studio
- MySQL Workbench
Specialized data modelling tools:
- ERwin Data Modeler
- ER/Studio
- PowerDesigner
- Lucidchart
General-purpose diagramming tools:
- Microsoft Visio
- Draw.io
- Gliffy
Collaborative modelling platforms:
- Vertabelo
- QuickDBD
- DbSchema

Data modelling is a critical step in database design that helps ensure the resulting database will effectively meet business needs while providing a solid foundation for application development.

Data Modelling

What is Data Modelling?

Purpose of Data Modelling

The Data Modelling Process

1. Requirements Gathering

2. Conceptual Data Modelling

3. Logical Data Modelling

4. Physical Data Modelling

5. Validation and Verification

Data Modelling Techniques

1. Entity-Relationship Modelling

2. Dimensional Modelling

3. Object-Oriented Modelling

4. Hierarchical Modelling

5. Network Modelling

Data Modelling Notations

1. Entity-Relationship Diagrams (ERD)

2. Unified Modeling Language (UML)

3. Crow’s Foot Notation

4. IDEF1X (Integration Definition for Information Modeling)

Levels of Data Modelling

1. Conceptual Data Model

2. Logical Data Model

3. Physical Data Model

Cardinality in Data Modelling

1. One-to-One (1:1)

2. One-to-Many (1:N)

3. Many-to-Many (M:N)

4. Zero or One (0:1)

Entity Types in Data Modelling

1. Strong Entity (Independent Entity)

2. Weak Entity (Dependent Entity)

3. Associative Entity (Junction Entity)

4. Subtype Entity (Child Entity)

Data Model Quality Characteristics

1. Completeness

2. Correctness

3. Simplicity

4. Flexibility

5. Integration

6. Implementability

Common Data Modelling Challenges

1. Handling Temporal Data

2. Modelling Hierarchies

3. Representing Business Rules

4. Managing Data Volumes

5. Integration with Legacy Systems

Data Modelling Best Practices

Tools for Data Modelling

Install Notes App