Data Modelling

What is Data Modelling?

Data modelling is the process of creating a visual representation of data and its relationships within a database system. It involves defining how data is connected, how it will be stored and organized, and the format of the data.

Think of data modelling as creating a blueprint before building a house. It helps you plan the database structure before actually implementing it, saving time and reducing errors.

Purpose of Data Modelling

Data modelling serves several important purposes:

  1. Communication: Provides a common language for database designers, developers, and business users to discuss data requirements

  2. Documentation: Creates a visual reference that documents data structures and relationships

  3. Planning: Helps plan database structure before implementation begins

  4. Error Detection: Identifies potential problems in database design early in the development process

  5. Standardization: Establishes consistent naming conventions and data formats across the organization

  6. Scope Management: Defines clear boundaries of what data will be included in the system

The Data Modelling Process

Data modelling typically follows these steps:

1. Requirements Gathering

  • Identify business needs and objectives
  • Gather information about data elements needed
  • Document data sources and destinations
  • Understand access patterns and volumes
  • Identify security and privacy requirements

2. Conceptual Data Modelling

  • Create high-level representation of data
  • Identify main entities and relationships
  • Focus on business concepts, not technical details
  • Establish scope of the data model
  • Communicate with business stakeholders

3. Logical Data Modelling

  • Transform conceptual model into more detailed form
  • Define attributes for each entity
  • Establish relationships and cardinality
  • Normalize the data structure
  • Remain independent of any specific database technology

4. Physical Data Modelling

  • Transform logical model into database-specific implementation
  • Define tables, columns, data types
  • Establish primary and foreign keys
  • Plan indexes and storage structures
  • Optimize for performance and storage efficiency

5. Validation and Verification

  • Review model with stakeholders
  • Validate against requirements
  • Test with sample data
  • Perform walkthroughs of common scenarios
  • Refine the model as needed

Data Modelling Techniques

Several techniques are used for data modelling, each with different strengths:

1. Entity-Relationship Modelling

The most common approach for relational databases, focusing on entities and their relationships.

Key Components:

  • Entities (things or objects)
  • Attributes (properties of entities)
  • Relationships (connections between entities)
  • Cardinality (one-to-one, one-to-many, many-to-many)

Notation Styles:

  • Chen notation (original ER notation)
  • Crow’s Foot notation (widely used in industry)
  • UML Class Diagrams (object-oriented approach)
  • IDEF1X (used in government and manufacturing)

2. Dimensional Modelling

Used primarily for data warehouses and business intelligence systems.

Key Components:

  • Fact tables (contain measurable data)
  • Dimension tables (provide context for facts)
  • Star schema (central fact table with surrounding dimensions)
  • Snowflake schema (normalized dimensions in multiple tables)

3. Object-Oriented Modelling

Based on object-oriented programming concepts.

Key Components:

  • Classes (templates for objects)
  • Objects (instances of classes)
  • Inheritance (parent-child relationships)
  • Encapsulation (hiding internal details)
  • Polymorphism (multiple forms)

4. Hierarchical Modelling

Represents data in a tree-like structure with parent-child relationships.

Key Components:

  • Root node (top-level entity)
  • Parent nodes (entities with children)
  • Child nodes (entities related to parents)
  • Links (connections between nodes)

5. Network Modelling

Extends hierarchical model to allow many-to-many relationships.

Key Components:

  • Records (collections of related fields)
  • Sets (named relationships between records)
  • Owner records (parent records in relationships)
  • Member records (child records in relationships)

Data Modelling Notations

Different visual notations are used to represent data models:

1. Entity-Relationship Diagrams (ERD)

Components:

  • Rectangles represent entities
  • Ovals or circles represent attributes
  • Diamonds or lines represent relationships
  • Various symbols indicate cardinality

2. Unified Modeling Language (UML)

Components:

  • Classes shown as rectangles with three sections
  • Associations represented by lines
  • Multiplicities shown at ends of associations
  • Attributes and operations listed in class boxes

3. Crow’s Foot Notation

Components:

  • Entities shown as rectangles
  • Relationships shown as lines
  • Cardinality indicated by symbols at line ends:
    • Crow’s foot (many)
    • Bar (exactly one)
    • Circle (zero)
    • Bar with circle (zero or one)

4. IDEF1X (Integration Definition for Information Modeling)

Components:

  • Entity boxes with rounded corners (dependent) or square corners (independent)
  • Relationships with solid or dashed lines
  • Key attributes listed at top of entity box
  • Non-key attributes below a horizontal line

Levels of Data Modelling

Data modelling occurs at different levels of abstraction:

1. Conceptual Data Model

Purpose: To represent data requirements from a business perspective

Characteristics:

  • High-level overview
  • Identifies major entities and relationships
  • No technical details
  • Understandable by non-technical stakeholders
  • Typically 10-20 entities for a moderate system

Example Elements:

  • Customer
  • Order
  • Product
  • “Customer places Order”
  • “Order contains Product”

2. Logical Data Model

Purpose: To represent the detailed structure of data independent of any database technology

Characteristics:

  • Complete set of entities
  • All attributes identified
  • Primary keys established
  • Foreign keys defined
  • Normalized to reduce redundancy
  • Technology-independent

Example Elements:

  • Customer (customer_id, name, email, phone, address)
  • Order (order_id, customer_id, order_date, total_amount)
  • Product (product_id, name, description, price)
  • Primary and foreign key relationships

3. Physical Data Model

Purpose: To represent how data will be implemented in a specific database system

Characteristics:

  • Database-specific syntax and features
  • Physical storage considerations
  • Performance optimization elements
  • Indexing strategies
  • Partitioning schemes
  • Specific data types and constraints

Example Elements:

  • Customers (customer_id INT PRIMARY KEY, name VARCHAR(100), etc.)
  • Indexes (CREATE INDEX idx_customer_name ON Customers(name))
  • Storage parameters (TABLESPACE users STORAGE(INITIAL 100K))

Cardinality in Data Modelling

Cardinality defines the numerical relationship between entities:

1. One-to-One (1:1)

Each entity in the first set is related to exactly one entity in the second set, and vice versa.

Example: Each Person has exactly one Passport, and each Passport belongs to exactly one Person.

Notation:

  • Crow’s Foot: Bar on both ends of the relationship line
  • Chen: “1” on both ends

2. One-to-Many (1:N)

Each entity in the first set can be related to multiple entities in the second set, but each entity in the second set is related to only one entity in the first set.

Example: Each Department has many Employees, but each Employee belongs to only one Department.

Notation:

  • Crow’s Foot: Bar on one end, crow’s foot on the other
  • Chen: “1” on one end, “N” on the other

3. Many-to-Many (M:N)

Each entity in the first set can be related to multiple entities in the second set, and vice versa.

Example: Each Student can enroll in many Courses, and each Course can have many Students.

Notation:

  • Crow’s Foot: Crow’s foot on both ends
  • Chen: “N” on both ends

Implementation: Typically requires an intermediary table (junction or bridge table) in relational databases.

4. Zero or One (0:1)

An optional relationship where an entity may have no relationship or at most one.

Example: An Employee may or may not have a Company Car.

Notation:

  • Crow’s Foot: Circle and bar
  • Chen: “0,1”

Entity Types in Data Modelling

Entities can be classified into different types:

1. Strong Entity (Independent Entity)

  • Exists independently of other entity types
  • Has its own primary key
  • Not dependent on any other entity

Example: Customer, Product

2. Weak Entity (Dependent Entity)

  • Depends on a strong entity for its existence
  • Cannot be uniquely identified by its attributes alone
  • Requires the primary key of its owner entity

Example: Order Line Item (depends on Order)

3. Associative Entity (Junction Entity)

  • Represents a many-to-many relationship
  • Contains foreign keys from the entities it connects
  • May contain additional attributes about the relationship

Example: Enrollment (connects Student and Course)

4. Subtype Entity (Child Entity)

  • Represents a specialization of a supertype entity
  • Inherits attributes from the supertype
  • Has additional attributes specific to the subtype

Example: SavingsAccount and CheckingAccount as subtypes of Account

Data Model Quality Characteristics

A good data model should exhibit these qualities:

1. Completeness

  • Captures all required data elements
  • Represents all necessary relationships
  • Includes all business rules and constraints

2. Correctness

  • Accurately represents the real-world system
  • Uses appropriate modelling constructs
  • Free from logical contradictions

3. Simplicity

  • As simple as possible while meeting requirements
  • Avoids unnecessary complexity
  • Easy to understand and explain

4. Flexibility

  • Accommodates future changes
  • Allows for reasonable extensions
  • Minimizes impact of changes

5. Integration

  • Consistent with existing data models
  • Uses standard terminology
  • Compatible with related systems

6. Implementability

  • Can be implemented in target technology
  • Performs well with expected data volumes
  • Practical to build and maintain

Common Data Modelling Challenges

Data modellers often face these challenges:

1. Handling Temporal Data

  • Historical data requirements
  • Effective dating
  • Time-variant attributes
  • Audit trails

2. Modelling Hierarchies

  • Parent-child relationships
  • Organizational structures
  • Product categorizations
  • Recursive relationships

3. Representing Business Rules

  • Complex constraints
  • Conditional relationships
  • Validation rules
  • Derivation formulas

4. Managing Data Volumes

  • Partitioning strategies
  • Archiving approaches
  • Summarization techniques
  • Performance considerations

5. Integration with Legacy Systems

  • Mapping to existing structures
  • Compatibility constraints
  • Migration paths
  • Conversion requirements

Data Modelling Best Practices

Follow these practices for effective data modelling:

  1. Involve stakeholders throughout the process
  2. Start simple and add complexity as needed
  3. Use consistent naming conventions
  4. Document assumptions and decisions
  5. Validate models with sample data and scenarios
  6. Consider performance implications early
  7. Review models with both business and technical teams
  8. Normalize first, then denormalize if needed
  9. Create reusable components for common patterns
  10. Version control your data models

Tools for Data Modelling

Several tools are available to assist with data modelling:

  1. Database-specific tools:

    • Oracle SQL Developer Data Modeler
    • Microsoft SQL Server Management Studio
    • MySQL Workbench
  2. Specialized data modelling tools:

    • ERwin Data Modeler
    • ER/Studio
    • PowerDesigner
    • Lucidchart
  3. General-purpose diagramming tools:

    • Microsoft Visio
    • Draw.io
    • Gliffy
  4. Collaborative modelling platforms:

    • Vertabelo
    • QuickDBD
    • DbSchema

Data modelling is a critical step in database design that helps ensure the resulting database will effectively meet business needs while providing a solid foundation for application development.