Entity Integrity

What is Entity Integrity?

Entity integrity is a fundamental rule in relational databases that ensures each entity (row) in a table can be uniquely identified. It’s one of the core integrity rules defined by E.F. Codd in the relational model and serves as the foundation for reliable data storage and retrieval.

The Primary Key Rule

Entity integrity is enforced through the primary key constraint, which states:

Every table must have a primary key, and the column(s) that make up the primary key must contain unique, non-null values for each row in the table.

This rule guarantees that:

  1. Each row in a table represents a distinct entity
  2. Each entity can be uniquely identified and accessed
  3. No duplicate entities exist in the table

Implementing Entity Integrity

Primary Key Definition

A primary key can be:

  • A single column (simple primary key)
  • Multiple columns combined (composite primary key)

Example of a Simple Primary Key:

CREATE TABLE Students (
    StudentID INT PRIMARY KEY,
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    BirthDate DATE
);

Example of a Composite Primary Key:

CREATE TABLE CourseEnrollment (
    StudentID INT,
    CourseID INT,
    EnrollmentDate DATE,
    Grade CHAR(2),
    PRIMARY KEY (StudentID, CourseID)
);

Primary Key Requirements

For entity integrity to be maintained, a primary key must have these properties:

  1. Uniqueness: No two rows can have the same primary key value
  2. Non-null: Primary key columns cannot contain NULL values
  3. Immutability: Primary key values should rarely or never change
  4. Minimality: The key should contain only the columns necessary for uniqueness

How DBMS Enforces Entity Integrity

Database management systems enforce entity integrity through several mechanisms:

  1. Unique Indexes: Automatically creates indexes to enforce uniqueness
  2. NOT NULL Constraints: Prevents NULL values in primary key columns
  3. Validation Checks: Verifies that new/updated values don’t duplicate existing primary keys
  4. Error Messages: Generates errors when entity integrity would be violated

Natural vs. Surrogate Keys

When implementing entity integrity, you can choose between two approaches:

Natural Keys

Natural keys use existing attributes of an entity that naturally identify it.

Characteristics:

  • Based on real-world attributes
  • Have business meaning
  • May be composite (multiple columns)
  • Can sometimes change over time

Examples:

  • Social Security Number for a person
  • ISBN for a book
  • Product code for inventory

When to use:

  • When a natural identifier already exists
  • When the identifier has business meaning
  • When the identifier is stable

Surrogate Keys

Surrogate keys are artificial identifiers created solely for the purpose of uniquely identifying rows.

Characteristics:

  • No business meaning
  • Usually auto-incremented numbers or GUIDs
  • Simple (single column)
  • Never change

Examples:

  • Auto-incremented CustomerID
  • System-generated UUID
  • Sequential order numbers

When to use:

  • When no natural key exists
  • When natural keys might change
  • When natural keys are complex or long
  • When performance is critical

Entity Integrity Violations

Entity integrity is violated when:

  1. Duplicate Primary Keys: Attempting to insert a row with a primary key value that already exists
  2. NULL Primary Keys: Attempting to insert a row with a NULL value in the primary key
  3. Missing Primary Key: Creating a table without defining a primary key

Example of a Violation:

-- Assuming we already have a student with StudentID = 101
INSERT INTO Students (StudentID, FirstName, LastName, BirthDate)
VALUES (101, 'Jane', 'Smith', '2000-05-15');
-- This will fail with a primary key violation error

Benefits of Entity Integrity

  1. Data Reliability: Ensures each entity is uniquely identifiable
  2. Query Accuracy: Guarantees precise retrieval of specific records
  3. Relationship Foundation: Provides reliable targets for foreign keys
  4. Prevents Duplicates: Eliminates redundant data
  5. Indexing Efficiency: Primary keys are automatically indexed for performance

Entity Integrity in Database Design

When designing databases, consider these best practices for entity integrity:

  1. Choose appropriate primary keys - balance between natural and surrogate keys
  2. Keep primary keys simple - simpler keys perform better
  3. Ensure keys are stable - avoid keys that might change
  4. Use appropriate data types - integers and fixed-length fields are more efficient
  5. Consider future growth - ensure your key strategy can accommodate data growth

Entity Integrity vs. Other Integrity Types

Entity integrity works alongside other integrity rules:

  • Domain Integrity: Ensures values are within permitted ranges
  • Referential Integrity: Ensures relationships between tables are valid
  • User-Defined Integrity: Implements business-specific rules

Together, these integrity types create a robust foundation for reliable database systems.