What is Entity Integrity?
Entity integrity is a fundamental rule in relational databases that ensures each entity (row) in a table can be uniquely identified. It’s one of the core integrity rules defined by E.F. Codd in the relational model and serves as the foundation for reliable data storage and retrieval.
The Primary Key Rule
Entity integrity is enforced through the primary key constraint, which states:
Every table must have a primary key, and the column(s) that make up the primary key must contain unique, non-null values for each row in the table.
This rule guarantees that:
- Each row in a table represents a distinct entity
- Each entity can be uniquely identified and accessed
- No duplicate entities exist in the table
Implementing Entity Integrity
Primary Key Definition
A primary key can be:
- A single column (simple primary key)
- Multiple columns combined (composite primary key)
Example of a Simple Primary Key:
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
BirthDate DATE
);
Example of a Composite Primary Key:
CREATE TABLE CourseEnrollment (
StudentID INT,
CourseID INT,
EnrollmentDate DATE,
Grade CHAR(2),
PRIMARY KEY (StudentID, CourseID)
);
Primary Key Requirements
For entity integrity to be maintained, a primary key must have these properties:
- Uniqueness: No two rows can have the same primary key value
- Non-null: Primary key columns cannot contain NULL values
- Immutability: Primary key values should rarely or never change
- Minimality: The key should contain only the columns necessary for uniqueness
How DBMS Enforces Entity Integrity
Database management systems enforce entity integrity through several mechanisms:
- Unique Indexes: Automatically creates indexes to enforce uniqueness
- NOT NULL Constraints: Prevents NULL values in primary key columns
- Validation Checks: Verifies that new/updated values don’t duplicate existing primary keys
- Error Messages: Generates errors when entity integrity would be violated
Natural vs. Surrogate Keys
When implementing entity integrity, you can choose between two approaches:
Natural Keys
Natural keys use existing attributes of an entity that naturally identify it.
Characteristics:
- Based on real-world attributes
- Have business meaning
- May be composite (multiple columns)
- Can sometimes change over time
Examples:
- Social Security Number for a person
- ISBN for a book
- Product code for inventory
When to use:
- When a natural identifier already exists
- When the identifier has business meaning
- When the identifier is stable
Surrogate Keys
Surrogate keys are artificial identifiers created solely for the purpose of uniquely identifying rows.
Characteristics:
- No business meaning
- Usually auto-incremented numbers or GUIDs
- Simple (single column)
- Never change
Examples:
- Auto-incremented CustomerID
- System-generated UUID
- Sequential order numbers
When to use:
- When no natural key exists
- When natural keys might change
- When natural keys are complex or long
- When performance is critical
Entity Integrity Violations
Entity integrity is violated when:
- Duplicate Primary Keys: Attempting to insert a row with a primary key value that already exists
- NULL Primary Keys: Attempting to insert a row with a NULL value in the primary key
- Missing Primary Key: Creating a table without defining a primary key
Example of a Violation:
-- Assuming we already have a student with StudentID = 101
INSERT INTO Students (StudentID, FirstName, LastName, BirthDate)
VALUES (101, 'Jane', 'Smith', '2000-05-15');
-- This will fail with a primary key violation error
Benefits of Entity Integrity
- Data Reliability: Ensures each entity is uniquely identifiable
- Query Accuracy: Guarantees precise retrieval of specific records
- Relationship Foundation: Provides reliable targets for foreign keys
- Prevents Duplicates: Eliminates redundant data
- Indexing Efficiency: Primary keys are automatically indexed for performance
Entity Integrity in Database Design
When designing databases, consider these best practices for entity integrity:
- Choose appropriate primary keys - balance between natural and surrogate keys
- Keep primary keys simple - simpler keys perform better
- Ensure keys are stable - avoid keys that might change
- Use appropriate data types - integers and fixed-length fields are more efficient
- Consider future growth - ensure your key strategy can accommodate data growth
Entity Integrity vs. Other Integrity Types
Entity integrity works alongside other integrity rules:
- Domain Integrity: Ensures values are within permitted ranges
- Referential Integrity: Ensures relationships between tables are valid
- User-Defined Integrity: Implements business-specific rules
Together, these integrity types create a robust foundation for reliable database systems.