Types of Keys

What are Keys?

Keys are attributes or combinations of attributes that uniquely identify rows in a table and establish relationships between tables. They are fundamental to relational database design, providing a way to ensure data integrity, eliminate redundancy, and enable efficient data retrieval.

Keys serve several critical functions in databases:

  • Uniquely identifying records
  • Enforcing entity integrity
  • Establishing relationships between tables
  • Enabling efficient indexing and searching
  • Preventing duplicate data

Primary Key

Definition

A primary key is a column or combination of columns that uniquely identifies each row in a table. It serves as the main identifier for records in the table.

Characteristics

  • Uniqueness: Each value must be unique within the table
  • Non-nullability: Cannot contain NULL values
  • Minimality: Should use the minimum number of columns necessary
  • Stability: Values should rarely or never change
  • Simplicity: Preferably simple data types for performance

Examples

  • employee_id in an Employees table
  • product_code in a Products table
  • Combination of order_id and product_id in an Order_Items table

Implementation

CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100)
);

Types of Primary Keys

1. Natural Key

  • Based on data that naturally exists in the real world
  • Uses attributes that are inherently unique
  • Examples: Social Security Number, ISBN, Vehicle Identification Number

Advantages:

  • Meaningful to users
  • Inherent data validation
  • No additional storage required

Disadvantages:

  • May change over time
  • Sometimes complex or long
  • May expose sensitive information
  • May not exist for all entity types

2. Surrogate Key

  • Artificially generated identifier
  • Not derived from application data
  • Examples: Auto-incrementing integer, UUID/GUID

Advantages:

  • Never changes
  • Simple and efficient
  • Consistent across all tables
  • Not affected by business rule changes

Disadvantages:

  • No inherent meaning
  • Additional storage required
  • May hide data quality issues

3. Composite Primary Key

  • Consists of two or more columns combined
  • Used when no single column has unique values
  • Examples: Combination of student_id and course_id in an Enrollments table

Advantages:

  • Enforces business rules at the database level
  • May eliminate need for additional unique constraints

Disadvantages:

  • More complex to work with
  • Can make joins more complicated
  • Less efficient for indexing

Candidate Key

Definition

A candidate key is any column or combination of columns that could potentially serve as a primary key, satisfying the requirements of uniqueness and non-nullability.

Characteristics

  • Uniqueness: Each value must be unique within the table
  • Non-nullability: Cannot contain NULL values
  • Irreducibility: No subset of the columns in a composite candidate key can uniquely identify each row

Examples

  • In a Students table, both student_id and email might be candidate keys
  • In a Countries table, both country_code and country_name might be candidate keys

Significance

  • Each table can have multiple candidate keys
  • One candidate key is selected as the primary key
  • Other candidate keys can be designated as unique keys/alternate keys

Super Key

Definition

A super key is any set of one or more columns that uniquely identifies each row in a table. Unlike candidate keys, super keys may contain additional columns beyond those needed for unique identification.

Characteristics

  • Uniqueness: Each combination of values must be unique
  • Potentially redundant: May include more columns than necessary

Examples

  • If student_id alone is a candidate key, then {student_id, name}, {student_id, email}, and {student_id, name, email} are all super keys
  • If {order_id, product_id} is a candidate key, then {order_id, product_id, quantity} is a super key

Relationship to Other Keys

  • Every candidate key is a super key
  • Not every super key is a candidate key (if it has unnecessary columns)

Alternate Key

Definition

An alternate key (also called a unique key) is a candidate key that is not selected as the primary key. It provides an alternative way to uniquely identify rows.

Characteristics

  • Uniqueness: Values must be unique within the table
  • May allow nulls: Unlike primary keys, some databases allow unique keys to contain NULL values (though only one NULL)

Examples

  • If student_id is the primary key, email might be an alternate key
  • If employee_id is the primary key, social_security_number might be an alternate key

Implementation

CREATE TABLE students (
    student_id INT PRIMARY KEY,
    email VARCHAR(100) UNIQUE,
    first_name VARCHAR(50),
    last_name VARCHAR(50)
);

Foreign Key

Definition

A foreign key is a column or combination of columns in one table that refers to the primary key (or unique key) in another table. It establishes a relationship between the two tables.

Characteristics

  • Referential integrity: Ensures that relationships between tables remain consistent
  • May allow nulls: Foreign keys can often be NULL (representing optional relationships)
  • Not necessarily unique: Multiple rows can have the same foreign key value

Examples

  • department_id in an Employees table referencing department_id in a Departments table
  • customer_id in an Orders table referencing customer_id in a Customers table

Relationship Types

  • One-to-One: Foreign key with a unique constraint
  • One-to-Many: Regular foreign key (most common)
  • Many-to-Many: Implemented using a junction table with two foreign keys

Implementation

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    order_date DATE,
    customer_id INT,
    total_amount DECIMAL(10,2),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Referential Actions

When implementing foreign keys, you can specify what happens when the referenced key is updated or deleted:

  • CASCADE: Automatically update or delete related records
  • SET NULL: Set the foreign key to NULL
  • SET DEFAULT: Set the foreign key to its default value
  • RESTRICT: Prevent the update or deletion if related records exist
  • NO ACTION: Similar to RESTRICT, but checked at the end of the transaction
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    FOREIGN KEY (customer_id) 
    REFERENCES customers(customer_id)
    ON DELETE CASCADE
    ON UPDATE RESTRICT
);

Composite Key

Definition

A composite key is a key (primary, candidate, foreign, etc.) that consists of two or more columns combined.

Characteristics

  • Combined uniqueness: The combination of values must be unique, not each column individually
  • Multiple columns: Uses two or more columns together
  • All columns required: All parts are needed to establish uniqueness or the relationship

Examples

  • Primary key of {student_id, course_id} in an Enrollments table
  • Foreign key of {order_id, product_id} in an Order_Details table referencing an Order_Items table

Implementation

CREATE TABLE enrollments (
    student_id INT,
    course_id INT,
    enrollment_date DATE,
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);

Natural vs. Surrogate Keys

Natural Keys

Definition: Keys that use existing data attributes that are inherently unique in the real world.

Examples:

  • Social Security Number for a Person
  • ISBN for a Book
  • Vehicle Identification Number for a Car

Advantages:

  • Have business meaning
  • Provide inherent validation
  • Save storage space

Disadvantages:

  • May change over time
  • May be complex or long
  • May contain sensitive information
  • Not always available for all entity types

Surrogate Keys

Definition: Artificially generated identifiers not derived from application data.

Examples:

  • Auto-incrementing integers
  • UUIDs/GUIDs
  • System-generated codes

Advantages:

  • Never change
  • Simple and consistent
  • Not affected by business rule changes
  • Easier to use in relationships

Disadvantages:

  • No inherent meaning
  • Extra storage required
  • May hide data quality issues
  • Requires additional unique constraints for business keys

When to Use Each

Use Natural Keys When:

  • The attribute is guaranteed to be unique and never change
  • The attribute is simple and efficient
  • The attribute is not sensitive information
  • Performance is not a major concern

Use Surrogate Keys When:

  • Natural keys might change over time
  • Natural keys are complex or composite
  • Natural keys contain sensitive information
  • You need consistent key structure across tables
  • Performance is critical

Secondary Key

Definition

A secondary key is an attribute or combination of attributes used to retrieve data but not necessarily to identify unique records. It’s primarily used for searching and filtering data.

Characteristics

  • Non-unique: May identify multiple records
  • Search-oriented: Used mainly for data retrieval
  • Often indexed: Commonly used as index keys for performance

Examples

  • last_name in an Employees table
  • department in a Products table
  • order_date in an Orders table

Significance

  • Determines which columns should be indexed
  • Influences query optimization
  • Important for reporting and data analysis

Partial Key

Definition

A partial key is part of a primary key in a weak entity. It distinguishes among all entities that depend on the same owner entity.

Characteristics

  • Partially identifies: Cannot uniquely identify records without the owner’s primary key
  • Used in weak entities: Part of the identification for weak entities
  • Combined with owner’s key: Forms the complete primary key when combined with the owner’s primary key

Examples

  • dependent_name in an Employee_Dependents table (combined with employee_id)
  • line_number in an Order_Lines table (combined with order_id)
  • room_number in a Building_Rooms table (combined with building_id)

Implementation

CREATE TABLE employee_dependents (
    employee_id INT,
    dependent_name VARCHAR(100),
    birth_date DATE,
    relationship VARCHAR(50),
    PRIMARY KEY (employee_id, dependent_name),
    FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
);

Key Selection Criteria

When selecting keys for database design, consider these factors:

1. Uniqueness

  • Will the key uniquely identify each record?
  • Will uniqueness be maintained over time?

2. Nullability

  • Can the key columns contain NULL values?
  • How will NULL values affect identification?

3. Stability

  • How likely are the values to change?
  • What impact would changes have on related tables?

4. Simplicity

  • How complex is the key structure?
  • How many columns are needed?

5. Familiarity

  • Will users understand and recognize the key?
  • Does it have business meaning?

6. Performance

  • How efficient is the key for indexing?
  • How well will it perform in joins?

7. Security

  • Does the key contain sensitive information?
  • Will it expose confidential data?

Key Constraints in SQL

Different types of keys are implemented through constraints in SQL:

Primary Key Constraint

-- In table creation
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100),
    price DECIMAL(10,2)
);

-- Or as a separate constraint
CREATE TABLE products (
    product_id INT,
    product_name VARCHAR(100),
    price DECIMAL(10,2),
    PRIMARY KEY (product_id)
);

Unique Key Constraint

-- In table creation
CREATE TABLE users (
    user_id INT PRIMARY KEY,
    username VARCHAR(50) UNIQUE,
    email VARCHAR(100) UNIQUE
);

-- Or as a separate constraint
CREATE TABLE users (
    user_id INT,
    username VARCHAR(50),
    email VARCHAR(100),
    PRIMARY KEY (user_id),
    UNIQUE (username),
    UNIQUE (email)
);

Foreign Key Constraint

-- In table creation
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    order_date DATE
);

-- Or as a separate constraint
CREATE TABLE orders (
    order_id INT,
    customer_id INT,
    order_date DATE,
    PRIMARY KEY (order_id),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Composite Key Constraint

CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    quantity INT,
    price DECIMAL(10,2),
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

Best Practices for Key Design

  1. Choose appropriate key types

    • Use surrogate keys for most entities
    • Apply natural keys where they make sense
    • Implement foreign keys for all relationships
  2. Keep keys simple

    • Avoid overly complex composite keys
    • Use simple data types for performance
    • Consider maintenance and usability
  3. Enforce key constraints

    • Always declare primary keys
    • Implement foreign key constraints
    • Use unique constraints for alternate keys
  4. Index strategically

    • Create indexes on frequently queried keys
    • Consider performance implications
    • Monitor and adjust as needed
  5. Document key decisions

    • Record reasons for key selections
    • Document business rules related to keys
    • Include key information in data dictionaries
  6. Plan for scale

    • Consider future data volume
    • Select key types that scale well
    • Avoid potential bottlenecks (e.g., sequential keys in distributed systems)

Keys are fundamental building blocks of relational database design, establishing the structure and relationships that make databases useful and efficient. Proper key selection and implementation ensure data integrity, prevent redundancy, and enable effective data retrieval.