What are Keys?
Keys are attributes or combinations of attributes that uniquely identify rows in a table and establish relationships between tables. They are fundamental to relational database design, providing a way to ensure data integrity, eliminate redundancy, and enable efficient data retrieval.
Keys serve several critical functions in databases:
- Uniquely identifying records
- Enforcing entity integrity
- Establishing relationships between tables
- Enabling efficient indexing and searching
- Preventing duplicate data
Primary Key
Definition
A primary key is a column or combination of columns that uniquely identifies each row in a table. It serves as the main identifier for records in the table.
Characteristics
- Uniqueness: Each value must be unique within the table
- Non-nullability: Cannot contain NULL values
- Minimality: Should use the minimum number of columns necessary
- Stability: Values should rarely or never change
- Simplicity: Preferably simple data types for performance
Examples
employee_idin an Employees tableproduct_codein a Products table- Combination of
order_idandproduct_idin an Order_Items table
Implementation
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100)
);
Types of Primary Keys
1. Natural Key
- Based on data that naturally exists in the real world
- Uses attributes that are inherently unique
- Examples: Social Security Number, ISBN, Vehicle Identification Number
Advantages:
- Meaningful to users
- Inherent data validation
- No additional storage required
Disadvantages:
- May change over time
- Sometimes complex or long
- May expose sensitive information
- May not exist for all entity types
2. Surrogate Key
- Artificially generated identifier
- Not derived from application data
- Examples: Auto-incrementing integer, UUID/GUID
Advantages:
- Never changes
- Simple and efficient
- Consistent across all tables
- Not affected by business rule changes
Disadvantages:
- No inherent meaning
- Additional storage required
- May hide data quality issues
3. Composite Primary Key
- Consists of two or more columns combined
- Used when no single column has unique values
- Examples: Combination of
student_idandcourse_idin an Enrollments table
Advantages:
- Enforces business rules at the database level
- May eliminate need for additional unique constraints
Disadvantages:
- More complex to work with
- Can make joins more complicated
- Less efficient for indexing
Candidate Key
Definition
A candidate key is any column or combination of columns that could potentially serve as a primary key, satisfying the requirements of uniqueness and non-nullability.
Characteristics
- Uniqueness: Each value must be unique within the table
- Non-nullability: Cannot contain NULL values
- Irreducibility: No subset of the columns in a composite candidate key can uniquely identify each row
Examples
- In a Students table, both
student_idandemailmight be candidate keys - In a Countries table, both
country_codeandcountry_namemight be candidate keys
Significance
- Each table can have multiple candidate keys
- One candidate key is selected as the primary key
- Other candidate keys can be designated as unique keys/alternate keys
Super Key
Definition
A super key is any set of one or more columns that uniquely identifies each row in a table. Unlike candidate keys, super keys may contain additional columns beyond those needed for unique identification.
Characteristics
- Uniqueness: Each combination of values must be unique
- Potentially redundant: May include more columns than necessary
Examples
- If
student_idalone is a candidate key, then {student_id,name}, {student_id,email}, and {student_id,name,email} are all super keys - If {
order_id,product_id} is a candidate key, then {order_id,product_id,quantity} is a super key
Relationship to Other Keys
- Every candidate key is a super key
- Not every super key is a candidate key (if it has unnecessary columns)
Alternate Key
Definition
An alternate key (also called a unique key) is a candidate key that is not selected as the primary key. It provides an alternative way to uniquely identify rows.
Characteristics
- Uniqueness: Values must be unique within the table
- May allow nulls: Unlike primary keys, some databases allow unique keys to contain NULL values (though only one NULL)
Examples
- If
student_idis the primary key,emailmight be an alternate key - If
employee_idis the primary key,social_security_numbermight be an alternate key
Implementation
CREATE TABLE students (
student_id INT PRIMARY KEY,
email VARCHAR(100) UNIQUE,
first_name VARCHAR(50),
last_name VARCHAR(50)
);
Foreign Key
Definition
A foreign key is a column or combination of columns in one table that refers to the primary key (or unique key) in another table. It establishes a relationship between the two tables.
Characteristics
- Referential integrity: Ensures that relationships between tables remain consistent
- May allow nulls: Foreign keys can often be NULL (representing optional relationships)
- Not necessarily unique: Multiple rows can have the same foreign key value
Examples
department_idin an Employees table referencingdepartment_idin a Departments tablecustomer_idin an Orders table referencingcustomer_idin a Customers table
Relationship Types
- One-to-One: Foreign key with a unique constraint
- One-to-Many: Regular foreign key (most common)
- Many-to-Many: Implemented using a junction table with two foreign keys
Implementation
CREATE TABLE orders (
order_id INT PRIMARY KEY,
order_date DATE,
customer_id INT,
total_amount DECIMAL(10,2),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
Referential Actions
When implementing foreign keys, you can specify what happens when the referenced key is updated or deleted:
- CASCADE: Automatically update or delete related records
- SET NULL: Set the foreign key to NULL
- SET DEFAULT: Set the foreign key to its default value
- RESTRICT: Prevent the update or deletion if related records exist
- NO ACTION: Similar to RESTRICT, but checked at the end of the transaction
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
FOREIGN KEY (customer_id)
REFERENCES customers(customer_id)
ON DELETE CASCADE
ON UPDATE RESTRICT
);
Composite Key
Definition
A composite key is a key (primary, candidate, foreign, etc.) that consists of two or more columns combined.
Characteristics
- Combined uniqueness: The combination of values must be unique, not each column individually
- Multiple columns: Uses two or more columns together
- All columns required: All parts are needed to establish uniqueness or the relationship
Examples
- Primary key of {
student_id,course_id} in an Enrollments table - Foreign key of {
order_id,product_id} in an Order_Details table referencing an Order_Items table
Implementation
CREATE TABLE enrollments (
student_id INT,
course_id INT,
enrollment_date DATE,
grade VARCHAR(2),
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
Natural vs. Surrogate Keys
Natural Keys
Definition: Keys that use existing data attributes that are inherently unique in the real world.
Examples:
- Social Security Number for a Person
- ISBN for a Book
- Vehicle Identification Number for a Car
Advantages:
- Have business meaning
- Provide inherent validation
- Save storage space
Disadvantages:
- May change over time
- May be complex or long
- May contain sensitive information
- Not always available for all entity types
Surrogate Keys
Definition: Artificially generated identifiers not derived from application data.
Examples:
- Auto-incrementing integers
- UUIDs/GUIDs
- System-generated codes
Advantages:
- Never change
- Simple and consistent
- Not affected by business rule changes
- Easier to use in relationships
Disadvantages:
- No inherent meaning
- Extra storage required
- May hide data quality issues
- Requires additional unique constraints for business keys
When to Use Each
Use Natural Keys When:
- The attribute is guaranteed to be unique and never change
- The attribute is simple and efficient
- The attribute is not sensitive information
- Performance is not a major concern
Use Surrogate Keys When:
- Natural keys might change over time
- Natural keys are complex or composite
- Natural keys contain sensitive information
- You need consistent key structure across tables
- Performance is critical
Secondary Key
Definition
A secondary key is an attribute or combination of attributes used to retrieve data but not necessarily to identify unique records. It’s primarily used for searching and filtering data.
Characteristics
- Non-unique: May identify multiple records
- Search-oriented: Used mainly for data retrieval
- Often indexed: Commonly used as index keys for performance
Examples
last_namein an Employees tabledepartmentin a Products tableorder_datein an Orders table
Significance
- Determines which columns should be indexed
- Influences query optimization
- Important for reporting and data analysis
Partial Key
Definition
A partial key is part of a primary key in a weak entity. It distinguishes among all entities that depend on the same owner entity.
Characteristics
- Partially identifies: Cannot uniquely identify records without the owner’s primary key
- Used in weak entities: Part of the identification for weak entities
- Combined with owner’s key: Forms the complete primary key when combined with the owner’s primary key
Examples
dependent_namein an Employee_Dependents table (combined withemployee_id)line_numberin an Order_Lines table (combined withorder_id)room_numberin a Building_Rooms table (combined withbuilding_id)
Implementation
CREATE TABLE employee_dependents (
employee_id INT,
dependent_name VARCHAR(100),
birth_date DATE,
relationship VARCHAR(50),
PRIMARY KEY (employee_id, dependent_name),
FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
);
Key Selection Criteria
When selecting keys for database design, consider these factors:
1. Uniqueness
- Will the key uniquely identify each record?
- Will uniqueness be maintained over time?
2. Nullability
- Can the key columns contain NULL values?
- How will NULL values affect identification?
3. Stability
- How likely are the values to change?
- What impact would changes have on related tables?
4. Simplicity
- How complex is the key structure?
- How many columns are needed?
5. Familiarity
- Will users understand and recognize the key?
- Does it have business meaning?
6. Performance
- How efficient is the key for indexing?
- How well will it perform in joins?
7. Security
- Does the key contain sensitive information?
- Will it expose confidential data?
Key Constraints in SQL
Different types of keys are implemented through constraints in SQL:
Primary Key Constraint
-- In table creation
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
price DECIMAL(10,2)
);
-- Or as a separate constraint
CREATE TABLE products (
product_id INT,
product_name VARCHAR(100),
price DECIMAL(10,2),
PRIMARY KEY (product_id)
);
Unique Key Constraint
-- In table creation
CREATE TABLE users (
user_id INT PRIMARY KEY,
username VARCHAR(50) UNIQUE,
email VARCHAR(100) UNIQUE
);
-- Or as a separate constraint
CREATE TABLE users (
user_id INT,
username VARCHAR(50),
email VARCHAR(100),
PRIMARY KEY (user_id),
UNIQUE (username),
UNIQUE (email)
);
Foreign Key Constraint
-- In table creation
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT REFERENCES customers(customer_id),
order_date DATE
);
-- Or as a separate constraint
CREATE TABLE orders (
order_id INT,
customer_id INT,
order_date DATE,
PRIMARY KEY (order_id),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
Composite Key Constraint
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
price DECIMAL(10,2),
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
Best Practices for Key Design
-
Choose appropriate key types
- Use surrogate keys for most entities
- Apply natural keys where they make sense
- Implement foreign keys for all relationships
-
Keep keys simple
- Avoid overly complex composite keys
- Use simple data types for performance
- Consider maintenance and usability
-
Enforce key constraints
- Always declare primary keys
- Implement foreign key constraints
- Use unique constraints for alternate keys
-
Index strategically
- Create indexes on frequently queried keys
- Consider performance implications
- Monitor and adjust as needed
-
Document key decisions
- Record reasons for key selections
- Document business rules related to keys
- Include key information in data dictionaries
-
Plan for scale
- Consider future data volume
- Select key types that scale well
- Avoid potential bottlenecks (e.g., sequential keys in distributed systems)
Keys are fundamental building blocks of relational database design, establishing the structure and relationships that make databases useful and efficient. Proper key selection and implementation ensure data integrity, prevent redundancy, and enable effective data retrieval.