What is Normalisation?
Normalisation is a systematic approach to database design that reduces data redundancy and improves data integrity by organizing fields and tables in a relational database. Developed by E.F. Codd in the 1970s, normalisation breaks down larger tables into smaller, well-structured tables and defines relationships between them.
Purpose of Normalisation
Normalisation serves several important purposes:
- Eliminate Data Redundancy: Reduce duplicate data to save storage space
- Minimize Data Anomalies: Prevent update, insert, and delete anomalies
- Improve Data Consistency: Ensure data is stored consistently across the database
- Simplify Data Maintenance: Make it easier to update and maintain the database
- Create Flexible Database Design: Allow for database expansion without major redesign
Data Anomalies That Normalisation Prevents
Without proper normalisation, databases can suffer from three types of anomalies:
1. Update Anomalies
When data is duplicated and only some instances are updated, the database becomes inconsistent.
Example: If customer addresses are stored in multiple tables and a customer moves, we might update some records but miss others, leading to inconsistent address information.
2. Insertion Anomalies
When you cannot add certain data because other unrelated data is missing.
Example: In a poorly designed university database, you might not be able to add a new course until a student enrolls in it, because the course information is only stored in a student enrollment table.
3. Deletion Anomalies
When deleting some data unintentionally causes other, unrelated data to be deleted.
Example: If you delete the last student enrolled in a particular course, you might inadvertently delete all information about that course if course details are only stored in the student enrollment records.
The Normalisation Process
Normalisation is typically performed in stages, with each stage representing a “normal form.” Each normal form has specific criteria that a database must meet. The most commonly used normal forms are:
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
Higher normal forms also exist (BCNF, 4NF, 5NF), but they are less commonly applied in practice.
First Normal Form (1NF)
A table is in First Normal Form if:
- It has no repeating groups or arrays
- All column values are atomic (indivisible)
- Each row is unique (typically enforced by a primary key)
Before 1NF:
| StudentID | Name | Courses |
|---|---|---|
| 1 | John Smith | Math, Physics, Chemistry |
| 2 | Mary Jones | History, English |
After 1NF:
| StudentID | Name | Course |
|---|---|---|
| 1 | John Smith | Math |
| 1 | John Smith | Physics |
| 1 | John Smith | Chemistry |
| 2 | Mary Jones | History |
| 2 | Mary Jones | English |
Second Normal Form (2NF)
A table is in Second Normal Form if:
- It is already in 1NF
- All non-key attributes are fully functionally dependent on the entire primary key (not just part of it)
2NF is particularly important for tables with composite primary keys.
Before 2NF:
| StudentID | CourseID | CourseName | InstructorName | StudentGrade |
|---|---|---|---|---|
| 1 | C1 | Mathematics | Dr. Thompson | A |
| 1 | C2 | Physics | Dr. Wilson | B+ |
| 2 | C1 | Mathematics | Dr. Thompson | B |
In this table, the primary key is the combination of StudentID and CourseID. However, CourseName and InstructorName depend only on CourseID, not on the entire primary key.
After 2NF:
Courses Table:
| CourseID | CourseName | InstructorName |
|---|---|---|
| C1 | Mathematics | Dr. Thompson |
| C2 | Physics | Dr. Wilson |
Enrollments Table:
| StudentID | CourseID | StudentGrade |
|---|---|---|
| 1 | C1 | A |
| 1 | C2 | B+ |
| 2 | C1 | B |
Third Normal Form (3NF)
A table is in Third Normal Form if:
- It is already in 2NF
- All non-key attributes are non-transitively dependent on the primary key (no non-key attribute depends on another non-key attribute)
Before 3NF:
| StudentID | DepartmentID | DepartmentName | DepartmentPhone |
|---|---|---|---|
| 1 | D1 | Computer Science | 555-1234 |
| 2 | D2 | Mathematics | 555-5678 |
| 3 | D1 | Computer Science | 555-1234 |
In this table, DepartmentName and DepartmentPhone depend on DepartmentID, not directly on StudentID.
After 3NF:
Students Table:
| StudentID | DepartmentID |
|---|---|
| 1 | D1 |
| 2 | D2 |
| 3 | D1 |
Departments Table:
| DepartmentID | DepartmentName | DepartmentPhone |
|---|---|---|
| D1 | Computer Science | 555-1234 |
| D2 | Mathematics | 555-5678 |
Functional Dependencies
Normalisation is based on the concept of functional dependencies. A functional dependency exists when the value of one attribute (or set of attributes) determines the value of another attribute.
We write X → Y to indicate that “Y is functionally dependent on X” or “X determines Y.”
Examples:
- StudentID → StudentName (A student ID uniquely determines a student’s name)
- CourseID → CourseName (A course ID uniquely determines a course name)
- {StudentID, CourseID} → Grade (A student’s grade in a course is determined by both the student and the course)
Keys and Functional Dependencies
Different types of keys in database design are related to functional dependencies:
- Superkey: A set of attributes that functionally determines all attributes in the table
- Candidate Key: A minimal superkey (no subset of it is a superkey)
- Primary Key: The chosen candidate key
- Foreign Key: An attribute (or set of attributes) that references a primary key in another table
Normalisation vs. Denormalisation
While normalisation offers many benefits, there are situations where denormalisation (deliberately introducing redundancy) might be appropriate:
When to Normalise:
- To eliminate redundancy and inconsistencies
- When data integrity is crucial
- When the database has frequent updates
- For OLTP (Online Transaction Processing) systems
When to Denormalise:
- To improve query performance
- When joins are too expensive
- For read-heavy applications
- For OLAP (Online Analytical Processing) systems
- For data warehousing
Practical Tips for Normalisation
- Start with a complete data set: Understand all the data elements and their relationships
- Create an initial table design: Put all related data in a single table
- Apply 1NF: Eliminate repeating groups and ensure atomic values
- Apply 2NF: Identify partial dependencies and create separate tables
- Apply 3NF: Identify transitive dependencies and create separate tables
- Review the design: Ensure it meets all requirements and supports necessary queries
- Consider performance implications: Denormalise if necessary for specific performance needs
Common Normalisation Mistakes
- Over-normalisation: Breaking tables down too much, requiring excessive joins
- Under-normalisation: Not eliminating enough redundancy, leading to anomalies
- Ignoring business requirements: Focusing too much on theory without considering practical usage
- Neglecting performance considerations: Creating designs that are theoretically correct but perform poorly
- Failing to document dependencies: Not clearly defining the functional dependencies that drive the design
Normalisation is a powerful technique for creating efficient, consistent, and maintainable database designs. By following the normalisation process, you can eliminate redundancies and anomalies while ensuring that your database structure accurately reflects the relationships in your data.