Introduction to Normalisation

What is Normalisation?

Normalisation is a systematic approach to database design that reduces data redundancy and improves data integrity by organizing fields and tables in a relational database. Developed by E.F. Codd in the 1970s, normalisation breaks down larger tables into smaller, well-structured tables and defines relationships between them.

Purpose of Normalisation

Normalisation serves several important purposes:

Eliminate Data Redundancy: Reduce duplicate data to save storage space
Minimize Data Anomalies: Prevent update, insert, and delete anomalies
Improve Data Consistency: Ensure data is stored consistently across the database
Simplify Data Maintenance: Make it easier to update and maintain the database
Create Flexible Database Design: Allow for database expansion without major redesign

Data Anomalies That Normalisation Prevents

Without proper normalisation, databases can suffer from three types of anomalies:

1. Update Anomalies

When data is duplicated and only some instances are updated, the database becomes inconsistent.

Example: If customer addresses are stored in multiple tables and a customer moves, we might update some records but miss others, leading to inconsistent address information.

2. Insertion Anomalies

When you cannot add certain data because other unrelated data is missing.

Example: In a poorly designed university database, you might not be able to add a new course until a student enrolls in it, because the course information is only stored in a student enrollment table.

3. Deletion Anomalies

When deleting some data unintentionally causes other, unrelated data to be deleted.

Example: If you delete the last student enrolled in a particular course, you might inadvertently delete all information about that course if course details are only stored in the student enrollment records.

The Normalisation Process

Normalisation is typically performed in stages, with each stage representing a “normal form.” Each normal form has specific criteria that a database must meet. The most commonly used normal forms are:

First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)

Higher normal forms also exist (BCNF, 4NF, 5NF), but they are less commonly applied in practice.

First Normal Form (1NF)

A table is in First Normal Form if:

It has no repeating groups or arrays
All column values are atomic (indivisible)
Each row is unique (typically enforced by a primary key)

Before 1NF:

StudentID	Name	Courses
1	John Smith	Math, Physics, Chemistry
2	Mary Jones	History, English

After 1NF:

StudentID	Name	Course
1	John Smith	Math
1	John Smith	Physics
1	John Smith	Chemistry
2	Mary Jones	History
2	Mary Jones	English

Second Normal Form (2NF)

A table is in Second Normal Form if:

It is already in 1NF
All non-key attributes are fully functionally dependent on the entire primary key (not just part of it)

2NF is particularly important for tables with composite primary keys.

Before 2NF:

StudentID	CourseID	CourseName	InstructorName	StudentGrade
1	C1	Mathematics	Dr. Thompson	A
1	C2	Physics	Dr. Wilson	B+
2	C1	Mathematics	Dr. Thompson	B

In this table, the primary key is the combination of StudentID and CourseID. However, CourseName and InstructorName depend only on CourseID, not on the entire primary key.

After 2NF:

Courses Table:

CourseID	CourseName	InstructorName
C1	Mathematics	Dr. Thompson
C2	Physics	Dr. Wilson

Enrollments Table:

StudentID	CourseID	StudentGrade
1	C1	A
1	C2	B+
2	C1	B

Third Normal Form (3NF)

A table is in Third Normal Form if:

It is already in 2NF
All non-key attributes are non-transitively dependent on the primary key (no non-key attribute depends on another non-key attribute)

Before 3NF:

StudentID	DepartmentID	DepartmentName	DepartmentPhone
1	D1	Computer Science	555-1234
2	D2	Mathematics	555-5678
3	D1	Computer Science	555-1234

In this table, DepartmentName and DepartmentPhone depend on DepartmentID, not directly on StudentID.

After 3NF:

Students Table:

StudentID	DepartmentID
1	D1
2	D2
3	D1

Departments Table:

DepartmentID	DepartmentName	DepartmentPhone
D1	Computer Science	555-1234
D2	Mathematics	555-5678

Functional Dependencies

Normalisation is based on the concept of functional dependencies. A functional dependency exists when the value of one attribute (or set of attributes) determines the value of another attribute.

We write X → Y to indicate that “Y is functionally dependent on X” or “X determines Y.”

Examples:

StudentID → StudentName (A student ID uniquely determines a student’s name)
CourseID → CourseName (A course ID uniquely determines a course name)
{StudentID, CourseID} → Grade (A student’s grade in a course is determined by both the student and the course)

Keys and Functional Dependencies

Different types of keys in database design are related to functional dependencies:

Superkey: A set of attributes that functionally determines all attributes in the table
Candidate Key: A minimal superkey (no subset of it is a superkey)
Primary Key: The chosen candidate key
Foreign Key: An attribute (or set of attributes) that references a primary key in another table

Normalisation vs. Denormalisation

While normalisation offers many benefits, there are situations where denormalisation (deliberately introducing redundancy) might be appropriate:

When to Normalise:

To eliminate redundancy and inconsistencies
When data integrity is crucial
When the database has frequent updates
For OLTP (Online Transaction Processing) systems

When to Denormalise:

To improve query performance
When joins are too expensive
For read-heavy applications
For OLAP (Online Analytical Processing) systems
For data warehousing

Practical Tips for Normalisation

Start with a complete data set: Understand all the data elements and their relationships
Create an initial table design: Put all related data in a single table
Apply 1NF: Eliminate repeating groups and ensure atomic values
Apply 2NF: Identify partial dependencies and create separate tables
Apply 3NF: Identify transitive dependencies and create separate tables
Review the design: Ensure it meets all requirements and supports necessary queries
Consider performance implications: Denormalise if necessary for specific performance needs

Common Normalisation Mistakes

Over-normalisation: Breaking tables down too much, requiring excessive joins
Under-normalisation: Not eliminating enough redundancy, leading to anomalies
Ignoring business requirements: Focusing too much on theory without considering practical usage
Neglecting performance considerations: Creating designs that are theoretically correct but perform poorly
Failing to document dependencies: Not clearly defining the functional dependencies that drive the design

Normalisation is a powerful technique for creating efficient, consistent, and maintainable database designs. By following the normalisation process, you can eliminate redundancies and anomalies while ensuring that your database structure accurately reflects the relationships in your data.