Normal Forms

Introduction to Normal Forms

Normal forms are formal guidelines for database design that help create efficient, organized, and flexible database structures. The process of applying these guidelines is called normalization. Each normal form represents a level of organization with specific rules that build upon the previous form.

Normal forms help eliminate:

  • Data redundancy
  • Update anomalies
  • Insertion anomalies
  • Deletion anomalies

First Normal Form (1NF)

Definition

A relation is in First Normal Form (1NF) if:

  1. It contains only atomic (indivisible) values
  2. Each column contains values of the same domain
  3. Each column has a unique name
  4. The order of rows and columns doesn’t matter
  5. No duplicate rows exist

Key Characteristics

  • No repeating groups or arrays
  • No multi-valued attributes
  • Each cell contains a single value

Example of Violating 1NF

Not in 1NF:

StudentIDNameCourses
1John SmithMath, Physics, Chemistry
2Mary JonesBiology, English, History

The “Courses” column violates 1NF because it contains multiple values in a single cell.

How to Convert to 1NF

  1. Identify repeating groups
  2. Create separate rows for each value in the repeating group

In 1NF:

StudentIDNameCourse
1John SmithMath
1John SmithPhysics
1John SmithChemistry
2Mary JonesBiology
2Mary JonesEnglish
2Mary JonesHistory

Now each cell contains only one value.

Second Normal Form (2NF)

Definition

A relation is in Second Normal Form (2NF) if:

  1. It is already in 1NF
  2. All non-key attributes are fully functionally dependent on the entire primary key (not on just part of the key)

In other words, 2NF deals with removing partial dependencies in tables with composite primary keys.

Key Characteristics

  • No non-key attribute depends on only part of the primary key
  • Particularly important when dealing with composite primary keys

Example of Violating 2NF

In 1NF but not in 2NF:

StudentIDCourseIDCourseNameInstructorNameStudentGrade
1C1MathematicsDr. ThompsonA
1C2PhysicsDr. WilsonB+
2C1MathematicsDr. ThompsonB

The primary key is the combination of StudentID and CourseID. However:

  • CourseName depends only on CourseID (part of the key)
  • InstructorName depends only on CourseID (part of the key)
  • StudentGrade depends on both StudentID and CourseID (the entire key)

This violates 2NF because of partial dependencies.

How to Convert to 2NF

  1. Identify partial dependencies
  2. Create separate tables for attributes that depend on part of the key

In 2NF:

Courses Table:

CourseIDCourseNameInstructorName
C1MathematicsDr. Thompson
C2PhysicsDr. Wilson

Enrollments Table:

StudentIDCourseIDStudentGrade
1C1A
1C2B+
2C1B

Now all attributes in each table depend on the entire primary key of that table.

Third Normal Form (3NF)

Definition

A relation is in Third Normal Form (3NF) if:

  1. It is already in 2NF
  2. All non-key attributes are non-transitively dependent on the primary key (no attribute depends on another non-key attribute)

In other words, 3NF deals with removing transitive dependencies.

Key Characteristics

  • No non-key attribute depends on another non-key attribute
  • Each non-key attribute depends directly on the primary key
  • Often remembered as “depends on the key, the whole key, and nothing but the key”

Example of Violating 3NF

In 2NF but not in 3NF:

StudentIDDepartmentIDDepartmentNameDepartmentPhone
1D1Computer Science555-1234
2D2Mathematics555-5678
3D1Computer Science555-1234

The primary key is StudentID. However:

  • DepartmentName depends on DepartmentID (not on StudentID)
  • DepartmentPhone depends on DepartmentID (not on StudentID)

This creates transitive dependencies: StudentID → DepartmentID → DepartmentName/DepartmentPhone

How to Convert to 3NF

  1. Identify transitive dependencies
  2. Create separate tables for attributes that are transitively dependent

In 3NF:

Students Table:

StudentIDDepartmentID
1D1
2D2
3D1

Departments Table:

DepartmentIDDepartmentNameDepartmentPhone
D1Computer Science555-1234
D2Mathematics555-5678

Now all non-key attributes depend directly on the primary key of their respective tables.

Comparing the Normal Forms

Normal FormMain FocusEliminatesKey Requirement
1NFAtomic valuesRepeating groupsIdentify a key that makes each row unique
2NFFull functional dependencyPartial dependenciesAll non-key attributes depend on the whole key
3NFDirect dependency on keyTransitive dependenciesNon-key attributes depend on “nothing but the key”

Informal Summary of Normal Forms

  • 1NF: “Each attribute contains only atomic values and each row is unique.”
  • 2NF: “It’s in 1NF and all non-key attributes depend on the whole key.”
  • 3NF: “It’s in 2NF and all non-key attributes depend directly on the key.”

Practical Steps to Normalize a Database

Step 1: Convert to 1NF

  1. Identify repeating groups
  2. Create separate rows for each value in repeating groups
  3. Ensure all attributes are atomic
  4. Identify a primary key

Step 2: Convert to 2NF

  1. Identify the primary key (especially composite keys)
  2. Check if any attributes depend on only part of the key
  3. Move these partially dependent attributes to new tables
  4. Create appropriate foreign keys to maintain relationships

Step 3: Convert to 3NF

  1. Identify non-key attributes that determine other non-key attributes
  2. Move these transitively dependent attributes to new tables
  3. Create appropriate foreign keys to maintain relationships

Benefits of Normalization

  1. Minimized Redundancy: Data is stored in only one place
  2. Improved Data Integrity: Changes are made in one place, ensuring consistency
  3. Smaller Tables: Focused tables for specific entities
  4. More Flexible Structure: Easier to modify and extend
  5. Accurate Representation: Better represents real-world entities and relationships

When to Denormalize

Sometimes it makes sense to intentionally denormalize (introduce redundancy) for:

  1. Performance Optimization: When join operations are too costly
  2. Reporting Systems: For faster read-only access
  3. Data Warehousing: When analytical processing is the priority
  4. Read-Heavy Applications: When reads vastly outnumber writes
  5. Historical Data: When data doesn’t change frequently

Higher Normal Forms

Beyond 3NF, there are additional normal forms that address more specialized dependency issues:

  1. Boyce-Codd Normal Form (BCNF): A stronger version of 3NF
  2. Fourth Normal Form (4NF): Deals with multi-valued dependencies
  3. Fifth Normal Form (5NF): Addresses join dependencies

However, for most practical applications, achieving 3NF is sufficient.

Common Mistakes in Normalization

  1. Over-normalization: Creating too many tables, making queries unnecessarily complex
  2. Ignoring performance: Normalizing without considering query patterns
  3. Misidentifying dependencies: Incorrectly analyzing how attributes relate
  4. Normalizing at the wrong time: Normalizing before fully understanding requirements

Practical Example: Normalizing a University Database

Unnormalized Table

StudentIDStudentNameCourseIDCourseNameInstructorIDInstructorNameInstructorOfficeGrade
S1John SmithC1DatabaseI1Dr. WilsonRoom 101A
S1John SmithC2NetworksI2Dr. JohnsonRoom 102B+
S2Mary BrownC1DatabaseI1Dr. WilsonRoom 101A-

First Normal Form (1NF)

Already in 1NF as each cell contains only one value

Second Normal Form (2NF)

Primary Key: (StudentID, CourseID)

Students Table:

StudentIDStudentName
S1John Smith
S2Mary Brown

Courses Table:

CourseIDCourseNameInstructorIDInstructorNameInstructorOffice
C1DatabaseI1Dr. WilsonRoom 101
C2NetworksI2Dr. JohnsonRoom 102

Enrollments Table:

StudentIDCourseIDGrade
S1C1A
S1C2B+
S2C1A-

Third Normal Form (3NF)

Students Table:

StudentIDStudentName
S1John Smith
S2Mary Brown

Courses Table:

CourseIDCourseNameInstructorID
C1DatabaseI1
C2NetworksI2

Instructors Table:

InstructorIDInstructorNameInstructorOffice
I1Dr. WilsonRoom 101
I2Dr. JohnsonRoom 102

Enrollments Table:

StudentIDCourseIDGrade
S1C1A
S1C2B+
S2C1A-

This fully normalized design eliminates redundancy while preserving all the original data and relationships.