Simplification of Context-Free Grammars

Why Simplify CFGs?

Context-Free Grammars can be messy! They may contain:

Useless symbols: Variables/terminals that never appear in derivations
ε-productions: Rules like A → ε (complicates parsing)
Unit productions: Rules like A → B (adds unnecessary steps)
Redundant rules: Multiple ways to derive same strings

Goal: Transform CFG into simpler equivalent form while preserving the language

Simple Analogy

Think of simplifying CFG like cleaning code:

Remove unused variables (dead code)
Eliminate unnecessary indirection (unit productions)
Simplify empty cases (ε-productions)
Keep functionality same (same language)

Result: Cleaner, more efficient grammar!

Types of Problematic Productions

1. Useless Symbols

Definition: Symbols that don’t contribute to generating terminal strings

Two types:

Type 1: Non-Terminating Symbols

Definition: Variables that cannot derive any terminal string

Example:

S → AB
A → a
B → bB    (B never reaches terminal!)

Problem: B loops forever, never produces terminal string

Type 2: Unreachable Symbols

Definition: Symbols that cannot be reached from start symbol

Example:

S → AB
A → a
B → b
C → c     (C unreachable from S!)

Problem: C is isolated, never used in any derivation from S

2. ε-Productions (Nullable Productions)

Definition: Rules of form A → ε

Example:

S → AB
A → aA | ε
B → bB | ε

Problem: Complicates parsing, makes grammar analysis harder

Note: If ε ∈ L(G), we need S → ε where S is start symbol

3. Unit Productions

Definition: Rules of form A → B (single variable on right)

Example:

S → A
A → B
B → a

Problem: Unnecessary indirection, adds derivation steps

Can simplify to: S → a, A → a, B → a

Simplification Algorithm Overview

Order matters! Follow this sequence:

1. Remove useless symbols
   a. First: Remove non-terminating symbols
   b. Then: Remove unreachable symbols

2. Remove ε-productions (except S → ε if needed)

3. Remove unit productions

Why this order?

Removing useless symbols first reduces grammar size
Must remove non-terminating before unreachable
ε-elimination before unit production removal

Step 1: Removing Useless Symbols

Step 1a: Remove Non-Terminating Symbols

Goal: Keep only variables that can derive terminal strings

Algorithm:

1. Mark all terminals as "terminating"
2. Repeat until no new variables marked:
   - If A → α and all symbols in α are terminating
   - Mark A as terminating
3. Remove all non-terminating variables and their rules

Example 1

Original Grammar:

S → AB | a
A → b
B → bB

Step 1: Mark terminals: a, b are terminating

Step 2: Check rules

S → a: right side has only terminal a → S is terminating ✓
A → b: right side has only terminal b → A is terminating ✓
B → bB: B appears on right, not yet marked → skip

Step 3: Check again

B → bB: b is terminating, but B is not → skip

Step 4: No new variables marked → done

Terminating: {S, A} Non-terminating: {B}

Simplified Grammar:

S → a
A → b

(Rules with B removed)

Example 2

Original Grammar:

S → AB | C
A → a
B → BC | b
C → c | D
D → dD

Analysis:

Terminals: a, b, c, d (terminating)
A → a: A is terminating ✓
B → b: B is terminating ✓
C → c: C is terminating ✓
D → dD: D loops, never terminating ✗
S → AB: A and B terminating → S is terminating ✓

Terminating: {S, A, B, C} Non-terminating: {D}

Simplified Grammar:

S → AB | C
A → a
B → BC | b
C → c

(D and rules with D removed)

Step 1b: Remove Unreachable Symbols

Goal: Keep only symbols reachable from start symbol

Algorithm:

1. Mark S (start symbol) as reachable
2. Repeat until no new symbols marked:
   - For each rule A → α where A is reachable
   - Mark all symbols in α as reachable
3. Remove all unreachable symbols and their rules

Example 1

Grammar (after removing non-terminating):

S → AB
A → a
B → b
C → c
D → d

Step 1: S is reachable

Step 2: Check S → AB

A is reachable ✓
B is reachable ✓

Step 3: Check A → a

a is reachable ✓

Step 4: Check B → b

b is reachable ✓

Reachable: {S, A, B, a, b} Unreachable: {C, D, c, d}

Simplified Grammar:

S → AB
A → a
B → b

Example 2

Grammar:

S → AB | C
A → a
B → b | C
C → c
D → d

Analysis:

S reachable (start)
From S → AB | C: A, B, C reachable
From A → a: a reachable
From B → b | C: b reachable (C already marked)
From C → c: c reachable
D never reached ✗

Reachable: {S, A, B, C, a, b, c} Unreachable: {D, d}

Simplified Grammar:

S → AB | C
A → a
B → b | C
C → c

Complete Example: Removing Useless Symbols

Original Grammar:

S → AB | a
A → b
B → C
C → D
D → E
E → a
F → f

Step 1a: Remove non-terminating

S → a: S terminating ✓
A → b: A terminating ✓
E → a: E terminating ✓
D → E: E terminating → D terminating ✓
C → D: D terminating → C terminating ✓
B → C: C terminating → B terminating ✓
F → f: F terminating ✓

All variables terminate!

Step 1b: Remove unreachable

S reachable (start)
From S → AB | a: A, B, a reachable
From A → b: b reachable
From B → C: C reachable
From C → D: D reachable
From D → E: E reachable
F never reached ✗

Final Grammar:

S → AB | a
A → b
B → C
C → D
D → E
E → a

(F removed)

Step 2: Removing ε-Productions

Goal: Eliminate A → ε rules (except S → ε if ε ∈ L(G))

Nullable Variables

Definition: Variable A is nullable if A ⇒* ε

Finding nullable variables:

1. If A → ε, then A is nullable
2. If A → B₁B₂...Bₙ and all Bᵢ are nullable, then A is nullable
3. Repeat until no new nullable variables found

Algorithm to Remove ε-Productions

1. Find all nullable variables
2. For each rule A → X₁X₂...Xₙ:
   - Add new rules with all combinations of nullable variables present/absent
   - But don't add A → ε (unless A is start and ε ∈ L(G))
3. Remove all A → ε rules

Example 1: Simple ε-Removal

Original Grammar:

S → AB
A → aA | ε
B → bB | ε

Step 1: Find nullable variables

A → ε: A nullable ✓
B → ε: B nullable ✓
S → AB: Both nullable → S nullable ✓

Nullable: {S, A, B}

Step 2: Generate new rules

For S → AB:

Both present: AB (original)
A absent: B
B absent: A
Both absent: ε (only if S is start and ε ∈ L)

New rules for S: S → AB | A | B | ε

For A → aA:

A present: aA (original)
A absent: a

New rules for A: A → aA | a

For B → bB:

B present: bB (original)
B absent: b

New rules for B: B → bB | b

Step 3: Remove ε-productions

Final Grammar:

S → AB | A | B | ε
A → aA | a
B → bB | b

Note: Kept S → ε because S is start and ε was in original language

Example 2: More Complex

Original Grammar:

S → ABC
A → aA | ε
B → bB | ε
C → c

Step 1: Nullable variables

A → ε: A nullable ✓
B → ε: B nullable ✓
C → c: not nullable ✗
S → ABC: A, B nullable, but C not → S not nullable ✗

Nullable: {A, B}

Step 2: Generate new rules

For S → ABC:

All present: ABC
A absent: BC
B absent: AC
A and B absent: C

New rules for S: S → ABC | BC | AC | C

For A → aA:

A present: aA
A absent: a

New rules for A: A → aA | a

For B → bB:

B present: bB
B absent: b

New rules for B: B → bB | b

Step 3: Remove ε

Final Grammar:

S → ABC | BC | AC | C
A → aA | a
B → bB | b
C → c

Special Case: Start Symbol

If ε ∈ L(G), need special handling:

1. Remove all ε-productions as above
2. If S was nullable, add new rule: S₀ → S | ε
3. S₀ becomes new start symbol

Example:

Original: S → A, A → a | ε
ε ∈ L(G) because S ⇒ A ⇒ ε

After removal:
S₀ → S | ε    (new start)
S → A
A → a

Step 3: Removing Unit Productions

Goal: Eliminate rules of form A → B

Algorithm

1. For each variable A, find all variables B such that A ⇒* B using only unit productions
2. For each such B, if B → α (non-unit production), add A → α
3. Remove all unit productions

Example 1: Simple Unit Removal

Original Grammar:

S → A
A → B
B → a | b

Step 1: Find unit production chains

S ⇒* A ⇒* B (using S → A, A → B)

Step 2: Add direct rules

B → a, B → b (non-unit)
Since S ⇒* B: add S → a | b
Since A ⇒* B: add A → a | b

Step 3: Remove unit productions

Final Grammar:

S → a | b
A → a | b
B → a | b

Example 2: More Complex

Original Grammar:

S → A | a
A → B | b
B → C | c
C → S | d

Step 1: Find unit production closure

S ⇒* A ⇒* B ⇒* C ⇒* S (cycle!)
So S ⇒* {S, A, B, C}
A ⇒* {A, B, C, S}
B ⇒* {B, C, S, A}
C ⇒* {C, S, A, B}

Step 2: For each variable, add non-unit productions from closure

For S (reaches {S, A, B, C}):

S → a (from S)
S → b (from A)
S → c (from B)
S → d (from C)

For A (reaches {A, B, C, S}):

A → b (from A)
A → c (from B)
A → d (from C)
A → a (from S)

Similarly for B and C

Final Grammar:

S → a | b | c | d
A → a | b | c | d
B → a | b | c | d
C → a | b | c | d

Example 3: Realistic Grammar

Original (after ε-removal):

E → E + T | T
T → T * F | F
F → (E) | id

Step 1: Unit production chains

E ⇒* T ⇒* F
T ⇒* F

Step 2: Add non-unit productions

For E (reaches T, F):

E → E + T (already there)
From T: E → T * F
From F: E → (E) | id

For T (reaches F):

T → T * F (already there)
From F: T → (E) | id

Final Grammar:

E → E + T | T * F | (E) | id
T → T * F | (E) | id
F → (E) | id

Complete Simplification Example

Original Grammar:

S → AB | ε
A → aA | ε
B → Bb | C
C → c
D → d

Step 1: Remove Useless Symbols

Step 1a: Non-terminating

All variables can reach terminals ✓

Step 1b: Unreachable

D is unreachable ✗

After Step 1:

S → AB | ε
A → aA | ε
B → Bb | C
C → c

Step 2: Remove ε-Productions

Nullable variables: {S, A}

New productions:

S → AB | B | ε (A can be absent)
A → aA | a (A can be absent)
B → Bb | C (unchanged)
C → c (unchanged)

After Step 2:

S → AB | B | ε
A → aA | a
B → Bb | C
C → c

Step 3: Remove Unit Productions

Unit productions: B → C

Closure:

B ⇒* C
Add B → c (from C)

After Step 3:

S → AB | B | ε
A → aA | a
B → Bb | c
C → c

Wait, S → B is also unit production!

Closure: S ⇒* B

Add S → Bb | c (from B)

Final Simplified Grammar:

S → AB | Bb | c | ε
A → aA | a
B → Bb | c
C → c

Actually, C is now unreachable! Remove it:

Final Final Grammar:

S → AB | Bb | c | ε
A → aA | a
B → Bb | c

Why Simplification Matters

1. Efficiency

Smaller grammar = faster parsing

Fewer rules to check
Fewer derivation steps
Less memory usage

2. Clarity

Cleaner grammar = easier to understand

No dead code (useless symbols)
No unnecessary indirection (unit productions)
Clearer structure

3. Normal Forms

Simplified grammar is prerequisite for:

Chomsky Normal Form
Greibach Normal Form
Other canonical forms

4. Parsing Algorithms

Many parsing algorithms require:

No ε-productions
No unit productions
No useless symbols

Examples: CYK algorithm, LL parsing, LR parsing

Important Properties

Language Preservation

Theorem: Simplification preserves the language

L(G_original) = L(G_simplified)

Except: May need to handle ε specially

Canonical Form

Simplified grammar is not unique!

Different simplification orders can give different (but equivalent) grammars

Complexity

Simplification is efficient:

Useless symbol removal: O(n²)
ε-production removal: O(2ⁿ) worst case, but usually practical
Unit production removal: O(n²)

Where n = grammar size

Common Mistakes to Avoid

✗ Mistake 1: Wrong order (unreachable before non-terminating)
✓ Correct: Non-terminating first, then unreachable

✗ Mistake 2: Removing S → ε when ε ∈ L(G)
✓ Correct: Keep S → ε if ε is in the language

✗ Mistake 3: Missing combinations in ε-removal
✓ Correct: Generate ALL combinations of nullable variables

✗ Mistake 4: Not finding all unit production chains
✓ Correct: Use closure to find all A ⇒* B relationships

✗ Mistake 5: Forgetting to remove newly unreachable symbols
✓ Correct: After each step, check for new useless symbols

✗ Mistake 6: Applying steps in wrong order
✓ Correct: Useless → ε-productions → Unit productions

Important Exam Points

✓ Three types of problems: Useless symbols, ε-productions, unit productions
✓ Order matters: Non-terminating → Unreachable → ε → Unit
✓ Useless symbols: Both non-terminating and unreachable
✓ Nullable variables: Variables that derive ε
✓ ε-removal: Generate all combinations, keep S → ε if needed
✓ Unit productions: A → B (single variable)
✓ Unit removal: Find closure, add direct rules, remove units
✓ Language preserved: L(G) unchanged (except ε handling)
✓ Not unique: Different simplifications possible
✓ Prerequisite: Needed for normal forms

Practice Questions

Q1: What’s the difference between non-terminating and unreachable?

Answer: Non-terminating can’t derive terminals; unreachable can’t be reached from S.

Q2: Why remove non-terminating before unreachable?

Answer: Removing non-terminating may make new symbols unreachable.

Q3: Can we remove S → ε?

Answer: Only if ε ∉ L(G). If ε ∈ L(G), must keep it or use S₀ → S | ε.

Q4: What’s a nullable variable?

Answer: Variable A where A ⇒* ε.

Q5: In S → AB with A, B nullable, what new rules?

Answer: S → AB | A | B (and S → ε if S is start and ε ∈ L).

Q6: Why remove unit productions?

Answer: Unnecessary indirection, adds derivation steps.

Q7: How to find unit production closure?

Answer: Find all B such that A ⇒* B using only unit productions.

Q8: Does simplification change the language?

Answer: No, L(G_original) = L(G_simplified).

Summary

Goal: Simplify CFG while preserving language
Three steps (in order):
1. Remove useless symbols (non-terminating, then unreachable)
2. Remove ε-productions (except S → ε if needed)
3. Remove unit productions
Useless symbols: Can’t reach terminals or can’t be reached from S
ε-productions: A → ε (find nullable, generate combinations)
Unit productions: A → B (find closure, add direct rules)
Language preserved: L(G) unchanged
Benefits: Cleaner, faster parsing, prerequisite for normal forms
Applications: Parser optimization, normal form conversion
Key insight: Systematic cleanup makes grammar more manageable

Simplification is essential preparation for converting CFGs to normal forms!

Simplification of Context-Free Grammars

Why Simplify CFGs?

Simple Analogy

Types of Problematic Productions

1. Useless Symbols

Type 1: Non-Terminating Symbols

Type 2: Unreachable Symbols

2. ε-Productions (Nullable Productions)

3. Unit Productions

Simplification Algorithm Overview

Step 1: Removing Useless Symbols

Step 1a: Remove Non-Terminating Symbols

Example 1

Example 2

Step 1b: Remove Unreachable Symbols

Example 1

Example 2

Complete Example: Removing Useless Symbols

Step 2: Removing ε-Productions

Nullable Variables

Algorithm to Remove ε-Productions

Example 1: Simple ε-Removal

Example 2: More Complex

Special Case: Start Symbol

Step 3: Removing Unit Productions

Algorithm

Example 1: Simple Unit Removal

Example 2: More Complex

Example 3: Realistic Grammar

Complete Simplification Example

Step 1: Remove Useless Symbols

Step 2: Remove ε-Productions

Step 3: Remove Unit Productions

Why Simplification Matters

1. Efficiency

2. Clarity

3. Normal Forms

4. Parsing Algorithms

Important Properties

Language Preservation

Canonical Form

Complexity

Common Mistakes to Avoid

Important Exam Points

Practice Questions

Summary

Install Notes App