What are Context-Free Grammars?
Context-Free Grammar (CFG) is a formal system for describing the syntax of languages. CFGs are more powerful than regular expressions and can describe languages that regular languages cannot, such as balanced parentheses and nested structures.
Simple Analogy
Think of CFG like construction rules for sentences:
- Sentence → Subject + Verb + Object
- Subject → “The cat” | “A dog” | “John”
- Verb → “ate” | “chased” | “saw”
- Object → “food” | “the mouse” | “her”
Just like grammar rules build valid sentences, CFG rules build valid strings!
Why Context-Free?
“Context-Free” means:
- A variable can be replaced regardless of context
- Don’t need to check what’s around it
- Simpler than context-sensitive grammars
Example:
- A → aA (can always replace A with aA, no matter where A appears)
Formal Definition
A Context-Free Grammar is a 4-tuple G = (V, Σ, R, S) where:
- V: Finite set of variables (non-terminals)
- Σ: Finite set of terminals (alphabet)
- R: Finite set of production rules (V → (V ∪ Σ)*)
- S: Start symbol (S ∈ V)
Constraint: V ∩ Σ = ∅ (variables and terminals are disjoint)
Components Explained
1. Variables (Non-terminals)
Notation: Usually capital letters A, B, C, S, …
Purpose: Placeholders that can be expanded using rules
Example: S, A, B, Expr, Term, Factor
2. Terminals
Notation: Usually lowercase letters a, b, c, … or symbols
Purpose: Actual symbols in the final string
Example: a, b, 0, 1, +, *, (, )
3. Production Rules
Form: Variable → String of variables and/or terminals
Notation:
- A → α (A produces α)
- Multiple rules: A → α | β | γ
Example:
S → aSb
S → ε
4. Start Symbol
Purpose: Where derivation begins
Usually: S (by convention)
Example: All strings derived starting from S
Simple Examples
Example 1: {0ⁿ1ⁿ | n ≥ 0}
Language: Equal number of 0’s and 1’s in sequence
CFG:
S → 0S1 | ε
How it works:
- S → ε gives: ε
- S → 0S1 → 0ε1 gives: 01
- S → 0S1 → 00S11 → 00ε11 gives: 0011
- S → 0S1 → 00S11 → 000S111 → 000ε111 gives: 000111
Pattern: Each application adds one 0 to left, one 1 to right
Note: This language is NOT regular, but IS context-free!
Example 2: Balanced Parentheses
Language: Properly nested parentheses
CFG:
S → (S) | SS | ε
Derivations:
S → ε gives: ε
S → (S) → (ε) gives: ()
S → (S) → ((S)) → ((ε)) gives: (())
S → SS → (S)S → ()S → ()(S) → ()() gives: ()()
S → (S) → (SS) → ((S)S) → (()S) → (()()) gives: (()())
Examples: (), (()), ()(), (())(), (()()), …
Example 3: Palindromes over {a, b}
Language: Strings that read the same forwards and backwards
CFG:
S → aSa | bSb | a | b | ε
Derivations:
S → ε gives: ε
S → a gives: a
S → b gives: b
S → aSa → aεa gives: aa
S → aSa → abSba → abεba gives: abba
S → aSa → aaSaa → aabaa gives: aabaa
Example 4: {aⁿbᵐ | n ≠ m}
Language: Unequal number of a’s and b’s (in sequence)
CFG:
S → AB | BA
A → aA | a
B → bBb | b
Wait, that’s wrong! Let me fix:
Correct CFG:
S → A | B
A → aAb | aA | a (more a's than b's)
B → aBb | bB | b (more b's than a's)
Better approach:
S → AB | BA
A → aAb | aA | a (at least one more a)
B → aBb | bB | b (at least one more b)
Example 5: {0ⁿ1²ⁿ | n ≥ 0}
Language: Twice as many 1’s as 0’s
CFG:
S → 0S11 | ε
Derivations:
S → ε gives: ε
S → 0S11 → 0ε11 gives: 011
S → 0S11 → 00S1111 → 00ε1111 gives: 001111
Language Generated by CFG
Definition
L(G) = {w ∈ Σ* | S ⇒* w}
The language generated by grammar G is the set of all terminal strings derivable from the start symbol.
Derivation Notation
Single step: α ⇒ β (α derives β in one step)
Multiple steps: α ⇒* β (α derives β in zero or more steps)
One or more steps: α ⇒⁺ β (α derives β in one or more steps)
Sentential Forms
Definition: Any string derivable from start symbol (may contain variables)
Examples (for S → 0S1 | ε):
- S (sentential form)
- 0S1 (sentential form)
- 00S11 (sentential form)
- 0011 (sentential form AND sentence)
Sentence: Sentential form with only terminals (w ∈ Σ*)
Types of Production Rules
1. Recursive Rules
Self-referencing on right side
Left Recursion:
A → Aα
Example: A → Aa | b
- Generates: b, ba, baa, baaa, …
Right Recursion:
A → αA
Example: A → aA | b
- Generates: b, ab, aab, aaab, …
Both Sides:
A → αAβ
Example: A → aAb | ε
- Generates: ε, ab, aabb, aaabbb, …
2. Epsilon Productions
Form: A → ε
Purpose: Generate empty string or make variable optional
Example:
S → aSb | ε
3. Unit Productions
Form: A → B (single variable on right)
Purpose: Pass control from one variable to another
Example:
S → A
A → B
B → a
4. Useless Productions
Definition: Productions that never appear in valid derivations
Types:
- Unreachable from start symbol
- Cannot derive terminal string
Example:
S → aA
A → bB
B → cB (useless! B loops forever, never reaches terminal)
C → d (useless! C unreachable from S)
Derivations
Leftmost Derivation
Rule: Always expand leftmost variable first
Notation: ⇒ₗₘ
Example: S → aSb | ε, derive “aabb”
S ⇒ₗₘ aSb ⇒ₗₘ aaSbb ⇒ₗₘ aabb
└─expand S (leftmost variable)
└─expand S (leftmost variable)
└─apply ε to S
Rightmost Derivation
Rule: Always expand rightmost variable first
Notation: ⇒ᵣₘ
Example: Same grammar, derive “aabb”
S ⇒ᵣₘ aSb ⇒ᵣₘ aaSbb ⇒ᵣₘ aabb
└─expand S (rightmost variable)
└─expand S (rightmost variable)
└─apply ε to S
Note: Same result, different order!
Example: Arithmetic Expressions
Grammar:
E → E + T | T
T → T * F | F
F → (E) | id
Derive: id + id * id
Leftmost:
E ⇒ₗₘ E + T
⇒ₗₘ T + T
⇒ₗₘ F + T
⇒ₗₘ id + T
⇒ₗₘ id + T * F
⇒ₗₘ id + F * F
⇒ₗₘ id + id * F
⇒ₗₘ id + id * id
Rightmost:
E ⇒ᵣₘ E + T
⇒ᵣₘ E + T * F
⇒ᵣₘ E + T * id
⇒ᵣₘ E + F * id
⇒ᵣₘ E + id * id
⇒ᵣₘ T + id * id
⇒ᵣₘ F + id * id
⇒ᵣₘ id + id * id
Designing CFGs
Strategy 1: Balanced Structures
Pattern: Wrap terminals around recursive variable
Examples:
Equal a's and b's: S → aSb | ε
Balanced parens: S → (S) | SS | ε
Palindromes: S → aSa | bSb | a | b | ε
Strategy 2: Concatenation
Pattern: Use multiple variables for different parts
Example: {aⁿbᵐ | n, m ≥ 0}
S → AB
A → aA | ε
B → bB | ε
Strategy 3: Union
Pattern: Multiple alternatives from start symbol
Example: {aⁿ | n ≥ 0} ∪ {bⁿ | n ≥ 0}
S → A | B
A → aA | ε
B → bB | ε
Strategy 4: Nesting
Pattern: Recursive in the middle
Example: {aⁿbⁿcᵐdᵐ | n, m ≥ 0}
S → AB
A → aAb | ε
B → cBd | ε
Strategy 5: Conditions
Pattern: Different rules for different conditions
Example: {aⁿbᵐ | n ≠ m}
S → A | B
A → aAb | aA | a (more a's)
B → aBb | bB | b (more b's)
CFG vs Regular Languages
What CFG Can Do That Regular Can’t
1. Counting and Matching:
{0ⁿ1ⁿ}: CFG ✓, Regular ✗
S → 0S1 | ε
2. Nested Structures:
{(ⁿ)ⁿ}: CFG ✓, Regular ✗
S → (S) | ε
3. Palindromes:
{w | w = wᴿ}: CFG ✓, Regular ✗
S → aSa | bSb | a | b | ε
What Regular Can Do That CFG Also Can
Every regular language is context-free!
Proof: Convert regex/DFA to CFG
Example: (a|b)*
Regex: (a|b)*
CFG:
S → aS | bS | ε
Example: ab
Regex: a*b*
CFG:
S → AB
A → aA | ε
B → bB | ε
Conversion: Regular → CFG
From DFA:
- Variable for each state
- Rule qᵢ → a qⱼ for each transition δ(qᵢ, a) = qⱼ
- Rule qf → ε for each accept state qf
Example DFA: States {q₀, q₁}, δ(q₀, a) = q₁, F = {q₁}
CFG:
q₀ → aq₁
q₁ → ε
Applications of CFGs
1. Programming Language Syntax
Example: Java/C assignment statement
Statement → Variable = Expression ;
Variable → id
Expression → Expression + Term | Term
Term → Term * Factor | Factor
Factor → ( Expression ) | id | number
2. XML/HTML Structure
Example: Nested tags
Document → <tag> Content </tag>
Content → Text | Document | Document Content
3. Natural Language Processing
Example: Simple English sentences
Sentence → NP VP
NP → Det Noun | Noun
VP → Verb NP | Verb
Det → the | a
Noun → cat | dog | mouse
Verb → chased | ate | saw
4. Mathematical Expressions
Example: Arithmetic with precedence
Expr → Expr + Term | Term
Term → Term * Factor | Factor
Factor → ( Expr ) | number
Handles precedence: * before +
5. Data Formats
Example: JSON-like structure
Object → { Members }
Members → Pair | Pair , Members
Pair → String : Value
Value → String | number | Object
Ambiguity (Preview)
A grammar is ambiguous if a string has multiple parse trees
Example: E → E + E | E * E | id
Derive: id + id * id
Parse Tree 1: (id + id) _ id Parse Tree 2: id + (id _ id)
Two different meanings!
Solution: Rewrite grammar to be unambiguous (next topic!)
Important Properties
Closure Properties
CFLs are closed under:
- ✓ Union
- ✓ Concatenation
- ✓ Kleene Star
- ✗ Intersection (NOT closed!)
- ✗ Complement (NOT closed!)
Decision Problems
Decidable:
- ✓ Is w ∈ L(G)? (membership)
- ✓ Is L(G) = ∅? (emptiness)
Undecidable:
- ✗ Is L(G₁) = L(G₂)? (equivalence)
- ✗ Is L(G) = Σ*? (universality)
- ✗ Is G ambiguous? (ambiguity)
Important Exam Points
- ✓ CFG Definition: G = (V, Σ, R, S)
- ✓ Components: Variables, terminals, rules, start symbol
- ✓ Rules: A → α (A ∈ V, α ∈ (V ∪ Σ)*)
- ✓ Language: L(G) = {w ∈ Σ* | S ⇒* w}
- ✓ Derivation: S ⇒ α₁ ⇒ α₂ ⇒ … ⇒ w
- ✓ Leftmost/Rightmost: Order of variable expansion
- ✓ More powerful: CFG ⊃ Regular languages
- ✓ Key examples: {0ⁿ1ⁿ}, balanced parentheses, palindromes
- ✓ Applications: Programming languages, parsers, compilers
- ✓ Limitations: Not closed under intersection/complement
Common Mistakes to Avoid
✗ Mistake 1: Confusing variables and terminals
✓ Correct: Variables (capitals) can be expanded, terminals (lowercase) are final
✗ Mistake 2: Thinking all CFLs are ambiguous
✓ Correct: Many CFLs have unambiguous grammars
✗ Mistake 3: Forgetting ε productions
✓ Correct: S → ε allows empty string in language
✗ Mistake 4: Assuming CFG = Regular
✓ Correct: CFG ⊃ Regular (strictly more powerful)
✗ Mistake 5: Multiple start symbols
✓ Correct: Exactly one start symbol S
✗ Mistake 6: Wrong notation for rules
✓ Correct: A → α (not A := α or A = α)
Practice Questions
Q1: Write CFG for {aⁿbⁿ | n ≥ 1}
Answer: S → aSb | ab
Q2: What language does S → SS | (S) | ε generate?
Answer: Balanced parentheses
Q3: Is every regular language context-free?
Answer: Yes! Regular ⊂ Context-Free
Q4: What’s the difference between S ⇒ α and S ⇒* α?
Answer: ⇒ is one step, ⇒* is zero or more steps
Q5: Can CFG generate {aⁿbⁿcⁿ}?
Answer: No, this requires context-sensitive grammar
Q6: What’s a sentential form?
Answer: Any string derivable from S (may contain variables)
Q7: Write CFG for palindromes over {0, 1}
Answer: S → 0S0 | 1S1 | 0 | 1 | ε
Q8: Why is CFG called “context-free”?
Answer: Variable can be replaced regardless of surrounding context
Summary
- CFG: G = (V, Σ, R, S) - formal system for describing languages
- Components: Variables, terminals, production rules, start symbol
- Language: L(G) = all terminal strings derivable from S
- More powerful: Can express {0ⁿ1ⁿ}, balanced parentheses, palindromes
- Derivations: Leftmost (expand left variable first) vs rightmost
- Design strategies: Balanced structures, concatenation, union, nesting
- Applications: Programming language syntax, parsers, natural language, data formats
- Properties: Closed under union/concatenation/star, NOT under intersection/complement
- Key advantage: Can count and match unbounded nested structures
- Next level: Parse trees, ambiguity, normal forms
Context-Free Grammars are the foundation of syntax analysis in compilers and formal language theory!