Derivations, Parse Trees, and Ambiguity

What are Derivations?

Derivation is the process of generating a string from a grammar by repeatedly applying production rules.

Simple Analogy

Think of derivation like cooking from a recipe:

  • Recipe: Grammar rules
  • Ingredients: Variables and terminals
  • Cooking process: Apply rules step by step
  • Final dish: The string generated

Each step replaces a variable with the right side of a rule!

Types of Derivations

1. Leftmost Derivation

Leftmost Derivation: Always replace the leftmost variable in each step

Notation: S ⇒ₗₘ α

Example

Grammar:

S → AB
A → aA | ε
B → bB | ε

Derive: “aabb”

Leftmost derivation:

S ⇒ₗₘ AB          (apply S → AB)
  ⇒ₗₘ aAB         (expand A, leftmost variable)
  ⇒ₗₘ aaAB        (expand A again)
  ⇒ₗₘ aaB         (apply A → ε)
  ⇒ₗₘ aabB        (expand B)
  ⇒ₗₘ aabbB       (expand B)
  ⇒ₗₘ aabb        (apply B → ε)

Each step expands the leftmost variable!

2. Rightmost Derivation

Rightmost Derivation: Always replace the rightmost variable in each step

Notation: S ⇒ᵣₘ α

Also called: Canonical derivation

Example

Same grammar, derive: “aabb”

Rightmost derivation:

S ⇒ᵣₘ AB          (apply S → AB)
  ⇒ᵣₘ AbB         (expand B, rightmost variable)
  ⇒ᵣₘ AbbB        (expand B again)
  ⇒ᵣₘ Abb         (apply B → ε)
  ⇒ᵣₘ aAbb        (expand A)
  ⇒ᵣₘ aaAbb       (expand A)
  ⇒ᵣₘ aabb        (apply A → ε)

Each step expands the rightmost variable!

3. Arbitrary Derivation

Can expand any variable at any step

Not restricted to left or right

Example: Same grammar

S ⇒ AB
  ⇒ aAB         (expand A - left)
  ⇒ aAbB        (expand B - right)
  ⇒ aaAbB       (expand A - middle left)
  ⇒ aabB        (expand A - middle left)
  ⇒ aabbB       (expand B - right)
  ⇒ aabb        (expand B - right)

Key Observations

Important:

  • Different derivation orders can produce the same string
  • Leftmost and rightmost derivations are canonical (unique order)
  • Arbitrary derivations are flexible but not standard

Parse Trees (Derivation Trees)

Parse Tree is a graphical representation of a derivation.

Structure

  • Root: Start symbol S
  • Internal nodes: Variables (non-terminals)
  • Leaves: Terminals (or ε)
  • Edges: Applications of production rules

How to Construct

Given derivation: Build tree step by step

Example: S → AB, A → aA | ε, B → b

Derive: “aab”

Derivation:

S ⇒ AB ⇒ aAB ⇒ aaAB ⇒ aaB ⇒ aab

Parse Tree:

         S
        / \
       A   B
      /|   |
     a A   b
      /|
     a A
       |
       ε

Reading leaves left-to-right gives: a, a, ε, b → “aab” ✓

Properties of Parse Trees

1. Unique representation: One parse tree can have multiple derivations (leftmost, rightmost, arbitrary)

2. Yield: Reading leaves left-to-right gives the derived string

3. Structure: Shows hierarchical relationship of grammar rules

4. Ambiguity: Multiple parse trees for same string means ambiguous grammar

Example: Arithmetic Expression

Grammar:

E → E + E | E * E | (E) | id

Derive: id + id * id

Parse Tree 1 (+ at root):

       E
      /||\
     E + E
     |  /|\
    id E * E
       |   |
      id  id

Meaning: (id + id) * id ❌ Wrong!

Parse Tree 2 (* at root):

       E
      /|\
     E + E
     |  /|\
    id E * E
       |   |
      id  id

Actually, let me fix the tree structure…

Parse Tree 1 (+ at root, then *):

           E
          /|\
         E + E
         |  /|\
        id E * E
           |   |
          id  id

Meaning: id + (id * id) ✓ Correct!

Parse Tree 2 (* at root, then +):

           E
          /|\
         E * E
        /|\  |
       E + E id
       |   |
      id  id

Meaning: (id + id) * id ❌ Wrong precedence!

Problem: Grammar is ambiguous - multiple parse trees possible!

Ambiguity

Definition

A grammar G is ambiguous if there exists a string w ∈ L(G) that has:

  • Two or more distinct leftmost derivations, OR
  • Two or more distinct rightmost derivations, OR
  • Two or more distinct parse trees

All three definitions are equivalent!

Why Ambiguity is a Problem

In compilers:

  • Ambiguity leads to multiple meanings
  • Same expression could evaluate differently
  • Unpredictable behavior

Example: id + id * id

  • Interpretation 1: (id + id) _ id = (2 + 3) _ 4 = 20
  • Interpretation 2: id + (id _ id) = 2 + (3 _ 4) = 14

Different results! 🚨

Detecting Ambiguity

No algorithm exists to determine if arbitrary grammar is ambiguous!

Ambiguity is undecidable for CFGs

Must check manually:

  1. Find a string
  2. Try to construct multiple parse trees
  3. If successful → ambiguous

Examples of Ambiguous Grammars

Example 1: Arithmetic Expressions

Ambiguous Grammar:

E → E + E | E * E | id

String: id + id * id

Parse Tree 1: (id + id) _ id Parse Tree 2: id + (id _ id)

Ambiguous!

Example 2: If-Then-Else

Ambiguous Grammar:

S → if E then S
  | if E then S else S
  | other

String: if E1 then if E2 then S1 else S2

Parse Tree 1: else matches second if

if E1 then (if E2 then S1 else S2)

Parse Tree 2: else matches first if

if E1 then (if E2 then S1) else S2

Dangling else problem!

Example 3: String Concatenation

Ambiguous Grammar:

S → SS | a

String: aaa

Parse Tree 1:

    S
   / \
  S   S
  |  / \
  a S   S
    |   |
    a   a

Parse Tree 2:

    S
   / \
  S   S
 / \  |
 S  S  a
|  |
a  a

Ambiguous!

Removing Ambiguity

Goal: Rewrite grammar to be unambiguous

Note: Not all ambiguous grammars can be rewritten! Some languages are inherently ambiguous.

Strategy 1: Introduce Precedence

Problem: E → E + E | E * E | id (ambiguous)

Solution: Use multiple variables for precedence levels

Unambiguous Grammar:

E → E + T | T         (addition, lowest precedence)
T → T * F | F         (multiplication, higher precedence)
F → (E) | id          (parentheses, highest precedence)

Why it works:

  • Must build * first (deeper in tree)
  • Then + at higher level
  • Parentheses can override

Parse Tree for id + id * id:

       E
      /|\
     E + T
     |  /|\
     T T * F
     |  |   |
     F  F  id
     |  |
    id id

Meaning: id + (id * id) ✓ Correct!

Only one parse tree possible!

Strategy 2: Enforce Associativity

Left Associative (+ and - evaluate left-to-right):

E → E + T | T
T → T * F | F
F → (E) | id

Right Associative (e.g., exponentiation):

E → T ^ E | T
T → (E) | id

Example: 2 ^ 3 ^ 4

  • Left associative: (2 ^ 3) ^ 4 = 8 ^ 4 = 4096
  • Right associative: 2 ^ (3 ^ 4) = 2 ^ 81 = huge!

Exponentiation is naturally right associative

Strategy 3: If-Then-Else (Dangling Else)

Ambiguous:

S → if E then S
  | if E then S else S
  | other

Solution 1: Match else with nearest if

S → if E then S else S | U
U → if E then S | if E then U else U | other

Solution 2: Use explicit delimiters

S → if E then S fi
  | if E then S else S fi
  | other

Solution 3: Require begin-end blocks (like Pascal)

S → if E then begin S end
  | if E then begin S end else begin S end
  | other

Strategy 4: Make Operations Non-Associative

Problem: S → SS | a (ambiguous)

Solution: Force left or right association

Left-associative:

S → SA | A
A → a

Right-associative:

S → AS | A
A → a

Inherently Ambiguous Languages

Definition: A language is inherently ambiguous if every grammar for it is ambiguous.

Example: Classic Inherently Ambiguous Language

L = {aⁿbⁿcᵐdᵐ | n, m ≥ 1} ∪ {aⁿbᵐcᵐdⁿ | n, m ≥ 1}

Meaning:

  • Either n a’s match n b’s AND m c’s match m d’s
  • OR n a’s match n d’s AND m b’s match m c’s

When n = m: Ambiguous!

Example: a²b²c²d²

  • Could be: n=2, m=2 from first pattern (a²b² c²d²)
  • Could be: n=2, m=2 from second pattern (a² b²c² d²)

Every grammar for this language must be ambiguous!

Comparing Derivations and Parse Trees

One Parse Tree, Multiple Derivations

Key Insight: Same parse tree can correspond to different derivation orders

Example:

Parse Tree:

    S
   / \
  A   B
  |   |
  a   b

Leftmost derivation:

S ⇒ₗₘ AB ⇒ₗₘ aB ⇒ₗₘ ab

Rightmost derivation:

S ⇒ᵣₘ AB ⇒ᵣₘ Ab ⇒ᵣₘ ab

Same tree, different order!

Unambiguous Grammar Properties

For unambiguous grammar and string w:

  • Exactly one parse tree
  • Exactly one leftmost derivation
  • Exactly one rightmost derivation
  • All three correspond to each other

For ambiguous grammar and string w:

  • Multiple parse trees possible
  • Multiple leftmost derivations possible
  • Multiple rightmost derivations possible

Practical Examples

Example 1: Programming Language Statement

Grammar:

Stmt → if (Expr) Stmt
     | if (Expr) Stmt else Stmt
     | Assign
     | Block

Block → { StmtList }
StmtList → Stmt | Stmt StmtList
Assign → id = Expr ;

This is ambiguous! (dangling else)

Example 2: Expression Grammar (Unambiguous)

Grammar:

Expr → Expr + Term | Expr - Term | Term
Term → Term * Factor | Term / Factor | Factor
Factor → ( Expr ) | id | number

Features:

  • ✓ Precedence: * / before + -
  • ✓ Left associative: 1+2+3 = (1+2)+3
  • ✓ Unambiguous

Example 3: JSON Structure

Grammar:

Value → Object | Array | String | Number | true | false | null
Object → { } | { Members }
Members → Pair | Pair , Members
Pair → String : Value
Array → [ ] | [ Elements ]
Elements → Value | Value , Elements

Unambiguous!

Applications

1. Compiler Parsing

Parse tree is used for:

  • Syntax checking
  • Semantic analysis
  • Code generation

Must be unambiguous for correct compilation!

2. Natural Language Processing

Parse trees show sentence structure:

Sentence → NP VP
NP → Det Noun
VP → Verb NP

"The cat chased the mouse"

      Sentence
      /      \
     NP       VP
    / \      /  \
  Det Noun Verb  NP
   |   |    |   / \
  The cat chased Det Noun
                |    |
               the  mouse

3. Mathematical Expressions

Parse tree determines evaluation order:

2 + 3 * 4

     +
    / \
   2   *
      / \
     3   4

Evaluate: 3*4 = 12, then 2+12 = 14 ✓

4. XML/HTML Parsing

Nested structure requires parse trees:

<html>
    <body>
        <h1>Title</h1>
        <p>Text</p>
    </body>
</html>

Parse tree validates nesting!

Important Exam Points

  1. Leftmost derivation: Always expand leftmost variable
  2. Rightmost derivation: Always expand rightmost variable
  3. Parse tree: Graphical representation of derivation
  4. Yield: Leaves of parse tree (left-to-right) give derived string
  5. Ambiguous grammar: Multiple parse trees for same string
  6. Ambiguity undecidable: No algorithm to detect in general
  7. Remove ambiguity: Use precedence, associativity, restructuring
  8. Inherently ambiguous: Some languages have no unambiguous grammar
  9. Applications: Compilers, parsers, syntax analysis
  10. One tree, many derivations: Same parse tree, different orders

Common Mistakes to Avoid

Mistake 1: Thinking ambiguity is always fixable
Correct: Some languages are inherently ambiguous

Mistake 2: Confusing derivation order with parse tree
Correct: Same parse tree can have multiple derivation orders

Mistake 3: Not expanding leftmost/rightmost variable
Correct: Follow the rule strictly in leftmost/rightmost derivation

Mistake 4: Assuming all grammars are unambiguous
Correct: Many practical grammars are ambiguous (need fixing)

Mistake 5: Reading parse tree leaves randomly
Correct: Always read left-to-right for yield

Mistake 6: Thinking precedence and associativity are same
Correct: Precedence (which first), associativity (which direction)

Practice Questions

Q1: What’s the difference between leftmost and rightmost derivation?

Answer: Leftmost expands leftmost variable first, rightmost expands rightmost variable first.

Q2: Can one parse tree have multiple derivations?

Answer: Yes! Same parse tree can have leftmost, rightmost, and arbitrary derivations.

Q3: How to detect ambiguity?

Answer: Find a string with multiple parse trees (or multiple leftmost/rightmost derivations).

Q4: Is E → E + E | E * E | id ambiguous?

Answer: Yes! String “id + id * id” has multiple parse trees.

Q5: What’s the dangling else problem?

Answer: Ambiguity in “if E1 then if E2 then S1 else S2” - which if does else match?

Q6: Can all ambiguous grammars be rewritten?

Answer: No! Some languages are inherently ambiguous.

Q7: What’s the yield of a parse tree?

Answer: The string obtained by reading leaf nodes left-to-right.

Q8: Why is ambiguity bad for compilers?

Answer: Multiple parse trees = multiple meanings = unpredictable behavior.

Summary

  • Derivations: Process of generating strings using grammar rules
  • Leftmost/Rightmost: Order of variable expansion (canonical forms)
  • Parse trees: Graphical representation showing structure
  • Yield: Reading leaves gives derived string
  • Ambiguity: Multiple parse trees for same string
  • Problem: Undecidable in general, causes unpredictable behavior
  • Solutions: Introduce precedence, enforce associativity, restructure grammar
  • Inherent ambiguity: Some languages have no unambiguous grammar
  • Applications: Compilers (syntax analysis), NLP, expression evaluation
  • Key insight: Parse tree structure determines meaning

Understanding derivations, parse trees, and ambiguity is crucial for parser design and compiler construction!