Languages

What is a Language?

A language is a set of strings over an alphabet. While a string is a single sequence, a language is a collection of strings. Languages can be finite (containing a specific number of strings) or infinite (containing unlimited strings).

Formal Definition

A language L over an alphabet Σ is a subset of Σ*.

  • L ⊆ Σ*
  • L can be finite or infinite
  • L can even be empty

Simple Analogy

  • Alphabet Σ = Letters (a-z)
  • Strings = All possible sequences of letters
  • Language = Valid English words (a specific collection)

Just like “cat”, “dog”, “hello” are valid English words but “xyz”, “zzz” might not be, a formal language defines which strings are “valid” or “accepted”.

Examples of Languages

Example 1: Finite Languages

Language 1: L₁ = {0, 1, 01, 10}
Alphabet: Σ = {0, 1}
Size: |L₁| = 4 (contains exactly 4 strings)

Language 2: L₂ = {ε, a, aa, aaa}
Alphabet: Σ = {a, b}
Size: |L₂| = 4

Language 3: L₃ = {cat, dog, bird}
Alphabet: Σ = {a, b, c, …, z}
Size: |L₃| = 3

Example 2: Infinite Languages

Language 1: L₄ = {0, 00, 000, 0000, …}
Description: All strings of 0’s
Formal Notation: L₄ = {0ⁿ | n ≥ 1}

Language 2: L₅ = {ε, 01, 0011, 000111, …}
Description: Strings with equal number of 0’s followed by equal number of 1’s
Formal Notation: L₅ = {0ⁿ1ⁿ | n ≥ 0}

Language 3: L₆ = {all binary numbers divisible by 3}
Examples: {0, 11, 110, 1001, 1100, …}

Example 3: Special Languages

Empty Language: L₇ = ∅ = { }
Description: Contains no strings at all
Note: ∅ ≠ {ε}

Language with Empty String: L₈ = {ε}
Description: Contains only the empty string
Size: |L₈| = 1

Universal Language: L₉ = Σ*
Description: Contains all possible strings
Size: Infinite

Types of Languages

1. Finite Languages

Languages with a finite (countable) number of strings.

Examples:

L = {a, ab, abc}
L = {0, 1}
L = {hello, world}

Properties:

  • Can be listed completely
  • Has a definite size |L|
  • Easy to define explicitly

2. Infinite Languages

Languages with infinite number of strings.

Examples:

L = {aⁿ | n ≥ 0} = {ε, a, aa, aaa, aaaa, ...}
L = {0ⁿ1ⁿ | n ≥ 0} = {ε, 01, 0011, 000111, ...}
L = Σ* = all possible strings

Properties:

  • Cannot be listed completely
  • Must be defined using a rule or pattern
  • Most interesting languages in TOC are infinite

Ways to Define Languages

Method 1: Enumeration (Listing)

For finite languages, simply list all strings.

Example:

L = {0, 1, 00, 11, 000}

Limitation: Only works for finite languages

Method 2: Set Builder Notation

Define a rule that describes which strings belong to the language.

Example 1:

L = {w ∈ {0,1}* | w starts with 0}
Examples: 0, 01, 00, 011, 0001, ...

Example 2:

L = {w ∈ {a,b}* | |w| is even}
Examples: ε, aa, ab, ba, bb, aaa, ...

Example 3:

L = {aⁿbⁿ | n ≥ 0}
Examples: ε, ab, aabb, aaabbb, ...

Method 3: Using Automata or Grammars

Define machines (like DFA, NFA) or grammars (like CFG) that generate or accept the language.

Example: A DFA that accepts all strings ending in ‘01’

We’ll study this in detail later.

Language Operations

Languages can be combined using various operations:

1. Union (L₁ ∪ L₂)

All strings that are in L₁ OR in L₂.

Example:

L₁ = {0, 00}
L₂ = {1, 11}
L₁ ∪ L₂ = {0, 00, 1, 11}

2. Concatenation (L₁ · L₂)

All strings formed by concatenating a string from L₁ with a string from L₂.

Example:

L₁ = {a, b}
L₂ = {0, 1}
L₁ · L₂ = {a0, a1, b0, b1}

3. Kleene Star (L*)

All possible concatenations of strings from L, including ε.

Example:

L = {a}
L* = {ε, a, aa, aaa, aaaa, ...}

L = {0, 1}
L* = {ε, 0, 1, 00, 01, 10, 11, 000, ...} = Σ*

4. Complement (L̄)

All strings in Σ* that are NOT in L.

Example:

Σ = {0, 1}
L = {0, 1}
L̄ = Σ* - L = {ε, 00, 01, 10, 11, 000, ...}

5. Intersection (L₁ ∩ L₂)

All strings that are in BOTH L₁ AND L₂.

Example:

L₁ = {0, 01, 10}
L₂ = {01, 10, 11}
L₁ ∩ L₂ = {01, 10}

We’ll study these operations in detail in the next section.

Special Languages in TOC

1. Regular Languages

Languages that can be recognized by finite automata.

Example: L = {w | w ends with ‘01’}

2. Context-Free Languages

Languages that can be recognized by pushdown automata.

Example: L = {0ⁿ1ⁿ | n ≥ 0}

3. Recursively Enumerable Languages

Languages that can be recognized by Turing machines.

Example: All computable languages

We’ll study these classifications throughout the course.

Empty Language vs Language with Empty String

This is a very important distinction that often confuses students:

AspectEmpty Language ∅Language with ε
Notation∅ or { }{ε}
Size|∅| = 0|{ε}| = 1
Contains ε?NoYes
ExamplesNo stringsOne string (empty)

Key Point:

  • ∅ is a language with no strings
  • {ε} is a language with one string (which happens to be empty)

Example:

L₁ = ∅
ε ∈ L₁? NO
|L₁| = 0

L₂ = {ε}
ε ∈ L₂? YES
|L₂| = 1

Language Recognition

A language is recognized by a computational model if the model accepts exactly the strings in that language.

Example:

Language: L = {w | w starts with 0}
Recognition: Build a DFA that accepts any string starting with 0
Strings accepted: 0, 01, 00, 011, 0101, ...
Strings rejected: 1, 10, 11, 101, ...

Decidable vs Undecidable Languages

Decidable Language

A language for which we can build an algorithm that always terminates and correctly decides membership.

Example: L = {even numbers in binary}

Undecidable Language

A language for which no such algorithm exists.

Example: Halting problem (we’ll study this later)

Practical Applications

1. Programming Languages

Alphabet: {keywords, identifiers, operators, ...}
Language: All valid programs in Python/Java/C

2. Regular Expressions

Regex: [a-z]+@[a-z]+\.com
Language: Valid email addresses

3. Compilers

Language: All syntactically correct programs
Recognition: Compiler checks if source code is valid

4. Network Protocols

Language: Valid sequences of network packets
Recognition: Protocol validator

Important Exam Points

  1. ✓ A language is a set of strings over an alphabet
  2. ✓ L ⊆ Σ* (language is subset of all possible strings)
  3. ✓ Languages can be finite or infinite
  4. ✓ ∅ (empty language) ≠ {ε} (language with empty string)
  5. ✓ |∅| = 0, but |{ε}| = 1
  6. ✓ Σ* is the universal language (all possible strings)
  7. ✓ Infinite languages need a rule or pattern to define them
  8. ✓ Set builder notation: L = {w | condition on w}

Common Mistakes to Avoid

Mistake 1: Confusing ∅ with {ε}
Correct: ∅ has 0 strings, {ε} has 1 string

Mistake 2: Thinking all languages are finite
Correct: Most interesting languages are infinite

Mistake 3: Writing L = {0ⁿ1ⁿ} without specifying n
Correct: L = {0ⁿ1ⁿ | n ≥ 0}

Mistake 4: Confusing language with alphabet
Correct: Alphabet is Σ, Language is L ⊆ Σ*

Practice Questions

Q1: Define the language of all binary strings with exactly two 1’s.

Answer: L = {w ∈ {0,1}* | w contains exactly two 1’s}
Examples: 11, 011, 101, 110, 0011, 0101, 0110, 1001, …

Q2: Is ε in the language L = {0ⁿ | n ≥ 1}?

Answer: No, because n ≥ 1, so minimum string is “0”

Q3: What is |L| if L = ∅?

Answer: |L| = 0

Q4: Give an example of an infinite language over Σ = {a}.

Answer: L = {aⁿ | n ≥ 0} = {ε, a, aa, aaa, aaaa, …}

Q5: Define the language of all strings that start and end with the same symbol over Σ = {0, 1}.

Answer: L = {w ∈ {0,1}* | first symbol = last symbol}
Examples: ε, 0, 1, 00, 11, 000, 010, 101, 111, …

Summary

  • A language is a set of strings: L ⊆ Σ*
  • Languages can be finite or infinite
  • Empty language ∅ contains no strings
  • {ε} is a language containing one string (empty string)
  • Languages are defined by:
    • Enumeration (for finite)
    • Set builder notation (for infinite)
    • Automata/Grammars (computational models)
  • Languages can be combined using operations (union, concatenation, etc.)
  • Different classes: Regular, Context-Free, Recursive, etc.

Languages are the central objects we study in Theory of Computation. Every computational problem can be viewed as a language recognition problem!