What is a Language?
A language is a set of strings over an alphabet. While a string is a single sequence, a language is a collection of strings. Languages can be finite (containing a specific number of strings) or infinite (containing unlimited strings).
Formal Definition
A language L over an alphabet Σ is a subset of Σ*.
- L ⊆ Σ*
- L can be finite or infinite
- L can even be empty
Simple Analogy
- Alphabet Σ = Letters (a-z)
- Strings = All possible sequences of letters
- Language = Valid English words (a specific collection)
Just like “cat”, “dog”, “hello” are valid English words but “xyz”, “zzz” might not be, a formal language defines which strings are “valid” or “accepted”.
Examples of Languages
Example 1: Finite Languages
Language 1: L₁ = {0, 1, 01, 10}
Alphabet: Σ = {0, 1}
Size: |L₁| = 4 (contains exactly 4 strings)
Language 2: L₂ = {ε, a, aa, aaa}
Alphabet: Σ = {a, b}
Size: |L₂| = 4
Language 3: L₃ = {cat, dog, bird}
Alphabet: Σ = {a, b, c, …, z}
Size: |L₃| = 3
Example 2: Infinite Languages
Language 1: L₄ = {0, 00, 000, 0000, …}
Description: All strings of 0’s
Formal Notation: L₄ = {0ⁿ | n ≥ 1}
Language 2: L₅ = {ε, 01, 0011, 000111, …}
Description: Strings with equal number of 0’s followed by equal number of 1’s
Formal Notation: L₅ = {0ⁿ1ⁿ | n ≥ 0}
Language 3: L₆ = {all binary numbers divisible by 3}
Examples: {0, 11, 110, 1001, 1100, …}
Example 3: Special Languages
Empty Language: L₇ = ∅ = { }
Description: Contains no strings at all
Note: ∅ ≠ {ε}
Language with Empty String: L₈ = {ε}
Description: Contains only the empty string
Size: |L₈| = 1
Universal Language: L₉ = Σ*
Description: Contains all possible strings
Size: Infinite
Types of Languages
1. Finite Languages
Languages with a finite (countable) number of strings.
Examples:
L = {a, ab, abc}
L = {0, 1}
L = {hello, world}
Properties:
- Can be listed completely
- Has a definite size |L|
- Easy to define explicitly
2. Infinite Languages
Languages with infinite number of strings.
Examples:
L = {aⁿ | n ≥ 0} = {ε, a, aa, aaa, aaaa, ...}
L = {0ⁿ1ⁿ | n ≥ 0} = {ε, 01, 0011, 000111, ...}
L = Σ* = all possible strings
Properties:
- Cannot be listed completely
- Must be defined using a rule or pattern
- Most interesting languages in TOC are infinite
Ways to Define Languages
Method 1: Enumeration (Listing)
For finite languages, simply list all strings.
Example:
L = {0, 1, 00, 11, 000}
Limitation: Only works for finite languages
Method 2: Set Builder Notation
Define a rule that describes which strings belong to the language.
Example 1:
L = {w ∈ {0,1}* | w starts with 0}
Examples: 0, 01, 00, 011, 0001, ...
Example 2:
L = {w ∈ {a,b}* | |w| is even}
Examples: ε, aa, ab, ba, bb, aaa, ...
Example 3:
L = {aⁿbⁿ | n ≥ 0}
Examples: ε, ab, aabb, aaabbb, ...
Method 3: Using Automata or Grammars
Define machines (like DFA, NFA) or grammars (like CFG) that generate or accept the language.
Example: A DFA that accepts all strings ending in ‘01’
We’ll study this in detail later.
Language Operations
Languages can be combined using various operations:
1. Union (L₁ ∪ L₂)
All strings that are in L₁ OR in L₂.
Example:
L₁ = {0, 00}
L₂ = {1, 11}
L₁ ∪ L₂ = {0, 00, 1, 11}
2. Concatenation (L₁ · L₂)
All strings formed by concatenating a string from L₁ with a string from L₂.
Example:
L₁ = {a, b}
L₂ = {0, 1}
L₁ · L₂ = {a0, a1, b0, b1}
3. Kleene Star (L*)
All possible concatenations of strings from L, including ε.
Example:
L = {a}
L* = {ε, a, aa, aaa, aaaa, ...}
L = {0, 1}
L* = {ε, 0, 1, 00, 01, 10, 11, 000, ...} = Σ*
4. Complement (L̄)
All strings in Σ* that are NOT in L.
Example:
Σ = {0, 1}
L = {0, 1}
L̄ = Σ* - L = {ε, 00, 01, 10, 11, 000, ...}
5. Intersection (L₁ ∩ L₂)
All strings that are in BOTH L₁ AND L₂.
Example:
L₁ = {0, 01, 10}
L₂ = {01, 10, 11}
L₁ ∩ L₂ = {01, 10}
We’ll study these operations in detail in the next section.
Special Languages in TOC
1. Regular Languages
Languages that can be recognized by finite automata.
Example: L = {w | w ends with ‘01’}
2. Context-Free Languages
Languages that can be recognized by pushdown automata.
Example: L = {0ⁿ1ⁿ | n ≥ 0}
3. Recursively Enumerable Languages
Languages that can be recognized by Turing machines.
Example: All computable languages
We’ll study these classifications throughout the course.
Empty Language vs Language with Empty String
This is a very important distinction that often confuses students:
| Aspect | Empty Language ∅ | Language with ε |
|---|---|---|
| Notation | ∅ or { } | {ε} |
| Size | |∅| = 0 | |{ε}| = 1 |
| Contains ε? | No | Yes |
| Examples | No strings | One string (empty) |
Key Point:
- ∅ is a language with no strings
- {ε} is a language with one string (which happens to be empty)
Example:
L₁ = ∅
ε ∈ L₁? NO
|L₁| = 0
L₂ = {ε}
ε ∈ L₂? YES
|L₂| = 1
Language Recognition
A language is recognized by a computational model if the model accepts exactly the strings in that language.
Example:
Language: L = {w | w starts with 0}
Recognition: Build a DFA that accepts any string starting with 0
Strings accepted: 0, 01, 00, 011, 0101, ...
Strings rejected: 1, 10, 11, 101, ...
Decidable vs Undecidable Languages
Decidable Language
A language for which we can build an algorithm that always terminates and correctly decides membership.
Example: L = {even numbers in binary}
Undecidable Language
A language for which no such algorithm exists.
Example: Halting problem (we’ll study this later)
Practical Applications
1. Programming Languages
Alphabet: {keywords, identifiers, operators, ...}
Language: All valid programs in Python/Java/C
2. Regular Expressions
Regex: [a-z]+@[a-z]+\.com
Language: Valid email addresses
3. Compilers
Language: All syntactically correct programs
Recognition: Compiler checks if source code is valid
4. Network Protocols
Language: Valid sequences of network packets
Recognition: Protocol validator
Important Exam Points
- ✓ A language is a set of strings over an alphabet
- ✓ L ⊆ Σ* (language is subset of all possible strings)
- ✓ Languages can be finite or infinite
- ✓ ∅ (empty language) ≠ {ε} (language with empty string)
- ✓ |∅| = 0, but |{ε}| = 1
- ✓ Σ* is the universal language (all possible strings)
- ✓ Infinite languages need a rule or pattern to define them
- ✓ Set builder notation: L = {w | condition on w}
Common Mistakes to Avoid
❌ Mistake 1: Confusing ∅ with {ε}
✓ Correct: ∅ has 0 strings, {ε} has 1 string
❌ Mistake 2: Thinking all languages are finite
✓ Correct: Most interesting languages are infinite
❌ Mistake 3: Writing L = {0ⁿ1ⁿ} without specifying n
✓ Correct: L = {0ⁿ1ⁿ | n ≥ 0}
❌ Mistake 4: Confusing language with alphabet
✓ Correct: Alphabet is Σ, Language is L ⊆ Σ*
Practice Questions
Q1: Define the language of all binary strings with exactly two 1’s.
Answer: L = {w ∈ {0,1}* | w contains exactly two 1’s}
Examples: 11, 011, 101, 110, 0011, 0101, 0110, 1001, …
Q2: Is ε in the language L = {0ⁿ | n ≥ 1}?
Answer: No, because n ≥ 1, so minimum string is “0”
Q3: What is |L| if L = ∅?
Answer: |L| = 0
Q4: Give an example of an infinite language over Σ = {a}.
Answer: L = {aⁿ | n ≥ 0} = {ε, a, aa, aaa, aaaa, …}
Q5: Define the language of all strings that start and end with the same symbol over Σ = {0, 1}.
Answer: L = {w ∈ {0,1}* | first symbol = last symbol}
Examples: ε, 0, 1, 00, 11, 000, 010, 101, 111, …
Summary
- A language is a set of strings: L ⊆ Σ*
- Languages can be finite or infinite
- Empty language ∅ contains no strings
- {ε} is a language containing one string (empty string)
- Languages are defined by:
- Enumeration (for finite)
- Set builder notation (for infinite)
- Automata/Grammars (computational models)
- Languages can be combined using operations (union, concatenation, etc.)
- Different classes: Regular, Context-Free, Recursive, etc.
Languages are the central objects we study in Theory of Computation. Every computational problem can be viewed as a language recognition problem!