What is a String?
A string (also called a word) is a finite sequence of symbols chosen from an alphabet. If an alphabet provides us with the building blocks, strings are what we build with them.
Formal Definition
A string w over an alphabet Σ is a finite sequence of symbols from Σ.
- w = a₁a₂a₃…aₙ where each aᵢ ∈ Σ
- The length of w is n, denoted as |w| = n
Simple Analogy
Think of it this way:
- Alphabet = Letters of the English language
- String = Words formed using those letters
- Just like “cat”, “dog”, “hello” are words formed from English letters, in Theory of Computation, strings are sequences formed from an alphabet.
Examples of Strings
Example 1: Binary Strings
Alphabet: Σ = {0, 1}
Valid Strings:
- w₁ = 0
- w₂ = 1
- w₃ = 01
- w₄ = 10
- w₅ = 101
- w₆ = 111000
- w₇ = 010101
Example 2: English Letter Strings
Alphabet: Σ = {a, b, c}
Valid Strings:
- w₁ = a
- w₂ = ab
- w₃ = abc
- w₄ = aabbcc
- w₅ = cabba
- w₆ = bcabca
Example 3: Decimal Strings
Alphabet: Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Valid Strings:
- w₁ = 123
- w₂ = 456789
- w₃ = 0000
- w₄ = 42
The Empty String (ε or λ)
The empty string is a special string that contains no symbols. It is denoted by:
- ε (epsilon) - most common notation
- λ (lambda) - alternative notation
Properties of Empty String
- Length: |ε| = 0
- Universal: Exists over any alphabet
- Unique: There is only one empty string
- Identity: For any string w, w·ε = ε·w = w (concatenation identity)
Example:
Σ = {0, 1}
ε = "" (no symbols)
|ε| = 0
Important: Empty string (ε) ≠ Empty set (∅)
- ε is a string with zero symbols
- ∅ is a set with zero elements
Length of a String
The length of a string is the number of symbols it contains, denoted by |w|.
Examples:
| String | Alphabet | Length |
|---|---|---|
| w = “010” | {0, 1} | |w| = 3 |
| w = “aaaa” | {a, b} | |w| = 4 |
| w = “abc” | {a, b, c} | |w| = 3 |
| w = ε | any | |w| = 0 |
| w = “1” | {0, 1} | |w| = 1 |
Important:
- Repeated symbols count: “aaa” has length 3, not 1
- Empty string: Always has length 0
- Length ≥ 0: Length cannot be negative
Operations on Strings
1. Concatenation
Concatenation is joining two strings together, denoted by · (dot) or simply placing them side by side.
Definition: If w = a₁a₂…aₘ and x = b₁b₂…bₙ, then
w·x = a₁a₂…aₘb₁b₂…bₙ
Examples:
w = "abc", x = "def"
w·x = "abcdef"
w = "01", x = "10"
w·x = "0110"
w = "hello", x = ε
w·x = "hello" (ε is identity)
Properties:
- Associative: (w·x)·y = w·(x·y)
- NOT Commutative: w·x ≠ x·w (usually)
- Identity: w·ε = ε·w = w
- Length: |w·x| = |w| + |x|
2. Reversal (wᴿ)
The reversal of a string w, denoted wᴿ, is the string written backwards.
Examples:
w = "abc" → wᴿ = "cba"
w = "0110" → wᴿ = "0110" (palindrome)
w = "hello" → wᴿ = "olleh"
w = "a" → wᴿ = "a"
ε ᴿ = ε
Properties:
- (wᴿ)ᴿ = w (reversing twice gives original)
- |wᴿ| = |w|
- (w·x)ᴿ = xᴿ·wᴿ
3. Power of a String (wⁿ)
The power of a string means concatenating it with itself n times.
Definition:
- w⁰ = ε (by convention)
- w¹ = w
- wⁿ = w·w·w·…·w (n times)
Examples:
w = "ab"
w⁰ = ε
w¹ = "ab"
w² = "abab"
w³ = "ababab"
w⁴ = "abababab"
w = "0"
w⁵ = "00000"
4. Substring
A string v is a substring of w if v appears consecutively within w.
Example:
w = "computer"
Substrings: "com", "put", "ter", "compu", "uter", "c", "e", ε, "computer"
Not substrings: "cpt", "mpu" (not consecutive)
5. Prefix and Suffix
Prefix: A substring that starts from the beginning Suffix: A substring that ends at the end
Example:
w = "testing"
Prefixes: ε, "t", "te", "tes", "test", "testi", "testin", "testing"
Suffixes: ε, "g", "ng", "ing", "ting", "sting", "esting", "testing"
Note: Every string is both a prefix and suffix of itself. ε is a prefix and suffix of every string.
Special String Sets
Σ* (Sigma Star)
Σ* is the set of all possible strings over alphabet Σ, including the empty string.
Example:
Σ = {0, 1}
Σ* = {ε, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111, ...}
Properties:
- Σ* is infinite (even if Σ is finite)
- ε ∈ Σ* (always includes empty string)
- Contains strings of all possible lengths
Σ⁺ (Sigma Plus)
Σ⁺ is the set of all non-empty strings over alphabet Σ.
Example:
Σ = {0, 1}
Σ⁺ = {0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111, ...}
Properties:
- Σ⁺ = Σ* - {ε}
- ε ∉ Σ⁺ (does NOT include empty string)
- Σ⁺ is also infinite
Relationship:
Σ* = Σ⁺ ∪ {ε}
Strings vs Symbols
| Aspect | Symbol | String |
|---|---|---|
| Definition | Single element from Σ | Sequence of symbols |
| Example | a | ”aaa” |
| Notation | a, b, c | w, x, y |
| Length | Always 1 | Can be 0, 1, 2, … |
| Empty | N/A | ε exists |
Important: A single symbol can also be viewed as a string of length 1.
Practical Applications
1. Programming
string = "hello" # String of length 5
alphabet = {'a', 'b', ..., 'z'} # Alphabet used
2. Pattern Matching
Text: "The quick brown fox"
Pattern: "quick" (substring to find)
3. DNA Analysis
Alphabet: {A, C, G, T}
DNA sequence: "ACGTACGTAACC" (string)
4. Binary Data
Alphabet: {0, 1}
Byte: "01100001" (string of length 8)
Important Exam Points
- ✓ A string is a finite sequence of symbols from an alphabet
- ✓ Length of a string is denoted by |w|
- ✓ Empty string (ε) has length 0
- ✓ ε is different from ∅ (empty set)
- ✓ Σ* includes ε, but Σ⁺ does not
- ✓ Concatenation: w·x simply places x after w
- ✓ |w·x| = |w| + |x|
- ✓ wⁿ means w repeated n times
Common Mistakes to Avoid
❌ Mistake 1: Confusing ε with 0 or space
✓ Correct: ε is completely empty, not “0” or ” ”
❌ Mistake 2: Thinking |“aaa”| = 1
✓ Correct: |“aaa”| = 3 (count all symbols)
❌ Mistake 3: Assuming w·x = x·w
✓ Correct: Concatenation is NOT commutative
❌ Mistake 4: Forgetting ε ∈ Σ*
✓ Correct: Σ* always includes the empty string
Practice Questions
Q1: Given w = “abc” and x = “de”, find:
- a) w·x = ?
- b) |w·x| = ?
- c) (w·x)ᴿ = ?
Answers:
- a) w·x = “abcde”
- b) |w·x| = 5
- c) (w·x)ᴿ = “edcba”
Q2: If w = “01”, find w³.
Answer: w³ = “010101”
Q3: Is “ab” a substring of “abcabc”?
Answer: Yes, it appears at positions 0 and 3.
Q4: What is the difference between Σ* and Σ⁺?
Answer: Σ* includes the empty string ε, while Σ⁺ does not.
Summary
- A string is a finite sequence of symbols from an alphabet
- Length |w| = number of symbols in w
- Empty string ε has length 0
- Σ* = all possible strings (including ε)
- Σ⁺ = all non-empty strings (excluding ε)
- Operations: concatenation, reversal, power, substring
- Strings form the basic data that automata and grammars process
Strings are the fundamental units of data in computation. Every input, output, program, and data file is ultimately a string!