Thrashing

Definition

Thrashing: OS spends more time paging (swapping pages to/from disk) than executing processes. System performance degrades severely. CPU idle while waiting for disk I/O.

Thrashing Scenario

Cascade Effect

Condition: Working set > RAM

Process runs, needs page not in memory
Page Fault! OS loads page (5ms)
For that, OS must evict another page
That evicted page immediately needed again
Another Page Fault! (5ms)
Cycle continues...

Result:
Time spent: 99% paging
Time spent: 1% computing
System paralyzed!

Performance Degradation

Normal operation:
- Paging: 1% of time
- Computing: 99% of time
- System responsive

Thrashing:
- Paging: 99% of time
- Computing: 1% of time
- System appears frozen

CPU Utilization:
Normal: 90% (doing useful work)
Thrashing: 5% (mostly waiting for disk)

Visible to User

Task Manager / Activity Monitor shows:
CPU: 50-90% (appears busy)
Disk: 100% (constant disk activity)
Application: Frozen (not responding)

Memory working set exceeds available RAM

Causes of Thrashing

1. Insufficient RAM

System: 4GB RAM
Running programs need: 8GB active working set
Result: Thrashing

Solution: Close programs or add RAM

2. Poor Page Replacement

Using FIFO (poor algorithm):
Instead of keeping working set
Evicts frequently-used pages
Causes page faults on next access
Thrashing!

Solution: Use LRU or better algorithm

3. Large Working Set

Program with large, random access pattern:
int data[1000000];  // 4MB

for (i in random_order) {
    access(data[i])
}

Every access might page fault
Poor locality
Thrashing!

4. Many Processes Fighting for Memory

System running:
Process A: 512MB active
Process B: 512MB active
Process C: 512MB active
Process D: 512MB active
Total: 2GB needed

RAM: 512MB only
(Each process gets 128MB)

All thrashing!

Thrashing Detection

CPU Utilization Monitoring

If CPU low despite disk activity:
CPU: 10% (idle, waiting)
Disk: 100% (paging)
→ Thrashing detected!

Normal operation:
CPU: 80-90% (productive)
Disk: 5-10% (occasional I/O)

Page Fault Rate Monitoring

Count page faults per second:

Normal: 10-100 faults/sec
Acceptable: CPU still 70%+ utilized

High: 10,000+ faults/sec
Result: CPU < 20% utilized
→ Thrashing!

Prevention Strategies

Strategy 1: Working Set Model

Concept

Keep actively-used working set in RAM:

Working Set (τ): Pages accessed in last τ memory references

Example: τ = 10,000 references

If page accessed within last 10,000 references:
In working set → Keep in memory

If not accessed:
Not in working set → Can evict

Track working set for each process
Allocate RAM to fit all active working sets

Implementation

For each process:
1. Identify working set size (WS)
2. Determine required frames
3. Ensure: Sum of WS ≤ RAM available

If not possible:
Suspend a process (free its memory)
Resume when space available

Algorithm:
while (sum of working sets > RAM):
    Suspend one process (lowest priority)
    Free its frames
Remaining processes run without thrashing

Benefits

✓ No thrashing (all working sets fit)
✓ Only active processes run
✓ Good performance

Overhead:
- Must track working set size
- Need algorithm to estimate
- Periodically suspend/resume

Strategy 2: Page Replacement Algorithm

Use efficient algorithm (LRU):

FIFO: Keep frequently-used pages? Maybe not
LRU: Keep recently-used pages
Result: Working set tends to stay in memory
Natural resistance to thrashing

Strategy 3: Program Optimization

Improve locality:

Before (poor locality):
for (i = 0 to 10) {
    for (j in random_order) {
        process(array[j])
    }
}

After (good locality):
for (i = 0 to 10) {
    for (j = 0 to n) {
        process(array[j])  // Sequential
    }
}

Fewer page faults
No thrashing!

Strategy 4: Sufficient Memory

Add RAM:

Working set: 4GB
RAM: 2GB
→ Thrashing!

Add RAM: 8GB
Now: Working set < RAM
→ No thrashing!

Direct solution but expensive

Strategy 5: Rate Limiting

Prevent excessive paging:

If page fault rate > threshold:
Suspend process
Free its memory
Resume others

Prevents complete system freeze
Ensures other processes run

Real-World Thrashing Examples

Example 1: Large Compilation

Compiler with 1GB code
System RAM: 512MB

First run:
Compiler loaded, starts compiling
Accesses different parts of code
Page faults for each new section
Thrashing! Compiles slowly

Adding RAM:
2GB RAM installed
Now compiler fits better
Fewer faults
Compiles 10x faster!

Example 2: Virtualization

Virtual machine with 8GB allocated
Host system RAM: 4GB
VM accesses 4GB at once

VM memory pressures host
Both host and VM thrash
System unresponsive

Solution:
Allocate less to VM (2GB)
OR
Add host RAM

Example 3: Too Many Browser Tabs

User opens 50 browser tabs:
Each tab: 50-100MB memory
Total: 2.5-5GB

System RAM: 4GB

Browser thrashes
System unresponsive
Close tabs: Performance returns

Recovery from Thrashing

Short Term

Kill one or more processes:
Free significant RAM
Remaining processes no longer thrash
System responsive again

User impact: Lost process data (bad!)

Medium Term

Implement working set model
Suspend low-priority processes
Keep system runnable without thrashing

Long Term

Add more RAM (best solution)
Upgrade hardware
Problem solved

Measuring Thrashing

Formulas

Page Fault Rate: $$\text{Fault Rate} = \frac{\text{Page Faults}}{\text{Memory References}}$$

Normal: 0.001 (0.1%)
Thrashing: 0.1-0.5 (10-50%!)

Service Time Impact: $$\text{Effective Access Time} = \text{RAM access time} + \text{Fault Rate} \times \text{Fault Service Time}$$

Example:
RAM: 100ns
Fault service: 5ms
Fault rate: 1%
Effective: 100ns + 0.01×5,000,000ns = 50,100ns (501x slower!)

Fault rate: 50%
Effective: 100ns + 0.5×5,000,000ns = 2,500,050ns (25,000x slower!)

Avoiding Thrashing

As User

✓ Close unnecessary applications
✓ Close tabs/files not needed
✓ Monitor memory usage
✓ Restart when system slow
✓ Upgrade RAM if needed

As Developer

✓ Optimize for locality (sequential access)
✓ Use appropriate data structures
✓ Profile memory usage
✓ Avoid large random-access arrays
✓ Consider streaming for large data

As System Administrator

✓ Monitor page fault rates
✓ Implement admission control
✓ Size system appropriately
✓ Use working set model
✓ Choose good page replacement algorithm

Modern Perspective

How Thrashing Reduced

1. More RAM

  • Past: 64MB RAM (thrashed easily)
  • Now: 4-16GB RAM (rare thrashing)

2. SSDs

  • Page fault slower but faster than HDD
  • Reduces severity of thrashing
  • Still want to avoid

3. Better Algorithms

  • LRU approximation
  • Compression instead of paging (Linux, iOS)

4. Swap Control

  • Modern Linux: Can disable swap
  • No swap = no thrashing (but OOM kills processes)

Summary

Thrashing occurs when working set exceeds RAM, causing excessive paging and system paralysis. CPU idle while disk busy. Detected by low CPU but high disk activity. Prevention: working set model (suspend low-priority), sufficient RAM, good page replacement, program optimization. Recovery: kill processes, add RAM. Modern systems rarely thrash due to larger RAM and compression. Understanding critical for system performance management.