Definition
Thrashing: OS spends more time paging (swapping pages to/from disk) than executing processes. System performance degrades severely. CPU idle while waiting for disk I/O.
Thrashing Scenario
Cascade Effect
Condition: Working set > RAM
Process runs, needs page not in memory
Page Fault! OS loads page (5ms)
For that, OS must evict another page
That evicted page immediately needed again
Another Page Fault! (5ms)
Cycle continues...
Result:
Time spent: 99% paging
Time spent: 1% computing
System paralyzed!
Performance Degradation
Normal operation:
- Paging: 1% of time
- Computing: 99% of time
- System responsive
Thrashing:
- Paging: 99% of time
- Computing: 1% of time
- System appears frozen
CPU Utilization:
Normal: 90% (doing useful work)
Thrashing: 5% (mostly waiting for disk)
Visible to User
Task Manager / Activity Monitor shows:
CPU: 50-90% (appears busy)
Disk: 100% (constant disk activity)
Application: Frozen (not responding)
Memory working set exceeds available RAM
Causes of Thrashing
1. Insufficient RAM
System: 4GB RAM
Running programs need: 8GB active working set
Result: Thrashing
Solution: Close programs or add RAM
2. Poor Page Replacement
Using FIFO (poor algorithm):
Instead of keeping working set
Evicts frequently-used pages
Causes page faults on next access
Thrashing!
Solution: Use LRU or better algorithm
3. Large Working Set
Program with large, random access pattern:
int data[1000000]; // 4MB
for (i in random_order) {
access(data[i])
}
Every access might page fault
Poor locality
Thrashing!
4. Many Processes Fighting for Memory
System running:
Process A: 512MB active
Process B: 512MB active
Process C: 512MB active
Process D: 512MB active
Total: 2GB needed
RAM: 512MB only
(Each process gets 128MB)
All thrashing!
Thrashing Detection
CPU Utilization Monitoring
If CPU low despite disk activity:
CPU: 10% (idle, waiting)
Disk: 100% (paging)
→ Thrashing detected!
Normal operation:
CPU: 80-90% (productive)
Disk: 5-10% (occasional I/O)
Page Fault Rate Monitoring
Count page faults per second:
Normal: 10-100 faults/sec
Acceptable: CPU still 70%+ utilized
High: 10,000+ faults/sec
Result: CPU < 20% utilized
→ Thrashing!
Prevention Strategies
Strategy 1: Working Set Model
Concept
Keep actively-used working set in RAM:
Working Set (τ): Pages accessed in last τ memory references
Example: τ = 10,000 references
If page accessed within last 10,000 references:
In working set → Keep in memory
If not accessed:
Not in working set → Can evict
Track working set for each process
Allocate RAM to fit all active working sets
Implementation
For each process:
1. Identify working set size (WS)
2. Determine required frames
3. Ensure: Sum of WS ≤ RAM available
If not possible:
Suspend a process (free its memory)
Resume when space available
Algorithm:
while (sum of working sets > RAM):
Suspend one process (lowest priority)
Free its frames
Remaining processes run without thrashing
Benefits
✓ No thrashing (all working sets fit)
✓ Only active processes run
✓ Good performance
Overhead:
- Must track working set size
- Need algorithm to estimate
- Periodically suspend/resume
Strategy 2: Page Replacement Algorithm
Use efficient algorithm (LRU):
FIFO: Keep frequently-used pages? Maybe not
LRU: Keep recently-used pages
Result: Working set tends to stay in memory
Natural resistance to thrashing
Strategy 3: Program Optimization
Improve locality:
Before (poor locality):
for (i = 0 to 10) {
for (j in random_order) {
process(array[j])
}
}
After (good locality):
for (i = 0 to 10) {
for (j = 0 to n) {
process(array[j]) // Sequential
}
}
Fewer page faults
No thrashing!
Strategy 4: Sufficient Memory
Add RAM:
Working set: 4GB
RAM: 2GB
→ Thrashing!
Add RAM: 8GB
Now: Working set < RAM
→ No thrashing!
Direct solution but expensive
Strategy 5: Rate Limiting
Prevent excessive paging:
If page fault rate > threshold:
Suspend process
Free its memory
Resume others
Prevents complete system freeze
Ensures other processes run
Real-World Thrashing Examples
Example 1: Large Compilation
Compiler with 1GB code
System RAM: 512MB
First run:
Compiler loaded, starts compiling
Accesses different parts of code
Page faults for each new section
Thrashing! Compiles slowly
Adding RAM:
2GB RAM installed
Now compiler fits better
Fewer faults
Compiles 10x faster!
Example 2: Virtualization
Virtual machine with 8GB allocated
Host system RAM: 4GB
VM accesses 4GB at once
VM memory pressures host
Both host and VM thrash
System unresponsive
Solution:
Allocate less to VM (2GB)
OR
Add host RAM
Example 3: Too Many Browser Tabs
User opens 50 browser tabs:
Each tab: 50-100MB memory
Total: 2.5-5GB
System RAM: 4GB
Browser thrashes
System unresponsive
Close tabs: Performance returns
Recovery from Thrashing
Short Term
Kill one or more processes:
Free significant RAM
Remaining processes no longer thrash
System responsive again
User impact: Lost process data (bad!)
Medium Term
Implement working set model
Suspend low-priority processes
Keep system runnable without thrashing
Long Term
Add more RAM (best solution)
Upgrade hardware
Problem solved
Measuring Thrashing
Formulas
Page Fault Rate: $$\text{Fault Rate} = \frac{\text{Page Faults}}{\text{Memory References}}$$
Normal: 0.001 (0.1%)
Thrashing: 0.1-0.5 (10-50%!)
Service Time Impact: $$\text{Effective Access Time} = \text{RAM access time} + \text{Fault Rate} \times \text{Fault Service Time}$$
Example:
RAM: 100ns
Fault service: 5ms
Fault rate: 1%
Effective: 100ns + 0.01×5,000,000ns = 50,100ns (501x slower!)
Fault rate: 50%
Effective: 100ns + 0.5×5,000,000ns = 2,500,050ns (25,000x slower!)
Avoiding Thrashing
As User
✓ Close unnecessary applications
✓ Close tabs/files not needed
✓ Monitor memory usage
✓ Restart when system slow
✓ Upgrade RAM if needed
As Developer
✓ Optimize for locality (sequential access)
✓ Use appropriate data structures
✓ Profile memory usage
✓ Avoid large random-access arrays
✓ Consider streaming for large data
As System Administrator
✓ Monitor page fault rates
✓ Implement admission control
✓ Size system appropriately
✓ Use working set model
✓ Choose good page replacement algorithm
Modern Perspective
How Thrashing Reduced
1. More RAM
- Past: 64MB RAM (thrashed easily)
- Now: 4-16GB RAM (rare thrashing)
2. SSDs
- Page fault slower but faster than HDD
- Reduces severity of thrashing
- Still want to avoid
3. Better Algorithms
- LRU approximation
- Compression instead of paging (Linux, iOS)
4. Swap Control
- Modern Linux: Can disable swap
- No swap = no thrashing (but OOM kills processes)
Summary
Thrashing occurs when working set exceeds RAM, causing excessive paging and system paralysis. CPU idle while disk busy. Detected by low CPU but high disk activity. Prevention: working set model (suspend low-priority), sufficient RAM, good page replacement, program optimization. Recovery: kill processes, add RAM. Modern systems rarely thrash due to larger RAM and compression. Understanding critical for system performance management.