Definition
Deadlock Recovery is the process of breaking a deadlock and restoring system to normal operation after deadlock is detected. Recovery strategies aim to minimize work loss while restoring system functionality. Combined with detection, creates detection-and-recovery approach.
Recovery Strategies
Strategy 1: Process Termination
Kill processes to break the cycle and free resources.
Approach 1a: Abort All Processes in Deadlock
Kill all processes involved in cycle:
Deadlock cycle: P1 → R1 → P2 → R2 → P1
Action: Kill P1 and P2 (both in cycle)
Result: Resources released, deadlock broken
Advantages:
- ✓ Simple (one action breaks all cycles)
- ✓ Guaranteed to work
- ✓ Fast recovery
Disadvantages:
- ❌ Lose ALL work done by both processes
- ❌ If processes have been running for hours, lose all progress
- ❌ User frustration (why kill my process?)
When Used:
- Last resort
- Processes haven’t done much work
- Critical to restore system
Approach 1b: Abort One Process at a Time
Iteratively kill processes until deadlock resolved:
Step 1: Detect cycle (P1, P2, P3, P4)
Step 2: Kill P1
Step 3: Check for deadlock - still exists (P2, P3, P4 in cycle)
Step 4: Kill P2
Step 5: Check for deadlock - still exists (P3, P4 in cycle)
Step 6: Kill P3
Step 7: Check for deadlock - gone!
Advantages:
- ✓ Minimize processes killed
- ✓ Keep processes that aren’t causing deadlock
Disadvantages:
- ❌ Expensive (check after each kill)
- ❌ Takes longer (multiple iterations)
When Used:
- Want to minimize work loss
- Detection overhead acceptable
Selection Criteria
If aborting, which process to abort?
Criteria (use one or combine):
-
Shortest Job Remaining Time:
- Kill process closest to completion
- Minimize wasted work
-
Least Resource Used:
- Kill process using fewest resources
- Quickly frees resources
-
Lowest Priority:
- Kill lowest priority process
- Preserve high-priority work
-
Fewest Dependencies:
- Kill process least dependent on others
- Minimize cascade effects
-
Most Likely to Restart:
- Kill process that can easily restart
- Batch jobs vs long-running analysis
Example:
Processes in deadlock cycle:
P1: CPU usage=5%, priority=LOW, dependencies=HIGH
P2: CPU usage=80%, priority=HIGH, dependencies=LOW
P3: CPU usage=20%, priority=MED, dependencies=MED
Best candidate: P1
- Low priority (important processes preserved)
- Low resource usage (little to free but OK)
- Will kill instead of P2 (high priority preserved)
Strategy 2: Resource Preemption
Take resources from processes and reallocate.
How It Works
Deadlock: P1 has R1, wants R2. P2 has R2, wants R1.
Preemption:
1. Take R2 from P2
2. Give R2 to P1
3. P1 completes, releases R2
4. P2 resumes
Result: Deadlock broken
Challenges
Problem 1: Which Process to Preempt From?
Same criteria as termination:
- Least resources invested
- Lowest priority
- Most likely to restart
Problem 2: Process State Inconsistency
When preempted mid-operation, process state uncertain:
Process doing database update:
1. Lock table
2. Read record
3. Modify record (halfway through)
← Preempted here!
4. Write modified record
5. Unlock table
If preempted at step 3:
- Table locked but modifications incomplete
- Process can't continue from step 3 (state corrupted)
- Must restart entire transaction
Solution: Checkpointing
Save process state at safe points:
Process:
1. Lock table
2. Read record
3. CHECKPOINT (save state)
4. Modify record
5. CHECKPOINT
6. Write modified record
7. Unlock table
If preempted at step 5:
- Restore to checkpoint after step 3
- Process restarts from step 4
- State consistent!
Challenges:
- ❌ Checkpointing overhead (save entire state)
- ❌ Complex to implement
- ❌ Process must support checkpointing
Selective Preemption
Only preempt specific resource, not entire process:
P1 holds (R1, R2, R3), wants R4
P2 holds R4
Preempt only R2 from P1:
- P1 keeps R1, R3
- R2 given to P2 (if helps)
- P1 can continue with R1, R3 (but not those using R2)
Complex: Need to know resource dependencies
Strategy 3: Rollback and Restart
Mechanism
- Detect deadlock
- Kill process (or checkpoint)
- Rollback to safe point (before deadlock)
- Restart process
In Database Systems
Transactions use this approach:
Transaction 1: BEGIN
Lock Account 1
Debit $100
[waiting for lock on Account 2]
Transaction 2: BEGIN
Lock Account 2
Debit $200
[waiting for lock on Account 1]
DEADLOCK detected!
Recovery:
Roll back Transaction 2 (to BEGIN)
Commit Transaction 1
Restart Transaction 2 (from BEGIN)
Advantages:
- ✓ State consistent (rollback to known good point)
- ✓ Minimal work loss (if checkpoints frequent)
- ✓ Process restarts automatically
Disadvantages:
- ❌ Requires checkpointing/logging
- ❌ Some work lost (between checkpoints)
- ❌ Complex to implement
Comparison of Strategies
| Strategy | Simplicity | Recovery Time | Work Loss | System Dependent |
|---|---|---|---|---|
| Abort All | High | Low | Very High | No |
| Abort One | Medium | High | High | Yes |
| Preemption | Low | Medium | Low | Depends |
| Rollback | Low | Medium | Medium | Yes |
Process Termination Details
Safe Termination
Graceful shutdown:
1. Signal process: "You're terminated"
2. Process catches signal
3. Process cleans up:
- Close files
- Release partial locks
- Save state if possible
4. Process exits
5. OS cleans remaining resources
Advantage: Process can clean up Time: Takes a moment (process must respond)
Forced Termination
Immediate kill:
1. Detect deadlock
2. Kill process immediately (no warning)
3. OS force cleans resources
4. Resources freed
Advantage: Immediate Disadvantage: No cleanup (may corrupt state)
Practical Approach
1. Send termination signal (SIGTERM)
2. Give process N seconds to exit gracefully
3. If not exited, send kill signal (SIGKILL)
4. Force termination
Recovery in Modern Systems
Database Systems
Transaction deadlock → Automatic rollback
Transaction restarts automatically
Application retries if needed
Users don't see deadlock (mostly transparent)
Operating Systems
Process deadlock → Rare (careful design prevents)
If occurs → Kill process, restart if critical
Users may lose unsaved work
Web Services
Request deadlock → Timeout and retry
Load balancer may route to different server
User sees retry, usually transparent
Handling Aborted Processes
Option 1: Manual Restart
User restarts process:
System kills P1 due to deadlock
User runs: "run_process P1"
Process restarts from beginning
Option 2: Automatic Restart
System restarts automatically:
System kills P1 due to deadlock
System queues P1 for restart
P1 automatically restarts
User doesn't notice (usually)
Used when: Important long-running processes
Option 3: Partial Restart
Restart with saved state:
System killed P1 after step 5
Saved state at step 5
Restart P1 from step 6 (not step 1)
Requires: Checkpointing support
Starvation After Recovery
Problem
If always abort same process:
Deadlock detected
Always kill Process X (lowest priority)
Process X never completes!
X starves (keeps getting killed)
Solution: Tracking
Track kills per process:
If Process X killed N times already → kill someone else next time
Ensures fair treatment
Prevents starvation
Cost Analysis
Cost of Each Approach
Abort All: High work loss, low recovery cost
Cost = Time lost in process × 2 (restart) + work loss
Abort One: Medium work loss, high recovery cost
Cost = Multiple detection + kills + work loss
Preemption: Low work loss, high recovery cost
Cost = Checkpointing + preemption overhead
Rollback: Medium work loss, medium cost
Cost = Checkpoint maintenance + rollback time
Summary
Recovery involves breaking deadlock after detection. Three main approaches: process termination (simple but loses work), resource preemption (complex but preserves work), and rollback (system-dependent but clean). Which to abort can use priority, resource usage, or restart likelihood. Modern systems prefer preventing deadlock (prevention/avoidance) over recovery due to cost. But recovery still needed for edge cases and as backup strategy. Understanding recovery essential for building resilient systems.