Deadlock Recovery

Definition

Deadlock Recovery is the process of breaking a deadlock and restoring system to normal operation after deadlock is detected. Recovery strategies aim to minimize work loss while restoring system functionality. Combined with detection, creates detection-and-recovery approach.

Recovery Strategies

Strategy 1: Process Termination

Kill processes to break the cycle and free resources.

Approach 1a: Abort All Processes in Deadlock

Kill all processes involved in cycle:

Deadlock cycle: P1 → R1 → P2 → R2 → P1

Action: Kill P1 and P2 (both in cycle)
Result: Resources released, deadlock broken

Advantages:

  • ✓ Simple (one action breaks all cycles)
  • ✓ Guaranteed to work
  • ✓ Fast recovery

Disadvantages:

  • ❌ Lose ALL work done by both processes
  • ❌ If processes have been running for hours, lose all progress
  • ❌ User frustration (why kill my process?)

When Used:

  • Last resort
  • Processes haven’t done much work
  • Critical to restore system

Approach 1b: Abort One Process at a Time

Iteratively kill processes until deadlock resolved:

Step 1: Detect cycle (P1, P2, P3, P4)
Step 2: Kill P1
Step 3: Check for deadlock - still exists (P2, P3, P4 in cycle)
Step 4: Kill P2
Step 5: Check for deadlock - still exists (P3, P4 in cycle)
Step 6: Kill P3
Step 7: Check for deadlock - gone!

Advantages:

  • ✓ Minimize processes killed
  • ✓ Keep processes that aren’t causing deadlock

Disadvantages:

  • ❌ Expensive (check after each kill)
  • ❌ Takes longer (multiple iterations)

When Used:

  • Want to minimize work loss
  • Detection overhead acceptable

Selection Criteria

If aborting, which process to abort?

Criteria (use one or combine):

  1. Shortest Job Remaining Time:

    • Kill process closest to completion
    • Minimize wasted work
  2. Least Resource Used:

    • Kill process using fewest resources
    • Quickly frees resources
  3. Lowest Priority:

    • Kill lowest priority process
    • Preserve high-priority work
  4. Fewest Dependencies:

    • Kill process least dependent on others
    • Minimize cascade effects
  5. Most Likely to Restart:

    • Kill process that can easily restart
    • Batch jobs vs long-running analysis

Example:

Processes in deadlock cycle:
P1: CPU usage=5%, priority=LOW, dependencies=HIGH
P2: CPU usage=80%, priority=HIGH, dependencies=LOW
P3: CPU usage=20%, priority=MED, dependencies=MED

Best candidate: P1
- Low priority (important processes preserved)
- Low resource usage (little to free but OK)
- Will kill instead of P2 (high priority preserved)

Strategy 2: Resource Preemption

Take resources from processes and reallocate.

How It Works

Deadlock: P1 has R1, wants R2. P2 has R2, wants R1.

Preemption:
1. Take R2 from P2
2. Give R2 to P1
3. P1 completes, releases R2
4. P2 resumes

Result: Deadlock broken

Challenges

Problem 1: Which Process to Preempt From?

Same criteria as termination:

  • Least resources invested
  • Lowest priority
  • Most likely to restart

Problem 2: Process State Inconsistency

When preempted mid-operation, process state uncertain:

Process doing database update:
1. Lock table
2. Read record
3. Modify record (halfway through)
   ← Preempted here!
4. Write modified record
5. Unlock table

If preempted at step 3:
- Table locked but modifications incomplete
- Process can't continue from step 3 (state corrupted)
- Must restart entire transaction

Solution: Checkpointing

Save process state at safe points:

Process:
1. Lock table
2. Read record
3. CHECKPOINT (save state)
4. Modify record
5. CHECKPOINT
6. Write modified record
7. Unlock table

If preempted at step 5:
- Restore to checkpoint after step 3
- Process restarts from step 4
- State consistent!

Challenges:

  • ❌ Checkpointing overhead (save entire state)
  • ❌ Complex to implement
  • ❌ Process must support checkpointing

Selective Preemption

Only preempt specific resource, not entire process:

P1 holds (R1, R2, R3), wants R4
P2 holds R4

Preempt only R2 from P1:
- P1 keeps R1, R3
- R2 given to P2 (if helps)
- P1 can continue with R1, R3 (but not those using R2)

Complex: Need to know resource dependencies

Strategy 3: Rollback and Restart

Mechanism

  1. Detect deadlock
  2. Kill process (or checkpoint)
  3. Rollback to safe point (before deadlock)
  4. Restart process

In Database Systems

Transactions use this approach:

Transaction 1: BEGIN
  Lock Account 1
  Debit $100
  [waiting for lock on Account 2]

Transaction 2: BEGIN
  Lock Account 2
  Debit $200
  [waiting for lock on Account 1]

DEADLOCK detected!

Recovery:
  Roll back Transaction 2 (to BEGIN)
  Commit Transaction 1
  Restart Transaction 2 (from BEGIN)

Advantages:

  • ✓ State consistent (rollback to known good point)
  • ✓ Minimal work loss (if checkpoints frequent)
  • ✓ Process restarts automatically

Disadvantages:

  • ❌ Requires checkpointing/logging
  • ❌ Some work lost (between checkpoints)
  • ❌ Complex to implement

Comparison of Strategies

StrategySimplicityRecovery TimeWork LossSystem Dependent
Abort AllHighLowVery HighNo
Abort OneMediumHighHighYes
PreemptionLowMediumLowDepends
RollbackLowMediumMediumYes

Process Termination Details

Safe Termination

Graceful shutdown:

1. Signal process: "You're terminated"
2. Process catches signal
3. Process cleans up:
   - Close files
   - Release partial locks
   - Save state if possible
4. Process exits
5. OS cleans remaining resources

Advantage: Process can clean up Time: Takes a moment (process must respond)

Forced Termination

Immediate kill:

1. Detect deadlock
2. Kill process immediately (no warning)
3. OS force cleans resources
4. Resources freed

Advantage: Immediate Disadvantage: No cleanup (may corrupt state)

Practical Approach

1. Send termination signal (SIGTERM)
2. Give process N seconds to exit gracefully
3. If not exited, send kill signal (SIGKILL)
4. Force termination

Recovery in Modern Systems

Database Systems

Transaction deadlock → Automatic rollback
Transaction restarts automatically
Application retries if needed
Users don't see deadlock (mostly transparent)

Operating Systems

Process deadlock → Rare (careful design prevents)
If occurs → Kill process, restart if critical
Users may lose unsaved work

Web Services

Request deadlock → Timeout and retry
Load balancer may route to different server
User sees retry, usually transparent

Handling Aborted Processes

Option 1: Manual Restart

User restarts process:

System kills P1 due to deadlock
User runs: "run_process P1"
Process restarts from beginning

Option 2: Automatic Restart

System restarts automatically:

System kills P1 due to deadlock
System queues P1 for restart
P1 automatically restarts
User doesn't notice (usually)

Used when: Important long-running processes

Option 3: Partial Restart

Restart with saved state:

System killed P1 after step 5
Saved state at step 5
Restart P1 from step 6 (not step 1)

Requires: Checkpointing support

Starvation After Recovery

Problem

If always abort same process:

Deadlock detected
Always kill Process X (lowest priority)
Process X never completes!
X starves (keeps getting killed)

Solution: Tracking

Track kills per process:
If Process X killed N times already → kill someone else next time
Ensures fair treatment
Prevents starvation

Cost Analysis

Cost of Each Approach

Abort All: High work loss, low recovery cost

Cost = Time lost in process × 2 (restart) + work loss

Abort One: Medium work loss, high recovery cost

Cost = Multiple detection + kills + work loss

Preemption: Low work loss, high recovery cost

Cost = Checkpointing + preemption overhead

Rollback: Medium work loss, medium cost

Cost = Checkpoint maintenance + rollback time

Summary

Recovery involves breaking deadlock after detection. Three main approaches: process termination (simple but loses work), resource preemption (complex but preserves work), and rollback (system-dependent but clean). Which to abort can use priority, resource usage, or restart likelihood. Modern systems prefer preventing deadlock (prevention/avoidance) over recovery due to cost. But recovery still needed for edge cases and as backup strategy. Understanding recovery essential for building resilient systems.