Deadlock Recovery

Definition

Deadlock Recovery is the process of breaking a deadlock and restoring system to normal operation after deadlock is detected. Recovery strategies aim to minimize work loss while restoring system functionality. Combined with detection, creates detection-and-recovery approach.

Recovery Strategies

Strategy 1: Process Termination

Kill processes to break the cycle and free resources.

Approach 1a: Abort All Processes in Deadlock

Kill all processes involved in cycle:

Deadlock cycle: P1 → R1 → P2 → R2 → P1

Action: Kill P1 and P2 (both in cycle)
Result: Resources released, deadlock broken

Advantages:

✓ Simple (one action breaks all cycles)
✓ Guaranteed to work
✓ Fast recovery

Disadvantages:

❌ Lose ALL work done by both processes
❌ If processes have been running for hours, lose all progress
❌ User frustration (why kill my process?)

When Used:

Last resort
Processes haven’t done much work
Critical to restore system

Approach 1b: Abort One Process at a Time

Iteratively kill processes until deadlock resolved:

Step 1: Detect cycle (P1, P2, P3, P4)
Step 2: Kill P1
Step 3: Check for deadlock - still exists (P2, P3, P4 in cycle)
Step 4: Kill P2
Step 5: Check for deadlock - still exists (P3, P4 in cycle)
Step 6: Kill P3
Step 7: Check for deadlock - gone!

Advantages:

✓ Minimize processes killed
✓ Keep processes that aren’t causing deadlock

Disadvantages:

❌ Expensive (check after each kill)
❌ Takes longer (multiple iterations)

When Used:

Want to minimize work loss
Detection overhead acceptable

Selection Criteria

If aborting, which process to abort?

Criteria (use one or combine):

Shortest Job Remaining Time:
- Kill process closest to completion
- Minimize wasted work
Least Resource Used:
- Kill process using fewest resources
- Quickly frees resources
Lowest Priority:
- Kill lowest priority process
- Preserve high-priority work
Fewest Dependencies:
- Kill process least dependent on others
- Minimize cascade effects
Most Likely to Restart:
- Kill process that can easily restart
- Batch jobs vs long-running analysis

Example:

Processes in deadlock cycle:
P1: CPU usage=5%, priority=LOW, dependencies=HIGH
P2: CPU usage=80%, priority=HIGH, dependencies=LOW
P3: CPU usage=20%, priority=MED, dependencies=MED

Best candidate: P1
- Low priority (important processes preserved)
- Low resource usage (little to free but OK)
- Will kill instead of P2 (high priority preserved)

Strategy 2: Resource Preemption

Take resources from processes and reallocate.

How It Works

Deadlock: P1 has R1, wants R2. P2 has R2, wants R1.

Preemption:
1. Take R2 from P2
2. Give R2 to P1
3. P1 completes, releases R2
4. P2 resumes

Result: Deadlock broken

Challenges

Problem 1: Which Process to Preempt From?

Same criteria as termination:

Least resources invested
Lowest priority
Most likely to restart

Problem 2: Process State Inconsistency

When preempted mid-operation, process state uncertain:

Process doing database update:
1. Lock table
2. Read record
3. Modify record (halfway through)
   ← Preempted here!
4. Write modified record
5. Unlock table

If preempted at step 3:
- Table locked but modifications incomplete
- Process can't continue from step 3 (state corrupted)
- Must restart entire transaction

Solution: Checkpointing

Save process state at safe points:

Process:
1. Lock table
2. Read record
3. CHECKPOINT (save state)
4. Modify record
5. CHECKPOINT
6. Write modified record
7. Unlock table

If preempted at step 5:
- Restore to checkpoint after step 3
- Process restarts from step 4
- State consistent!

Challenges:

❌ Checkpointing overhead (save entire state)
❌ Complex to implement
❌ Process must support checkpointing

Selective Preemption

Only preempt specific resource, not entire process:

P1 holds (R1, R2, R3), wants R4
P2 holds R4

Preempt only R2 from P1:
- P1 keeps R1, R3
- R2 given to P2 (if helps)
- P1 can continue with R1, R3 (but not those using R2)

Complex: Need to know resource dependencies

Strategy 3: Rollback and Restart

Mechanism

Detect deadlock
Kill process (or checkpoint)
Rollback to safe point (before deadlock)
Restart process

In Database Systems

Transactions use this approach:

Transaction 1: BEGIN
  Lock Account 1
  Debit $100
  [waiting for lock on Account 2]

Transaction 2: BEGIN
  Lock Account 2
  Debit $200
  [waiting for lock on Account 1]

DEADLOCK detected!

Recovery:
  Roll back Transaction 2 (to BEGIN)
  Commit Transaction 1
  Restart Transaction 2 (from BEGIN)

Advantages:

✓ State consistent (rollback to known good point)
✓ Minimal work loss (if checkpoints frequent)
✓ Process restarts automatically

Disadvantages:

❌ Requires checkpointing/logging
❌ Some work lost (between checkpoints)
❌ Complex to implement

Comparison of Strategies

Strategy	Simplicity	Recovery Time	Work Loss	System Dependent
Abort All	High	Low	Very High	No
Abort One	Medium	High	High	Yes
Preemption	Low	Medium	Low	Depends
Rollback	Low	Medium	Medium	Yes

Process Termination Details

Safe Termination

Graceful shutdown:

1. Signal process: "You're terminated"
2. Process catches signal
3. Process cleans up:
   - Close files
   - Release partial locks
   - Save state if possible
4. Process exits
5. OS cleans remaining resources

Advantage: Process can clean up Time: Takes a moment (process must respond)

Forced Termination

Immediate kill:

1. Detect deadlock
2. Kill process immediately (no warning)
3. OS force cleans resources
4. Resources freed

Advantage: Immediate Disadvantage: No cleanup (may corrupt state)

Practical Approach

1. Send termination signal (SIGTERM)
2. Give process N seconds to exit gracefully
3. If not exited, send kill signal (SIGKILL)
4. Force termination

Recovery in Modern Systems

Database Systems

Transaction deadlock → Automatic rollback
Transaction restarts automatically
Application retries if needed
Users don't see deadlock (mostly transparent)

Operating Systems

Process deadlock → Rare (careful design prevents)
If occurs → Kill process, restart if critical
Users may lose unsaved work

Web Services

Request deadlock → Timeout and retry
Load balancer may route to different server
User sees retry, usually transparent

Handling Aborted Processes

Option 1: Manual Restart

User restarts process:

System kills P1 due to deadlock
User runs: "run_process P1"
Process restarts from beginning

Option 2: Automatic Restart

System restarts automatically:

System kills P1 due to deadlock
System queues P1 for restart
P1 automatically restarts
User doesn't notice (usually)

Used when: Important long-running processes

Option 3: Partial Restart

Restart with saved state:

System killed P1 after step 5
Saved state at step 5
Restart P1 from step 6 (not step 1)

Requires: Checkpointing support

Starvation After Recovery

Problem

If always abort same process:

Deadlock detected
Always kill Process X (lowest priority)
Process X never completes!
X starves (keeps getting killed)

Solution: Tracking

Track kills per process:
If Process X killed N times already → kill someone else next time
Ensures fair treatment
Prevents starvation

Cost Analysis

Cost of Each Approach

Abort All: High work loss, low recovery cost

Cost = Time lost in process × 2 (restart) + work loss

Abort One: Medium work loss, high recovery cost

Cost = Multiple detection + kills + work loss

Preemption: Low work loss, high recovery cost

Cost = Checkpointing + preemption overhead

Rollback: Medium work loss, medium cost

Cost = Checkpoint maintenance + rollback time

Summary

Recovery involves breaking deadlock after detection. Three main approaches: process termination (simple but loses work), resource preemption (complex but preserves work), and rollback (system-dependent but clean). Which to abort can use priority, resource usage, or restart likelihood. Modern systems prefer preventing deadlock (prevention/avoidance) over recovery due to cost. But recovery still needed for edge cases and as backup strategy. Understanding recovery essential for building resilient systems.

Deadlock Recovery

Definition

Recovery Strategies

Strategy 1: Process Termination

Approach 1a: Abort All Processes in Deadlock

Approach 1b: Abort One Process at a Time

Selection Criteria

Strategy 2: Resource Preemption

How It Works

Challenges

Solution: Checkpointing

Selective Preemption

Strategy 3: Rollback and Restart

Mechanism

In Database Systems

Comparison of Strategies

Process Termination Details

Safe Termination

Forced Termination

Practical Approach

Recovery in Modern Systems

Database Systems

Operating Systems

Web Services

Handling Aborted Processes

Option 1: Manual Restart

Option 2: Automatic Restart

Option 3: Partial Restart

Starvation After Recovery

Problem

Solution: Tracking

Cost Analysis

Cost of Each Approach

Summary

Install Notes App