Parallel Systems

Definition

A parallel system (or parallel operating system) is a computer system with multiple processors (CPUs) that work together simultaneously to solve computational problems and manage system resources. All processors share common memory and are controlled by a single operating system. They communicate through shared memory and coordinate their activities to achieve faster execution.

Basic Concept

In a parallel system:

  • Multiple CPUs work at the exact same time
  • Shared memory allows all CPUs to access same data
  • Tight coupling - CPUs are closely connected
  • Single OS manages all processors
  • Simultaneous processing - different tasks run on different CPUs at same time

Key Difference from Multiprogramming

Multiprogramming:

  • One CPU
  • Rapidly switches between tasks
  • Only one task runs at any moment
  • Feels like multitasking but really sequential

Parallel System:

  • Multiple CPUs
  • Different tasks run simultaneously
  • True parallel execution
  • Actually multiple things happening together

Types of Parallel Systems

1. Symmetric Multiprocessing (SMP)

What is it? All processors are equal in power and status. They share same memory, I/O devices, and operating system. Any processor can execute any task.

Characteristics:

  • All CPUs have equal capability
  • No master or slave processors
  • Shared memory accessible to all
  • Single operating system for all
  • Uniform access time to memory (ideally)
  • Any processor can execute operating system code

Advantages:

  • Simple conceptually (all processors equal)
  • Good load balancing possible
  • Fault tolerance (if one CPU fails, others continue)
  • Scalable to reasonable number of CPUs
  • Better resource utilization

Disadvantages:

  • Bus contention (all CPUs competing for memory access)
  • Cache coherence problems (explained below)
  • Synchronization complexity
  • Difficult to scale beyond 32-64 CPUs
  • Performance not linear with CPU addition

Example: Modern laptop with 4 or 8 cores is an SMP system!

2. Asymmetric Multiprocessing (AMP)

What is it? One processor acts as master CPU controlling others. Master assigns tasks to slave processors and manages the system. Slave processors only execute assigned tasks.

Characteristics:

  • One master processor (control)
  • Multiple slave processors (execution)
  • Master controls task allocation
  • Slaves only execute
  • Master is single point of control
  • Less common in modern systems

Advantages:

  • Simpler to design than SMP
  • Master has full control and visibility
  • No cache coherence issues
  • Easier synchronization

Disadvantages:

  • Master is bottleneck (all decisions through master)
  • If master fails, entire system stops
  • Unequal processor status
  • Less flexible
  • Inefficient use of master’s processing power

Cache Coherence Problem (Very Important!)

What is the Problem?

Each processor has its own cache (fast memory). When multiple CPUs have copies of same data, inconsistency problems occur.

Example of Problem:

Main Memory: x = 5

CPU 1 Cache: x = 5
CPU 2 Cache: x = 5
CPU 3 Cache: x = 5

Now CPU 1 changes x to 10:
CPU 1 Cache: x = 10

But:
CPU 2 Cache: x = 5 (old value!)
CPU 3 Cache: x = 5 (old value!)

Which value is correct? Inconsistency!

Solutions to Cache Coherence

1. Write-Invalidate Protocol:

  • When CPU 1 writes x = 10
  • Invalidate x in all other caches
  • Other CPUs know they must read fresh from main memory
  • Next access gets value 10 from main memory
  • Advantage: Updates only when needed
  • Disadvantage: Other CPUs must wait for main memory read

2. Write-Update Protocol:

  • When CPU 1 writes x = 10
  • Update x in all other caches immediately
  • All caches have new value
  • No main memory delay for reading
  • Advantage: No main memory access needed
  • Disadvantage: More network traffic

3. Directory-Based Protocol:

  • Keep directory of which CPUs have copies of data
  • When update happens, only notify interested CPUs
  • Reduces network traffic
  • More complex to implement

Synchronization Issues

Race Condition Problem

When multiple CPUs access shared data simultaneously, results can be unpredictable.

Example:

Bank account with balance = 100

CPU 1: Withdraw 50
CPU 2: Deposit 30

Without synchronization:
Both read value 100
CPU 1 calculates: 100 - 50 = 50, writes 50
CPU 2 calculates: 100 + 30 = 130, writes 130

Final balance: 130 (should be 80!)

Result depends on timing - unpredictable!

Solution: Mutual Exclusion

Only one CPU can access shared data at a time using locks.

CPU 1: LOCK (balance)
CPU 1: Read 100
CPU 1: Calculate 100 - 50 = 50
CPU 1: Write 50
CPU 1: UNLOCK

CPU 2: LOCK (balance) - had to wait
CPU 2: Read 50 (new value!)
CPU 2: Calculate 50 + 30 = 80
CPU 2: Write 80
CPU 2: UNLOCK

Final balance: 80 (correct!)

Load Balancing

Purpose: Keep all CPUs equally busy

Static Load Balancing

  • Divide work before execution
  • Each CPU gets assigned portion
  • Some CPUs may finish early while others still working
  • Less overhead but less flexible

Dynamic Load Balancing

  • OS monitors CPU utilization continuously
  • Moves tasks from busy CPUs to idle CPUs
  • Keeps all CPUs equally loaded
  • More complex but better utilization
  • Overhead of moving tasks

Scheduling in Parallel Systems

Challenges

  • Which CPU gets which task?
  • How to balance load?
  • When to migrate tasks?
  • Minimize context switching
  • Maintain fairness

Scheduling Approaches

  • Gang Scheduling: All threads of job run together
  • Space Sharing: Divide processors among jobs
  • Time Sharing: CPU time-sliced among jobs
  • Work Stealing: Idle CPU takes work from busy CPU

Advantages of Parallel Systems

  1. Increased Speed - Multiple processors working together

    • 4 CPUs ≈ 4x faster
    • 8 CPUs ≈ 8x faster (ideally)
  2. Reliability and Fault Tolerance

    • System continues if one CPU fails
    • Degraded performance but still working
    • Not like single processor where one failure stops everything
  3. Scalability

    • Can add more processors to increase power
    • System grows with demand
    • Flexible architecture
  4. Efficient Resource Use

    • No idle processors
    • All resources utilized
    • No bottlenecks (ideally)
  5. Better Throughput

    • More work completed per unit time
    • Higher productivity
    • Better utilization

Disadvantages of Parallel Systems

  1. Very Complex to Design

    • Difficult to program correctly
    • Race conditions possible
    • Deadlocks can occur
    • Synchronization overhead
  2. Cache Coherence Overhead

    • Keeping caches consistent costs time
    • Cache coherence protocol overhead
    • Bus traffic increases
    • Slows down access to shared memory
  3. Synchronization Overhead

    • Locks and synchronization mechanisms cost time
    • Too much synchronization = sequential execution
    • Too little = race conditions
    • Finding balance is difficult
  4. Limited Scalability

    • Can’t keep adding CPUs forever
    • Eventually hits bottleneck
    • Bus bandwidth limited
    • Memory bandwidth limited
    • Practical limit: 32-64 CPUs per system
  5. Expensive

    • Multiple processors = higher cost
    • Complex design = more development cost
    • Not affordable for all systems
  6. Debugging Difficulty

    • Hard to reproduce problems
    • Timing-dependent bugs
    • Difficult to find and fix issues

Modern Parallel Systems

Multi-Core Processors

Today’s computers have multiple cores on single chip:

System TypeTypical Cores
Budget PC2-4 cores
Mid-range6-8 cores
High-end12-16 cores
Gaming PC8-10 cores
Servers32-64+ cores
Supercomputers10,000+ cores

Example: Intel Core i7-12700K has 12 cores (8 performance + 4 efficiency cores)

Graphics Processing Units (GPUs)

GPUs are extreme parallel processors:

  • CPU: 8-16 cores typical
  • GPU: 1,000-10,000+ cores!

Why so many?

  • Each core handles one pixel or data element
  • Process thousands of elements simultaneously
  • Perfect for graphics and parallel algorithms
  • Used for AI and machine learning

Performance Analysis

Speedup Calculation

Speedup = Time with 1 CPU / Time with N CPUs

Ideal: Speedup = N (linear) Reality: Speedup < N (due to overhead)

Amdahl’s Law (Important Limitation!)

Speedup = 1 / [f + (1-f)/N]

Where f = fraction of code that must run sequentially

What this means:

  • If 10% of code must be sequential:
    • With 4 CPUs: 3.08x speedup (not 4x!)
    • With 100 CPUs: 9.9x speedup (not 100x!)

Implication: Some problems can’t benefit from many CPUs because sequential parts become bottleneck.

Real-World Applications

Scientific Computing

  • Weather simulation
  • Molecular dynamics
  • Climate modeling
  • Use supercomputers with thousands of CPUs

Artificial Intelligence and Machine Learning

  • Training neural networks
  • GPU acceleration
  • Process millions of parameters

Gaming

  • Multi-threaded rendering
  • Better graphics
  • Physics calculations
  • Smoother gameplay

Video Processing

  • Encoding videos (multiple frames in parallel)
  • Each CPU encodes different frame
  • 4 CPUs = 4x faster video encoding

Data Analysis

  • Process large datasets
  • Distribute analysis across CPUs
  • Big data processing

Servers

  • Web servers with many cores
  • Handle many requests simultaneously
  • Database servers for queries

Important Concepts

Context Switch

Saving current CPU state and loading next task’s state.

Cache Line

Unit of memory that moves between caches and main memory together.

False Sharing

Two CPUs modifying different variables on same cache line, causing unnecessary cache traffic.

Atomic Operation

Operation that cannot be interrupted - completes fully or not at all.

Deadlock

Circular wait where CPUs wait for each other indefinitely.

Exam Important Points

  1. Define parallel system
  2. Difference between SMP and AMP
  3. Cache coherence problem and solutions
  4. Synchronization issues and mutual exclusion
  5. Load balancing (static vs dynamic)
  6. Advantages and disadvantages
  7. Speedup and Amdahl’s Law
  8. Modern multi-core systems
  9. GPU as parallel system
  10. Applications of parallel systems