Parallel Systems

Definition

A parallel system (or parallel operating system) is a computer system with multiple processors (CPUs) that work together simultaneously to solve computational problems and manage system resources. All processors share common memory and are controlled by a single operating system. They communicate through shared memory and coordinate their activities to achieve faster execution.

Basic Concept

In a parallel system:

Multiple CPUs work at the exact same time
Shared memory allows all CPUs to access same data
Tight coupling - CPUs are closely connected
Single OS manages all processors
Simultaneous processing - different tasks run on different CPUs at same time

Key Difference from Multiprogramming

Multiprogramming:

One CPU
Rapidly switches between tasks
Only one task runs at any moment
Feels like multitasking but really sequential

Parallel System:

Multiple CPUs
Different tasks run simultaneously
True parallel execution
Actually multiple things happening together

Types of Parallel Systems

1. Symmetric Multiprocessing (SMP)

What is it? All processors are equal in power and status. They share same memory, I/O devices, and operating system. Any processor can execute any task.

Characteristics:

All CPUs have equal capability
No master or slave processors
Shared memory accessible to all
Single operating system for all
Uniform access time to memory (ideally)
Any processor can execute operating system code

Advantages:

Simple conceptually (all processors equal)
Good load balancing possible
Fault tolerance (if one CPU fails, others continue)
Scalable to reasonable number of CPUs
Better resource utilization

Disadvantages:

Bus contention (all CPUs competing for memory access)
Cache coherence problems (explained below)
Synchronization complexity
Difficult to scale beyond 32-64 CPUs
Performance not linear with CPU addition

Example: Modern laptop with 4 or 8 cores is an SMP system!

2. Asymmetric Multiprocessing (AMP)

What is it? One processor acts as master CPU controlling others. Master assigns tasks to slave processors and manages the system. Slave processors only execute assigned tasks.

Characteristics:

One master processor (control)
Multiple slave processors (execution)
Master controls task allocation
Slaves only execute
Master is single point of control
Less common in modern systems

Advantages:

Simpler to design than SMP
Master has full control and visibility
No cache coherence issues
Easier synchronization

Disadvantages:

Master is bottleneck (all decisions through master)
If master fails, entire system stops
Unequal processor status
Less flexible
Inefficient use of master’s processing power

Cache Coherence Problem (Very Important!)

What is the Problem?

Each processor has its own cache (fast memory). When multiple CPUs have copies of same data, inconsistency problems occur.

Example of Problem:

Main Memory: x = 5

CPU 1 Cache: x = 5
CPU 2 Cache: x = 5
CPU 3 Cache: x = 5

Now CPU 1 changes x to 10:
CPU 1 Cache: x = 10

But:
CPU 2 Cache: x = 5 (old value!)
CPU 3 Cache: x = 5 (old value!)

Which value is correct? Inconsistency!

Solutions to Cache Coherence

1. Write-Invalidate Protocol:

When CPU 1 writes x = 10
Invalidate x in all other caches
Other CPUs know they must read fresh from main memory
Next access gets value 10 from main memory
Advantage: Updates only when needed
Disadvantage: Other CPUs must wait for main memory read

2. Write-Update Protocol:

When CPU 1 writes x = 10
Update x in all other caches immediately
All caches have new value
No main memory delay for reading
Advantage: No main memory access needed
Disadvantage: More network traffic

3. Directory-Based Protocol:

Keep directory of which CPUs have copies of data
When update happens, only notify interested CPUs
Reduces network traffic
More complex to implement

Synchronization Issues

Race Condition Problem

When multiple CPUs access shared data simultaneously, results can be unpredictable.

Example:

Bank account with balance = 100

CPU 1: Withdraw 50
CPU 2: Deposit 30

Without synchronization:
Both read value 100
CPU 1 calculates: 100 - 50 = 50, writes 50
CPU 2 calculates: 100 + 30 = 130, writes 130

Final balance: 130 (should be 80!)

Result depends on timing - unpredictable!

Solution: Mutual Exclusion

Only one CPU can access shared data at a time using locks.

CPU 1: LOCK (balance)
CPU 1: Read 100
CPU 1: Calculate 100 - 50 = 50
CPU 1: Write 50
CPU 1: UNLOCK

CPU 2: LOCK (balance) - had to wait
CPU 2: Read 50 (new value!)
CPU 2: Calculate 50 + 30 = 80
CPU 2: Write 80
CPU 2: UNLOCK

Final balance: 80 (correct!)

Load Balancing

Purpose: Keep all CPUs equally busy

Static Load Balancing

Divide work before execution
Each CPU gets assigned portion
Some CPUs may finish early while others still working
Less overhead but less flexible

Dynamic Load Balancing

OS monitors CPU utilization continuously
Moves tasks from busy CPUs to idle CPUs
Keeps all CPUs equally loaded
More complex but better utilization
Overhead of moving tasks

Scheduling in Parallel Systems

Challenges

Which CPU gets which task?
How to balance load?
When to migrate tasks?
Minimize context switching
Maintain fairness

Scheduling Approaches

Gang Scheduling: All threads of job run together
Space Sharing: Divide processors among jobs
Time Sharing: CPU time-sliced among jobs
Work Stealing: Idle CPU takes work from busy CPU

Advantages of Parallel Systems

Increased Speed - Multiple processors working together
- 4 CPUs ≈ 4x faster
- 8 CPUs ≈ 8x faster (ideally)
Reliability and Fault Tolerance
- System continues if one CPU fails
- Degraded performance but still working
- Not like single processor where one failure stops everything
Scalability
- Can add more processors to increase power
- System grows with demand
- Flexible architecture
Efficient Resource Use
- No idle processors
- All resources utilized
- No bottlenecks (ideally)
Better Throughput
- More work completed per unit time
- Higher productivity
- Better utilization

Disadvantages of Parallel Systems

Very Complex to Design
- Difficult to program correctly
- Race conditions possible
- Deadlocks can occur
- Synchronization overhead
Cache Coherence Overhead
- Keeping caches consistent costs time
- Cache coherence protocol overhead
- Bus traffic increases
- Slows down access to shared memory
Synchronization Overhead
- Locks and synchronization mechanisms cost time
- Too much synchronization = sequential execution
- Too little = race conditions
- Finding balance is difficult
Limited Scalability
- Can’t keep adding CPUs forever
- Eventually hits bottleneck
- Bus bandwidth limited
- Memory bandwidth limited
- Practical limit: 32-64 CPUs per system
Expensive
- Multiple processors = higher cost
- Complex design = more development cost
- Not affordable for all systems
Debugging Difficulty
- Hard to reproduce problems
- Timing-dependent bugs
- Difficult to find and fix issues

Modern Parallel Systems

Multi-Core Processors

Today’s computers have multiple cores on single chip:

System Type	Typical Cores
Budget PC	2-4 cores
Mid-range	6-8 cores
High-end	12-16 cores
Gaming PC	8-10 cores
Servers	32-64+ cores
Supercomputers	10,000+ cores

Example: Intel Core i7-12700K has 12 cores (8 performance + 4 efficiency cores)

Graphics Processing Units (GPUs)

GPUs are extreme parallel processors:

CPU: 8-16 cores typical
GPU: 1,000-10,000+ cores!

Why so many?

Each core handles one pixel or data element
Process thousands of elements simultaneously
Perfect for graphics and parallel algorithms
Used for AI and machine learning

Performance Analysis

Speedup Calculation

Speedup = Time with 1 CPU / Time with N CPUs

Ideal: Speedup = N (linear) Reality: Speedup < N (due to overhead)

Amdahl’s Law (Important Limitation!)

Speedup = 1 / [f + (1-f)/N]

Where f = fraction of code that must run sequentially

What this means:

If 10% of code must be sequential:
- With 4 CPUs: 3.08x speedup (not 4x!)
- With 100 CPUs: 9.9x speedup (not 100x!)

Implication: Some problems can’t benefit from many CPUs because sequential parts become bottleneck.

Real-World Applications

Scientific Computing

Weather simulation
Molecular dynamics
Climate modeling
Use supercomputers with thousands of CPUs

Artificial Intelligence and Machine Learning

Training neural networks
GPU acceleration
Process millions of parameters

Gaming

Multi-threaded rendering
Better graphics
Physics calculations
Smoother gameplay

Video Processing

Encoding videos (multiple frames in parallel)
Each CPU encodes different frame
4 CPUs = 4x faster video encoding

Data Analysis

Process large datasets
Distribute analysis across CPUs
Big data processing

Servers

Web servers with many cores
Handle many requests simultaneously
Database servers for queries

Important Concepts

Context Switch

Saving current CPU state and loading next task’s state.

Cache Line

Unit of memory that moves between caches and main memory together.

Two CPUs modifying different variables on same cache line, causing unnecessary cache traffic.

Atomic Operation

Operation that cannot be interrupted - completes fully or not at all.

Deadlock

Circular wait where CPUs wait for each other indefinitely.

Exam Important Points

Define parallel system
Difference between SMP and AMP
Cache coherence problem and solutions
Synchronization issues and mutual exclusion
Load balancing (static vs dynamic)
Advantages and disadvantages
Speedup and Amdahl’s Law
Modern multi-core systems
GPU as parallel system
Applications of parallel systems