Definition
A parallel system (or parallel operating system) is a computer system with multiple processors (CPUs) that work together simultaneously to solve computational problems and manage system resources. All processors share common memory and are controlled by a single operating system. They communicate through shared memory and coordinate their activities to achieve faster execution.
Basic Concept
In a parallel system:
- Multiple CPUs work at the exact same time
- Shared memory allows all CPUs to access same data
- Tight coupling - CPUs are closely connected
- Single OS manages all processors
- Simultaneous processing - different tasks run on different CPUs at same time
Key Difference from Multiprogramming
Multiprogramming:
- One CPU
- Rapidly switches between tasks
- Only one task runs at any moment
- Feels like multitasking but really sequential
Parallel System:
- Multiple CPUs
- Different tasks run simultaneously
- True parallel execution
- Actually multiple things happening together
Types of Parallel Systems
1. Symmetric Multiprocessing (SMP)
What is it? All processors are equal in power and status. They share same memory, I/O devices, and operating system. Any processor can execute any task.
Characteristics:
- All CPUs have equal capability
- No master or slave processors
- Shared memory accessible to all
- Single operating system for all
- Uniform access time to memory (ideally)
- Any processor can execute operating system code
Advantages:
- Simple conceptually (all processors equal)
- Good load balancing possible
- Fault tolerance (if one CPU fails, others continue)
- Scalable to reasonable number of CPUs
- Better resource utilization
Disadvantages:
- Bus contention (all CPUs competing for memory access)
- Cache coherence problems (explained below)
- Synchronization complexity
- Difficult to scale beyond 32-64 CPUs
- Performance not linear with CPU addition
Example: Modern laptop with 4 or 8 cores is an SMP system!
2. Asymmetric Multiprocessing (AMP)
What is it? One processor acts as master CPU controlling others. Master assigns tasks to slave processors and manages the system. Slave processors only execute assigned tasks.
Characteristics:
- One master processor (control)
- Multiple slave processors (execution)
- Master controls task allocation
- Slaves only execute
- Master is single point of control
- Less common in modern systems
Advantages:
- Simpler to design than SMP
- Master has full control and visibility
- No cache coherence issues
- Easier synchronization
Disadvantages:
- Master is bottleneck (all decisions through master)
- If master fails, entire system stops
- Unequal processor status
- Less flexible
- Inefficient use of master’s processing power
Cache Coherence Problem (Very Important!)
What is the Problem?
Each processor has its own cache (fast memory). When multiple CPUs have copies of same data, inconsistency problems occur.
Example of Problem:
Main Memory: x = 5
CPU 1 Cache: x = 5
CPU 2 Cache: x = 5
CPU 3 Cache: x = 5
Now CPU 1 changes x to 10:
CPU 1 Cache: x = 10
But:
CPU 2 Cache: x = 5 (old value!)
CPU 3 Cache: x = 5 (old value!)
Which value is correct? Inconsistency!
Solutions to Cache Coherence
1. Write-Invalidate Protocol:
- When CPU 1 writes x = 10
- Invalidate x in all other caches
- Other CPUs know they must read fresh from main memory
- Next access gets value 10 from main memory
- Advantage: Updates only when needed
- Disadvantage: Other CPUs must wait for main memory read
2. Write-Update Protocol:
- When CPU 1 writes x = 10
- Update x in all other caches immediately
- All caches have new value
- No main memory delay for reading
- Advantage: No main memory access needed
- Disadvantage: More network traffic
3. Directory-Based Protocol:
- Keep directory of which CPUs have copies of data
- When update happens, only notify interested CPUs
- Reduces network traffic
- More complex to implement
Synchronization Issues
Race Condition Problem
When multiple CPUs access shared data simultaneously, results can be unpredictable.
Example:
Bank account with balance = 100
CPU 1: Withdraw 50
CPU 2: Deposit 30
Without synchronization:
Both read value 100
CPU 1 calculates: 100 - 50 = 50, writes 50
CPU 2 calculates: 100 + 30 = 130, writes 130
Final balance: 130 (should be 80!)
Result depends on timing - unpredictable!
Solution: Mutual Exclusion
Only one CPU can access shared data at a time using locks.
CPU 1: LOCK (balance)
CPU 1: Read 100
CPU 1: Calculate 100 - 50 = 50
CPU 1: Write 50
CPU 1: UNLOCK
CPU 2: LOCK (balance) - had to wait
CPU 2: Read 50 (new value!)
CPU 2: Calculate 50 + 30 = 80
CPU 2: Write 80
CPU 2: UNLOCK
Final balance: 80 (correct!)
Load Balancing
Purpose: Keep all CPUs equally busy
Static Load Balancing
- Divide work before execution
- Each CPU gets assigned portion
- Some CPUs may finish early while others still working
- Less overhead but less flexible
Dynamic Load Balancing
- OS monitors CPU utilization continuously
- Moves tasks from busy CPUs to idle CPUs
- Keeps all CPUs equally loaded
- More complex but better utilization
- Overhead of moving tasks
Scheduling in Parallel Systems
Challenges
- Which CPU gets which task?
- How to balance load?
- When to migrate tasks?
- Minimize context switching
- Maintain fairness
Scheduling Approaches
- Gang Scheduling: All threads of job run together
- Space Sharing: Divide processors among jobs
- Time Sharing: CPU time-sliced among jobs
- Work Stealing: Idle CPU takes work from busy CPU
Advantages of Parallel Systems
-
Increased Speed - Multiple processors working together
- 4 CPUs ≈ 4x faster
- 8 CPUs ≈ 8x faster (ideally)
-
Reliability and Fault Tolerance
- System continues if one CPU fails
- Degraded performance but still working
- Not like single processor where one failure stops everything
-
Scalability
- Can add more processors to increase power
- System grows with demand
- Flexible architecture
-
Efficient Resource Use
- No idle processors
- All resources utilized
- No bottlenecks (ideally)
-
Better Throughput
- More work completed per unit time
- Higher productivity
- Better utilization
Disadvantages of Parallel Systems
-
Very Complex to Design
- Difficult to program correctly
- Race conditions possible
- Deadlocks can occur
- Synchronization overhead
-
Cache Coherence Overhead
- Keeping caches consistent costs time
- Cache coherence protocol overhead
- Bus traffic increases
- Slows down access to shared memory
-
Synchronization Overhead
- Locks and synchronization mechanisms cost time
- Too much synchronization = sequential execution
- Too little = race conditions
- Finding balance is difficult
-
Limited Scalability
- Can’t keep adding CPUs forever
- Eventually hits bottleneck
- Bus bandwidth limited
- Memory bandwidth limited
- Practical limit: 32-64 CPUs per system
-
Expensive
- Multiple processors = higher cost
- Complex design = more development cost
- Not affordable for all systems
-
Debugging Difficulty
- Hard to reproduce problems
- Timing-dependent bugs
- Difficult to find and fix issues
Modern Parallel Systems
Multi-Core Processors
Today’s computers have multiple cores on single chip:
| System Type | Typical Cores |
|---|---|
| Budget PC | 2-4 cores |
| Mid-range | 6-8 cores |
| High-end | 12-16 cores |
| Gaming PC | 8-10 cores |
| Servers | 32-64+ cores |
| Supercomputers | 10,000+ cores |
Example: Intel Core i7-12700K has 12 cores (8 performance + 4 efficiency cores)
Graphics Processing Units (GPUs)
GPUs are extreme parallel processors:
- CPU: 8-16 cores typical
- GPU: 1,000-10,000+ cores!
Why so many?
- Each core handles one pixel or data element
- Process thousands of elements simultaneously
- Perfect for graphics and parallel algorithms
- Used for AI and machine learning
Performance Analysis
Speedup Calculation
Speedup = Time with 1 CPU / Time with N CPUs
Ideal: Speedup = N (linear) Reality: Speedup < N (due to overhead)
Amdahl’s Law (Important Limitation!)
Speedup = 1 / [f + (1-f)/N]
Where f = fraction of code that must run sequentially
What this means:
- If 10% of code must be sequential:
- With 4 CPUs: 3.08x speedup (not 4x!)
- With 100 CPUs: 9.9x speedup (not 100x!)
Implication: Some problems can’t benefit from many CPUs because sequential parts become bottleneck.
Real-World Applications
Scientific Computing
- Weather simulation
- Molecular dynamics
- Climate modeling
- Use supercomputers with thousands of CPUs
Artificial Intelligence and Machine Learning
- Training neural networks
- GPU acceleration
- Process millions of parameters
Gaming
- Multi-threaded rendering
- Better graphics
- Physics calculations
- Smoother gameplay
Video Processing
- Encoding videos (multiple frames in parallel)
- Each CPU encodes different frame
- 4 CPUs = 4x faster video encoding
Data Analysis
- Process large datasets
- Distribute analysis across CPUs
- Big data processing
Servers
- Web servers with many cores
- Handle many requests simultaneously
- Database servers for queries
Important Concepts
Context Switch
Saving current CPU state and loading next task’s state.
Cache Line
Unit of memory that moves between caches and main memory together.
False Sharing
Two CPUs modifying different variables on same cache line, causing unnecessary cache traffic.
Atomic Operation
Operation that cannot be interrupted - completes fully or not at all.
Deadlock
Circular wait where CPUs wait for each other indefinitely.
Exam Important Points
- Define parallel system
- Difference between SMP and AMP
- Cache coherence problem and solutions
- Synchronization issues and mutual exclusion
- Load balancing (static vs dynamic)
- Advantages and disadvantages
- Speedup and Amdahl’s Law
- Modern multi-core systems
- GPU as parallel system
- Applications of parallel systems