Definition
A distributed system is a collection of independent computers connected together through a network that communicate and coordinate with each other to achieve a common goal. To the user, the collection of computers appears as a single coherent system even though they are geographically distributed and have independent processing power, memory, and clock.
Basic Concept
In a distributed system:
- Multiple independent computers (nodes) with separate memories and processors
- Connected via network (LAN, WAN, Internet)
- No shared memory - only message passing communication
- No shared clock - each computer has own clock
- Autonomous operation - each can work independently
- Transparent to user - appears as single system
- Loosely coupled - computers operate independently
Key Differences from Parallel Systems
| Aspect | Parallel | Distributed |
|---|---|---|
| Memory | Shared | Separate |
| Coupling | Tight | Loose |
| Communication | Shared memory | Network messages |
| Clock | Synchronized | Independent |
| Location | Same place | Different locations |
| Failure | Affects others | May be isolated |
System Architectures
1. Client-Server Architecture
Structure:
- One or more powerful server computers providing services
- Multiple client computers requesting services
- Centralized resource management
How it works:
- Client sends request to server
- Server processes request
- Server sends response back to client
- Client receives response
Characteristics:
- Clear separation of roles
- Server is centralized
- Multiple clients
- Asynchronous communication
Examples:
- Web servers (client = browser, server = web server)
- Email systems (client = email client, server = mail server)
- Database servers
- File servers
Advantages:
- Centralized management
- Easy to understand
- Security at server
- Easy to scale clients
Disadvantages:
- Server is bottleneck
- Server failure affects all clients
- Single point of failure
2. Peer-to-Peer (P2P) Architecture
Structure:
- All nodes are equal peers
- No central server
- Each node acts as both client and server
- Direct communication between peers
How it works:
- Peer A requests data from Peer B
- Peer B provides data
- Peer B requests data from Peer C
- All peers cooperate
Characteristics:
- No hierarchy
- Decentralized
- Shared resources
- Self-organizing
Examples:
- BitTorrent (file sharing)
- Blockchain (cryptocurrency)
- Skype (VoIP)
- Instant messaging
Advantages:
- Highly scalable
- No single point of failure
- Better load distribution
- Robust and resilient
Disadvantages:
- More complex
- Harder to control quality
- Security challenges
- Harder to implement
3. Hybrid Architecture
Structure:
- Combines client-server with P2P
- Some centralized components
- Some distributed components
Example:
- Torrent downloading (P2P) with tracker server (client-server)
Communication in Distributed Systems
Message Passing
Only way for independent computers to communicate:
Components:
- Sender: Computer initiating communication
- Message: Data to be transmitted
- Network: Medium of transmission
- Receiver: Computer receiving message
Communication Types
Synchronous Communication:
- Sender waits for receiver response
- Sender blocked until response arrives
- Simple but slower
Asynchronous Communication:
- Sender doesn’t wait for response
- Sender continues working
- Faster but complex coordination
Reliable Communication:
- Messages guaranteed to arrive
- TCP/IP protocol
- Slower due to acknowledgments
Unreliable Communication:
- Messages may be lost
- UDP protocol
- Faster but less safe
Network Protocols
TCP/IP:
- Reliable ordered delivery
- Used for email, web, file transfer
- Slower but guaranteed
UDP:
- Fast unreliable delivery
- Used for video, audio streaming
- Fast but may lose data
HTTP/HTTPS:
- Web protocol
- Client-server over TCP
- Request-response pattern
RPC (Remote Procedure Call):
- Call functions on remote computer
- Appears like local call
- Abstracts network complexity
Distributed System Challenges
1. Heterogeneity
Problem: Different hardware, software, networks
- Different processors (Intel, ARM, etc.)
- Different operating systems (Windows, Linux, etc.)
- Different network technologies
- Different file formats and protocols
Solution: Standardized protocols and middleware
- Use standard TCP/IP
- Use standard file formats
- Use middleware to hide differences
2. Transparency
Goal: Hide distributed nature from users
Types:
- Location Transparency: User doesn’t know where resource is
- Migration Transparency: Resources can move without affecting users
- Replication Transparency: Multiple copies appear as one
- Concurrency Transparency: Multiple users don’t interfere
3. Reliability
Problem: Components may fail
- Computer crashes
- Network disconnection
- Software errors
- Data corruption
Solutions:
- Redundancy: Duplicate computers and data
- Replication: Multiple copies of data
- Backup: Regular backups
- Error Detection: Find and fix errors
4. Synchronization and Ordering
Problem: No shared clock, so ordering is difficult
- Which event happened first?
- Need to coordinate actions
Solutions:
- Logical Clocks: Order based on causality
- Vector Clocks: Track causality relationships
- Timestamps: Use NTP for clock synchronization
5. Security
Problems:
- Network eavesdropping
- Unauthorized access
- Data corruption
- Denial of service attacks
Solutions:
- Encryption: Scramble data in transit
- Authentication: Verify user identity
- Authorization: Control access rights
- Firewall: Filter network traffic
- Intrusion Detection: Detect attacks
6. Fault Tolerance and Availability
Problem: Failures happen in distributed systems
Availability: Percentage of time system is operational
- 99% availability = ~3.5 days downtime per year
- 99.9% availability = ~8 hours downtime per year
- 99.99% availability = ~50 minutes downtime per year
Fault Tolerance: System continues despite failures
- Replication: Multiple copies continue service
- Redundancy: Backup systems take over
- Graceful Degradation: Reduced service continues
Distributed System Advantages
- Resource Sharing - Share expensive resources across network
- Cost Effective - Use cheaper computers instead of mainframe
- Reliability - System survives individual failures
- Performance - Distribute processing load
- Scalability - Easy to add more computers
- Flexibility - Can add/remove computers dynamically
- Accessibility - Access resources from anywhere
- Incremental Growth - Grow system gradually
Distributed System Disadvantages
- Complexity - Much more complex than single computer systems
- Difficult Debugging - Hard to diagnose problems across network
- Network Dependency - Depends on reliable network
- Security Issues - More vulnerable to attacks
- Synchronization - Coordinating is difficult
- Data Consistency - Keeping copies identical is hard
- Latency - Network communication is slow
- Overhead - Message passing overhead
- Difficult Testing - Hard to test all scenarios
Real Examples of Distributed Systems
Internet
- Largest distributed system
- Billions of computers
- TCP/IP based
- DNS for name resolution
World Wide Web
- Distributed information system
- HTTP protocol
- Millions of servers
- Client browsers request resources
Cloud Computing
- AWS (Amazon Web Services)
- Google Cloud
- Microsoft Azure
- On-demand computing resources
Email Systems
- Distributed message delivery
- SMTP for sending
- IMAP/POP for receiving
- Messages routed through network
Content Delivery Networks
- Akamai, CloudFlare
- Replicate content across world
- Serve from nearest location
- Reduce latency
Databases
- Distributed databases (MongoDB, Cassandra)
- Data replicated across nodes
- Queries processed locally
- Synchronization across replicas
Blockchain and Cryptocurrency
- Bitcoin, Ethereum
- Decentralized ledger
- All nodes have copy
- Consensus mechanism
Distributed Operating System Functions
1. Process Management
- Create processes on different computers
- Process migration between computers
- Remote execution of programs
- Load balancing across computers
2. Memory Management
- Distributed memory allocation
- Cache management across network
- Distributed virtual memory
- Page replacement across nodes
3. File Management
- Distributed file systems (NFS, AFS)
- Replicate files across computers
- Consistency maintenance
- Access control
4. Communication Management
- Message passing infrastructure
- Network protocol management
- Data serialization
- Latency hiding
5. Synchronization
- Distributed mutual exclusion
- Deadlock detection
- Clock synchronization
- Concurrency control
Types of Distributed Systems
Cluster Computing
- Multiple computers in same location
- High-speed network connection
- Tightly coupled
- Shared storage
- Example: Supercomputer clusters
Grid Computing
- Computers in different locations
- Variable speed network
- Loosely coupled
- Shared resources
- Example: SETI@home
Cloud Computing
- On-demand computing resources
- Shared infrastructure
- Pay per use
- Elastic scaling
Edge Computing
- Processing at network edge
- Closer to data sources
- Lower latency
- Reduced bandwidth
Important Concepts
Consistency
Multiple copies have same data value.
Availability
System accessible when needed.
Partition Tolerance
System continues despite network partition.
CAP Theorem
Can have only 2 of 3: Consistency, Availability, Partition tolerance
Eventual Consistency
All copies become consistent eventually, not immediately.
Quorum
Minimum number of nodes needed for decision.
Exam Important Points
- Define distributed system
- Difference from parallel systems
- Client-server vs P2P vs hybrid architecture
- Communication in distributed systems
- Main challenges (heterogeneity, transparency, reliability, synchronization, security)
- Advantages and disadvantages
- Real examples (Internet, Web, Cloud, Email)
- Fault tolerance and availability
- CAP theorem
- Distributed system functions