Distributed Systems

Mino
August 16, 2024
knowledge_camp

Introduction to Distributed Computing

Distributed Computing
Distributed Computing

The term "distributed" as used in contexts such as "distributed system," "distributed programming," and "distributed algorithm" originally described networks where computers were physically spread across geographical locations. Nowadays, this term broadly covers autonomous processes, even those running on the same physical computer, which interact via message passing.

Definition and Key Properties

While a unified definition of a distributed system remains elusive, certain properties are commonly accepted:

  • Autonomy: The system consists of multiple computational entities (either "computers" or "nodes"), each with its own local memory.
  • Communication: These entities communicate through message passing.

Distributed systems may aim to solve large computational problems, appearing as a single unit to the user. Alternatively, they might manage shared resources or provide communication services, with each computer serving different user needs.

Characteristics of Distributed Systems

Distributed systems are designed to handle several complexities:

  • Fault Tolerance: The system must continue to function despite the failure of individual computers.
  • Dynamic Structure: The network configuration, including topology and latency, often changes dynamically during operation. The system might include diverse computer types and network links.
  • Limited Perspective: Each computer typically has a partial view of the system's overall state.

Parallel vs. Distributed Computing

The terms "distributed computing," "parallel computing," and "concurrent computing" overlap significantly:

  • Parallel Computing: Processors may access shared memory to exchange information.
  • Distributed Computing: Each processor operates independently with its own memory, communicating through message passing.

The accompanying figure differentiates between distributed (Figures a and b) and parallel (Figure c) systems. Figure a provides a network topology view of a distributed system, while Figure b zooms in to show individual computers linked by communication channels. Figure c illustrates a parallel system where all processors share memory.

Historical Context

The concept of message-passing concurrent processes dates back to the 1960s in operating system research. The proliferation of distributed systems began with the development of local-area networks, such as Ethernet in the 1970s, and was furthered by the introduction of ARPANET, a precursor to the internet, in the late 1960s. ARPANET's e-mail system, developed in the early 1970s, became one of the first successful large-scale distributed applications. Subsequent networks like Usenet and FidoNet also supported distributed systems from the 1980s. The formal study of distributed computing evolved into a distinct branch of computer science by the early 1980s, marked by the establishment of dedicated conferences like the Symposium on Principles of Distributed Computing (PODC) and the International Symposium on Distributed Computing (DISC).

For more detailed discussions, see the provided links regarding distributed algorithms and systems.

Architectures in Distributed Computing

Distributed computing involves various hardware and software architectures. These systems typically require the interconnection of multiple CPUs through networks which can be complexly structured or simply made up of loosely coupled devices and cables.

Types of Architectures

The fundamental architectures of distributed systems are primarily categorized by resource sharing:

  • Shared Memory: All CPUs have access to a common memory pool.
  • Shared Disk: CPUs share the same disk resources but operate independent memories.
  • Shared Nothing: Each CPU operates independently without sharing memory or disk resources.

Distributed programming often adopts specific architectural frameworks:

  • Client–Server: This model involves "smart" clients that request data from servers, process it, and manage the display. Changes made by clients are sent back to the server for updates.
  • Three-Tier: This structure shifts client-side logic to a middle tier, allowing for stateless clients which simplifies deployment. It is prevalent in web applications.
  • N-Tier: These are more complex web applications that delegate requests to additional enterprise services, a key factor in the proliferation of application servers.
  • Peer-to-Peer: In this architecture, all nodes, or peers, equally share the responsibilities of services and resource management. Notable examples include BitTorrent and the Bitcoin network.

Communication Methods

Distributed systems also vary in their method of communication among concurrent processes:

  • Direct inter-process communication is common, using various message passing protocols.
  • A Database-Centric Architecture allows distributed computing without direct inter-process communication by utilizing a shared database, which supports relational processing and analytics within a live environment, enhancing the functionality across networked databases

Applications of Distributed Systems

The deployment of distributed systems is often driven by specific needs and advantages:

  • Necessity: Some applications inherently require a distributed framework, such as data generated in one location but needed in another.
  • Benefits:
    • Greater storage capacity, faster processing, and higher bandwidth than single computers.
    • Enhanced reliability through redundancy, reducing the risk of total system failure.
    • Scalability and easier management compared to centralized systems.
    • Cost efficiency by utilizing clusters of low-end computers instead of a single high-end machine.

Examples of Distributed Systems

Distributed computing is integral to a vast array of systems and sectors:

  • Telecommunications: Includes telephone and cellular networks, the Internet, and wireless sensor networks.
  • Network Applications: Encompasses the World Wide Web, peer-to-peer networks, massively multiplayer online games, virtual reality communities, distributed databases and file systems, and distributed cache systems.
  • Real-Time Process Control: Used in aircraft control systems and industrial control mechanisms.
  • Parallel Computation: Covers scientific computing through cluster, grid, and cloud computing, and is pivotal in distributed rendering for computer graphics and various volunteer computing initiatives.

These architectures and applications illustrate the diversity and complexity of distributed computing systems, reflecting their critical role in modern technology and industry.

Theoretical Foundations of Distributed Computing

Distributed computing involves solving computational problems across multiple computers or processes. This field intersects with theories of computability and computational complexity, which explore what problems can be solved and the efficiency of these solutions.

Computational Problems

In theoretical computer science, computational problems are defined by instances (questions) and their corresponding solutions. Problems suitable for automation by computers often follow this question-answer format.

Key Theories

Computational problems are considered solvable if an algorithm can produce a correct solution for any instance. Such algorithms can be implemented on general-purpose computers which perform computations to convert input instances into output solutions. Models like random-access machines or universal Turing machines serve as abstract models for these computations. For more on these models, see Formal Models of Computation.

Concurrent and Distributed Computing

This domain extends the discussion to systems involving multiple computers or concurrent processes on a single machine, exploring the solvability and efficiency of problems in these contexts.

Common Viewpoints in Distributed Computing:

1. Parallel Algorithms in Shared-Memory Model:

  • Access: All processors access a shared memory.
  • Model: Utilizes parallel random-access machines (PRAM). For asynchronous models, see Asynchronous Shared Memory Systems.
  • Extension: Shared-memory programs can be adapted for distributed systems with OS-level memory encapsulation.

2. Parallel Algorithms in Message-Passing Model:

  • Design: The algorithm designer specifies the network structure and each computer's program.
  • Usage: Implements models like Boolean circuits and sorting networks, where computational elements function as network nodes. For more on Boolean circuits,  Boolean Circuit Complexity.

3. Distributed Algorithms in Message-Passing Model:

  • Uniformity: All computers run the same program, ensuring functionality irrespective of network structure.
  • Graphical Models: Often uses graphs where each node represents a computer within the network. For more on graph-based models in distributed algorithms, see Graph Theory and Distributed Algorithms.

Applications

In distributed computing, many computational problems are inherently graph-related, reflecting the structure and dynamics of the underlying computer network. Each graph serves not only as a tool but often as the problem instance itself.

Further Reading

Explore these concepts further through Distributed Algorithms and Systems for a comprehensive understanding of distributed algorithms' theoretical underpinnings and practical applications.

Properties of Distributed Systems

Research on distributed systems not only involves designing systems to solve specific problems but also studying the inherent properties of these systems.

Understanding System Behavior

Analogous to centralized computing's halting problem, which questions whether a given program will halt or run indefinitely, distributed computing faces similar challenges. The halting problem, undecidable in general, highlights the complexity of predicting behavior in even a single computer, which scales further in distributed networks.

Special Cases and Decidability

Despite the overarching challenge, certain special cases within distributed systems can be decided. Notably, the behavior of networks consisting of finite-state machines offers intriguing possibilities:

  • Problem Example: Determining whether a network of asynchronous and non-deterministic finite-state machines might reach a deadlock.
  • Complexity: This specific problem is classified as PSPACE-complete, indicating that while it is decidable, finding an efficient solution (whether centralized, parallel, or distributed) for large networks remains unlikely

Further Reading

For more on the theoretical challenges and properties of distributed systems, explore detailed studies such as those found in the Journal of Physics: Conference Series, which discusses various aspects of complex computing environments