Management of Parallelism
In the abstract world of PRAM computing, we ignore how computation is started, how results are saved, and how processes are scheduled. Also, computation is at a predictable, synchronous rate: so we do not consider the many problems of unbalanced load and hardware failures. In practical multicore architecture, there are two extremes: synchronous cores found in GPU hardware has a single clock driving all the cores, so they all run at the same rate; the GPU programs tend to be SIMD-like, from which we can plan that all cores do the same size computations. Thus GPU programs tend to be like the simple world of PRAM computing, except that operations on memory are constrained. The GPU programming model tends to look more like a Dataflow way of doing things. At the other extreme, cores can be general purpose and have private memory caches. If different cores are doing different things, and their memory caches have different contents, then computation appears to be asynchronous (different processing rates) even if all the cores have a common clock and there is a shared read/write memory. Operating systems manage multiple cores using the same vocabulary developed during the single-core days: back then, abstractions like "processes", "threads", "jobs", "tasks", "address spaces" -- these simulated a world of asynchronous parallelism, especially when the work often included substantial periods of a CPU waiting on input/output operations.
Let's review some of the terminology:
one logical flow of control through a program, typically in its own private address space (made possible through virtual memory or hardware that can separate user memory from operating system kernel memory). A "process" is a heavyweight entity in terms of the overhead that tracks where a process is in the program, what resources and permissions it has. The operating system needs functions that set up memory space for a process, functions that respond to system calls from the process, functions that take care of process failures or unfair use of resources, and functions that allow a process to communicate with device drivers (printers, files, network, keyboard, etc). The life of an operating system consists of transiting from one process to the kernel, then to another process (see Context switch) back to the kernel, and so on. If there are multiple cores, the story is similar, but there is another level of management where the operating system can try to keep all the cores busy by runnning multiple processes concurrently. This scheme of having an operating system manage processes works quite well except when processes are very brief (high overhead for little actual work) and when processes contend for resources, possibly having problems when they attempt to share or monopolize resources.
again this is a logical flow of control through a program, but multiple threads can share the same address space. Originally when proposed, threads were just an abstraction outside of the operating system, implemented by a threads library. In effect, some user wrote a simulator of concurrent processing and the operating system was unaware of this. As an interesting aside, the illusion of concurrent processing by threads can be implemented using programming concepts Coroutine, Switch statement, and a State transition table. Because the operating system did not do anything special for threads, it follows that threads can be lower overhead than processes. The flip side of this observation is that threads aren't so well protected from each other and do not have the same kind of protection, permissions, and failure cleanup than processes have.
- there's no widely agreed definition of a job, though we might think of it as a programming "session" which can have multiple steps, each of which might use one or more processes.
Whether the definitions sketched above are accurate or not, the takeaway is that operating systems on single computers implement these or similar abstractions, and these concepts are familiar to most programmers.
Management of Distributed Systems
The question arises, how should the resources of a distributed system (collection of many computers that are networked together) be managed? At an even bigger scale, what about an enterprise which has multiple data centers (and perhaps three hundred thousand server-scale computers per data center)? Will the historical concepts of process, thread, and job be sufficient for this scale of system? The word scale is crucial here. The fundamental concerns at this large scale are the same as for a single-computer operating system: manage execution of programs, control resources, schedule efficiently and/or fairly, ideally maximizing parallelism while reducing overhead, and dealing with failures. The scale of a large distributed system has two main consequences: (1) algorithms, data structures, the "architecture" of software and network protocols can't get significantly worse as the distributed system is made to scale up and scale out; and (2) due to simple probability arguments, failures of component hardware will be commonplace when a system is large.
The "operating system" for a distributed group of machines, possibly spanning multiple data centers, uses familiar concepts. There need to be the same kind of services that an operating system has, for resource allocation, for management of jobs, file and database storage. The difference is that the system is not a single program. It is usually designed in terms of servers. There can be a set of servers for starting, stopping and monitoring jobs. Another set of servers does authentication (passwords & credentials); there are file servers and database servers; there are servers to observe the health of all machines and detect failures. Though it might be simpler to have one server for everything, splitting up the responsibility into multiple servers works better for software development. Instead of a single server for one speciality, it's often useful to have backup servers for crash tolerance.
The architecture of multiple servers described in the previous paragraph leaves out some details. First, the servers are themselves running on standard operating systems (linux, unix, windows). The servers implement documented APIs usually based on REST, with call parameters and responses encoded using XML, JSON, or similar standard data formats. The call itself is thus an HTTP or HTTPS request. Each API has some known ports that clients connect to for API calls.
Apache YARN is an example of a job/task scheduling service. Here are some references about Apache YARN:
The diagrams in these references show how YARN can be combined with Hadoop and the HDFS; also YARN can be an "execution engine" for Spark.
Another example is Apache Zookeeper. This is a highly specialized service for tracking distributed processes. One can thing of Zookeeper as a database of processes, from which one can request information about the status of a process, what is its parent process, etc.
Apache Mesos is a job scheduler, similar in some ways to YARN, but more ambitious in the scale and potential applications.