Consensus is perhaps the problem in distributed computing, in the sense that once this problem is solved, just about any problem has a solution (though, perhaps not the best solution). Perhaps the previous claim is too strong: many important tasks are provably weaker (that is, they can be solved without being able to solve consensus under the same conditions). But before we get too theoretical, let's take a tour (thanks to Wikipedia links) through motivating background.
An important motivating example for consensus is the issue of synchronizing the sites of a distributed database. The easiest case to consider is the problem of a replicated database, where consistency means having all copies being identical (we have seen already how this is useful to cloud providers). Here are some motivating concepts:
Bottom line, after all the abstractions of database transaction management and commit protocols, the sites of the database use transactions to attempt updating the state of the data.
An interesting recent development is the renewed interest in transactions from the point of view of multicore programming:
The "database" is shared memory, and we need consistent updates especially when the different cores cache the content of shared memory for the sake of performance. When considering what consistency means for shared memory, or even for some types of databases and transactions, there is a minor academic industry to define various types:
It's useful to drill down on a few of these types, such as:
The Consensus Problem
Consensus, or agreement, has a general meaning and a specific, distributed-computing meaning:
and, more specifically, Byzantine consensus, see Byzantine_generals
The consensus problem has a long, interesting history, and has been scrutinized in all sorts of computing contexts, including message-passing, shared memory, abstract linguistic models of evaluation, and more. The point of such study is to determine the dividing line between what is possible and what is impossible, and within the possible, to further distinguish between what resources are needed to solve a consensus problem.
Consensus problems are formally specified by properties of Termination, Validity, Integrity, and Agreement. Additional considerations are given to the allowable modes of failure (crash, slowness, message loss, and general craziness), the domain of values for agreement, levels of approximation, and whether we can have randomized approaches.
With respect to transactions, one way to look at the progress of transactions in a distributed database is that sites must reach a consensus on the order of updates to apply to the respective sites: if the order of updates is the same everywhere, then the database sites remain consistent. Techniques useful to build up a consensus protocol are:
As mentioned above, the literature on consensus and related problems devotes high respect to mathematical proofs of impossibility: similar in spirit to older results on undecidability, there are many theorems about the conditions that make consensus unsolvable. One very influential theorem comes from economics, not computing:
Like consensus, Arrow's result is based on an axiomatic definition of a problem with autonomous entities, which are similar to how we might view a set of distributed processes. The lesson of the result is that the precise form of the axioms can matter --- even a small change can be the difference between possibility and impossibility.
The most cited work about impossibility of consensus is:
Which is a result in the message-passing model where one process may or may not crash during execution of the consensus protocol. There are many similar impossibility theorems for different models of computing (shared memory, etc).