Linda

From 118wiki

Jump to: navigation, search

Contents

Linda

"Linda" is a coordination language, proposed in ISBN 026203171X, that enables parallel and distributed programs to be written in C, Java, or whatever is your favorite programming language. So what is a coordination language? Think of a coordination language as the glue that puts pieces of a program together, where these pieces might be spread around a number of threads or even on different nodes. A coordination language is usually implemented as an API or a library of methods. See Linda (coordination language) for a brief history.

Distributed Shared Memory

The main message of our introduction of Linda is this: we do not have to use messages to write all networked applications. Consider how threads communicate in one machine. They can read and write, or invoke methods, on shared variables and objects (maybe using locks or other techniques to serialize things properly). Why can't we do this in a network sense also? One answer is that there is no common memory available to different hosts on a network, whereas threads running on the same machine do have access to the same memory hardware. But why let that answer stop us? After all, the whole idea of a network stack is to create the illusion of a more powerful, reliable, and feature-filled abstraction from the lower layers. Can't we create the illusion of a common (virtual) memory and then allow programs on a network communicate by reading and writing using this virtual memory?

This "virtual" memory, which could be shared and manipulated by multiple hosts on a network has a name: distributed shared memory, or DSM. There are many interesting technical issues in implementing and using DSM, but we will look instead a related idea, which was proposed in the Linda system.

Tuplespace

Instead of a DSM, Linda is based on an abstraction called tuplespace. Programs written using Linda communicate with each other by reading from and writing into tuplespace (so far, it sounds just like a DSM). However, unlike a DSM -- which mimics the way hardware memories behave -- the tuplespace is based on a data structure called a multiset: tuplespace is a multiset of tuples. The tuples in this multiset can be any kind of data we like, for example (1,3.14159,"alpha") is a tuple. The fact that tuplespace is multiset rather than a set implies that the tuplespace can have any number of copies of a given tuple.

Operations on Tuplespace

Initially, the tuplespace is empty. Program operations add tuples to the tuplespace, read tuples in the tuplespace, and remove tuples from the tuplespace. Here are the basic operations:

out 
The out operation adds a tuple to the tuplespace.
rd 
The rd operation reads a tuple in the tuplespace.
in 
The in operation both reads a tuple in the tuplespace and removes that tuple from the tuplespace (it thus resembles a dequeue operation, though the tuplespace is not in any way ordered like a queue).

A radical difference between traditional DSM and the tuplespace is how tuples are "addressed". In a DSM, memory is addressed by the location of a word of memory; in tuplespace, a tuple is specified by search parameter(s). The best way to understand this is by an example.

rd("A", ? x, ? y)

The operation above asks to read a tuple with three elements, whose first element is the letter "A". The cryptic-looking "? x" specifies that we don't care about the value in the second element of the tuple, however when the tuple is read, kindly let x equal the value of the tuple's second element (similarly, y should be equal to the tuple's third element). It could be that the tuplespace contains thousands of tuples with three elements whose first element is "A" -- in such a case, the programmer has no control over which tuple will be read.

But what happens if the tuplespace contains no triple of the form ("A", ... , ... )? In this case, the program executing rd("A",?x,?y) blocks, that is, execution stops and the program is frozen. The program remains frozen until a tuple shows up in the tuplespace which has the matching form; after that, execution continues by reading a matching tuple.

The in operation is like rd, except that the tuple read is also removed from the tuplespace. If there are many (even duplicate) matching tuples, then only one tuple is read and removed, but we don't have any control over which one is removed. Obviously, a program that executes

in("A", ? x, ? y)

could block if no matching tuple is found.

Sometimes, we'd prefer that programs do not block when trying to read or input a tuple (we saw this theme previously with thread and select). Linda provides two variants of in/rd,

rdp 
The rdp operation returns True if a matching tuple was found and read, otherwise it returns False (implying that no matching tuple existed at the time the operation executed).
inp 
The inp operation similarly returns True or False, and in the former case, removes a matching tuple from the tuplespace.

There is one other operation defined in Linda that we won't use or study, called eval:

eval 
The eval operation forks a new process (or starts a new thread) to evaluate some expression (which could be some subroutine). When this new process or thread finishes its work, a new tuple is added to the tuplespace. This new tuple contains the result of the expression that the process/thread evaluated.

Linda in Python

There are some implementations of Linda, or Linda-like languages, for Python. But for learning purposes, its best to start with a simplified version. I wrote a very simple adaptation of Linda for Python: media:Linda.txt contains two implementations of Linda, one is sequential, the other is threaded. The sequential implementation is very simple, but only supports inp, out, and rdp operations (blocking operations only make sense when there are threads). The threading implementation has all operations except for eval. The operations and syntax is changed a bit, because in Python we cannot use the "?x" syntax. The program media:testLinda.txt demonstrates use of the Linda.py implementation; you can see how the operations work in that example.

Networked Linda in Python

The media:Linda.txt implementation of Linda does allow for concurrent (and parallel) execution of programs using threads. But what if we would like to have multiple hosts involved in a parallel program? In that case, we need more than just threads. There are some interesting (but complicated!) ideas of how to implement the tuplespace abstraction in a network. The simplest idea is just to have a "tuplespace server" that manages all the tuples. This is not a very scalable technique, because a single server can be a bottleneck. But it is simple to demonstrate: media:NetworkLindaServer.txt is an example (written by students in a previous semester of this course). To use this server, each "client" is a Python program that instead of using media:Linda.txt, instead uses media:NetworkLinda.txt, which is essentially a proxy to having the server directly on the same host. The client uses TCP to communicate tuple requests and responses with the server.

Personal tools