![]() |
The MEMO System : A Shared Directory of Unordered Queues |
Copyright ©
1994. Thomas W. Christopher
| 4.1 | Abstract |
| The Memo System provides synchronization and
communication facilities to parallel processes. Processes
communicate through a shared directory of unordered
queues. Memo is shown to provide a wide variety of shared
data structures -- including arrays, I-structures,
barriers, locks, and semaphores -- and programming
techniques -- including job jars, dataflow, and reactive
objects. In comparing Memo to the Linda programming model, it is shown that almost all Linda algorithms use tuple space to simulate directories of queues, which is provided directly in Memo. Keywords : directory of queues, parallel software, shared data structures, Linda. |
|
| 4.2 | Introduction |
| Many conventional distributed-memory programming
systems allocate one process to each node of a
multicomputer. The nodes send messages directly to each
other. A problem with such systems is that data
structures are not global, but rather are localized in
each process. One of the most common programming
techniques is to manipulate a data structure, but when
trying to use that technique on most distributed-memory
systems, the data structure must be partitioned and the
parts hidden in a fixed number of processes, which
creates an artificiality and complexity in parallel
programs. A candidate for a more natural parallel
programming system is Linda [AHUJ86] [CARR89a] [CARR89b]
[GELE85] [LELE90] which provides a shared associative
memory, the tuple space. A tuple is a sequence of fields,
each of which has a type and contains a value or a
variable. Tuples are written into the tuple space with an out operation, are removed with an in, and are read without being removed with a rd. For an in or rd, the tuple accessed in tuple space must match the tuple provided with the command. The number and types of fields must be identical. A value must match an identical value. A variable in either must match a value in the other. A variable will not match a variable. The in or rd will block until there is a matching tuple in tuple space. Recent implementations of Linda have provided non-blocking versions or rd and in, rdp and inp, which return a Boolean indication of success. Linda also has the command eval, which will start a process, an "active tuple," which will return its value by becoming a normal tuple. Various implementations of Linda have restricted the format of tuples, or the implementation of eval, or have provided multiple tuple spaces. The tuple space does allow data structures to be shared among processes, and the designers of Linda are quick to point out the advantages of being able to design programs around the shared data structures. Still, a question persists: why does tuple space suddenly become a natural structure for parallel programming when there was no demand for it in sequential programs? An examination of the published Linda fragments suggests an explanation: in almost all cases the tuple space is being used as a shared, flat directory of queues. Since both directories and queues have long been known to be useful in multiprogramming and multitasking systems and since they can so easily be implemented in tuple space, it is no surprise that Linda is successful. But this does suggest that a system built directly on a shared directory of queues might work as well as or better than Linda and that clarity on the data structures actually being used might suggest useful facilities that tuple space would not suggest. The Memo System (Memo) was designed and implemented to investigate the possibilities for a parallel programming system built around a shared directory of queues. We describe Memo and show some data structures and parallel programming techniques it makes possible, including named objects, arrays of objects, locks and semaphores, unordered and ordered queues, job jars, futures, and barriers. We compare Memo to Linda and suggest that Linda is successful not because of its tuple space abstraction, but rather that tuple space is valuable primarily because it allows the simulation of a directory of queues. |
|
| 4.3 | Memo facilities |
| The Memo system provides communication among parallel processes through a shared directory of unordered queues (or a shared table of bags, if you prefer). In Memo, the queues are called folders and appear only in the directory associated with names. In an early version of the Memo system, the names of folders were character strings. In the version presented here, a folder names consists of a "symbol" and three integers, allowing the simulation of up-to-three-dimensional arrays of folders. Fixed-sized names speed up accesses. | |
| 4.3.1 | Types and folder naming |
There are three new data types provided by the Memo
system:
The type MEMO is a pointer to a block of dynamic
storage. The S member represents the name of the data structure that this folder is part of, e.g. the name of an array. For an array, the X member contains the indices. The system includes a definition for NUM_X in the header file. We provide NUM_X equal to three, but it can be changed if the application demands it. SYMBOL memo_symbol() |
|
| 4.3.2 | Basic Procedures |
| The central operations of Memo place memos into and
remove memos from folders. void memo_write(key,block_addr,len)
|
|
| 4.3.3 | Alternative gets |
MEMO
memo_alt(key_list,len_key_list,one_found)
|
|
| 4.3.4 | Delayed write |
void
memo_write_delayed(key1,key2,block_addr,len)
|
|
| 4.3.5 | Unsafe procedures |
MEMO memo_alloc(len)
|
|
| 4.3.6 | Process management |
| The actual procedures used to create parallel
processes are not central to the concept of the memo
system. In our current implementations we assume a
limited number of processes will be created at the
beginning of the program execution and will continue in
existence until the end. We provide two functions to
accomplish this: memo_init
and memo_exec. void memo_init() void memo_exec(num_workers,argc,argv)
|
|
| 4.4 | Data Structures and Programming Techniques |
| Memo facilitates the creation of shared data and
control structures. The following is a list of some of
the most useful: Named Objects. A folder that holds at most one memo can represent a dynamically allocated object on a heap. Instead of pointers to the objects, we use folder names. Arrays.
Arrays of shared objects may be created similarly. The
element a[i,j] can be stored in a folder whose name, key,
is constructed thus: FOLDER_NAME key; Locked data structures.
Shared records are accessed by getting them from their
folders, examining and updating them, and putting them
back. While the record is being updated, its folder is
empty, and any other process trying to access it will
have to wait; the records are implicitly locked. The
following exemplifies operations on shared records: Locks. Naturally, an explicit lock can simply be represented by an empty memo in a folder. FOLDER_NAME lock; Semaphores.
The simplest implementation of a counting semaphore is
identical to a lock, except that to initialize the
semaphore, a process places as many empty memos in the
semaphore's folder as initial count. Job Jar.
An important use of an unordered queue is as a job jar.
The memos in the job jar indicate tasks to perform.
Whenever a process requires more work to do, it pulls a
memo out of the job jar. Whenever a process creates more
work to do, it drops memos in the job jar. It is often
convenient to have one job jar for each process and one
common jar for all. The individual job jars are used for
operations that must be performed by a particular
process, e.g. file I/O - a file is typically open in only
one particular process. Futures and I-Structures.
A future is an assign-once variable used to communicate
between a producer (typically a subroutine) and a
consumer (its caller). Both the producer and consumer may
run in parallel, with the consumer only being delayed if
it attempts to fetch from the variable before it has been
assigned. An I-structure (an "incremental
structure") is a collection (e.g. an array) of
futures. I-structures were invented for dataflow hardware
[ARVI80]. Dataflow.
Dataflow programming triggers execution of code when its
operands become available [ACKE82] [ARVI86] [DAVI82]
[DENN80] [TREL82] [VEEN86]. The Memo system facilitates
dataflow programming by providing the memo_write_delayed
procedure. Assume the operands are futures. One simply
arranges to have an operation dropped into a job jar when
an operand memo arrives in a folder, thus: If the operation requires more than one operand, then
the operation can poll for each of the operands (e.g. by
memo_present) and delay itself in absent operands'
folders until all are available. Reactive Objects.
A reactive object is an object that executes only in
response to messages it receives. Reactive objects are
central to the Actors model of parallel computation
[AGHA86] [ATHA86] [ATHA87] [ATHA88] [BODE88]. A reactive
object can be implemented with a job jar folder plus one
input folder per object. A memo containing the object is
delayed on its input folder. When an input memo arrives,
the object is placed in the job jar. Eventually some
process fetches the object from the job jar and executes
its code, which reads an input message and responds to
it. (For efficiency, it is better to loop reading all
available input messages with memo_get_skip, rather than
delaying the object after processing each one.) Ordered queues.The
simplest implementation of an ordered queue is an array
of futures. For example, the producer can write memos
into folders thus: Barriers. In parallelizing a loop, it is often important to synchronize the processes executing the loop at one or more points within it using a barrier. If a barrier is initialized with a count N, then N processes must wait at the barrier before any of them are allowed to proceed. The barrier is automatically reinitialized so that they can synchronize over and over again at the same barrier. In Memo, a barrier can be implemented with two folders and a single memo containing an integer count. The presence of the memo in a folder allows the processes to proceed past the barrier. The processes alternate which folder they look for the memo in. When a process comes to a barrier, it gets the memo and decrements the count in it. If the count is greater than zero, this is not the last process to arrive, so it puts the memo back and tries to get a copy of the memo in the other folder, which is not present yet. If the process decrements the count to zero, then this is the last process to arrive. It puts a memo with the full count of the number of processes to synchronize into the other folder and continues executing. Code for this follows: FOLDER_NAME barrier[2] =
{{BARR0,0,0,0},{BARR1,0,0,0}};
|
|
| 4.5 | Implementations |
| Memo has been implemented in C in several versions on
an Encore shared-memory multiprocessor using the Encore's
parallel programming library and using a package that
simulates conventional distributed memory parallel
programming. The two shared memory versions both allocate memos in shared memory and both use a hash table to look up folders. One version locks the entire memo package to achieve mutual exclusion; the other locks individual hash table buckets. A memo_get is handled by linking a "get structure" in a folder telling which process to deliver the memo to. Memo_get will wait for one to be delivered. Memo_get_skip will check to see if a memo was found while the get structure was being installed. Memo_alt is handled like memo_get: it simply installs get structures in the listed folders. Memo_alt_skip and memo_strict_alt_skip are identical; they install get structures in all named folders and check if a memo was latched during the installation. All the three distributed memory versions have used servers to handle the directory of folders. The simplest distributed implementation uses a single server; the Memo procedures are replaced with stubs that pass messages to the server to invoke the operations. Two distributed memory implementations use multiple servers. Folder names are hashed to determine which server is handling the folder. For both, the implementation of the varieties of memo_alt is difficult. For memo_alt itself, messages are sent to all servers containing any of the folders listed. Servers that find memos must latch one of the memos and respond. It must not remove the memo from its folder unless its memo is the one chosen - other servers can also have memos. A latched memo may delay other get_skip operations: if it is not chosen, the memo will be logically present to a get_skip or alt_skip, even though it cannot be immediately returned. The implementation of memo_strict_alt_skip is
particularly problematic; before it can return NULL, the
system has to guarantee that there is an instant of time
across the entire machine during which no memos are
present in any of the folders. In one implementation, the
process executing the strict_alt_skip potentially
communicates with the servers three times: the first time
to install the get structures; the second time, after the
servers respond, to remove the get structures; and the
third time to accept one latched memo and unlatch the
others. (It can communicate only twice if the first
communication finds and latches a memo immediately.) |
|
| 4.6 | DISCUSSION |
| 4.6.1 | Comparison with Linda |
| Linda is claimed to be an utterly new programming
model. In contrast, Memo cannot claim to be utterly new,
since both directories and queues have a long history and
have been known to be useful in operating systems and
parallel programming. However, Memo casts doubt on the
claims for Linda's originality. The vast majority of the published Linda algorithms are simple directory-of-queues algorithms. In most Linda algorithms, the first several fields of a tuple, which we will call the key fields, are specified explicitly in both out and in operations. The remaining fields, the entry fields, are all given values by an out and are all assigned to variables by an in. Hence the key fields correspond to the name of a queue, and the entry fields correspond to the structure written into it. Translated into Memo, out corresponds to memo_write; in, to memo_get; and rd, to memo_get_copy. There are a few other usages of Linda that require slightly more work than simple transliteration. For the same keys, there may be several different numbers and types of non-key fields. Linda provides further matching on their number and types. This can easily be encoded in Memo by using multiple folders, or by writing and reading discriminated unions. The realities of Linda usage must have been apparent to the implementers of Kernel Linda [LELE90], since they provided tuple spaces called dictionaries and limited tuples to a single key field and a single entry field. The existence of tuples containing variables has no direct equivalent in Memo, but the examples of their use are infrequent. One published use [GELE85] is to provide an alternate, asynchronous write: server processes wait for tuples specifying their individual names. A request for service by a particular server specifies the server's name. A request for service by any server specifies a variable in the name field, so that any server can pick it up. It is probably much cleaner to have one queue for all servers in addition to an individual queue for each and to have the servers wait for commands with a memo_alt. Linda lacks a read with alternatives, so this solution is unavailable to it. Linda does try to integrate process creation and termination by eval creating active tuples. The ability to create many processes does make the lack of analogues to memo_alt and memo_write_delayed less serious. However, eval has been difficult to implement in portable forms, and the many small processes it creates are expensive to run, especially on RISC processors. The assumption in Memo is that one either will write data-parallel programs or will simulate macro-dataflow, reactive objects, or multiple small tasks using memos for state vectors and scheduling via job jars. In any case, a limited number of large processes is adequate. The vast bulk of Linda research shows the usefulness of directories and queues. Linda implementations do show that unordered queues can be as useful as strictly FIFO queues and are considerably easier to implement on distributed-memory machines. The most original aspect of Linda, tuple space, appears to be an obfuscation; Linda research does not seem to support the idea that tuple space has many uses beyond its ability to simulate directories of queues. Memo offers a number of advances over Linda (in addition to clarifying the abstraction). The memo_alt function allows the implementation of an analogue of a guarded input command. The memo_write_delayed procedure facilitates dataflow and reactive object programming. |
|
| 4.6.2 | Future Work |
| There are a number of trade-offs to be considered in
defining the semantics of Memo. Should memo_write be
allowed to return before the memo is in the folder, or
not? An early return would be more efficient on a
distributed memory system, but would that hinder
reasoning about program correctness? Should the
memo_strict_alt_skip exist? What are its implications for
reasoning about programs? And what fairness policy should
memo_alt use? On shared memory and single server
distributed memory computers, it uses a strict priority:
it returns the first memo it encounters while processing
the list of folder names in order. Should memo_alt be
defined to choose a memo from the first non-empty folder?
Should it be defined to choose randomly or round-robin?
Should more routines be implemented incorporating the
different options? We are currently implementing macro-dataflow and reactive object algorithms in Memo to compare its performance to other systems (including MDC90, our distributed-memory/pattern-driven control based language). More programming experience is needed to find the best collection of operations for Memo. |
|
| 4.7 | Conclusion |
| Memo is a parallel programming system based on
communication through a shared directory of unordered
queues. It makes possible a variety of shared data
structures, including named objects, arrays of objects,
locks and semaphores, unordered and ordered queues, job
jars, futures, I-structures, and barriers. It facilitates
macro-dataflow and reactive object programming. We compared Memo to Linda and suggested that Linda is successful not because of its tuple space abstraction, but rather that tuple space is valuable primarily because it allows the simulation of a directory of queues. |
|
| 4.8 | ACKNOWLEDGEMENTS |
| We gratefully acknowledge the Advanced Computing Research Facility, Mathematics and Computer Science Division, Argonne National Laboratory and the Pittsburgh Supercomputing Center for supporting our efforts to port Memo to other platforms by their provision of computing facilities. | |
| 4.9 | References |
|
Copyright © 1994. Thomas W. Christopher