![]() |
The MEMO System : A Shared Directory of Unordered Queues |
| 4.1 | Abstract |
| The Memo System provides synchronization and communication facilities to parallel
processes. Processes communicate through a shared directory of unordered queues.
Memo is shown to provide a wide variety of shared data structures -- including arrays, I-structures,
barriers, locks, and semaphores -- and programming techniques -- including
job jars, dataflow, and reactive objects. In comparing Memo to the Linda programming model, it is shown that almost all Linda algorithms use tuple space to simulate directories of queues, which is provided directly in Memo. Keywords : directory of queues, parallel software, shared data structures, Linda. |
|
| 4.2 | Introduction |
| Many conventional distributed-memory programming systems allocate one process to
each node of a multicomputer. The nodes send messages directly to each other. A problem
with such systems is that data structures are not global, but rather are localized in
each process. One of the most common programming techniques is to manipulate a data
structure, but when trying to use that technique on most distributed-memory systems,
the data structure must be partitioned and the parts hidden in a fixed number of processes,
which creates an artificiality and complexity in parallel programs.
A candidate for a more natural parallel programming system is Linda [AHUJ86]
[CARR89a] [CARR89b] [GELE85] [LELE90] which provides a shared associative
memory, the tuple space. A tuple is a sequence of fields, each of which has a type and
contains a value or a variable. Tuples are written into the tuple space with an out operation, are removed with an in, and are read without being removed with a rd. For an in or rd, the tuple accessed in tuple space must match the tuple provided with the command. The number and types of fields must be identical. A value must match an identical value. A variable in either must match a value in the other. A variable will not match a variable. The in or rd will block until there is a matching tuple in tuple space. Recent implementations of Linda have provided non-blocking versions or rd and in, rdp and inp, which return a Boolean indication of success. Linda also has the command eval, which will start a process, an "active tuple," which will return its value by becoming a normal tuple. Various implementations of Linda have restricted the format of tuples, or the implementation of eval, or have provided multiple tuple spaces. The tuple space does allow data structures to be shared among processes, and the designers of Linda are quick to point out the advantages of being able to design programs around the shared data structures. Still, a question persists: why does tuple space suddenly become a natural structure for parallel programming when there was no demand for it in sequential programs? An examination of the published Linda fragments suggests an explanation: in almost all cases the tuple space is being used as a shared, flat directory of queues. Since both directories and queues have long been known to be useful in multiprogramming and multitasking systems and since they can so easily be implemented in tuple space, it is no surprise that Linda is successful. But this does suggest that a system built directly on a shared directory of queues might work as well as or better than Linda and that clarity on the data structures actually being used might suggest useful facilities that tuple space would not suggest. The Memo System (Memo) was designed and implemented to investigate the possibilities for a parallel programming system built around a shared directory of queues. We describe Memo and show some data structures and parallel programming techniques it makes possible, including named objects, arrays of objects, locks and semaphores, unordered and ordered queues, job jars, futures, and barriers. We compare Memo to Linda and suggest that Linda is successful not because of its tuple space abstraction, but rather that tuple space is valuable primarily because it allows the simulation of a directory of queues. |
|
| 4.3 | Memo facilities |
| The Memo system provides communication among parallel processes through a shared directory of unordered queues (or a shared table of bags, if you prefer). In Memo, the queues are called folders and appear only in the directory associated with names. In an early version of the Memo system, the names of folders were character strings. In the version presented here, a folder names consists of a "symbol" and three integers, allowing the simulation of up-to-three-dimensional arrays of folders. Fixed-sized names speed up accesses. | |
| 4.3.1 | Types and folder naming |
There are three new data types provided by the Memo system:
The type MEMO is a pointer to a block of dynamic storage. The S member represents the name of the data structure that this folder is part of, e.g. the name of an array. For an array, the X member contains the indices. The system includes a definition for NUM_X in the header file. We provide NUM_X equal to three, but it can be changed if the application demands it.
SYMBOL memo_symbol() |
|
| 4.3.2 | Basic Procedures |
| The central operations of Memo place memos into and remove memos from folders.
void memo_write(key,block_addr,len) MEMO memo_get(key) MEMO memo_get_copy(key) MEMO memo_get_skip(key) MEMO memo_get_copy_skip(key) void memo_free(memo) int memo_len(memo)
Memo_write puts a memo into a folder named by the key. If a folder by that name does
not already exist, one is created. The contents of the memo are the len bytes beginning
at address block_addr.
Memo_free frees the dynamic storage that a memo occupies.
|
|
| 4.3.3 | Alternative gets |
|
MEMO memo_alt(key_list,len_key_list,one_found) MEMO memo_alt_skip(key_list,len_key_list,one_found) MEMO memo_strict_alt_skip(key_list,len_key_list,one_found)
Memo_alt is a version of memo_get that returns a memo taken from one of a list of
alternative folders. Parameter key_list points to an array keys, the names of the folders
to be examined. The length of the list of keys is len_key_list. As with memo_get,
memo_alt blocks until a memo can be returned. Memo_strict_alt_skip is a variant of memo_alt_skip. The distinction is this: alt_skip will not return NULL if one of the folders named has a a memo in it from the time alt_skip is called until it returns; that is, alt_skip is required to look in each folder before returning NULL, and if one of the folders has a memo in it when alt_skip looks, alt_skip is not allowed to return NULL. Memo_strict_alt_skip will not return NULL unless there is some instant of time between its call and return when none of the folders contains a memo; that is, strict_alt_skip behaves as if it looks in all folders simultaneously. |
|
| 4.3.4 | Delayed write |
| void memo_write_delayed(key1,key2,block_addr,len) Memo_write_delayed |
|
| 4.3.5 | Unsafe procedures |
|
MEMO memo_alloc(len) void memo_put(key,memo) void memo_put_delayed(key1,key2,memo) void memo_replace(memo) FOLDER_NAME *memo_key(memo)
These procedures will link a memo into the system queues that a user program retains a
pointer to. In shared memory implementations, the system may deliver the memo to
another user. These routines are therefore not safe - the sender can easily cause mischief
to the system or the receiver; they are provided to increase efficiency.
Memo_key returns a pointer to the name of the folder a memo was removed or copied
from by memo_get... or memo_alt.... This will be a field
of the memo's header; the pointer should not be used after the memo has been freed, replaced, or put.
Assigning indirectly through this pointer may make memo_replace malfunction. |
|
| 4.3.6 | Process management |
| The actual procedures used to create parallel processes are not central to the
concept of the memo system. In our current implementations we assume a limited
number of processes will be created at the beginning of the program execution and
will continue in existence until the end. We provide two functions to accomplish this:
memo_init and memo_exec.
void memo_init() void memo_exec(num_workers,argc,argv) int memo_get_my_id() int memo_num_workers() Memo_exec forks num_workers+1 processes. The main process, which executed this procedure, is assigned the number 0 and executes the procedure "boss(argc,argv)". The worker processes are assigned the numbers 1 through num_workers and execute procedure "worker()". Memo_exec only returns when the boss and all worker subroutines have returned. Memo_get_my_id returns the number assigned the process by the memo system. Memo_num_workers returns the total number of workers started by memo_init() |
|
| 4.4 | Data Structures and Programming Techniques |
| Memo facilitates the creation of shared data and control structures. The following is a
list of some of the most useful:
Named Objects. A folder that holds at most one memo can represent a dynamically allocated object on a heap. Instead of pointers to the objects, we use folder names.
Arrays. Arrays of shared objects may be created similarly. The element a[i,j] can be
stored in a folder whose name, key, is constructed thus:
FOLDER_NAME key;
Locked data structures. Shared records are accessed by getting them from
their folders, examining and updating them, and putting them back. While the record is being
updated, its folder is empty, and any other process trying to access it will have to wait;
the records are implicitly locked. The following exemplifies operations on shared records: Locks. Naturally, an explicit lock can simply be represented by an empty memo in a folder.
FOLDER_NAME lock;
Semaphores. The simplest implementation of a counting semaphore is identical to a
lock, except that to initialize the semaphore, a process places as many empty memos in the semaphore's folder as
initial count.
Job Jar. An important use of an unordered queue is as a job jar. The memos in the job
jar indicate tasks to perform. Whenever a process requires more work to do, it pulls a
memo out of the job jar. Whenever a process creates more work to do, it drops memos in
the job jar. It is often convenient to have one job jar for each process and one common
jar for all. The individual job jars are used for operations that must be performed by a
particular process, e.g. file I/O - a file is typically open in only one particular process.
Futures and I-Structures. A future is an assign-once variable used to communicate
between a producer (typically a subroutine) and a consumer (its caller). Both the producer and consumer may
run in parallel, with the consumer only being delayed if it attempts to fetch from the variable before it has
been assigned. An I-structure (an "incremental structure") is a collection (e.g. an array) of futures.
I-structures were invented for dataflow hardware [ARVI80].
Dataflow. Dataflow programming triggers execution of code when its operands become
available [ACKE82] [ARVI86] [DAVI82] [DENN80] [TREL82] [VEEN86]. The Memo system facilitates dataflow
programming by providing the memo_write_delayed procedure. Assume the operands are futures. One simply
arranges to have an operation dropped into a job jar when an operand memo arrives in a folder, thus:
If the operation requires more than one operand, then the operation can poll for each of
the operands (e.g. by memo_present) and delay itself in absent operands' folders until all
are available.
Reactive Objects. A reactive object is an object that executes only in response to messages
it receives. Reactive objects are central to the Actors model of parallel computation
[AGHA86] [ATHA86] [ATHA87] [ATHA88] [BODE88]. A reactive object can be implemented with a job jar
folder plus one input folder per object. A memo containing the object is delayed on its input folder.
When an input memo arrives, the object is placed in the job jar. Eventually some process fetches the
object from the job jar and executes its code, which reads an input message and responds to it.
(For efficiency, it is better to loop reading all available input messages with memo_get_skip,
rather than delaying the object after processing each one.)
Ordered queues.The simplest implementation of an ordered queue is an array of
futures. For example, the producer can write memos into folders thus:
Barriers. In parallelizing a loop, it is often important to synchronize the
processes executing the loop at one or more points within it using a barrier. If a barrier is initialized
with a count N, then N processes must wait at the barrier before any of them are allowed to proceed.
The barrier is automatically reinitialized so that they can synchronize over and over again at
the same barrier. In Memo, a barrier can be implemented with two folders and a single memo containing
an integer count. The presence of the memo in a folder allows the processes to proceed past the barrier. |
|
| 4.5 | Implementations |
| Memo has been implemented in C in several versions on an Encore shared-memory
multiprocessor using the Encore's parallel programming library and using a package
that simulates conventional distributed memory parallel programming. The two shared memory versions both allocate memos in shared memory and both use a hash table to look up folders. One version locks the entire memo package to achieve mutual exclusion; the other locks individual hash table buckets. A memo_get is handled by linking a "get structure" in a folder telling which process to deliver the memo to. Memo_get will wait for one to be delivered. Memo_get_skip will check to see if a memo was found while the get structure was being installed. Memo_alt is handled like memo_get: it simply installs get structures in the listed folders. Memo_alt_skip and memo_strict_alt_skip are identical; they install get structures in all named folders and check if a memo was latched during the installation.
All the three distributed memory versions have used servers to handle the directory of
folders. The simplest distributed implementation uses a single server; the Memo procedures
are replaced with stubs that pass messages to the server to invoke the operations. |
|
| 4.6 | DISCUSSION |
| 4.6.1 | Comparison with Linda |
| Linda is claimed to be an utterly new programming model. In contrast, Memo cannot
claim to be utterly new, since both directories and queues have a long history and have
been known to be useful in operating systems and parallel programming. However,
Memo casts doubt on the claims for Linda's originality. The vast majority of the published Linda algorithms are simple directory-of-queues algorithms. In most Linda algorithms, the first several fields of a tuple, which we will call the key fields, are specified explicitly in both out and in operations. The remaining fields, the entry fields, are all given values by an out and are all assigned to variables by an in. Hence the key fields correspond to the name of a queue, and the entry fields correspond to the structure written into it. Translated into Memo, out corresponds to memo_write; in, to memo_get; and rd, to memo_get_copy. There are a few other usages of Linda that require slightly more work than simple transliteration. For the same keys, there may be several different numbers and types of non-key fields. Linda provides further matching on their number and types. This can easily be encoded in Memo by using multiple folders, or by writing and reading discriminated unions.
The realities of Linda usage must have been apparent to the implementers of Kernel
Linda [LELE90], since they provided tuple spaces called dictionaries and limited tuples
to a single key field and a single entry field. |
|
| 4.6.2 | Future Work |
| There are a number of trade-offs to be considered in defining the semantics of Memo.
Should memo_write be allowed to return before the memo is in the folder, or not? An
early return would be more efficient on a distributed memory system, but would that
hinder reasoning about program correctness? Should the memo_strict_alt_skip exist?
What are its implications for reasoning about programs? And what fairness policy
should memo_alt use? On shared memory and single server distributed memory computers,
it uses a strict priority: it returns the first memo it encounters while processing
the list of folder names in order. Should memo_alt be defined to choose a memo from
the first non-empty folder? Should it be defined to choose randomly or round-robin?
Should more routines be implemented incorporating the different options? We are currently implementing macro-dataflow and reactive object algorithms in Memo to compare its performance to other systems (including MDC90, our distributed-memory/pattern-driven control based language). More programming experience is needed to find the best collection of operations for Memo. |
|
| 4.7 | Conclusion |
| Memo is a parallel programming system based on communication through a shared
directory of unordered queues. It makes possible a variety of shared data structures,
including named objects, arrays of objects, locks and semaphores, unordered and
ordered queues, job jars, futures, I-structures, and barriers. It facilitates macro-dataflow
and reactive object programming. We compared Memo to Linda and suggested that Linda is successful not because of its tuple space abstraction, but rather that tuple space is valuable primarily because it allows the simulation of a directory of queues. |
|
| 4.8 | ACKNOWLEDGEMENTS |
| We gratefully acknowledge the Advanced Computing Research Facility, Mathematics and Computer Science Division, Argonne National Laboratory and the Pittsburgh Supercomputing Center for supporting our efforts to port Memo to other platforms by their provision of computing facilities. | |
| 4.9 | References |
|