| |
We define Message Driven Computation as a style of
computing with two characteristics: 1) Messages, not
sequential processes, convey both control and data.
Indeed, there are no sequential processes unless they are
programmed explicitly in terms of messages. 2)
Computation is invoked by the presence of a collection of
messages at the same location.
We have embodied MDC in two languages, extensions of Icon
and of C. MDC is designed to be coherent with the global
structure of multicomputers and hence forms a basis for
general purpose multicomputing. MDC may also be viewed as
a generalization of Actors [AGHA86].
Our development of Message Driven Computation has three
main goals: 1) to be coherent with the structure of
multicomputers, 2) to be efficient, 3) to be useful for
general purpose computation. Here we introduce the
features of Message Driven Computation and relate them to
the features of multicomputers, to Actors, to other
parallel programming systems, and to our goals.
It should be understood from the outset that our current
version of MDC is nearly at the assembly language level.
It is not much easier to use than any other low level
language.
We base the design of MDC on the structure of
multicomputers. Consider the differences between von
Neumann computers and multicomputers at their global
level. |
| |
In a von Neumann computer, processors and memories
are separate and connected through a switch, which forms
a bottleneck. In a multicomputer, processors and memories
are combined at nodes, which are interconnected, in most
designs, without bottlenecks.
In a von Neumann computer, memory locations are small. In
a multicomputer, the node memories are large and can
store large, structured objects.
In a von Neumann computer, addressing logic is simple and
memory locations are addressed by integer. In a
multicomputer, node memories are not directly addressable
from outside, and accesses to non-local data must use the
processor of the node on which the data resides. Since a
full processor will be used anyway, creation of some form
of an associative memory is trivial.
In a von Neumann computer, a process contains data in its
local memory and executes instructions to control the
transformation of the data. In a multicomputer, messages
carry data between the nodes and the arrival of messages
trigger the execution of instructions and the
transformation of data. We hypothesize that messages in a
multicomputer are the equivalent of processes on a von
Neumann computer.
| Von
Neumann |
Multicomputer |
| P-S-M (Processors connected to memories
through a switch |
C-C(Full computers at nodes) |
| small memory locations |
large memories on nodes |
| integer memory addresses |
associative memories as easily as linear
address spaces |
| sequential processes encapsulate control and
data |
hypothesis: Messages encapsulate control and
data |
Considering the global structure of
multicomputers, we have designed MDC with the following
features:
Loci : Since on a
multicomputer processing and storage are performed at the
same locations, we provide loci (singular locus),
locations where data are stored, and computations are
performed.
Portals : Each locus is a
collection of labeled message queues, called portals.
Each message at a locus is in one queue, at one portal.
Messages are not sent to loci directly, but to portals:
the destination locus and the portal label must both be
specified.
Message Driven Computations :
On multicomputers, messages encapsulate data and
control and thus stand in the place of processes. In MDC,
messages contain all data and
messages cause all computation. There is no storage
except at loci and no storage at a
locus except in the messages waiting there. Messages may
contain an arbitrary number
of fields and the fields may be tuples.
Behaviors : Computation is
specified by behavior declarations * which
consist of a pattern and a procedure body (the behavior
body). The pattern specifies a pattern of messages. When
a collection of messages at a locus match the pattern,
they may be removed and the behavior body executed
("triggered") with the fields of the messages
as its parameters. The body may send messages to portals
at other loci or leave messages at the current locus.
Pattern matching : There are
currently two forms of pattern that may trigger
behaviors. They are sufficient to provide control flow,
data flow, and demand driven models of computation
[TREL82a][TREL82b]. One form of behavior pattern may
specify a certain number of messages are to be removed
from each of several portals. When each of the portals
has the required number of messages, the behavior may be
triggered. This kind of pattern permits data-flow and
control-flow computation models. Another form of pattern
specifies that if the first message to arrive at a locus
comes to a specified portal, the behavior is to be
triggered. We use this for demand-driven models and for
certain dynamic programming algorithms, where the first
demand for a value starts its computation and the
subsequent demands simply wait.
Data types: Locus and portal
are data types, which permits them to be passed in
messages and manipulated easily. Two examples, when a
message corresponding to a procedure call is sent, it
contains a portal-type field giving the continuation to
which the result is to be sent. In a MDC interpreter for
a data-flow language, a message corresponding to a
data-flow instruction contains portals indicating where
the result token messages are to be sent.
Associative memory : We use
the processing power available in a multicomputer to
create an associative memory of loci. In MDC, the names
of loci are tuples of arbitrary length. The elements of
the tuple may be any primitive data type: integer,
character, string, symbol *,
real. It is required that the first element of the locus
name be a symbol to make it easier to write a garbage
collector - if a symbol can no longer be accessed by any
computation, then all loci that begin with it may be
removed.
A locus will occupy one node of the multicomputer. The
number of the node it occupies is computed from its name
by hashing or by some other user-specified function. The
storage it occupies on that node is found by looking its
name up in a hash table. The name of a locus may be
constructed independently on many computers and its
single exact location computed. Notice that MDC's global
data base differs from that of Linda [AHUJ88]. Linda
maintains a global tuple space and permits a pattern
matching fetch of tuples. In MDC the data base contains
loci which are named by tuples. A locus contains
potentially any number of messages. A message is sent to
a single locus by specifying its exact name so the exact
node on which it is located can be computed. The Linda
and the MDC associative memories have different functions
and uses and are not comparable.
Adjustable granularity : MDC
permits fields of messages to be tuples of information,
small arrays. Many parallel processing language
implementations are fine grain, passing a single data
value in a message. MDC permits the granularity to be
adjusted by passing tuples of values in messages. The
programming problem is to lower the message overhead per
datum adequately while not depriving the system of too
many opportunities for parallelism.
|