![]() |
Termination Detection with The Overlapping-Intervals / Charged Systems Metaphor In Systems With Asynchronous Message Passing |
Copyright ©
1994. Thomas W. Christopher
| 5.1 | Abstract | |
| The overlapping interval/charged system metaphor
distributed termination detection technique is presented.
This technique is applicable to systems with a fixed
number of computational locations and asynchronous
message passing. It is experimentally compared to the
conventional technique of maintaining a tree of active
locations. Keywords: distributed termination detection, asynchronous message passing, Bell-man-Ford algorithm. |
||
| 5.2 | INTRODUCTION | |
| One of the problems with distributed algorithms is
simply determining when the algorithm has terminated.
Distributed termination detection algorithms have
received considerable attention [DIJK80] [DIJK83]
[FRAN80] [FRAN82] [APT84] [TOPO84] [APT86], typically
with synchronous message passing of the CSP style.
Asynchronous message passing complicates termination
detection, since messages cannot be detected while in
transit, but the computation has terminated only when
there are no further such messages. Our work with Message
Driven Computing [CHRI92a] necessitated our investigation
of termination detection with asynchronous message
passing. We were inspired by the weighted throw count
scheme of [ROKU88], although our algorithm differs from
theirs. In this paper we investigate a pair of termination
detection algorithms based on overlapping
intervals to determine the global state of the
system at some instant of time and the charged system metaphor to
determine that no messages are in transit. We prove the
cor- |
||
| 5.3 | ALGORITHMS | |
| 5.3.1 | Active Tree Algorithm | |
| The conventional active tree termination detection
algorithm [DIJK80] works as follows: A location may be
either active or inactive. A location is initially
inactive. It becomes active upon receipt of a task
message, which will be called the critical message. An
active location maintains a count of the number of tasks
it has sent which have not yet been acknowledged. While
active, a location immediately acknowledges task
messages, increments its count for each task message it
sends, and decrements the count for each acknowledgement
it receives. When it has received acknowledgements for
all the task messages it has sent, a location becomes
inactive and acknowledges the critical message. The active locations from a tree. Each directed edge leads from one active location to the location that sent the critical message that made it active. All paths lead to the root. An edge cannot lead to an inactive location, since that location cannot become inactive until the critical message is acknowledged, which will not happen until the source of the edge becomes inactive, dropping out of the tree, and implicitly removing the edge. The computation has terminated when the root becomes inactive. |
||
| 5.3.2 | Ordering of computational events | |
| Before presenting the overlapping
intervals/charged-system metaphor termination detection
algorithm, we need to present a model of distributed
computation in terms of computational events connected by
message passing. A computational event is an execution of
a block of code which begins and ends with sending or
receiving messages (or the initiation or termination of
the task). Message transmission imposes a partial order on the events in an execution of a distributed algorithm. Let
|
||
| 5.3.3 | Overlapping Intervals | |
| We will use the term location to
refer to an object or process that computes in response
to asynchronous messages. We will deal only with
algorithms in which the computations occur at a fixed
number of locations. Let the n locations in the system be
numbered from 1 to n: L1, L2, ...,
Ln. To determine whether an execution of the algorithm has terminated, probe messages are sent throughout the system. The arrival of a probe message at a location will cause a probe event. Let Pik be the kth probe event at location i. We would like to probe all locations simultaneously to determine the global state of the system, but unfortunately we are unable to. What we can and will do is determine the state of the system over an interval. The kth
interval at location i is the time between probe Pi(k-1)
and Pik. (Pi0 is defined as the
beginning of the execution at location i.) Probe tree. In the tree
method [FRAN82] [TOPO84], one location Lr is
selected as the root, a set of locations, I, are the
internal nodes, and the remaining locations form the set
of leaf nodes, F. If there is more than one location,
then Lr is in set I. We shall assume this
case, since a single location system does not need
elaborate termination detection. Query messages travel down the tree from root to
leaves, and probes travel back up the tree from leaves to
the root. Query messages are sent by Q events. Let Qik be
the kth Q event at location i. Note: |
||
| 5.3.4 | Charged system metaphor | |
| The computational events relevant to the algorithm
(i.e. the events other than those only used for
termination detection) will be called actions. Let Aijk
be the jth action at location i in the kth
interval. Actions are triggered by the arrival of one or
more task messages and may
in turn send zero or more task messages. When there are
no more task messages in transit, no further actions can
occur, and the algorithm has terminated. Unfortunately,
the number of messages in transit cannot be directly
examined. We will detect the number of task messages in transit reaching zero by using a charged system metaphor. Each location Li keeps a count ci which records the current charge of Li. Each task message carries a charge of -1. Each task message sent increases the charge of its sender by one; each task message received decreases the charge of its receiver. Action Aijk which is triggered by the receipt of g>0 task messages and which sends h >= 0 task messages will add h-g to the charge of location Li. Assuming that when the algorithm begins executing the charges in the locations sum up to the initial number of task messages, the sum of the charges in a linearization of the execution will always equal the number of task messages in transit. |
||
| 5.3.5 | Overlapping Interval/charged system metaphor | |
| The overlapping interval/charged system metaphor
termination detection algorithm works as follows: Probe messages are used as described above to create a sequence of overlapping intervals. The probes accumulate the sum of the charges and the logical or of the busy bits across all locations. The system detects that the computation has terminated if at the end of some set of overlapping intervals, the sum of the charges is zero and the or of the busy bits is false. The proof is essentially this:
This proof is at a rather high level. For a more
detailed proof, it is advantageous to phrase the proof in
terms of linearizations of the computational events. Theorem :
The algorithm has terminated iff |
||
| Proof | ||
|
||
| 5.4 | TEST METHODS | |
| We compared the overlapping interval/charged-system metaphor termination detection algorithm to the conventional active tree algorithm. For our experiments, we used the Bellman-Ford algorithm for the single source/shortest path problem. | ||
| Test problem | ||
| Single-source,
shortest path problem. A
network is a directed graph where each edge has a
non-negative length. The length of a directed path is the
sum of the lengths of the edges in the path. The single
source shortest path problem associates a distance Dv
with each vertex v where Dv
is the minimum length of any directed path from the start
vertex to v. The Bellman-Ford Algorithm. The asynchronous Bellman-Ford algorithm works as follows: messages are sent along edges of the graph giving the length of a path from the start vertex that ends with that edge. All vertices have an initial distance of infinity, except for the start vertex which has distance zero. When a vertex receives a message specifying a lower distance than it is already assigned, the vertex assigns itself that distance and sends messages along each of its outgoing edges containing its new distance plus the length of the edge. The algorithm begins with the start vertex sending messages along its outgoing edges giving the length of the edge. In our experiments, we use the two following networks: Chain+Random. The "chain+random" graph has a chain of all the vertices in numeric order. In addition, each vertex except the last has two extra edges leading from it to other vertices chosen at random. The final vertex has three randomly chosen edges. There are no duplicate edges. The lengths of the edges are uniformly distributed random integers in the range [0..9]. X-ladder. The x-ladder graph is shown in Figure 1. There are even number, N, of vertices numbered 0 ... N-1. |
||
| The edges are | ||
As shown in [CHRI92b], the X-ladder graph
with random edge lengths yields an number of distance
messages in the Bellman-Ford algorithm exponential in the
number of vertices.
Experiments. In our experiments, we ran each algorithm 32 times for a given number of vertices to permit the use of large sample statistics. The lengths assigned to the edges were assigned using a random number generator seeded from a different random number generator. No two runs in the same experiment used the same seed. The order of runs was randomized to avoid being influenced by secular trends (e.g. other load on the machine). Hardware system. The experiments were run on an Encore Multimax computer with 20 processors at a time of low utilization. To implement MDC90's distributed memory model, processes on the Encore simulate the nodes on a multicomputer: an MDC90 location is assigned to a particular process. Messages, however, are allocated in shared memory and passed among the processes through linked queues, not by copying. |
||
| 5.5 | RESULTS | |
| We would expect the chain+random networks
to have relatively shallow active trees and relatively
few task messages sent. Hence these networks should
terminate the Bellman-Ford algorithm quickly. The
X-ladder networks would have deep active trees and an
exponential number of task messages. The cost of termination detection in Active Tree algorithm is proportional to the number of task messages sent since there is one acknowledgement message per task message. Hence we would expect the Active Tree algorithm to be more efficient for the Chain+random networks than the X-ladder. The cyclic probe algorithm puts the least demand of the system while the underlying algorithm is running: at any one time there is at most one probe message waiting to be processed. The major problem with the cyclic probe technique is that it visits the locations sequentially and the probe message must make two cycles after the underlying algorithm has terminated before detecting the termination. Worse yet, our cyclic probe visits all locations in numeric order. Since location i is allocated to process (i modulus P) for P processes, the probe cycles through the processes round robin. Since there are no other activities keeping the processes busy, each probe transmission during the last cycles wakes up a process that is waiting on a semaphore. Our probe tree algorithm uses a multiway tree. The probe intervals run reasonably quickly, but that means that the system has a high density of query and probe messages to process, which takes processing power away from the underlying algorithm. The probe tree algorithm could potentially be the worst of the three by swamping the system with termination detection messages. Figures 2 and 3 show Chain+Random networks of a range of sizes both with four and with ten processes. The cyclic probe algorithm is uniformly the worst, probably due to the high cost of the final two cycles waking a process for each location visited. The Active Tree algorithm is better with four processes; the Probe Tree with ten. It is unclear why, but it is possible that having extra processors available to process the probe messages speeds up the Probe Tree. Figure 4 shows the algorithms under heavier load. The X-ladder graph has an exponential growth in the number of task messages sent, so fewer total vertices were tested to save computing time. Here the Active Tree algorithm was the worst, due to the costs of termination growing proportionally to the number of task messages. The Probe Tree algorithm is the best and appears to be diverging from the Cyclic Probe algorithm as in the earlier figures. Figures 5 and 6 hold the number of vertices constant while varying the number of processors. Figure 5 tests Chain+Random networks composed of 500 vertices. Figure 6 tests the X-ladder network with 150 vertices. Folklore says that the best number of processors to use on a shared-memory machine is about half the total available, so the results for around ten processors are likely to be of most interest. Probably the cyclic+random graph computation (Figure 5) terminates very quickly causing the cyclic probe to show the worst behavior: the probe message must move from location to location, all of which are on separate processes, waking each of them from a semaphore wait. However, that fact alone would argue that the costs across varying numbers of processes should be nearly constant with a slight decline for the underlying algorithm terminating more quickly as more processors are applied. The rising cost is unexpected. The rising cost may be due to a strain placed on underlying UNIX scheduling algorithms by a strictly round-robin activation of processes (due to page replacement, perhaps). In Figure 6, the active tree algorithm is significantly worse than the other techniques. The acknowledgement messages equal the number of distance messages sent, and since the number of distance messages are exponential in the number of nodes, the number of probe messages that have to be processed are far overshadowed by the number of acknowledgements. |
||
| 5.6 | DISCUSSION | |
| We have presented the overlapping
interval/charged system metaphor technique for detecting
termination of distributed computations that take place
at a fixed collection of locations. We have compared this
technique to the conventional "active tree"
technique using the Bellman-Ford single-source, shortest
path algorithm. We find the overlapping interval/charged
system metaphor technique is slower for some underlying
networks and faster for others. The time taken to detect
termination in the active tree technique is proportional
to the number of task messages sent, but the time taken
in the overlapping interval/charged system metaphor
technique is independent of the number of task messages
sent. Probe trees were faster than cyclic probes for creating sets of overlapping intervals, but cyclic probes may not have been treated fairly; further experiments should be run with the probe visiting all locations in the same process before going on to (and waking) the next process. |
||
| 5.7 | ACKNOWLEDGEMENTS | |
| We ran these experiments at the Advanced Computing Research Facility, Mathematics and Computer Science Division, Argonne National Laboratory, whose provision of computing facilities we gratefully acknowledge. | ||
| 5.8 | REFERENCES | |
|
||
Copyright © 1994. Thomas W. Christopher