.\" expand -8 | grn | eqn | tbl | troff -ms
.EQ
delim @@
.EN
.nr CH
.nr CF
.nr LL 6.5i
.nr PO 1.0i
.nr PI 0.2i
.\" get a BIG section header
.de BH
.SH 
.LG
.sp .1i
\\$1
.NL
.sp .1i
..
.\" Get an -me horizontal line
.de hl
.br
\l'\\n(.lu-\\n(.iu'
.sp
..
.\" Two macros for code display --
.\" Use fixed width font so indents show up right,
.\" decrease point size and vertical spacing by two
.de LS
.br
.DS B \\$1
.\" Need courier font for code display
.ft C
.ps -2
.vs -2
..
.de LE
.vs +2
.ps +2
.ft P
.DE
.br
..
.TL
An Open Environment for Building Parallel Programming Systems
.AU
Brian N. Bershad
Edward D. Lazowska
Henry M. Levy
David B. Wagner
.AI
Department of Computer Science
University of Washington
Seattle, WA  98195
.sp 2
April 1988
.AB
PRESTO is a set of tools for building parallel
programming systems on shared-memory
multiprocessors.  
PRESTO's goal is to provide a
framework within which one can easily
build efficient support for any of a wide variety of
"models" of parallel programming.
PRESTO is
designed for easy modification and extension,
not only at the level of the primitives and structures
made available for the application programmer's use, but also
at the level of the run-time kernel that supports
parallel applications.
PRESTO is implemented in the object-oriented
language C++ on a Sequent Balance 21000 and 
has been used in a number of applications that are
described in this paper.
.AE
.2C
.FS
.hy 0
Our work is supported by the National Science
Foundation (Grants No. CCR-8619663, CCR-8703049,
and CCR-8700106),
the Naval Ocean
Systems Center, U S WEST Advanced Technologies, the
Washington Technology Center, the USENIX Association,
Hewlett-Packard, 
and Digital Equipment Corporation (the
Systems Research Center and the External Research Program).
.hy 1
.\" Next page has a figure
.sp 2i
.FE
.nr HM 5i
.NH 1
Introduction
.PP
Most parallel programming
systems present themselves in terms
of a fixed set of primitives (e.g., send a
message, receive a message, acquire a monitor) running
on top of a closed run-time kernel.
The primitives together with the kernel define a
"model" of parallel programming
that, while pleasing to the implementor, may not
always be satisfactory to the application programmer.
Should incompatibility arise (e.g., due to the demands
of a particular application), the programmer
must either find another system, or build one,
or attempt to conform to the parallel programming model supported
by the available system.
The first option may be impossible since there may not be another
system that runs on the given hardware.  The second option,
while possible, is likely to be
prohibitively expensive.  This leaves only the third
choice \- shoehorning.
.PP
PRESTO addresses this dilemma.
PRESTO is a set of tools for building parallel programming
systems on shared-memory
multiprocessors.  
PRESTO's goal is to provide a
framework within which one can easily
build efficient support for any of a wide variety 
of parallel programming models.
PRESTO has been used to emulate
existing models, and to create new ones.
Among these are
a Mesa-like environment, one providing
ACTOR-like futures, and one for writing
parallel simulations.
The ease with which support for new parallel programming
paradigms can be built using PRESTO allows the
programmer to choose for him or herself the
model of parallel computation that is
most appropriate to a given problem.
(Of course, support for various common models
is intended to be built once and shared.)\ 
Figure 1 illustrates
the relationship between PRESTO,
the various parallel programming systems implemented
in PRESTO (in heavy boxes),
and the various applications implemented
in each of these parallel programming
systems (in dashed boxes).
.\".DS
.\".1C
.\".hl
.\".GS
.\"pointscale on
.\"width 5
.\"height 4
.\"file tree.grn
.\".GE
.\".sz +2
.\".ce
.\"Figure 1 \- PRESTO, Parallel Programming Systems, and Applications
.\".hl
.\".2C
.\".DE
.\" next page has a figure
.PP
PRESTO's "openness" \- its ease
of modification and extension \- applies
not only to the primitives and structures
made available for the application programmer's use,
but also to the lowest levels of the run-time kernel.
Many implementors of new programming systems
in PRESTO would not be concerned
with the low-level details of a multiprocessor environment
such as scheduling, preemption, and processor control:  they
would take what PRESTO gives them in these areas.
However, if the characteristics of a new environment
do require exceptional handling in any of these
low-level areas, changes are possible through exactly
the same mechanisms used to customize PRESTO at
higher levels.
.NH 1
Achieving an Open System through Object-Oriented Design
.PP
A parallel programming environment must deal with
issues such as 
processor control, scheduling, concurrency, and
synchronization.  PRESTO encapsulates each of these
issues inside a default structure having a fixed interface.
Programming systems utilize these structures.
Sometimes the basic structures serve the needs of a
programming system, sometimes the structures need
to be modified or extended in order to be useful,
and sometimes the structures must serve as a base
for other, higher-level structures.
PRESTO allows this degree of customization
by employing
an \fIobject-oriented\fP programming paradigm.
.PP
Objects are recognized as providing
an effective means for structuring
sequential software systems in terms of 
components and interfaces.  An object has a name, private data,
and a set of interfaces that allow other objects
to view and manipulate it.  The interfaces serve to
contain and insulate an object's state, so that its own implementation
is invisible to those who use it.  
.PP
In terms of PRESTO's goals, the most important aspect of
an object-oriented environment is the ability to redefine
an object's behavior.  As long as the object's interface
remains unchanged, other objects in the system need not
be informed of the changes.  This property allows the designer
to modify the behavior of system objects.
.PP
As well as being an ideal vehicle on which to structure an open
system such as PRESTO, objects are also a useful abstraction
for writing parallel programs.
An object can maintain its own internal
parallelism, \fIand\fP control any concurrency imposed upon
it by other objects \&[19].
We use these
points to argue that PRESTO-derived programming
systems should present object-oriented structures to
their (application) programmers.  In most cases, these systems
have, although their have been some exceptions.\**
.FS
Specifically, the Poker \&[14]
\& simulation environment is
programmed in C, a language that is definitely not object-oriented.
.FE
.PP
HYDRA \&[17]
\& was the first system to adopt an
object-oriented view to address the fact that the
design of parallel systems
is as much an art as a science.
Both HYDRA and PRESTO are open systems in recognition of
the fact
that there is no "right" way to build a system
for a parallel machine,
that \fIany\fP system should have a
clear separation between mechanism and policy,
and that strict hierarchical layering of
system components limits
flexibility.   Unfortunately, in
a full-blown operating system such
as HYDRA these principles
must be balanced against real-world issues such as protection,
fairness and reliability.  Consequently, much of the openness
may be compromised.
For example, it is infeasible for an operating system to permit
easy redefinition of the concepts of a processor,
scheduler, lock, or even a thread.
These are the most basic components of an operating system,
and allowing users the freedom to change them could result in chaos.
Indeed, there is no evidence in the 
literature suggesting that these types of objects
were ever redefined in HYDRA.
PRESTO runs on top of existing operating systems,
and provides full flexibility in those areas that are
critical to the construction of parallel applications.
.nr HM 1i
.\" No more pictures
.NH 1
The PRESTO System Structure
.PP
The most important aspect of PRESTO's design is its simplicity,
both in concept and in implementation.  The conceptual simplicity
allows programmers to quickly grasp the functions
that the system \fIdoes\fP provide, while the simplicity of 
the implementation
allows them to introduce extensions without concern for
subtle interactions between components.
.PP
There are five
fundamental PRESTO objects:  the scheduler, the processor,
the thread, the spinlock, and the synchronization object.
From these,
the system
supplies a basic parallel programming system that includes:
.\".DS L
.\"\(bu a preemptive scheduler,
.\"\(bu the ability to create new threads of control,
.\"\(bu busy waiting synchronization based on hardware atomic locks, and
.\"\(bu primitives allowing a thread to deschedule itself and be rescheduled by another thread.
.\".DE
a preemptive scheduler, the ability to create new threads
of control, busy waiting synchronization based on hardware atomic
locks, and primitives allowing a thread to deschedule itself
and be rescheduled  by another thread.
.PP
Threads are
created, destroyed, put to sleep, and awakened.  
A thread is put to sleep by a synchronization object,
which consists
of a queue, a spinlock, and whatever other state is needed
to implement the synchronization object's semantics.  
The spinlock
guards the critical sections that describe a given
synchronization object.
The synchronization objects provided by PRESTO have
no semantics in the sense of \fIP\fP and \fIV\fP [5] 
\& or \fInotify\fP and \fIwait\fP \&[13]
\&.  
PRESTO interprets them merely as objects on which threads
can be blocked, queued and resumed.  The policies governing
when to block are provided by more sophisticated synchronization objects
derived from the basic ones provided by PRESTO.
.PP
The scheduler maintains a pool of runnable
threads.  Threads enter the pool when they become ready,
and processors extract threads
from the pool when they become idle.
The main
body of the processor object supplied by PRESTO does
.DS L
.ft I
forever do
        ask scheduler for the next ready thread
        if a ready thread is available
                request the ready thread to run
        else if there will never be a ready thread again
                quit
.ft R
.DE
When a processor requests that a thread
run, the thread becomes active, using the power of the
requesting processor.
Once active, the thread
is able to execute within any other object.  The thread runs until it is
preempted, goes to sleep on a synchronization object, or terminates.
After any of these actions, the processor object reactivates and 
continues looking for ready threads.  
If a processor idles, finding nothing to do, and all other
processors are idle, the system halts.
Figure 2 illustrates 
the main PRESTO components "in action,"
highlighting the states of threads as
they progress through the system.
.\".DS
.\".hl
.\".GS
.\"pointscale on
.\"width 5
.\"height 4
.\"file presto.parts.grn
.\".GE
.\".sz +2
.\".ce
.\"Figure 2 \- PRESTO Components
.\".hl
.\".DE
.PP
The structure of PRESTO is powerful enough to serve
as the base for any parallel programming environment that must
map threads onto processors.  Its simplicity allows the functions
that define that mapping to be easily extended, either in
isolation or in concert with other functions.
.NH 1
Techniques For Customizing PRESTO
.PP
There are three
basic methods for customizing the PRESTO 
environment:  layered extension, differential
extension,  and lateral extension.
The first is a property of all programming languages, the second of
most object-oriented languages, and the third of open systems
such as PRESTO.
.PP
Layering allows the programmer
to build new primitives through the composition of
existing ones.  There are three problems with
layering.  First, a layered system's performance can quickly degrade
as layers are added.
Second,
it may be difficult to express a new abstraction
in terms of the existing ones.
For example, consider the
distinction between Hoare monitors \&[9]
\& and Mesa monitors \&[13]
\&.  
The stricter semantics of Hoare monitors makes them 
difficult to implement on a multiprocessor, and even
more difficult (and costly) if they must be implemented
on top of less stringent Mesa monitors.
Finally, layering is only useful for expressing the behavior
of something new; existing code cannot be affected.
.PP
Differential extension allows the programmer to exploit
the hierarchical type system of an object-oriented programming
language having inheritance.
New classes may be differentially specified
in terms of existing ones.  This allows the programmer to
construct classes similar to existing ones, while only specifying
the changes in the classes' behavior.  Hierarchical extensions
can be combined with layering, so that new or modified
operations invoke the primitive ones in the more basic class.
Operations that are unchanged are executed directly.
.PP
Lateral extension makes it possible to change the behavior
of an object dynamically (the other extensions are specified
at compile-time) by affecting its relationships with
other objects.  For example, PRESTO provides a global scheduler
object responsible for mapping runnable threads onto idle processors.
The system's processor objects interact with the scheduler object.
By replacing the system's scheduler object with a different
one, the programmer can radically affect the behavior of
the system.  
Lateral extensions such as this allow the programmer
to install changes at any level in the system.
.PP
Layered, differential, and lateral extensions can be combined
to achieve new results with a minimum of coding and effort.
System components can be redefined 
differentially and installed laterally.  When appropriate,
the new
version of an operation can be layered on top of existing ones.
The programmer never changes the basic PRESTO classes, but
instead derives new ones from them.  Instances of these newer
classes are created, and then bound to the names used
by other objects so they can be referenced.  For example, to
force the scheduler to use priority scheduling instead of
the default first-come-first-served, the following would be done:
.DS  L
.ft I
\(bu derive a class ThreadPriorityHeap from 
  the system class ThreadPool
\(bu define the operations get() and put() for a 
  ThreadPriorityHeap
\(bu create a new instance of ThreadPriorityHeap
\(bu inform the scheduler to replace its current 
  thread pool with the new ThreadPriorityHeap
.ft R
.DE
When given a new thread pool, the scheduler moves all threads from the
old pool into the new pool, allowing the scheduling discipline to be
changed dynamically.  For large
parallel applications that compute in phases, this 
flexibility is important.
.NH 1
Exploiting PRESTO's Flexibility
.PP
To add concreteness to the preceding discussion, we will
present three customizations of quite different styles that
have been implemented using PRESTO's open architecture.
The first customization involves a redefinition of one of the system's most
fundamental objects, the thread, in order to gather data about
parallel program performance.  
The second customization involves the construction
of a number of higher-level general parallel programming environments,
each providing its own set of primitives and model of programming.
The third customization involves the
construction of specialized environments for building parallel
programs in narrow application domains; in particular,
a PRESTO-based environment
for writing parallel discrete-event simulations.
The characteristics of these narrow-domain applications often
require special handling at very low levels in order to obtain
reasonable performance; the simulation environment
provides its
own scheduler for handling situations that are
particular to the simulation world.
.NH 2
Building an Instrumented Execution Environment
.PP
Understanding how to improve a parallel application
requires understanding its behavior.  This, in turn,
requires the ability to observe the run-time
characteristics of an application (either in real time,
or in "play-back" mode).
Expecting the designers of a programming environment
to include code for monitoring \fIall the right things\fP
is unreasonable.  Redefinition allows the programmer
to use the base system components while collecting
data about their behavior.  
.PP
PRESTO objects have been instrumented to permit the collection
of data during an application's execution.  The instrumentation
is implemented by creating, for each PRESTO system class,
an instrumented version of that class.
For example, once an application has been written, a programmer
might need to know the percentage of time that
a thread is blocked with respect to its total lifetime.
.\" Knowing
.\" this for all threads gives a good indication of the inherent
.\" parallelism in an application.
This information is easily
obtainable by deriving a new class \fIInstrumentedThread\fP from
the basic PRESTO class \fIThread\fP.  An \fIInstrumentedThread\fP
redefines the two thread operations \fIsleep\fP and \fIwakeup\fP
so that they adjust a timer
before invoking the real \fIsleep\fP and \fIwakeup\fP operations
in the superclass class \fIThread\fP.   
.PP
System objects that create new instances of other system
objects rely on a prototypical instance of the new object to
direct the creation.  Part of the scheduler's responsibility, for
instance, is to create one thread per processor in the system.  These
threads busy themselves by looking for something to do, doing
it, and then continuing looking.  The scheduler does not
create these new threads directly; instead, it asks its own
thread  (since the scheduler must be running, it must have
a thread) to create a new instance of itself.  By specifying
the scheduler's prototypical thread, the programmer can
control the behavior of all other system threads.  Were
PRESTO to assume that threads always looked the same,
this would not be possible.
.NH 2
Building Various Parallel Programming Environments
.LP
\fIA First Model\fR
.PP
The first real 
applications built on top of PRESTO were implemented
using a Mesa-like environment that can be included
when a PRESTO program is compiled.  The environment simply
provides threads (Mesa \fIprocesses\fP)
that can fork other
threads, join on their results, and synchronize using
the standard
Mesa monitors and condition variables.   In a sense,
the environment is bland, but it \fIis\fP effective, and it
was cheap to build.
The entire implementation required only about 200 lines of
code when built using PRESTO.
.PP
Despite its wide appeal,
Mesa's model of parallel programming is not without flaws.
Threads have
special semantics apart from other program objects, and
synchronization constraints must be made explicit by the programmer.
Nonetheless, we have used this programming model for a
wide range of applications that include
matrix multiplication, sorting,
analysis of multi-class queueing networks, and a parallel othello
program.
.sp 0.5
.LP
\fIImplementing Higher-Level Abstractions\fR
.PP
The primary goal of any parallel programming environment is
to help make it easier to write and reason about parallel programs.
To this end, there have been
a large number of notable systems \&[12]
.\".[[
.\"needref
.\".]][1,\|6-8,\|10,\|11,\|18],
each providing their
designers' notions of the best possible environment for constructing
parallel applications.  Although each system is unique in its goals
and implementation, each has had to address a similar set of key
questions.  By answering these questions differently, different
models of parallel programming are
realized.  Rather than enumerating
the features of each system and discussing its (potential)
implementation in PRESTO, we instead touch on some of
the key questions, discussing how they would be answered
by PRESTO.
.IP \(bu
\fIShould dynamic creation of threads be allowed?\fP
.IP
Concurrent Euclid and CSP do not support the dynamic
creation of new threads.  The basic process structure
is specified at compile time and never changes.  Although
thread creation in PRESTO is inherently dynamic, static objects
containing their own thread(s) of execution can be declared
at compile time, and the freedom to create new threads at run time
can be denied.  
.IP \(bu
\fIShould objects and threads be disjoint or unified concepts?\fP
.IP
Systems such as Smalltalk-80, Mesa, and Modula-2+
make a strong distinction between an object, which is
inherently passive, and a thread (or process), which is used
to animate objects.  Threads may not even be first class objects.
This dichotomy can be confusing to
programmers, making it difficult to model problems and enforce
protection among objects.
On the other hand,
systems such as ConcurrentSmalltalk and Act 1 unify the two
notions.
An object is a schedulable entity in these 
systems, and there is no conceptual gap between an object and a
thread \- one cannot exist without the other.
.IP
PRESTO supports both models.  Threads can be separate
entities from the objects in which they execute,
or threads and objects can be unified by defining classes
that \fIthread\fP themselves upon instantiation.  The PRESTO-derived
class \fCTask\fP serves this function.  
Tasks
execute autonomously, communicating with one another
via messages.  The basic structure of a task is
.DS L
.ft I
forever do
        m @<-@ receiveMessage
        decode m
        execute the operation requested in m 
         resulting in a new message m'
        sendMessage m' to the sender of m
.ft R
.DE
.IP
A user's definition for an object derived from a \fCTask\fP has no main
body.  It just provides the operations that are invoked by the task's
thread upon receiving a message from another task.  Synchronization
within the object is implicit in the serial processing of incoming
messages.  
.IP \(bu
\fIShould object operations be asynchronous or synchronous?\fP
.IP
Act 1 and Multilisp have only asynchronous object operations,
while POOL-T is entirely synchronous.  ConcurrentSmalltalk
supports both.  The advantage of having asynchronous operations
is that it allows inter-object parallelism to be naturally 
expressed; interacting objects proceed in parallel
without programmer intervention.  Synchronous operations are easier
to reason about, but require that the programmer create a new
thread of execution to gain concurrency.
.IP
Both synchronous and asynchronous operations can be
realized using PRESTO.  Synchronous
operations essentially come "for free" from the sequential
from an underlying sequential programming model.
An asynchronous operation having no return value (or none
of interest) are possible by creating a new thread and
using it to request the operation.  Later synchronizing
on an asynchronous operation requires extra handling.
.IP
In Multilisp and Act 1, an asynchronous operation is described
by a \fIfuture\fP, which is an object whose value is either
"in progress" or "ready."
If a future's value is referenced in an expression, the thread
executing the expression 
is blocked until this value becomes ready.
In PRESTO, a future is represented by
an instance of a 
class derived from the class \fIFuture\fP.  
When a future is declared, a new thread
is created, and the computation represented
by the future executes
asynchronously to the thread running in the
object holding the future's reference.  This reference
may be passed around freely.
The thread computing the future terminates
with a return value that becomes the future's own value.  Upon
termination, the future has been reached, and any
threads waiting on the future's value are resumed.
.IP \(bu
\fIShould any parallel program try to use all of this?\fP
.IP
PRESTO provides the basic tools
with which to construct different types of
parallel programming models.  While misuse (or abuse) of these
tools can be discouraged, it cannot be prevented.
Application builders will fail to realize any model if they
try to realize them all.  For instance, 
tasks and futures can be used together.  A task would
create an asynchronously executing
future object and return it to the object invoking
the \fIsendAndReceiveMessage\fP operation.  
Mixing models like this is not recommended, though, since
it quickly results in incomprehensible programs.
.NH 2
Synapse \- An Environment For Parallel Simulation
.PP
Synapse \&[16]
\& provides a message-oriented programming
environment for writing parallel simulations of the
type described by Chandy and Misra \&[4]
\& and Bryant \&[3].
\& A simulation is structured as a set of processes that communicate
by sending timestamped messages to one another.  Each
process is guaranteed to receive its messages in monotonically
increasing timestamp order.  The Synapse environment's primary function is
to guarantee this ordering, \fIwithout\fP requiring that all processes
run lockstep in simulation time.   In this way,
processes that do not have intermessage dependencies can proceed
in parallel.
.sp 0.5
.LP
\fIThe Synapse Abstractions\fR
.PP
Synapse presents itself as a standalone support system
for parallel simulations.  Programmers are unaware that the foundation
on which Synapse implements its abstractions is provided
by PRESTO.  Instead of threads, spinlocks, and synchronization
objects, Synapse programs manipulate higher-level abstractions such 
as \fILogical Processes\fP (LP), \fIlinks\fP and \fImessages\fP.
LP's communicate with one another by sending timestamped
messages across
links, which are one-way communication channels guaranteeing
that all messages are received in increasing timestamp order.
.PP
In the same way that synchronization semantics
are absent from PRESTO synchronization objects,
Synapse's LPs don't simulate anything.  Rather, they are a basic
class from which more useful simulation processes can be
derived.
An LP is an object with its own thread of control and its
own private simulation clock.  It can
open links, and send and receive messages on them. 
Every message includes the clock value of the source LP
at the time the message is sent.
A clock advances in response
to the messages that an LP receives,  implying that clocks
in different LPs can progress at different rates.
The main job of Synapse is to ensure that this variability is
transparent to the LPs.
.PP
A message cannot
be received on a link by an LP unless no other logically
earlier message will arrive on any of the LP's other links.
This constraint causes LPs to block \fIeven though\fP
there may be pending messages on some of its links.
.sp 0.5
.LP
\fIHandling Deadlock\fR
.PP
A Synapse simulation can deadlock if a cycle of empty links exists
among a subset of LPs.  If the subset is proper, then the
system is partially deadlocked and some LPs can continue
to make progress.  If complete, then all LPs are blocked
and nothing can proceed without external intervention.
To recover from deadlock, the blocked LP with the 
earliest pending input message is allowed to receive that message.
.PP
Part of the research associated with Synapse is the investigation of different
methods of handling deadlock (partial and full) in parallel distributed
simulations.  The structure of PRESTO and, in particular, PRESTO's
scheduler, simplifies the mechanics of this research.  When the
scheduler concludes that there are no threads to be run (all LPs
are deadlocked), it invokes the operation \fIhalt\fP on itself.
Synapse defines a simulation scheduler that is exactly the same as the
PRESTO scheduler, \fIexcept\fP for its halting criteria.  The simulation
scheduler finds the set of all LPs that are blocked but have
unread messages on their links.  If any exist, the scheduler
starts the one with the earliest unread message.
.PP
Dealing with deadlock only when the system halts does
not solve the problem of partial deadlock.  As long as a
simulation fully utilizes all of the \fIphysical\fP processors in the
system, a deadlocked subset of LPs does not affect a parallel
simulation's
performance.  Only when there are fewer non-deadlocked LPs
than physical processors is the simulation "in trouble."
.PP
The typical way of dealing with partial deadlock is to use
low-priority threads that find, and break, the deadlock when
it occurs.
A processor that can't find an LP to execute runs one of these
threads instead.  Although functionally correct, this solution incurs
a scheduling and context-switch overhead.  This overhead can be avoided,
however, by changing the behavior of the low-level system components.
.PP
PRESTO encourages a solution in which the
processors themselves solve the deadlock.  When a processor object
requests a ready thread from the scheduler, the scheduler invokes the
operation \fIget\fP on the pool of ready threads.  With this,
the locus of control moves from the processor object to the scheduler object
to the ready pool.  Synapse supplants the scheduler's ready pool
with one of its own.  A \fIget\fP operation on a Synapse ready pool that
would otherwise return nothing, instead scans the set of blocked LPs
to determine which, if any, should be restarted.  The algorithms for
making this determination are wholly enclosed within Synapse's own
ready pool.  The processor and scheduler
objects are essentially duped into breaking partial
deadlock without their knowledge.
.PP
Synapse and PRESTO together prove
that efficiency and abstraction need not be incompatible.
The difference between a PRESTO thread and a Synapse LP is one of
semantics, not performance.  Similarly, links are built
using the basic thread primitives (sleep/wakeup), 
and are not constrained by an abstraction that obscures their
goals.
The combination of object-oriented programming with
PRESTO's open system design allows a very
high-level concept, namely deadlock detection, to execute efficiently
at a very low level.  Currently, Synapse is serving as the implementation
base for another parallel programming environment, namely Poker \&[14].
.NH 1
The PRESTO Implementation
.PP
PRESTO is implemented in the object-oriented programming
language C++ \&[15]. \&
The system runs on a Sequent Balance 21000
shared-memory multiprocessor
on top of the DYNIX 
operating system, and
on single-processor DEC VAX machines running the
ULTRIX operating system.  The
system should soon be operational on DEC SRC Firefly
experimental prototype multiprocessor workstations.
.PP
Sequent's DYNIX is a UNIX-lookalike.  The only way
to achieve true multiprocessor parallelism is to create
multiple DYNIX processes, a fairly expensive task requiring
about 55 msec\**.
.FS
Some timings for other Sequent operations:
execute one iteration of a for-loop: 4 @mu@secs.;
make a procedure call with no arguments: 15 @mu@secs.
.FE
In contrast, a PRESTO thread on the Sequent
can be created and started in as little as 700 @mu@secs.
.\" A thread can be put to sleep and resumed in XX @mu@secs.
.\" *** timing needed above
A large percentage of these times is spent acquiring the atomic
hardware locks needed
to guarantee mutual exclusion
in the various system components (about 30 @mu@secs. per
lock).  We expect the Sequent implementation
to speed up considerably when new Symmetry hardware makes
it possible to acquire free locks in only a few @mu@secs.
It is encouraging that our design, which
invites extension and modification, has performance comparable
to that of several other multiprocessor threads packages
known to us.
.PP
In this paper we have concentrated on the way in which
PRESTO's object orientation provides a
framework within which one can easily
build efficient support for a wide variety of
parallel programming models.
PRESTO's implementation and performance are described
more fully in a companion paper [2].
.NH 1
Conclusions
.PP
PRESTO is not a toy.
It is the current system of choice for parallel programming
at the University of Washington.
.PP
PRESTO began merely as an effort to address the high
cost of the parallel programming constructs provided
in the DYNIX environment, where we found that our use
of threads had to be governed by their overhead rather
than by the natural decomposition of our problems.
PRESTO succeeded in this goal.
.PP
After a significant period of use, though, we have
come to the conclusion that PRESTO's ability to be
customized to provide efficient support for any of
a wide variety of parallel programming models is of
much greater importance.
Correct and efficient parallel programs are
notoriously hard to engineer.
There is no one parallel programming model that is
right for all applications.
The ability to construct an appropriate model using
PRESTO makes correct and efficient programs less
difficult to achieve.
.SH
Acknowledgements
.PP
We'd like to thank Kenneth Almquist, Tom Anderson, Jeff Chase,
Bjorn Freeman-Benson, Ellen Ratajak, Alan Shaw and Ken Whaley
for their input on the system's design and implementation
as well as their many helpful comments on earlier
drafts of this paper.
.]<
.\"America.P.-POOL-T:-A-Parallel-O-1
.ds [F 1
.]-
.ds [A P. America
.ds [T POOL-T: A Parallel Object-Oriented Language
.ds [E M. Tokoro, A. Yonezawa
.ds [B Object-Oriented Concurrent Programming
.ds [I MIT Press
.ds [C Cambridge, Mass
.ds [D 1987
.ds [K pool-t
.nr [T 0
.nr [A 0
.nr [O 0
.][ 3 article-in-book
.\"Bershad.B.N.-Lazowska.E.D.-Levy.H.M.-PRESTO:-A-System-For-2
.ds [F 2
.]-
.ds [A B.N. Bershad
.as [A ", E.D. Lazowska
.as [A ", and H.M. Levy
.ds [T PRESTO: A System For Object-Oriented Parallel Programming
.ds [R Technical Report TR 87-09-01, Department of Computer Science, University of Washington, (submitted for publication)
.ds [D September 1987
.ds [K parallel, object-oriented, multiprocessor
.ds [K original presto paper,presto1
.nr [T 0
.nr [A 0
.nr [O 0
.][ 4 tech-report
.\"Bryant.R.E.-Simulation-of-Packet-3
.ds [F 3
.]-
.ds [A R.E. Bryant
.ds [T Simulation of Packet Communucations Architecture Computer Systems
.ds [I Massachusetts Institute of Technology, Laboratory for Computer Science
.ds [R Technical Report MIT, LCS, TR-188
.ds [D 1977
.nr [T 0
.nr [A 0
.nr [O 0
.][ 4 tech-report
.\"Chandy.K.M.-Misra.J.-Asynchronous-Distrib-4
.ds [F 4
.]-
.ds [A K.M. Chandy
.as [A " and J. Misra
.ds [T Asynchronous Distributed Simulation Via A Sequence of Parallel Computations
.ds [J Communications of the ACM
.ds [I ACM
.ds [V 24
.ds [N 11
.ds [P 198-206
.nr [P 1
.ds [D November 1981
.nr [T 0
.nr [A 0
.nr [O 0
.][ 1 journal-article
.\"Dijkstra.E.W.-Structure-of-the-`TH-5
.ds [F 5
.]-
.ds [A E.W. Dijkstra
.ds [T The Structure of the `THE'-Multiprogramming System
.ds [J Communications of the ACM
.ds [I ACM
.ds [V 11
.ds [N 5
.ds [P 341-346
.nr [P 1
.ds [D 1968
.ds [K seminal semaphore paper
.ds [K THE-system
.nr [T 0
.nr [A 0
.nr [O 0
.][ 1 journal-article
.\"Goldberg.A.-Robson.D.-Smalltalk-80:-The-La-6
.ds [F 6
.]-
.ds [T Smalltalk-80: The Language and its Implementation
.ds [A A. Goldberg
.as [A " and D. Robson
.ds [I Addison-Wesley
.ds [D 1983
.ds [K smalltalk-80
.nr [T 0
.nr [A 0
.nr [O 0
.][ 2 book
.\"Halstead.R.-Multilisp:-A-Languag-7
.ds [F 7
.]-
.ds [A R. Halstead
.ds [T Multilisp: A Language for Concurrent Symbolic Computation
.ds [J ACM Transaction on Programming Languages and Systems
.ds [D October 1985
.ds [k multilisp, parallel programming
.nr [T 0
.nr [A 0
.nr [O 0
.][ 1 journal-article
.\"Hoare.C.A.R.-Communicating-Sequen-8
.ds [F 8
.]-
.ds [A C.A.R. Hoare
.ds [T Communicating Sequential Processes
.ds [J Communications of the ACM
.ds [I ACM
.ds [V 21
.ds [N 11
.ds [P 666-677
.nr [P 1
.ds [D August 1978
.ds [K seminal paper,csp
.ds [K csp, parallel programming languages
.nr [T 0
.nr [A 0
.nr [O 0
.][ 1 journal-article
.\"Hoare.C.A.R.-Monitors:-An-Operati-9
.ds [F 9
.]-
.ds [A C.A.R. Hoare
.ds [J Communications of the ACM
.ds [T Monitors: An Operating System Structuring Concept
.ds [I ACM
.ds [V 17
.ds [N 10
.ds [P 549-557
.nr [P 1
.ds [D October 1974
.ds [K hoare monitors, synchronization, concurrent programming
.nr [T 0
.nr [A 0
.nr [O 0
.][ 1 journal-article
.\"Holt.R.-Short-Introduction-T-10
.ds [F 10
.]-
.ds [A R. Holt
.ds [T A Short Introduction To Concurrent Euclid
.ds [J SIGPLAN Notices
.ds [V 17
.ds [P 60-79
.nr [P 1
.ds [D May 1982
.ds [K CE concurrent euclid
.nr [T 0
.nr [A 0
.nr [O 0
.][ 1 journal-article
.\"Lieberman.H.-Concurrent-Object-Or-11
.ds [F 11
.]-
.ds [A H. Lieberman
.ds [T Concurrent Object-Oriented Programming in Act 1
.ds [E M. Tokoro, A. Yonezawa
.ds [B Object-Oriented Concurrent Programming
.ds [I MIT Press
.ds [C Cambridge, Mass
.ds [D 1987
.ds [K actors, act 1
.nr [T 0
.nr [A 0
.nr [O 0
.][ 3 article-in-book
.\"Mundie.D.A.-Fisher.D.A.-Parallel-Processing--12
.ds [F 12
.]-
.ds [A D.A. Mundie
.as [A " and D.A. Fisher
.ds [T Parallel Processing in Ada
.ds [J IEEE Computer
.ds [D August 1985
.ds [P 20-25
.nr [P 1
.ds [K ada, parallel processing
.nr [T 0
.nr [A 0
.nr [O 0
.][ 1 journal-article
.\"Redell.B.W.L.D.D.-Experiences-with-Pro-13
.ds [F 13
.]-
.ds [A B.W. Lampson, D.D. Redell
.ds [T Experiences with Processes and Monitors in Mesa
.ds [J Communications of the ACM
.ds [I ACM
.ds [V 23
.ds [N 2
.ds [P 104-117
.nr [P 1
.ds [D February 1980
.ds [K mesa, monitors, concurrent programming, experiences with mesa
.nr [T 0
.nr [A 0
.nr [O 0
.][ 1 journal-article
.\"Snyder.L.-Parallel-Programming-14
.ds [F 14
.]-
.ds [A L. Snyder
.ds [T Parallel Programming and the Poker Programming Environment
.ds [J IEEE Computer
.ds [V 17
.ds [N 7
.ds [D July 1984
.ds [k poker
.nr [T 0
.nr [A 0
.nr [O 0
.][ 1 journal-article
.\"Stroustrup.B.-C++-Programming-Lang-15
.ds [F 15
.]-
.ds [A B. Stroustrup
.ds [T The C++ Programming Language
.ds [I Addison-Wesley
.ds [D March 1986
.ds [K c++ cplusplus
.nr [T 0
.nr [A 0
.nr [O 0
.][ 2 book
.\"Wagner.D.B.-Lazowska.E.D.-Bershad.B.N.-Techniques-for-Effic-16
.ds [F 16
.]-
.ds [A D.B. Wagner
.as [A ", E.D. Lazowska
.as [A ", and B.N. Bershad
.ds [T Techniques for Efficient Shared-Memory Parallel Simulation
.ds [I Department of Computer Science, University of Washington
.ds [R Technical Report 88-04-05
.ds [K synapse
.ds [D April 1988
.nr [T 0
.nr [A 0
.nr [O 0
.][ 4 tech-report
.\"Wulf.W.-Cohen.E.-Corwin.W.-Jones.A.-Levin.R.-Pollack.F.-HYDRA:-The-Kernel-of-17
.ds [F 17
.]-
.ds [A W. Wulf
.as [A ", E. Cohen
.as [A ", W. Corwin
.as [A ", A. Jones
.as [A ", R. Levin
.as [A ", and F. Pollack
.ds [T HYDRA: The Kernel of a Multiprocessor Operating System
.ds [J Communications of the ACM
.ds [I ACM
.ds [V 17
.ds [N 6
.ds [P 337-345
.nr [P 1
.ds [D June 1974
.ds [K HYDRA, hydra
.nr [T 0
.nr [A 0
.nr [O 0
.][ 1 journal-article
.\"Yokote.Y.-Tokoro.M.-Concurrent-Programmi-18
.ds [F 18
.]-
.ds [A Y. Yokote
.as [A " and M. Tokoro
.ds [T Concurrent Programming in ConcurrentSmalltalk
.ds [E M. Tokoro, A. Yonezawa
.ds [B Object-Oriented Concurrent Programming
.ds [I MIT Press
.ds [C Cambridge, Mass
.ds [D 1987
.ds [K concurrentsmalltalk
.nr [T 0
.nr [A 0
.nr [O 0
.][ 3 article-in-book
.\"Yonezawa.A.-Tokoro.M.-Object-Oriented-Conc-19
.ds [F 19
.]-
.ds [A A. Yonezawa
.as [A " and M. Tokoro
.ds [T Object-Oriented Concurrent Programming: An Introduction
.ds [E M. Tokoro, A. Yonezawa
.ds [B Object-Oriented Concurrent Programming
.ds [I MIT Press
.ds [C Cambridge, Mass
.ds [D 1987
.ds [K intro-to-object-oriented-concurrent-programming
.nr [T 0
.nr [A 0
.nr [O 0
.][ 3 article-in-book
.]>

