Performance Programming
Performance programming seeks to improve performance
beyond what is achieved by programming an algorithm in the
most expedient manner. The goal is that each processing
element be kept as busy as possible doing useful work. This
entails satisfying four requirements: breaking problems into
independent subproblems that can be executed concurrently,
distributing these subproblems appropriately among the processing
elements, making sure that the necessary data is close to its
processing element, and overlapping communication with computation
where possible. To attain high performance, these requirements must
be satisfied whether one views "processing elements" as
stages of an arithmetic or vector pipeline, functional units of a CPU,
processors of a tightly coupled shared-memory multiprocessor, nodes of
a distributed-memory supercomputer, or heterogeneous computers on a
network.
Performance programming can require weeks of work even for a small
program. This effort can be justified for some frequently-executed
kernels and computations for which there is a large premium for fast
answers. Performance programming techniques are applicable, but rarely
appropriate, for sequential programs. They are often of paramount
importance on parallel supercomputers.
Some useful url's
A bibliography of my papers on topics related to performance
programming. Preprints of most of these papers are provided.
An introduction to uniprocessor optimization in
PostScript or
PDF Format
for the 1998 NPACI Parallel Computing Institute.
Here are some materials for a
one-day
course in performance programming developed by Bowen Alpern
and Larry Carter.
An outline of a graduate course for Columbia University's Computer
Science Department taught by Bowen Alpern in the Fall of 1994 is
available here.
I taught a
course on performance programming at UCSD in the Fall of 1995.
And again in Fall 98.