Performance Programming

Performance programming seeks to improve performance beyond what is achieved by programming an algorithm in the most expedient manner. The goal is that each processing element be kept as busy as possible doing useful work. This entails satisfying four requirements: breaking problems into independent subproblems that can be executed concurrently, distributing these subproblems appropriately among the processing elements, making sure that the necessary data is close to its processing element, and overlapping communication with computation where possible. To attain high performance, these requirements must be satisfied whether one views "processing elements" as stages of an arithmetic or vector pipeline, functional units of a CPU, processors of a tightly coupled shared-memory multiprocessor, nodes of a distributed-memory supercomputer, or heterogeneous computers on a network.

Performance programming can require weeks of work even for a small program. This effort can be justified for some frequently-executed kernels and computations for which there is a large premium for fast answers. Performance programming techniques are applicable, but rarely appropriate, for sequential programs. They are often of paramount importance on parallel supercomputers.

Some useful url's

CPU Information Center
The ASCI Blue Benchmarks.
LMBench performance analysis tools, including some figures on memory latencies for various caches.
Netlib Repository of software, papers & databases.
PTOOLS Parallel Tools Consortium.
PTools-related web sites.
WARTS: Wisconsin's Archtecture Research Tool Set.
Tom Anderson's web page on tools for tuning program performance.
Don Pardo's instruction-level simulation and tracing list of tools.
Rudolf Eigenmann's directory of computational applications and benchmarks.
Ian Foster's online book, Designing and Building Parallel Programs.
NPACI's userguide for the T3E.

Performance Programming Papers

A bibliography of my papers on topics related to performance programming. Preprints of most of these papers are provided.

Performance Programming Courses

An introduction to uniprocessor optimization in PostScript or PDF Format for the 1998 NPACI Parallel Computing Institute.

Here are some materials for a one-day course in performance programming developed by Bowen Alpern and Larry Carter.

An outline of a graduate course for Columbia University's Computer Science Department taught by Bowen Alpern in the Fall of 1994 is available here.

I taught a course on performance programming at UCSD in the Fall of 1995. And again in Fall 98.