Performance Programming
Columbia, Fall 1994
Professor: Bowen Alpern
SORRY! These are no longer online.
Course Outline
What is performance programming? When is it justified? Review
of RAM and PRAM analysis. Overview of examples to be used in the
course. Poisson distributions. Extended example: seismic migration.
Extended example: unblocked matrix multiplication. The Two Level
model of computation. The memory hierarchy of an RS/6000. Locality:
local, semilocal, and nonlocal data passes. Why Quicksort is better
than Heapsort. Communication cost on SP-1 and SP-2.
The PMH model of computation and its parameters. Visualizing computers.
Programming a PMH. Performance programming techniques. Hierarchial tiling.
Extended example: a PDE-like code.
LAPACK and the BLAS. daxpy vs. ddot.
Eliminating sequential dependencies in ddot.
Computation ``sticks.'' Prefetching. LU decomposition. The importants
of proper parenthesization.
Cache. Translation Lookaside Buffer (TLB). Disk. Blocking.
Associativity: what it is and how to fight it. Blocking multiple
access patterns. Examples: ranking integers, LU decomposition, and
FFT. Poisson distribution (again).
Distributed and shared address space parallelism. Latency and
bandwidth. Collective Communication. Choreography. 1D and 2D LU
decomposition.
Example: NAS/CG
Scalability analysis. Performance landscapes. Examples: LU
decomposition and sparse matrix-vector product.
What would a portable performance program look like? Program
variants. Generic Models. Deriving a PMH model for a machine.
Modeling the CM-5. Distortions of PMH models.
Message Compression: fewer messages, fewer values, or fewer bits.
Hardware support for message compression. Example: NAS/IS. Natural
sequence compression. Standard techniques: run-length coding, Huffman
codes, arithmetic coding.
IEEE floating-point format. Rounding modes. Integer arithmetic in
the floating-point unit. Examples: Pseudo-random number generation
and data compression. Microparallelism. Inner-loop tricks. Table
lookup and the NAS/EP benchmark. Approximate (inverse) squareroots.
Fast Fourier Transforms.
Uniformity and the UMH model. Some theorems and their
proofs. Threshold functions and candidate threshold functions.
Examples: transpose, matrix multiplication, FFT and parallel matrix
multiplication.
Rules of thumb. Dynamic programming. Extended example: optimal
Steiner trees.
In addition, Rick Lawrence, taught one class on his experience tuning
the NAS/MG benchmark for the IBM SP2.