CSE 260 – Background Survey

CSE 260 – Quizlet # 3 11/8/02 Name: ____________________

1. Given an EREW PRAM with P processors, explain how a value can be distributed to all processors in O(lg P) steps. (Perhaps the following will make the question clearer: Suppose memory locations X and Y [1], Y[2], ... Y[P] are initialized to arbitrary values. Describe how, in O(lg P) time steps, processor i could compute X+Y[i].)

Let’s say X=X[1], and that X[2], X[3], ..., X[P] are additional memory locations..

Processor i does nothing for the first k = 2 lg i timesteps. Then at time k+1, it reads the value in X[i]; at time k + 2, it will store it in location X[2^k/2+ i]; and finally at time k+3, it can add it to Y[i].

IMPORTANT: PRAM’s don’t send messages between processors! They only read and write from their shared memory.

2. Suppose that a program for LU decomposition on a problem of size N, using P processors, takes N³/P + 20N² time.

(a) What does Amdahl’s Law say is the maximum possible speedup?

Unparallelizable portion is 20N² of N³ + 20N². Amdahl’s law says limit on speedup is the reciprocal, that is (N³ + 20N²⁾ / 20N² = (N+20)/20.

(b) What is the parallel efficiency of this program?

Serial Time / P x Parallel Time = (N³ + 20N²)/ (P (N³/P + 20N²))

= (N + 20) / (N + 20P).

Keeping the parallel efficiency constant (say at c), means (N+20)/(N+20P) = c.

Thus, we want N(1-c) = 20Pc – 20, i.e. N = (20Pc – 20)/(1-c) = O(P).

4. Suppose message passing in a computer has a latency of 10 microseconds and a bandwidth of 500 MB/sec. Compute what would be the “N_1/2” message length.

The asymptotic (though unattainable) best performance you can get is 500 MB/sec.

The question is, what message length gets half this speed? Well, the time to send a message of length N Bytes is 10 x 10^-6 sec + N/500x10⁶ B/sec = (10 + N/500)x10^–6 sec.

Thus, we need to solve N Bytes/((10 + N/500)x10^–6 sec) = 250 MB/sec.

A little algebra gets N = 5000 Bytes.

Even though Little’s Law says that the number of bits “in flight” in a communication channel that has latency L and bandwidth B is LB, this has little to do with this problem!

5. Which would you expect to run faster (in terms of MFLOP/sec) on a typical multicomputer: a Finite Difference Method problem or a Finite Element Method problem? Why?

Finite Difference Method. Fewer loads are needed per floating point operation (since the

coefficients are the same for all stencils), memory references have better locality, and it’s easier to get good load balancing among processors.

“Less communication required” isn’t necessarily true.

6. Give example of an important problem for each of the following application areas:

(a) CFD Airflow over car or airplane wing, weather prediction, climate modeling, ...

(b) Structural dynamics. Car crash simulation, response of buildings and bridges to earthquakes, ...

Genomics, protein/DNA string matching aren’t molecular dynamics – they don’t

study the forces on, and movement of, molecules.

(d) Signal processing. Radar, Sonar, SETI, seismic data processing (for earthquake detection or oil exploration), processing of sounds (e.g. talking) in cell phones, ...

“Analyzing electromagnetic radiation” is a good guess, but it isn’t really considered signal processing.

7. Here’s a list of US government organizations: DOD, DOE, NASA, NIH, NSF.

For each of the following programs or institutions, tell which government organization from the list is the major sponsor:

(a) NCSA ____NSF_________

(b) DARPA ____DOD_________