1. Given an EREW PRAM with P processors, explain how a value can be distributed to all processors in O(lg P) steps. (Perhaps the following will make the question clearer: Suppose memory locations X and Y [1], Y[2], ... Y[P] are initialized to arbitrary values. Describe how, in O(lg P) time steps, processor i could compute X+Y[i].)
Let’s say X=X[1], and that X[2],
X[3], ..., X[P] are additional memory locations..
Processor i does nothing for the
first k = 2 lg i timesteps. Then at time k+1, it reads the
value in X[i]; at time k + 2, it will
store it in location X[2k/2 + i]; and finally at time k+3, it can
add it to Y[i].
IMPORTANT: PRAM’s don’t send messages between processors! They only read and write from their shared memory.
2. Suppose that a program for LU decomposition on a problem of size N, using P processors, takes N3/P + 20N2 time.
(a) What does Amdahl’s Law say is the maximum possible speedup?
Unparallelizable
portion is 20N2 of N3 + 20N2. Amdahl’s law
says limit on speedup is the reciprocal, that is (N3 + 20N2) / 20N2 =
(N+20)/20.
(b) What is the parallel efficiency of this program?
Serial
Time / P x Parallel Time = (N3 + 20N2 )/ (P (N3/P
+ 20N2))
= (N + 20) / (N + 20P).
(c) Calculate the isoefficiency function for this program.
Keeping the parallel efficiency constant (say at c), means
(N+20)/(N+20P) = c.
Thus, we want N(1-c)
= 20Pc – 20, i.e. N = (20Pc – 20)/(1-c) = O(P).
4. Suppose message passing in a computer has a latency of 10 microseconds and a bandwidth of 500 MB/sec. Compute what would be the “N1/2 ” message length.
The asymptotic (though
unattainable) best performance you can get is 500 MB/sec.
The question is, what message
length gets half this speed? Well, the time to send a message of length N Bytes
is 10 x 10-6 sec + N/500x106 B/sec = (10 + N/500)x10–6
sec.
Thus, we need to solve N Bytes/((10 + N/500)x10–6
sec) = 250 MB/sec.
A little algebra gets N = 5000
Bytes.
Even though Little’s Law says that the number of bits “in flight” in a communication channel that has latency L and bandwidth B is LB, this has little to do with this problem!
5. Which would you expect to run faster (in terms of MFLOP/sec) on a typical multicomputer: a Finite Difference Method problem or a Finite Element Method problem? Why?
Finite Difference Method. Fewer
loads are needed per floating point operation (since the
coefficients are the same for all
stencils), memory references have better locality, and it’s easier to get good
load balancing among processors.
“Less communication required” isn’t necessarily true.
6. Give example of an important problem for each of the following application areas:
(a) CFD Airflow over car or airplane wing, weather prediction, climate modeling, ...
(b) Structural
dynamics. Car crash simulation, response of buildings
and bridges to earthquakes, ...
(c) Molecular dynamics Protein folding, drug design, ...
Genomics, protein/DNA string matching aren’t molecular
dynamics – they don’t
study the forces on, and movement of, molecules.
(d) Signal processing. Radar, Sonar, SETI, seismic data processing (for earthquake detection or oil exploration), processing of sounds (e.g. talking) in cell phones, ...
“Analyzing
electromagnetic radiation” is a good guess, but it isn’t really considered
signal processing.
7. Here’s a list of US government organizations: DOD, DOE, NASA, NIH, NSF.
For each of the following programs or institutions, tell which government organization from the list is the major sponsor:
(a) NCSA ____NSF_________
(b) DARPA ____DOD_________
(c) ASCI ____DOE_________