Abstract for Tullsen, Brown, "Handling Long-latency Loads in a Simultaneous
Multithreading Processor"
Simultaneous multithreading architectures have been defined
previously with fully shared execution resources. When one thread
in such an architecture experiences a very long-latency operation, such
as a load miss, the thread will eventually stall,
potentially holding resources which
other threads could be using to make forward progress.
This paper shows that in many cases it is better to free the resources
associated with a stalled thread rather than keep that thread ready to
immediately begin execution upon return of the loaded data.
Several possible architectures are examined, and some simple
solutions are shown to be
very effective, achieving speedups close to 6.0 in some
cases, and averaging 15% speedup with four threads and over 100\% speedup with
two threads running. Response times are cut in half for several workloads
in open system experiments.