Abstract for Tullsen, Eggers, Emer, Levy, Lo, Stamm, "Exploiting Choice:
Instruction Fetch and Issue on an Implementable Simultaneous Multithreading
Processor"
Simultaneous multithreading is a technique that permits multiple independent
threads to issue multiple instructions each cycle.
In previous work we demonstrated the performance potential of simultaneous
multithreading,
based on a somewhat idealized model. In this paper we show that the throughput
gains from
simultaneous multithreading can be achieved without extensive changes
to a conventional wide-issue superscalar, either in hardware structures or sizes.
We present an architecture for simultaneous multithreading
that achieves three goals: (1) it minimizes the architectural impact on the
conventional superscalar design, (2) it has minimal performance impact on
a single thread executing alone, and (3) it achieves significant throughput gains
when running multiple threads. Our simultaneous multithreading
architecture achieves a throughput of 5.4 instructions per cycle,
a 2.5-fold improvement
over an unmodified superscalar with similar hardware resources.
This speedup is enhanced by an advantage of
multithreading previously unexploited in other
architectures: the ability to favor for fetch and issue
those threads most efficiently using the processor each cycle, thereby
providing the ``best'' instructions to the processor.