Abstract for Wallace, Calder, Tullsen, "Threaded Multiple Path Execution"

This paper presents Threaded Multi-Path Execution (TME), which
exploits existing hardware on a Simultaneous Multithreading (SMT)
processor to speculatively execute multiple paths of execution.  When
there are fewer threads in an SMT processor than hardware contexts,
threaded multi-path execution uses spare contexts to fetch and execute
code along the less likely path of hard-to-predict branches.

This paper describes the hardware mechanisms needed to enable an SMT
processor to efficiently spawn speculative threads for threaded
multi-path execution.  The Mapping Synchronization Bus is
described, which enables the spawning of these multiple paths.
Policies are examined for deciding which branches to fork, and for
managing competition between primary and alternate path threads for
critical resources.  Our results show that TME increases the single
program performance of an SMT with eight thread contexts by 14%-23%
on average, depending on the misprediction penalty, for programs with
a high misprediction rate.