This paper describes the hardware mechanisms needed to enable an SMT
processor to efficiently spawn speculative threads for threaded
multi-path execution. The Mapping Synchronization Bus
is
described, which enables the spawning of these multiple paths.
Policies are examined for deciding which branches to fork, and for
managing competition between primary and alternate path threads for
critical resources. Our results show that TME increases the single
program performance of an SMT with eight thread contexts by 14%-23%
on average, depending on the misprediction penalty, for programs with
a high misprediction rate.