| CSE 240B<br>Advanced Computer Architecture<br>Dean Tullsen                                                                                                                                                                                                                      | Multiprocessors and                                                                                                     | Multiprocessing |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|-----------------|
| CSE 240B Dean Tullsen                                                                                                                                                                                                                                                           | CSE 240B                                                                                                                | Dean Tullsen    |
| <ul> <li>Multiprocessors</li> <li>why would you want a multiprocessor?</li> </ul>                                                                                                                                                                                               | Classifying Multiprocessors                                                                                             |                 |
| <ul> <li>what things can it do well?</li> <li>What things can't it do well?</li> <li>What things can it do that a <i>bunch of computers</i> can't do?</li> <li>How much are you willing to pay?</li> </ul> Processor Processor Processor Cache Cache Cache Single bus Memory V0 | <ul> <li>Flynn Taxonomy</li> <li>Interconnection Network</li> <li>Memory Topology</li> <li>Programming Model</li> </ul> |                 |

#### Flynn Taxonomy Interconnection Network • SISD (Single Instruction Single Data) . . . Proces - Uniprocessors • Bus • MISD (Multiple Instruction Single Data) • Network Cache . . . Cache - ??? • pros/cons? SIMD (Single Instruction Multiple Data) ٠ Single bus - Examples: Illiac-IV, CM-2 » Simple programming model I/0 Memory » Low overhead » All custom Processor Processor • MIMD (Multiple Instruction Multiple Data) - Examples: many, nearly all modern MPs Cache Cache ... » Flexible ` ` » Use off-the-shelf micros Memory Memory . . . Memor Network CSE 240B Dean Tullsen CSE 240B Dean Tullsen Memory Topology **Programming Model** Shared Memory -- every processor can name every address location • UMA (Uniform Memory Access) ٠ Message Passing -- each processor can name only it's local memory. ٠ • NUMA (Non-uniform Memory Access) Communication is through explicit messages. • pros/cons? Processor . . . Processo Processo ` Cache Cache Cache . . . Cache Cache Memor Memory . . . Memory i/O Memory cpu Network cpu М

М

М

Dean Tullsen

cpu

cpu

CSE 240B

shared memory architecture with network interconnection sometimes called *Distributed Shared Memory (DSM)* 

CSE 240B

Dean Tullsen



### Potential Solutions

- Snooping Solution (Snoopy Bus):
  - Send all requests for unknown data to all processors
  - Processors snoop to see if they have a copy and respond accordingly
  - Requires "broadcast", since caching information is at processors
  - Works well with bus (natural broadcast medium)
  - Dominates for small scale machines (most of the market)
- Directory-Based Schemes
  - Keep track of what is being shared in one centralized place
  - Distributed memory => distributed directory (avoids bottlenecks)
  - Send point-to-point requests to processors
  - Scales better than Snoop
  - Actually existed BEFORE Snoop-based schemes

CSE 240B

Dean Tullsen

# **Basic Snoopy Protocols**

- Write Invalidate Protocol:
  - Write to shared data: an invalidate is sent to all caches which snoop and *invalidate* any copies
  - Read Miss:
    - · Write-through: memory is always up-to-date
    - · Write-back: snoop in caches to find most recent copy
- Write Update Protocol:
  - Write to shared data: broadcast on bus, processors snoop, and *update* copies
  - Read miss: memory is always up-to-date
- Write serialization: bus serializes requests
  - Bus is single point of arbitration

CSE 240B

Dean Tullsen

# **Basic Snoopy Protocols**

- Write Invalidate versus Broadcast:
  - Invalidate requires one transaction per write-run
  - Invalidate exploits spatial locality: one transaction per block
  - Broadcast has lower latency between write and read
  - Broadcast: BW (increased) vs. latency (decreased) tradeoff

# An Example Snoopy Protocol

- Invalidation protocol, write-back cache
- Each block of memory is in one state:
  - Clean in all caches and up-to-date in memory
  - Dirty in exactly one cache
  - Not in any caches
- Each cache block is in one state:
  - (S)hared: block can be read
  - (E)xclusive: cache has only copy, its writeable, and dirty
  - (I)nvalid: block contains no data
- Read misses: cause all caches to snoop
- · Writes to clean line are treated as misses

CSE 240B

