CacheCoherenceProtocols

The protocols to maintain coherence for multiple processors are called CacheCoherenceProtocols

A memory system is coherent if (P-process, X[value]-location)

Program order Preserved: $P \to X [10], P \leftarrow X [10]$ , essentially you read what you wrote before provided no one else wrote to X!
Memory Coherence: $P_{2} \to X [5], P_{1} \leftarrow X [5]$ , essentially you read what someone else wrote to X if they wrote it before you.
Serialization: Writes to the same location are serialized: two writes to the same location are seen in the same order by all processors!

There are two classes of protocols:

Directory Based protocols
Snooping protocols
A common technique called write invalidate protocol ensures that a processor has exclusive access to data item before it writes to that item. It is called the invalidate protocol because it invalidates other copies on a write. This is by far the most common technique of all the snooping protocols and directory based protocols.
A second technique called write update or write broadcast, update all the cache copies of the data item when that item is written. The invalidate protocol works on cache blocks while in a update protocol must work on individual words.

Because bus and memory bandwidth is usually the commodity most in demand in a bus based multiprocessor, and since invalidation protocols generate less bus and memory traffic, invalidation has become the protocol of choice for almost all multiprocessors.

Snooping Protocols

Every cache that has a copy of the data from a block of physical memory also has a copy of the sharing status of the block, and no centralized state is kept. The caches usually have a shared memory bus and all cache controllers monitor or snoop the bus to determine whether or not they have a copy of the block that is requested on the bus.

This concept works for both reads and writes. On a write invalidate all the other caches that are snooping the bus will invalidate their cache block if they have that address in their cache. A cache miss can be more complex, However when a request is made all the other processors monitor the bus and if they contain a dirty copy of the block requested they write their copy to the bus and cancel the memory operation. This means that we can use a write back cache just as easily as a write through cache. Note: a write through cache does not require another processor to write its dirty block onto the bus because it will have be written through to memory. (Much slower though!). For this reason write back caches are usually used in multiprocessor systems.

We further complicate issues by keeping track of whether or not a block is shared. If it is not shared, we don't have to write invalidate it and thus saves time on a write. We do this by changing the cache to the following format:

Invalid-bit

Dirty-bit

Shared-bit

Tag

Data Block 1

Data Block 2 ...

We turn the shared bit off whenever we send a write invalidate for that block. Then if someone later asks for it, we will give it to them and turn on the shared bit.

Since every bus transaction must check the cash-address tags we could potentially interfere with CPU caching access. This potential interference is reduced by one of two techniques: duplicating the tags or employing a multilevel cache with inclusion whereby the levels closer to the CPU are a subset of those further away. Arbitration is still needed in some cases, see Computer Architecture, A Quantitative Approach 3rd Ed. page 556 first and second paragraph for more details.

The problem is that this does not scale well. Think about having 32 processors gabbing on a single shared bus. Then it better be a really fast bus, faster than the CPU possibly?

Directory based protocols

The sharing status of a block of physical memory is kept in just one location, called the directory.

Back to ComputerTerms